Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 5662. Отображено 197.
17-08-2021 дата публикации

Номер: RU2018142910A3
Автор:
Принадлежит:

Подробнее
19-08-2019 дата публикации

ВЕРИФИКАЦИЯ ГОВОРЯЩЕГО

Номер: RU2697736C1
Принадлежит: ГУГЛ ЭлЭлСи (US)

Изобретение относится к верификации говорящего. Технический результат – обеспечение точной верификации подлинности говорящих, которые говорят на разных языках или диалектах. Предлагаются способы, системы, устройство, включающее в себя компьютерные программы, кодированные на компьютерном носителе информации, для способствования независимой от языка верификации говорящего. В одном аспекте, способ включает в себя действия по приему, пользовательским устройством, аудиоданных, представляющих фрагмент речи пользователя. Другие действия могут включать в себя предоставление в нейронную сеть, хранящуюся на пользовательском устройстве, входных данных, полученных из аудиоданных и идентификатора языка. Нейронная сеть может быть обучена с использованием речевых данных, представляющих речь на разных языках или диалектах. Способ может включать в себя дополнительные действия по генерированию, на основе вывода нейронной сети, представления говорящего и определению, на основе представления говорящего и второго ...

Подробнее
02-02-2021 дата публикации

Биометрический способ идентификации абонента по речевому сигналу

Номер: RU2742040C1

Изобретение относится к области вычислительной техники и связи. Технический результат заключается в обеспечении возможности дистанционной идентификации абонента. Способ включает предварительное определение набора характеристик речевого портрета абонента, установление указанных характеристик для различных абонентов, запись их в базу данных, определение речевого портрета обратившегося абонента, сравнение этого речевого портрета с речевыми портретами из базы данных и определение абонента. Определение характеристик голоса абонента для составления речевого портрета проводят с использованием методов одновременного спектрального, временного и спектрально-временного анализа совместно с аналого-цифровым преобразованием с использованием Вейвлет-преобразования. В качестве голосового примера может быть использован любой фрагмент речи, кроме пауз. Идентификация производится не по огибающей сигнала, а по специально обработанному его цифровому представлению. Это существенно ускоряет процесс идентификации ...

Подробнее
26-03-2018 дата публикации

Алгоритм поиска в компьютерных системах и базах данных

Номер: RU2648572C1

Изобретение относится к средствам для поиска в компьютерных системах и базах данных. Технический результат заключается в обеспечении возможности подбора музыкальной и текстовой информации друг к другу на основании их ритмических свойств. Способ включает индексацию текстовой информации, ввод запроса и поиск по индексу. Для индексации базы текстов, хранящихся в базе данных, с использованием системы поиска вычисляют ритмические характеристики текстов, а именно векторы, кодирующие ритмические свойства отдельных строк текста и всего текста в целом. Построение векторов системой поиска происходит в два этапа. На первом этапе строка сегментируется на слоги, после чего в строке автоматически проставляются ударения и на выходе первого этапа в строке текста расставлены границы слогов и для каждого из них указано, ударный это слог, безударный или система поиска не может однозначно определить его ударность. На втором этапе система поиска вычисляет вектор для каждой строки текста, после чего вычисляет ...

Подробнее
20-07-2000 дата публикации

ANORDNUNG ZUR SIGNALVERARBEITUNG

Номер: DE0069230139T2

Подробнее
11-07-2019 дата публикации

Verfahren zur Sprachverarbeitung und Sprachverarbeitungsvorrichtung

Номер: DE102019100403A1
Принадлежит:

Bei Telefonaten in der Öffentlichkeit zögern Nutzer wegen des Abhörrisikos möglicherweise, private oder geheime Informationen bereitzustellen. Eine Freisprechlösung, um geheime Informationen in elektronische Sprachkommunikationsgeräte einzugeben, basiert auf Sprachverarbeitung. Ein Verfahren zur Sprachverarbeitung eines Spracheingabedatenstroms umfasst die Schritte Durchsuchen des Spracheingabedatenstroms und Detektieren eines gesprochenen Delimiters darin, Bestimmen einer vordefinierten Audiosequenz entsprechend dem detektierten gesprochenen Delimiter, Einfügen der bestimmten vordefinierten Audiosequenz bei dem gesprochenen Delimiter in den Spracheingabedatenstrom, wobei ein substituierter Sprachdatenstrom erhalten wird und wobei Sprachabschnitte des Spracheingabedatenstroms mindestens vor dem gesprochenen Delimiter in dem substituierten Sprachdatenstrom verbleiben, und Bereitstellen des substituierten Sprachdatenstroms an einem Audioausgang an einen Empfänger.

Подробнее
06-11-2003 дата публикации

Verfahren und Vorrichtung zur Prüfung von Sprache

Номер: DE0069725252D1
Принадлежит: TELIA AB, TELIA AB, FARSTA

Подробнее
30-09-2020 дата публикации

Multi-user personalization at a voice interface device

Номер: GB0002556656B
Принадлежит: GOOGLE LLC, Google LLC

Подробнее
01-04-2020 дата публикации

Microphone authentication

Номер: GB0002567018B

Подробнее
11-10-2017 дата публикации

Detection of replay attack

Номер: GB0201713699D0
Автор:
Принадлежит:

Подробнее
12-02-2020 дата публикации

Detection of replay attack

Номер: GB0201919464D0
Автор:
Принадлежит:

Подробнее
13-12-2017 дата публикации

Neural networks for speaker verification

Номер: GB0201717774D0
Автор:
Принадлежит:

Подробнее
22-02-2023 дата публикации

Processing method and device

Номер: GB0002610013A
Принадлежит:

Obtaining input information from an input member of an electronic apparatus. The input information includes a behaviour parameter in a process of inputting a target word. The method further includes determining a display parameter of the target word based on the behaviour parameter to display the target word on a target display according to the display parameter. The display parameter represents feature information when inputting the target word. The input information may include trajectory information from the input, audio input information, or posture input information.

Подробнее
15-10-2009 дата публикации

AKUSTICHES CALLER IDENTIFICATION PROCEDURE

Номер: AT0000445965T
Принадлежит:

Подробнее
15-08-2003 дата публикации

SPEAKER RECOGNITION

Номер: AT0000246835T
Принадлежит:

Подробнее
15-08-2000 дата публикации

ARRANGEMENTS FOR SIGNAL PROCESSING

Номер: AT0000194878T
Принадлежит:

Подробнее
02-05-2019 дата публикации

End-to-end speaker recognition using deep neural network

Номер: AU2017322591A1
Принадлежит: Griffith Hack

The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

Подробнее
16-01-2020 дата публикации

METHOD, APPARATUS AND SYSTEM FOR SPEAKER VERIFICATION

Номер: AU2019279933A1
Принадлежит: IP& Pty Ltd

Abstract The present disclosure relates to a method, apparatus, and system for speaker verification. The method includes: acquiring an audio recording; extracting speech signals from the audio recording; extracting features of the extracted speech signals; and determining whether the extracted speech signals represent speech by a predetermined speaker based on the extracted features and a speaker model trained with reference voice data of the predetermined speaker.

Подробнее
21-07-2005 дата публикации

Method for identifying people

Номер: AU2004312589A1
Автор: KRESS MARKUS, MARKUS KRESS
Принадлежит:

Подробнее
20-05-1980 дата публикации

AUTOMATIC SPEAKER VERIFICATION SYSTEMS EMPLOYING MOMENT INVARIANTS

Номер: CA0001078066A1
Принадлежит:

Подробнее
15-03-2018 дата публикации

END-TO-END SPEAKER RECOGNITION USING DEEP NEURAL NETWORK

Номер: CA0003096378A1
Принадлежит: HAUGEN, J. JAY

Подробнее
22-03-2018 дата публикации

CHANNEL-COMPENSATED LOW-LEVEL FEATURES FOR SPEAKER RECOGNITION

Номер: CA0003036561A1
Принадлежит: HAUGEN, J. JAY

A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.

Подробнее
07-02-2008 дата публикации

IDENTIFICATION OF PEOPLE USING MULTIPLE TYPES OF INPUT

Номер: CA0002653278A1
Принадлежит:

Systems and methods for detecting people or speakers in an automated fash ion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learnin g algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

Подробнее
25-05-2018 дата публикации

Identity verification method and device based on recurrent neural network

Номер: CN0108074575A
Автор: CHEN SHUDONG
Принадлежит:

Подробнее
04-01-2019 дата публикации

An apparatus and method for mobile payment at a vehicle end

Номер: CN0109146492A
Автор: SHI LIANG, JIA LIJUAN
Принадлежит:

Подробнее
14-12-2018 дата публикации

VOICE END-POINT DETECTION DEVICE, SYSTEM AND METHOD

Номер: CN0109003626A
Принадлежит:

Подробнее
31-08-2018 дата публикации

BY CLASSIFYING THE AUDIO DIARISATION SEQUENTIAL REDUCED IN THIS SPACE.

Номер: FR0003063377A1
Принадлежит:

Les interactions sociales sont l'objet d'applications dans une multitude de domaines de l'industrie, de la santé, la défense, etc. La parole porte des informations essentielles pour la communication entre les humains et entre l'homme et les machines de calcul, y compris les robots. La parole contient des données sémantiques, portant un sens lexical, et également des données non-sémantiques, les dernieres etant l'objet d'une riche recherche fondamentale et applicative le dernier temps. Tout traitement non-sémantique de la parole est soumis au paradigme d'intelligence artificielle: extraction des caractéristiques du signal audio; classification des données pour la création d'un modèle avec des "patterns" cible, e.g. émotions nommées, stress, âge, sexe, etc.; évaluation, ou prédiction des "patterns" voix, sur un flux audio proche du contexte d'entraînement . En bref, cela se résume au phases suivantes: captation du signal, classification en mode apprentissage, et prédiction. Lors de l'analyse ...

Подробнее
15-12-2020 дата публикации

Method for providing voice of each speaker

Номер: KR0102190988B1
Автор:
Принадлежит:

Подробнее
09-07-2020 дата публикации

METHOD FOR REAL-TIME SPEAKER DETERMINATION

Номер: KR1020200083685A
Автор:
Принадлежит:

Подробнее
03-02-2020 дата публикации

METHOD, APPARATUS AND COMPUTER PROGRAM FOR PROVIDING INTERACTION MESSAGE

Номер: KR1020200011198A
Принадлежит:

Подробнее
25-04-2019 дата публикации

Номер: KR1020190042919A
Автор:
Принадлежит:

Подробнее
06-08-2001 дата публикации

Metod och anordning för talverifiering

Номер: SE0000515447C2
Автор:
Принадлежит:

Подробнее
17-06-2010 дата публикации

METHOD FOR VERIFYING THE IDENTITY OF A SPEAKER, SYSTEM THEREFORE AND COMPUTER READABLE MEDIUM

Номер: WO2010066310A1
Принадлежит:

The invention refers to a method of verifying the identity of a speaker based on the speakers voice comprising the steps of: receiving (1, 5) a first and a second voice utterance; using biometric voice data to verify (2, 6) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received first and/or second voice utterance and determine (8) the similarity of the two received voice utterances characterized in that the similarity is determined using biometric voice characteristics of the two voice utterances or data derived from such biometric voice characteristics. The invention further refers to a System (80) for verifying the identity of a speaker based on the speakers voice comprising: a component (81) for receiving a first and a second voice utterance; a component (82) for using biometric voice data to verify that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received first and/or second ...

Подробнее
16-07-2019 дата публикации

Voice control of playback device using voice assistant service(s)

Номер: US0010354658B2
Принадлежит: Sonos, Inc., SONOS INC

Disclosed herein are example techniques to identify a voice service to process a voice input. An example implementation may involve a playback device capturing, via a microphone array, audio into one or more buffers. The playback device analyzes analyzing the captured audio using multiple wake-word detection algorithms. When a particular wake-word detection algorithm detects a wake-word corresponding to a particular voice assistant service, the playback device transmits the captured audio to the particular voice assistant service. The captured audio includes a voice input that includes a command to modify at least one playback setting of a media playback system. After transmitting the captured audio, the playback device receives, from the particular voice assistant service, instructions to modify the at least one playback setting according to the command, modifies the at least one playback setting, and with the at least one playback setting modified, plays back at least one audio track.

Подробнее
20-04-2021 дата публикации

Voice enhancement using depth image and beamforming

Номер: US0010984816B2
Принадлежит: GOERTEK INC., GOERTEK INC

A voice enhancement method and apparatus of a smart device and a smart device are disclosed. The method comprises: monitoring and collecting a voice signal sent by a user in real time; determining a direction of the user according to the voice signal; collecting a depth image in the direction of the user; determining a sound source direction of the user according to the depth image; and adjusting a beamforming direction of a microphone array on the smart device according to the sound source direction of the user, and performing enhancement processing on the voice signal.

Подробнее
07-11-2019 дата публикации

VOICE IDENTIFICATION ENROLLMENT

Номер: US2019341055A1
Принадлежит:

Examples are disclosed that relate to voice identification enrollment. One example provides a method of voice identification enrollment comprising, during a meeting in which two or more human speakers speak at different times, determining whether one or more conditions of a protocol for sampling meeting audio used to establish human speaker voiceprints are satisfied, and in response to determining that the one or more conditions are satisfied, selecting a sample of meeting audio according to the protocol, the sample representing an utterance made by one of the human speakers. The method further comprises establishing, based at least on the sample, a voiceprint of the human speaker.

Подробнее
31-10-2019 дата публикации

SPEAKER IDENTIFICATION

Номер: US2019333522A1
Принадлежит:

A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.

Подробнее
26-11-2019 дата публикации

Speech processing apparatus, speech processing method and computer-readable medium

Номер: US0010490194B2
Принадлежит: NEC Corporation, NEC CORP

A speech processing apparatus, method and non-transitory computer-readable storage medium are disclosed. A speech processing apparatus may include a memory storing instructions, and at least one processor configured to process the instructions to calculate an acoustic diversity degree value representing a degree of variation in types of sounds included in a speech signal representing a speech, on a basis of the speech signal, and compensate for a recognition feature value calculated to recognize specific attribute information from the speech signal, using the acoustic diversity degree value.

Подробнее
05-01-2021 дата публикации

Apparatuses and methods for recognizing object and facial expression robust against change in facial expression, and apparatuses and methods for training

Номер: US0010885317B2

A facial expression recognition apparatus and method and a facial expression training apparatus and method are provided. The facial expression recognition apparatus generates a speech map indicating a correlation between a speech and each portion of an object based on a speech model, extracts a facial expression feature associated with a facial expression based on a facial expression model, and recognizes a facial expression of the object based on the speech map and the facial expression feature. The facial expression training apparatus trains the speech model and the facial expression model.

Подробнее
08-02-2018 дата публикации

VOICEPRINT-RECOGNITION-BASED SECURITY PROTECTION METHOD AND DEVICE

Номер: US20180039767A1
Принадлежит: ZTE CORPORATION

Provided is a voiceprint-recognition-based security protection method. The method includes: acquiring voice data of a current user of a terminal and extracting voiceprint characteristic information from the voice data; matching the extracted voiceprint characteristic information of the current user of the terminal with a pre-saved voiceprint model of an owner of the terminal, and judging whether the current user of the terminal is the owner of the terminal; and when judging that the current user of the terminal is not the owner of the terminal, performing security protection processing on the terminal.

Подробнее
19-01-2017 дата публикации

Voice Controlled Multimedia Content Creation

Номер: US20170019362A1
Принадлежит:

Voice controlled multimedia content creation techniques are discussed in which a multimedia package is created and shared to a specified destination responsive to voice commands. The voice commands can be received by a device as a single stream (e.g., a single phrase) that causes automatic performance of a sharing sequence or as a series of multiple voice commands that are input in response to prompts for voice input as part of the sharing sequence. The voice commands can be recognized and handled by a content creation system of the device to select a clip for tagging of content (such as captured audio or video). The selected clip is then combined with the content to create the multimedia package. Voice commands can also be employed to specify a destination for sharing of the content, such as one or more contacts or a particular sharing site. 1. A method implemented by a computing device comprising:capturing audio input;recognizing one or more voice commands to create a multimedia package as indicated by the audio input;ascertaining contextual factors for creation of the multimedia package;determining a content clip for tagging of the multimedia package based at least in part upon the contextual factors;obtaining multimedia content for inclusion in the multimedia package; andforming the multimedia package by combining the multimedia package with the content clip.2. A method as described in claim 1 , wherein the multimedia content for inclusion in the multimedia package is pre-existing content identified via the one or more voice commands.3. An method as described in claim 1 , wherein the multimedia content for inclusion in the multimedia package is captured by the computing device responsive to the one or more voice commands.4. A input device as described in claim 1 , wherein the method is performed responsive to the audio input received as a single input stream that includes the one or more voice commands.5. A method as described in claim 1 , wherein the one or ...

Подробнее
09-03-2017 дата публикации

METHOD AND DEVICE FOR SPEECH RECOGNITION

Номер: US20170069320A1
Принадлежит:

Embodiments of the present disclosure provide a method and device for speech recognition. The solution comprises: receiving a first speech signal issued by a user; performing analog to digital conversion on the first speech signal to generate a first digital signal after the analog to digital conversion; extracting a first speech parameter from the first digital signal, the first speech parameter describing a speech feature of the first speech signal; if the first speech parameter coincides with a first prestored speech parameter in a sample library, executing control signalling instructed by the first digital signal, the sample library prestoring prestored speech parameters of N users, N≧1. The solution can be applied in a speech recognition process and can improve the accuracy of speech recognition.

Подробнее
08-10-2020 дата публикации

METHOD AND APPARATUS FOR DETECTING AN END OF AN UTTERANCE

Номер: US20200321022A1
Принадлежит:

A device to perform end-of-utterance detection includes a speaker vector extractor configured to receive a frame of an audio signal and to generate a speaker vector that corresponds to the frame. The device also includes an end-of-utterance detector configured to process the speaker vector and to generate an indicator that indicates whether the frame corresponds to an end of an utterance of a particular speaker.

Подробнее
27-04-2017 дата публикации

ACOUSTIC AND SURFACE VIBRATION AUTHENTICATION

Номер: US20170116995A1
Принадлежит:

Systems and methods for authorizing a user of a portable communications device entail sampling a user utterance via both an air mic (audio mic) and a conduction mic (surface mic or bone conduction mic). The difference between these signals is unique to each user since the tissues of each user will differ with respect to audio conduction. This difference may be characterized via a transform including magnitude, phase and time delay components. If the transform for a prospective user matches a stored transform for an authorized user, then the prospective user may be granted access.

Подробнее
16-03-2021 дата публикации

Audio fingerprint extraction method and device

Номер: US0010950255B2

An audio fingerprint extraction method and device are provided. The method includes: converting an audio signal to a spectrogram; determining one or more characteristic points in the spectrogram; in the spectrogram, determining one or more masks for the characteristic points; determining mean energy of each of the spectrum regions; determining one or more audio fingerprint bits according to mean energy of the plurality of spectrum regions in the one or more masks; judging credibility of the audio fingerprint bits to determine one or more weight bits; and combining the audio fingerprint bits and the weight bits to obtain an audio fingerprint. Each of the one or more masks includes a plurality of spectrum regions.

Подробнее
06-03-2003 дата публикации

Method and system for non-intrusive speaker verification using behavior models

Номер: US2003046072A1
Автор:
Принадлежит:

A system and method for verifying user identity, in accordance with the present invention, includes a conversational system for receiving inputs from a user and transforming the inputs into formal commands. A behavior verifier is coupled to the conversational system for extracting features from the inputs. The features include behavior patterns of the user. The behavior verifier is adapted to compare the input behavior to a behavior model to determine if the user is authorized to interact with the system.

Подробнее
07-04-2020 дата публикации

System and method for performing caller identity verification using multi-step voice analysis

Номер: US0010614813B2
Принадлежит: Intellisist, Inc., INTELLISIST INC

Caller identity verification can be improved by employing a multi-step verification that leverages speech features that are obtained from multiple interactions with a caller. An enrollment is performed in which customer speech features and customer information are collected. When a caller calls into the call center, an attempt is made to verify the caller's identity by requesting the caller to speak a predefined phrase, extracting speech features from the spoken phrase, and comparing the phrase. If the purported identity of the caller can be matched with one of the customers based on the comparison, the identity of the caller is verified. If the match cannot be made with a high enough degree of confidence, the customer is asked to speak any phrase that is not predefined. Features are extracted from the caller's speech, combined with features previously extracted from the predefined speech, and compared to the enrollment features.

Подробнее
21-04-2020 дата публикации

Cepstral variance normalization for audio feature extraction

Номер: US0010629184B2
Принадлежит: Intel Corporation, INTEL CORP, INTEL CORPORATION

Cepstral variance normalization is described for audio feature extraction. In some embodiments a method includes receiving a sequence of frames of digitized audio from a microphone, determining a feature vector for a first frame of the sequence of frames, the feature vector being determined using an initial mean and an initial variance, updating the initial mean to a current mean using the determined feature vector for the first frame, updating the variance to a current variance using the current mean and the determined feature vector for the first frame, determining a next feature vector for each of subsequent frames of the sequence of frames, after determining a next feature vector for each subsequent frame, updating the current mean to a next current mean and updating the current variance to a next current variance and wherein determining a feature vector for a subsequent frame comprises using the next current mean and the next current variance, and sending the determined feature vectors ...

Подробнее
14-07-2020 дата публикации

Voiceprint recognition model construction

Номер: US0010714094B2

Technologies related to voiceprint recognition model construction are disclosed. In an implementation, a first voice input from a user is received. One or more predetermined keywords from the first voice input are detected. One or more voice segments corresponding to the one or more predetermined keywords are recorded. The voiceprint recognition model is trained based on the one or more voice segments. A second voice input is received from a user, and the user's identity is verified based on the second voice input using the voiceprint recognition model.

Подробнее
05-10-2017 дата публикации

Unlocking Method and Electronic Device

Номер: US20170287491A1
Принадлежит:

An unlocking method and electronic device are provided. The unlocking method includes: receiving input sound information; and when a voiceprint feature of the sound information is within a preset voiceprint feature range of a corresponding function or application, unlocking the function or the application. The embodiments of the present disclosure enable a particular user group unable to operate a locked function or application, and enable other user groups to unlock a corresponding function or application in at least one unlocking manner, thereby more fully meeting a requirement of a user on encryption of a function or an application of a terminal.

Подробнее
05-10-2017 дата публикации

SPEAKER RECOGNITION USING ADAPTIVE THRESHOLDING

Номер: US20170287490A1
Принадлежит:

Techniques related to speaker recognition are discussed. Such techniques may include determining an adaptive speaker recognition threshold based on a speech to noise ratio and noise type label corresponding to received audio and performing speaker recognition based on the adaptive speaker recognition threshold and a speaker recognition score corresponding to received audio.

Подробнее
10-01-2023 дата публикации

Payment method, client, electronic device, storage medium, and server

Номер: US0011551219B2
Принадлежит: ALIBABA GROUP HOLDING LIMITED

Embodiments of this application disclose a payment method, a client, an electronic device, a storage medium, and a server. The method includes: receiving a payment instruction of a user; generating, according to audio information in a voice input of the user, a voice feature vector of the audio information; performing matching between the voice feature vector and a user feature vector; and when the matching succeeds, sending personal information associated with the user feature vector to a server, so that the server performs a payment operation for a resource account associated with the personal information. The method can bring convenience to shopping by a consumer.

Подробнее
03-05-2022 дата публикации

Voice user interface

Номер: US0011322157B2

A method of speaker authentication comprises: receiving a speech signal; dividing the speech signal into segments; and, following each segment, obtaining an authentication score based on said segment and previously received segments, wherein the authentication score represents a probability that the speech signal comes from a specific registered speaker. In response to an authentication request, an authentication result is output based on the authentication score.

Подробнее
07-05-2024 дата публикации

System and method for automatic speech translation based on zero user interface

Номер: US0011977855B2

The Zero User Interface (UI)-based automatic speech translation system and method can solve problems such as the procedural inconvenience of inputting speech signals and the malfunction of speech recognition due to crosstalk when users who speak difference languages have a face-to-face conversation. The system includes an automatic speech translation server, speaker terminals and a counterpart terminal. The automatic speech translation server selects a speech signal of a speaker among multiple speech signals received from speaker terminals connected to an automatic speech translation service and transmits a result of translating the speech signal of the speaker into a target language to a counterpart terminal.

Подробнее
05-01-1984 дата публикации

PRIVATE COLLATOR

Номер: JP0059000192A
Принадлежит:

Подробнее
10-03-2010 дата публикации

УСОВЕРШЕНСТВОВАННАЯ ИДЕНТИФИКАЦИЯ ВЫЗЫВАЮЩЕГО АБОНЕНТА НА ОСНОВЕ РАСПОЗНАВАНИЯ РЕЧИ

Номер: RU2383938C2

Изобретение относится к идентификации вызывающего абонента. Изобретение позволяет надежно идентифицировать абонента на основе распознавания речи. Создается персонализированная контекстно-свободная грамматика (грамматика CFG) для каждого потенциального принимающего вызов абонента, которая конфигурируется для поддержки идентификации вызывающих абонентов с использованием распознавания голоса. Каждая грамматика CFG содержит указание на высоковероятных вызывающих абонентов, и весовые коэффициенты вероятности в каждой грамматике CFG изменяются соответствующим образом. Когда принимающий абонент получает вызов, применяется релевантная грамматика CFG совместно с приложением распознавания голоса для обеспечения по меньшей мере предварительной идентификации вызывающего абонента. Вызывающий абонент может подтвердить идентификационные данные. При возможности используется стандартное средство идентификации вызывающего абонента по меньшей мере для содействия процессу идентификации. Можно также использовать ...

Подробнее
27-05-2011 дата публикации

СПОСОБ ИДЕНТИФИКАЦИИ ГОВОРЯЩЕГО ПО ФОНОГРАММАМ ПРОИЗВОЛЬНОЙ УСТНОЙ РЕЧИ НА ОСНОВЕ ФОРМАНТНОГО ВЫРАВНИВАНИЯ

Номер: RU2419890C1

Изобретение относится к области опознавания говорящего по голосу, в частности к способам идентификации говорящего по фонограммам произвольной устной речи, предназначенным в том числе для криминалистических исследований. Сущность способа состоит в том, что идентификацию говорящего по фонограммам устной речи осуществляют путем оценки сходства между первой фонограммой говорящего и второй эталонной фонограммой. Для указанной оценки на первой и второй фонограммах выбирают опорные фрагменты речевых сигналов, на которых присутствуют формантные траектории, по крайней мере, трех формантных частот, сравнивают между собой опорные фрагменты, в которых совпадают значения, по крайней мере, двух формантных частот, оценивают сходство сравниваемых опорных фрагментов по совпадению значений остальных формантных частот, а сходство фонограмм в целом определяют по суммарной оценке сходства всех сравниваемых опорных фрагментов. Технический результат - обеспечивают надежную идентификацию говорящего как для длинных ...

Подробнее
17-05-2021 дата публикации

СПОСОБ И СИСТЕМА АУТЕНТИФИКАЦИИ ПОЛЬЗОВАТЕЛЯ С ПОМОЩЬЮ ГОЛОСОВОЙ БИОМЕТРИИ

Номер: RU2747935C2
Принадлежит: ПВ ГРУП (FR)

Изобретение относится к области вычислительной техники для аутентификации пользователя с помощью голосовой биометрии. Технический результат заключается в повышении надежности аутентификации пользователя с помощью голосовой биометрии и устойчивости к атакам. Технический результат достигается за счет получения контрольных данных авторизованного пользователя, во время которого пользователь произносит контрольную фразу по меньшей мере один раз и фразу преобразуют в последовательность контрольных символов посредством статистического преобразования, общего для всех пользователей, контрольные данные которых подлежат получению, и тестирования аутентификации, включающего в себя первый этап, во время которого пользователь-кандидат произносит контрольную фразу по меньшей мере один раз и произнесенную фразу преобразуют так же, как и контрольную фразу во время предварительного этапа, путем использования того же преобразования, в последовательность символов-кандидатов, и второй этап, во время которого ...

Подробнее
27-01-2006 дата публикации

УСОВЕРШЕНСТВОВАННАЯ ИДЕНТИФИКАЦИЯ ВЫЗЫВАЮЩЕГО АБОНЕНТА НА ОСНОВЕ РАСПОЗНАВАНИЯ РЕЧИ

Номер: RU2004124499A
Принадлежит:

... 1. Реализуемый компьютером способ определения идентификационных данных вызывающего телефонного абонента, соотнесенного с входящим телефонным вызовом, направленным конкретному потенциальному получателю вызова, заключающийся в том, что создают и сохраняют персонализированную грамматику распознавания речи для множества потенциальных получателей вызовов, включая упомянутого конкретного потенциального получателя вызова, получают речевую выборку от вызывающего телефонного абонента, и выбирают идентификационные данные вызывающего телефонного абонента на основе, по меньшей мере частично, персонализированной грамматики распознавания речи, ассоциированной с упомянутым конкретным потенциальным получателем вызова. 2. Способ по п.1, отличающийся тем, что сохранение персонализированной грамматики распознавания речи для множества потенциальных получателей вызовов включает в себя сохранение грамматики распознавания речи, которая взвешена для поддержки идентификации набора высоковероятных вызывающих абонентов ...

Подробнее
12-04-1984 дата публикации

Номер: DE0002659083C2

Подробнее
20-06-2018 дата публикации

Speaker identification

Номер: GB0002557375A
Принадлежит:

A speaker recognition system extracts feature vectors from a signal to produce a match score to compare with stored models of enrolled speakers S1 S3, the method terminating upon speaker identification above a certainty threshold T1.2, or non-identification below a lower threshold T2.2. A Voice Activity Detector (VAD) triggers two parallel recognition processes S1 & S2 at t0 which accumulate match scores until respective high and low thresholds are reached at t1 and t2, at which point the process is disabled until S2 speaks at t4. The process may be re-enabled during this period by a speech start event, eg. a detected change of speaker direction or frequency. Only 1-2 seconds of resource-intensive biometric voice verification is thus required.

Подробнее
03-04-2019 дата публикации

Microphone authentication

Номер: GB0002567018A
Принадлежит:

A microphone authentication apparatus with a comparison block 301 which receives a signal indicative of one or more spectral parameters of at least part of an audio signal to be verified. The spectral parameters are compared to predetermined characteristic microphone resonance parameters associated with the acoustic port of a microphone (such as a Helmholtz resonance) and there is a determination block 305 which determines based on this comparison whether the audio signal originated from a microphone. The predetermined characteristics can relate to a peak frequency or range of peak frequencies or a quality factor or range thereof. They may also be parameters of a function, such as a parabolic curve, describing the characteristic resonance. The audio signal may be determined to originate from a generic microphone, or from a specific microphone associated with the system. A signed verification signal may be sent to a voice authentication module. The signal may be down-converted from a first ...

Подробнее
05-08-2020 дата публикации

International Patent Application For Method, apparatus and system for speaker verification

Номер: GB0002580856A
Принадлежит:

The present disclosure relates to a method, apparatus, and system for speaker verification. The method includes: acquiring an audio recording;; extracting speech signals from the audio recording; extracting features of the extracted speech signals; and determining whether the extracted speech signals represent speech by a predetermined speaker based on the extracted features and a speaker model trained with reference voice data of the predetermined speaker.

Подробнее
21-05-1980 дата публикации

Method of verifying a speaker

Номер: GB0002033637A
Принадлежит:

An improved method of verifying the speaker from whom long term spectral characteristics have been derived during a learning phase. Mean values and distance thresholds have been calculated and stored. Improved recognition is obtained by incorporating into the stored values additional values of speech samples in ranges situated outside the distance threshold value. When, during verification, a speech sample not situated within the distance threshold value is encountered its distance from the additionally stored speech samples is successively determined and compared with the separate distance threshold value. The speaker is rejected only if the speech sample to be verified is not situated at a sufficiently small distance from any of the additionally stored speech samples. The method thus increases the recognition range to better approximate the actual distribution of speech samples of the learning phase.

Подробнее
21-06-2017 дата публикации

Speaker identification

Номер: GB0201707094D0
Автор:
Принадлежит:

Подробнее
21-06-2023 дата публикации

Method and apparatus for improving speech intelligibility in a room

Номер: GB0002605693B
Принадлежит: PORSCHE AG [DE]

Подробнее
07-12-2022 дата публикации

Audio system with digital microphone

Номер: GB0002607505A
Принадлежит:

An audio system receives an audio signal from a digital microphone, which has an analog-digital converter with a controllable sampling rate. In response to a determination that a predetermined trigger phrase is not detected in the decimated audio signal, the sampling rate of the analog-digital converter in the digital microphone is controlled such that the audio signal has a first sample rate. In response to a determination that the predetermined trigger phrase is detected in the decimated signal, the sampling rate of the analog-digital converter in the digital microphone is controlled such that the audio signal has a second sample rate higher than the first sample rate, and the audio signal is applied to a spoof detection circuit, to determine whether the received signal contains live speech or replayed speech.

Подробнее
15-08-1996 дата публикации

PROCEDURE FOR THE SPEAKER RECOGNITION

Номер: AT0000140552T
Принадлежит:

Подробнее
15-01-2012 дата публикации

SYSTEM AND PROCEDURE FOR STIMMENAUTHENTIFIZIERUNG

Номер: AT0000539430T
Автор: ADIBI SASAN, ADIBI, SASAN
Принадлежит:

Подробнее
15-05-2006 дата публикации

PROCEDURE AND DEVICE FOR THE SPEAKER RECOGNITION AND - VERIFICATION

Номер: AT0000323933T
Принадлежит:

Подробнее
19-12-2019 дата публикации

Technologies for authenticating a speaker using voice biometrics

Номер: AU2017274657B2

Technologies for authenticating a speaker in a voice authentication system using voice biometrics include a speech collection computing device and a speech authentication computing device. The speech collection computing device is configured to collect a speech signal from a speaker and transmit the speech signal to the speech authentication computing device. The speech authentication computing device is configured to compute a speech signal feature vector for the received speech signal, retrieve a speech signal classifier associated with the speaker, and feed the speech signal feature vector to the retrieved speech signal classifier. Additionally, the speech authentication computing device is configured to determine whether the speaker is an authorized speaker based on an output of the retrieved speech signal classifier. Additional embodiments are described herein.

Подробнее
21-07-2005 дата публикации

METHOD FOR IDENTIFYING PEOPLE

Номер: CA0002552247A1
Автор: KRESS, MARKUS
Принадлежит:

The invention relates to a method for identifying people, whereby a person is identified by comparing an electric signal derived from a sound produced by the person with a stored signal of the above kind. The invention is characterized in that the signals to be compared are derived from the subphonemic range of sound production. The signal especially refers to a quasiperiod of a vowel or a semivowel.

Подробнее
04-05-2018 дата публикации

SYSTEM AND METHOD FOR PERFORMING CALLER IDENTITY VERIFICATION USING MULTI-STEP VOICE ANALYSIS

Номер: CA0002984787A1
Принадлежит:

Caller identity verification can be improved by employing a multi-step verification that leverages speech features that are obtained from multiple interactions with a caller. An enrollment is performed in which customer speech features and customer information are collected. When a caller calls into the call center, an attempt is made to verify the caller's identity by requesting the caller to speak a predefined phrase, extracting speech features from the spoken phrase, and comparing the phrase. If the purported identity of the caller can be matched with one of the customers based on the comparison, the identity of the caller is verified. If the match cannot be made with a high enough degree of confidence, the customer is asked to speak any phrase that is not predefined. Features are extracted from the caller's speech, combined with features previously extracted from the predefined speech, and compared to the enrollment features.

Подробнее
27-10-2015 дата публикации

SYSTEM AND METHOD FOR LOW OVERHEAD VOICE AUTHENTICATION

Номер: CA0002720727C
Автор: ABIDI, SASAN, ABIDI SASAN

A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.

Подробнее
27-10-2015 дата публикации

SYSTEM AND METHOD FOR LOW OVERHEAD VOICE AUTHENTICATION

Номер: CA0002720823C
Автор: ADIBI, SASAN, ADIBI SASAN

A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.

Подробнее
07-07-2011 дата публикации

METHOD AND SYSTEM FOR PROCESSING MULTIPLE SPEECH RECOGNITION RESULTS FROM A SINGLE UTTERANCE

Номер: CA0002785081A1
Принадлежит:

A method of and system for accurately determining a caller response by processing speech- recognition results and returning that result to a directed-dialog application for further interaction with the caller. Multiple speech-recognition engines are provided that process the caller response in parallel. Returned speech-recognition results comprising confidence-score values and word-score values from each of the speech-recognition engines may be modified based on context information provided by the directed-dialog application and grammars associated with each speech-recognition engine. An optional context database may be used to further reduce or add weight to confidence-score values and word-score values, remove phrases and/or words, and add phrases and/or words to the speech-recognition engine results. In situations where a predefined threshold-confidence-score value is not exceeded, a new dynamic grammar may be created. A set of n-best hypotheses of what the caller uttered is returned ...

Подробнее
14-03-2000 дата публикации

A METHOD AND APPARATUS FOR SPEAKER RECOGNITION

Номер: CA0002158847C

Apparatus for speaker recognition which comprises means (210, 220, 230) for generating, in response to a speech signal, a plurality of feature data comprising a series of coefficient sets, each set comprising a plurality of coefficients indicating the short term spectral amplitude in a plurality of frequency bands, and means (260) for comparing said feature data with predetermined speaker reference data, and for indicating recognition of a corresponding speaker in dependence upon said comparison; characterised in that said frequency bands are unevenly spaced along the frequency axis, and by means (250) for deriving a long term average spectral magnitude of at least one of said coefficients; and for normalising the or each of said at least one coefficient by said long term average.

Подробнее
05-09-2000 дата публикации

SPEECH RECOGNITION WITH PAUSE DETECTION

Номер: CA0002158849C

A recognition system comprising: input means for receiving a speech signal; recognition processing means for processing the speech signal to indicate its similarity to predetermined patterns to be recognised, said recognition processing means being arranged repeatedly to partition the speech signal into a pattern-containing portion and, preceding and following said pattern-containing portions, noise or silence portions, and to identify a pattern corresponding to said pattern containing portion; and output means for supplying a recognition signal indicating recognition of one of said patterns, characterised by pause detection means for detecting the noise or silence portion which follows the pattern-containing portion, and means, responsive to the detection thereof, arranged to supply a signal identifying the pattern currently corresponding to the pattern portion to the output means. Also provided are similarly operating rejection means.

Подробнее
30-07-2014 дата публикации

СПОСОБ ИДЕНТИФИКАЦИИ ГОВОРЯЩЕГО ПО ФОНОГРАММАМ ПРОИЗВОЛЬНОЙ УСТНОЙ РЕЧИ НА ОСНОВЕ ФОРМАНТНОГО ВЫРАВНИВАНИЯ

Номер: EA0000019949B1

Способ идентификации говорящего по фонограммам произвольной устной речи на основе формантного выравнивания. Предлагаемый способ позволяет осуществлять надежную идентификацию говорящего как для длинных, так и для коротких фонограмм, фонограмм, записанных в различных каналах с высоким уровнем помех и искажений, а также фонограмм с произвольной устной речью дикторов, находящихся в различных психофизиологических состояниях, говорящих на различающихся языках, что обеспечивает широкую область применения предлагаемого способа, в том числе в криминалистических исследованиях. Идентификации говорящего по фонограммам устной речи осуществляют путем оценки сходства между первой фонограммой говорящего и второй эталонной фонограммой. Для указанной оценки на первой и второй фонограммах выбирают опорные фрагменты речевых сигналов, на которых присутствуют формантные траектории по крайней мере трех формант, сравнивают между собой опорные фрагменты, в которых совпадают значения по крайней мере двух формантных ...

Подробнее
30-07-2012 дата публикации

СПОСОБ ИДЕНТИФИКАЦИИ ГОВОРЯЩЕГО ПО ФОНОГРАММАМ ПРОИЗВОЛЬНОЙ УСТНОЙ РЕЧИ НА ОСНОВЕ ФОРМАНТНОГО ВЫРАВНИВАНИЯ

Номер: EA201290082A1
Принадлежит:

Предлагаемый способ идентификации говорящего по фонограммам произвольной устной речи на основе формантного выравнивания позволяет осуществлять надежную идентификацию говорящего как для длинных, так и для коротких фонограмм, фонограмм, записанных в различных каналах с высоким уровнем помех и искажений, а также фонограмм с произвольной устной речью дикторов, находящихся в различных психофизиологических состояниях, говорящих на различающихся языках, что обеспечивает широкую область применения предлагаемого способа, в том числе в криминалистических исследованиях. Идентификация говорящего по фонограммам устной речи осуществляют путем оценки сходства между первой фонограммой говорящего и второй, эталонной фонограммой. Для указанной оценки на первой и второй фонограммах выбирают опорные фрагменты речевых сигналов, на которых присутствуют формантные траектории по крайней мере трех формант, сравнивают между собой опорные фрагменты, в которых совпадают значения по крайней мере двух формантных частот ...

Подробнее
03-09-2019 дата публикации

Electronic device, identity verification method, and computer-readable storage medium

Номер: CN0108564955B
Автор:
Принадлежит:

Подробнее
27-03-2020 дата публикации

Training classifiers using the selected subset of group samples

Номер: CN0106062871B
Автор:
Принадлежит:

Подробнее
15-04-2011 дата публикации

PROCESS AND SYSTEM TO AUTHENTICATE A USER AND/OR A CRYPTOGRAPHIC DATA

Номер: FR0002940498B1
Принадлежит: THALES

Подробнее
09-05-1980 дата публикации

METHOD FOR VERIFYING THE VOICE OF AN INDIVIDUAL

Номер: FR0002438887A1
Автор:
Принадлежит:

Подробнее
07-11-1997 дата публикации

PROCESS OF Voice recognition Of a SPEAKER IMPLEMENTING a PREDICTIVE MODEL, IN PARTICULAR FOR APPLICATIONS OF Access control

Номер: FR0002748343A1
Автор:
Принадлежит:

Подробнее
22-04-2019 дата публикации

Номер: KR0101970753B1
Автор:
Принадлежит:

Подробнее
25-02-2020 дата публикации

Method, for adding account, server, server, and computer storage medium

Номер: KR0102081495B1
Автор:
Принадлежит:

Подробнее
20-06-2017 дата публикации

성문 정보 관리 방법 및 장치, 및 신원 인증 방법 및 시스템

Номер: KR1020170069258A
Автор: 슝, 지안
Принадлежит:

... 본 출원은 제1 사용자의 음성 정보를 얻기 위하여 연관된 시스템 내에 저장된 이력 음성 파일의 필터링, 텍스트 인식 처리에 의한 음성 정보에 대응하는 텍스트 정보 취득, 및 음성 정보 및 대응하는 텍스트 정보를 제1 사용자의 기준 성문 정보로 편집을 포함하는 성문 정보 관리 방법 및 장치, 및 신원 인증 방법 및 시스템에 관한 것이다. 기준 성문 정보 내의 텍스트 정보 및 음성 정보가 모두 연관된 시스템에 의해 사전 설정되지 않고 상술한 이력 음성 파일 기반으로 취득되므로, 즉, 공개되지 않으므로, 신원 인증이 실행될 때 다시 읽어져야 할 텍스트 정보의 특정한 내용을 사용자가 예견할 수 없고, 따라서 사전 녹음된 음성 파일을 재생하여 성공적인 인증의 목적을 달성할 수 없다. 그러므로, 신원 인증이 본 출원의 실시예에 의해 제공되는 성문 정보 관리 방법 기반으로 수행되고, 인증 결과가 더 정확하며, 잠재적인 보안 위험이 존재하지 않으며, 계정의 보안이 강화된다.

Подробнее
22-05-2020 дата публикации

METHOD FOR RETRIEVING CONTENT HAVING VOICE IDENTICAL TO VOICE OF TARGET SPEAKER AND APPARATUS FOR PERFORMING THE SAME

Номер: KR1020200056342A
Автор:
Принадлежит:

Подробнее
04-10-2012 дата публикации

Systems, methods, and media for generating hierarchical fused risk scores

Номер: US20120254243A1
Принадлежит: Victrio Inc

Systems, methods, and media for generating fused risk scores for determining fraud in call data are provided herein. Some exemplary methods include generating a fused risk score used to determine fraud from call data by generating a fused risk score for a leg of call data, via a fuser module of an analysis system, the fused risk score being generated by fusing together two or more uniquely calculated fraud risk scores, each of the uniquely calculated fraud risk scores being generated by a sub-module of the analysis system; and storing the fused risk score in a storage device that is communicatively couplable with the fuser module.

Подробнее
03-01-2013 дата публикации

Method and system for speaker diarization

Номер: US20130006635A1
Автор: Hagai Aronowitz
Принадлежит: International Business Machines Corp

A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Подробнее
27-06-2013 дата публикации

SYSTEM AND METHOD FOR RECOGNIZING A USER VOICE COMMAND IN NOISY ENVIRONMENT

Номер: US20130166279A1
Принадлежит: VEOVOX SA

An automatic speech recognition system for recognizing a user voice command in noisy environment, including: matching means for matching elements retrieved from speech units forming said command with templates in a template library; characterized by processing means including a MultiLayer Perceptron for computing posterior templates (P(O)) stored as said templates in said template library; means for retrieving posterior vectors (P(O)) from said speech units, said posterior vectors being used as said elements. The present invention relates also to a method for recognizing a user voice command in noisy environments. 1. An automatic speech recognition system for recognizing a user voice command in noisy environment , comprising:matching means for matching elements retrieved from speech units forming said command with templates in a template library library;{'sup': 'template(q)', 'processing means including a MultiLayer Perceptron for computing posterior templates (P(O)) stored as said templates in said template library;'}{'sup': 'test(q)', 'means for retrieving posterior vectors (P(O)) from said speech units, said posterior vectors being used as said elements;'}calculating means for automatically selecting posterior templates stored in said template library, wherein said calculating means use a graph approach, such as the Gabriel's approach, or the relative neighbour approach, or a linear interpolation to prepare posterior templates from training templates.2. The system of claim 1 , further comprising:a DTW decoder for matching posterior vectors with posterior templates.3. The system of claim 2 , further comprisinga voice activity detector; anda dictionary.4. The system of claim 1 , wherein said MultiLayer Perceptron is multilingual.5. The system of claim 1 , comprising at least two MultiLayer Perceptrons claim 1 , wherein each of said MultiLayer Perceptrons is used for a specific language.6. The system of claim 1 , wherein said template library is a pre-existing ...

Подробнее
18-07-2013 дата публикации

MULTIPLE CODING MODE SIGNAL CLASSIFICATION

Номер: US20130185063A1
Принадлежит: QUALCOMM INCORPORATED

Improved audio classification is provided for encoding applications. An initial classification is performed, followed by a finer classification, to produce speech classifications and music classifications with higher accuracy and less complexity than previously available. Audio is classified as speech or music on a frame by frame basis. If the frame is classified as music by the initial classification, that frame undergoes a second, finer classification to confirm that the frame is music and not speech (e.g., speech that is tonal and/or structured that may not have been classified as speech by the initial classification). Depending on the implementation, one or more parameters may be used in the finer classification. Example parameters include voicing, modified correlation, signal activity, and long term pitch gain. 1. A method comprising:receiving a portion of an audio signal at a first classifier;classifying the portion of the audio signal at the first classifier as speech or as music;if the portion is classified by the first classifier as speech, then encoding the speech using a first coding mode; and providing the portion to a second classifier;', 'classifying the portion at the second classifier as speech or as music;', 'if the portion is classified at the second classifier as speech, then encoding the portion using a second coding mode; and', 'if the portion is classified at the second classifier as music, then encoding the portion using a third coding mode., 'if the portion is classified by the first classifier as music, then2. The method of claim 1 , wherein the portion of the audio signal is a frame.3. The method of claim 1 , wherein the first coding mode comprises a first speech coder claim 1 , the second coding mode comprises a second speech coder claim 1 , and the third coding mode comprises a music coder.4. The method of claim 3 , wherein the first speech coder is a code excited linear predictive (CELP) type coder claim 3 , the second speech coder is a ...

Подробнее
08-08-2013 дата публикации

SPEAKER ADAPTATION OF VOCABULARY FOR SPEECH RECOGNITION

Номер: US20130204621A1
Принадлежит: NUANCE COMMUNICATIONS, INC.

A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed. 1. A method for constructing at least one speaker-specific recognition vocabulary from a speaker-independent recognition vocabulary that comprises a first group of words , wherein each word in the first group of words contains a first portion associated with plural alternate pronunciations in the speaker-independent recognition vocabulary for the respective word , the method comprising:recognizing, by at least one processor, a first keyword in speech input spoken by a first speaker, wherein the first keyword contains the first portion;identifying, by the at least one processor, a first spoken pronunciation for the first portion based, at least in part, on how the first speaker pronounced the first keyword in the speech input; andconstructing a first speaker-specific recognition vocabulary by including, for each of the words in the first group of words, a first recognition pronunciation of the respective word selected from the plural alternate pronunciations based on the identified first spoken pronunciation.2. The method of claim 1 , wherein the first keyword is identified as a representative of the first group of words prior to recognizing the first keyword in the speech input.3. The method of claim 1 , comprising selecting claim 1 , as the first recognition pronunciation claim 1 , one of the plural alternate pronunciations based on comparing the first spoken pronunciation to a corresponding portion of each of the plural alternate pronunciations.4. The method of claim 3 , further comprising generating adaptation rules based upon the selected first recognition pronunciation claim 3 , wherein the adaptation rules facilitate ...

Подробнее
29-08-2013 дата публикации

Methods employing phase state analysis for use in speech synthesis and recognition

Номер: US20130226569A1
Принадлежит: Lessac Tech Inc

A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.

Подробнее
10-10-2013 дата публикации

Text dependentspeaker recognition with long-term feature based on functional data analysis

Номер: US20130268272A1
Принадлежит: Sony Computer Entertainment Inc

One or more test features are extracted from a time domain signal. The test features are represented by discrete data. The discrete data is represented for each of the one or more test features by a corresponding one or more fitting functions, which are defined in terms of finite number of continuous basis functions and a corresponding finite number of expansion coefficients. Each fitting function is compressed through Functional Principal Component Analysis (FPCA) to generate corresponding sets of principal components. Each principal component for a given test feature is uncorrelated to each other principal component for the given test feature. A distance between a set of principal components for the given test feature and a set of principal components for one or more training features with the processing system is calculated. The test feature is classified according to the distance calculated with the processing system.

Подробнее
06-02-2014 дата публикации

System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain

Номер: US20140037095A1
Принадлежит: Intellisis Corp

A system and method may be configured to process an audio signal. The system and method may track pitch, chirp rate, and/or harmonic envelope across the audio signal, may reconstruct sound represented in the audio signal, and/or may segment or classify the audio signal. A transform may be performed on the audio signal to place the audio signal in a frequency chirp domain that enhances the sound parameter tracking, reconstruction, and/or classification.

Подробнее
13-02-2014 дата публикации

INFORMATION PROCESSING APPARATUS, COMPUTER PROGRAM PRODUCT, AND INFORMATION PROCESSING METHOD

Номер: US20140046666A1
Принадлежит: KABUSHIKI KAISHA TOSHIBA

According to an embodiment, an information processing apparatus includes a dividing unit, an assigning unit, and a generating unit. The dividing unit is configured to divide speech data into pieces of utterance data. The assigning unit is configured to assign speaker identification information to each piece of utterance data based on an acoustic feature of the each piece of utterance data. The generating unit is configured to generate a candidate list that indicates candidate speaker names so as to enable a user to determine a speaker name to be given to the piece of utterance data identified by instruction information, based on operation history information in which at least pieces of utterance identification information, pieces of the speaker identification information, and speaker names given by the user to the respective pieces of utterance data are associated with one another. 1. An information processing apparatus comprising:a first receiving unit configured to receive speech data containing pieces of utterance data of speakers;a dividing unit configured to divide the speech data into the pieces of utterance data;an assigning unit configured to assign speaker identification information to each piece of utterance data based on an acoustic feature of the each piece of utterance data;a second receiving unit configured to receive, from a user, instruction information that indicates a piece of utterance data to which a speaker name is to be given from among the pieces of utterance data included in the speech data; anda generating unit configured to generate a candidate list that indicates candidate speaker names so as to enable the user to determine a speaker name to be given to the piece of utterance data identified by the instruction information, based on operation history information in which at least pieces of utterance identification information for identifying the respective pieces of utterance data, pieces of the speaker identification information that has ...

Подробнее
13-01-2022 дата публикации

SYSTEM AND METHOD FOR PERFORMING VOICE BIOMETRICS ANALYSIS

Номер: US20220012321A1
Принадлежит: NICE LTD.

In a system and method for audio analysis in a cloud-based computerized an authentication (RTA) manager micro-service may send an audio packet to a voice processor micro-service. The voice processor may extract features of the audio. The RTA manager may obtain the extracted features from the voice processor; calculate, based on the extracted features, a quality grade of the audio packet, and send the extracted features to an at least one voice biometrics engine if the quality grade is above a threshold. Each of the at least one voice biometrics engines may be configured to generate a voiceprint of the audio packet, based on the extracted features of the audio packet and to perform at least one of: authenticate a speaker, detect fraudsters, and enrich a previously stored voiceprint of the speaker with the voiceprint of the audio packet. 1. A method for audio analysis , performed by an at least one processor , the method comprising:a. sending part of a stream of audio to a voice processor, wherein the voice processor is configured to extract features of the audio of the part of a stream of audio;b. obtaining the extracted features from the voice processor;c. calculating, based on the extracted features, a quality grade of the part of a stream of audio; and 'wherein each of the at least one voice biometrics engines is configured to generate a voiceprint of the part of a stream of audio, based on the extracted features of the part of a stream of audio.', 'd. sending the extracted features to an at least one voice biometrics engine if the quality grade is above a threshold;'}2. The method of claim 1 , wherein each of the at least one voice biometrics engines is further configured to perform at least one of:authenticate a speaker by comparing a voiceprint of the part of a stream of audio to a previously stored voiceprint of the same speaker,detect fraudsters by comparing the voiceprint of the part of a stream of audio to previously stored voiceprints of known fraudsters; ...

Подробнее
04-01-2018 дата публикации

USER DEFINED KEY PHRASE DETECTION BY USER DEPENDENT SEQUENCE MODELING

Номер: US20180005633A1
Принадлежит:

Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include determining a sequence of audio units for received audio input representing a user defined key phrase, eliminating audio units from the sequence to generate a final sequence of audio units, and generating a key phrase recognition model representing the user defined key phrase based on the final sequence. 1. A computer-implemented method for user dependent key phrase enrollment comprising:determining a sequence of most probable audio units corresponding to a received audio input representing a user defined key phrase, wherein each audio unit of most probable audio units corresponds to a frame of a plurality of frames of the audio input;processing the sequence of most probable audio units to eliminate at least one audio unit from the sequence of most probable audio units to generate a final sequence of audio units; andgenerating a key phrase recognition model representing the user defined key phrase based on the final sequence of audio units, the key phrase recognition model comprising a single rejection state having a transition to a key phrase model, wherein the key phrase model comprises a plurality of states having transitions therebetween, the plurality of states including a final state of the key phrase model, wherein the plurality of states of the key phrase model correspond to the final sequence of audio units.2. The method of claim 1 , wherein the audio units comprises at least one of a sub-phonetic unit or a silence audio unit.3. The method of claim 1 , wherein processing the sequence of most probable audio units to eliminate at least one audio unit comprises determining a first sub-phonetic audio unit of the sequence and a second sub-phonetic audio unit of the sequence immediately temporally following the first sub-phonetic audio unit match and eliminating the first or second sub-phonetic audio unit from the sequence of most probable ...

Подробнее
02-01-2020 дата публикации

DIARIZATION USING LINGUISTIC LABELING

Номер: US20200005796A1
Принадлежит: VERINT SYSTEMS LTD.

Systems and methods diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction. 120.-. (canceled)21. A system for diarization and labeling of audio data , the system comprising:an audio database server comprising a plurality of audio files;a transcription server that transcribes the audio files into textual transcripts; receives a set of textual transcripts from the transcription server and a set of audio files associated with the set of textual transcripts from the audio database server,', 'performs a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript, wherein the diarized textual transcripts are associated in groups of at least two, wherein the group of at least two includes a textual transcript originating from the identified group of speakers and at least one textual transcript originating from an other speaker', 'automatedly applies at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a detection of a script associated with the identified group of speakers,', ' ...

Подробнее
20-01-2022 дата публикации

INTERRUPT FOR NOISE-CANCELLING AUDIO DEVICES

Номер: US20220020387A1
Принадлежит:

Implementations of the subject technology provide systems and methods for determining whether to interrupt a user of an audio device that is operating in a noise-cancelling mode of operation. For example, the user may desire to be interrupted by one or more pre-designated contacts that are identified at an associated electronic device as interrupt-authorized contacts, or by a person who speaks a designated keyword to the user. 1. A device of a first user , the device comprising:secure memory storing a plurality of contacts including contacts designated at the device as interrupt-authorized contacts for a peripheral device; and provide audio content to the peripheral device, the audio content to be played by the peripheral device in a first mode of operation of the peripheral device or to be played combined with noise cancelling content by the peripheral device in a second mode of operation of the peripheral device;', 'receive, from the peripheral device, information associated with a voice input received by the peripheral device from a person other than the first user during operation of the peripheral device in the second mode of operation;', 'determine, at least in part based on the information received from the peripheral device, whether the person is one of the interrupt-authorized contacts;', 'transmit an instruction to the peripheral device to switch from the second mode of operation to the first mode of operation if it is determined that the person is one of the interrupt-authorized contacts; and', 'transmit an instruction to the peripheral device to continue operation in the second mode of operation if it is determined that the person is not one of the interrupt-authorized contact., 'one or more processors configured to2. The device of claim 1 , wherein the interrupt-authorized contacts are not authorized users of the device or the peripheral device claim 1 , and wherein determining that the person is one of the interrupt-authorized contacts does not provide ...

Подробнее
20-01-2022 дата публикации

AUDIO MODIFYING CONFERENCING SYSTEM

Номер: US20220020388A1
Принадлежит:

A computer-implemented method for modifying audio-based communications produced during a conference call is disclosed. The computer-implemented method can include monitoring a plurality of utterances transmitted via an audio feed of a device connected to the conference call. The computer-implemented method can identify a first unwanted audio component transmitted via the audio feed. The computer-implemented method can actively modify the audio feed by removing the first unwanted audio component from the audio feed. 1. A computer-implemented method for modifying audio-based communications produced during a conference call , comprising:monitoring a plurality of utterances transmitted via an audio feed of a device connected to the conference call;identifying a first unwanted audio component transmitted via the audio feed; andactively modifying the audio feed by removing the first unwanted audio component from the audio feed.2. The computer-implemented method of claim 1 , wherein actively modifying the audio feed is based on:determining that the first unwanted audio component is being generated by a person that has not opted-in to the conference call.3. The computer-implemented method of claim 2 , further comprising:determining that the person has opted-in to the conference call; andpermitting, in response to the person opting-in to the conference call, the first unwanted audio component to be transmitted via the audio feed.4. The computer-implemented method of claim 2 , wherein determining if the person should be added to the conference call claim 2 , is based claim 2 , at least in part claim 2 , on:sending a prompt to a first person that has opted-in to the conference call;receiving, in response to the prompt, information indicative that the first person should be added to the conference call; andautomatically opting in, responsive to receiving the information indicative that the first person should be added to the conference call, the first person to the conference ...

Подробнее
11-01-2018 дата публикации

Method and System for Facilitating the Detection of Time Series Patterns

Номер: US20180012120A1
Автор: Adrien Daniel
Принадлежит: NXP BV

According to a first aspect of the present disclosure, a method for facilitating the detection of one or more time series patterns is conceived, comprising building one or more artificial neural networks, wherein, for at least one time series pattern to be detected, a specific one of said artificial neural networks is built. According to a second aspect of the present disclosure, a corresponding computer program is provided. According to a third aspect of the present disclosure, a non-transitory computer-readable medium is provided that comprises a computer program of the kind set forth. According to a fourth aspect of the present disclosure, a corresponding system for facilitating the detection of one or more time series patterns is provided.

Подробнее
14-01-2016 дата публикации

SYSTEM AND METHODS FOR PERSONAL IDENTIFICATION NUMBER AUTHENTICATION AND VERIFICATION

Номер: US20160012823A1
Автор: Roos Derrick
Принадлежит:

Systems and methods to authenticate and verify user access replace the digits of a personal identification number (PIN) of a particular user with prompted randomized words that are to be uttered by an unidentified user. By virtue of this replacement, the PIN remains secret. A known speaker provides voice samples to the system in advance. The words uttered by the unidentified user (in response to the prompted words being displayed) correspond to digits. The uttered words are checked against the PIN, and are used to verify if the unidentified user's voice matches the voice of the known speaker. 1. A computing system for implementing an authentication and verification system , the system comprising:physical storage media configured to store information that represents audio characteristics of sounds generated by a speaker; and obtain a target personal identification sequence, wherein the target personal identification sequence is associated with the speaker;', 'obtain a mapping between user-selectable input options and a set of prompts that represent words;', 'obtain a target sequence of prompts that corresponds to the target personal identification sequence;', 'effectuate presentation of the set of prompts to an unidentified user such that individual ones of the presented prompts are associated with individual ones of the user-selectable input options in accordance with the obtained mapping;', 'obtain one or more audio files comprising sound generated by an unidentified user in response to the presentation;', 'make a first determination whether the obtained one or more audio files represent a vocalization of the target sequence of prompts;', 'make a second determination whether the obtained one or more audio files match the audio characteristics of sounds generated by the speaker;', 'effectuate a grant of access to the unidentified user responsive to a positive first and second determination., 'one or more physical processors configured to execute computer program ...

Подробнее
11-01-2018 дата публикации

SYSTEM AND METHODS FOR PRONUNCIATION ANALYSIS-BASED SPEAKER VERIFICATION

Номер: US20180012602A1
Принадлежит:

A system and method for speaker verification based on using N-best speech recognition results. 1. A system for creating pronunciation analysis-based speaker verification comprising of:a speech recognition system that analyzes an utterance spoken by the user and returns a ranked list of recognized phrases;a speech analysis module that analyzes a list of recognized phrases and determines the parts of utterances that were pronounced correctly and the parts of utterances that were mispronounced;a star repository that contains star-like structures with the central node corresponding to a sequence of words or phonemes to be pronounced and the periphery nodes corresponding to results of ASR of pronunciation of the central node by a user or a group of users;a star generation system that finds sequences of phonemes, words and phrases that have homogeneous N-best results in multiple occurrences in one utterance and across multiple utterances for a user or a group of users and stores the results in star repository;a challenge phrase generation system that builds a set of phrases to be used to detect if a speaker is a legitimate user or an imposter using large corpora or internet at large to find phrases that correspond to stars that are consistently well recognized and stars that are consistently poorly recognized;a speaker verification system that uses challenge phrases to verify that the phrases that are consistently well recognized for a user continue to be well recognized during verification/authentication of a speaker, and the ones that were consistently mispronounced by a user are mispronounced during verification/authentication phase; anda human-machine interface that facilitates user registration and speaker verification phases.2. The system of where users' utterances are stored in an utterance repository accessible via the Internet.3. The system of claim 1 , further comprising a performance repository accessible via the Internet claim 1 , wherein users' ...

Подробнее
11-01-2018 дата публикации

SYSTEM AND METHODS FOR PRONUNCIATION ANALYSIS-BASED NON-NATIVE SPEAKER VERIFICATION

Номер: US20180012603A1
Принадлежит:

A system and method for non-native speaker verification based on using N-best speech recognition results. 1. A system for creating pronunciation analysis-based non-native speaker verification comprising of:{'b': 1', '2, "a speech recognition system that analyzes an utterance spoken by the user in user's mother tongue (L) and user's acquired tongue (L) and returns a ranked list of recognized phrases;"}{'b': 1', '2, 'a speech analysis module that analyzes a list of recognized phrases and determines the parts of utterances that were pronounced in L and/or L correctly and the parts of utterances that were mispronounced;'}{'b': 1', '2, 'a star repository that contains star-like structures with the central node corresponding to a sequence of words or phonemes to be pronounced and the periphery nodes corresponding to results of ASR of pronunciation of the central node by a user or a group of users for L and/or L;'}{'b': 1', '2, 'a star generation system that finds sequences of phonemes, words and phrases in L and/or L that have homogeneous N-best results in multiple occurrences in one utterance and across multiple utterances for a user or a group of users and stores the results in a star repository;'}{'b': 1', '2, 'a challenge phrase generation system that builds a set of phrases in L and/or L to be used to detect if a speaker is a legitimate user or an imposter using large text corpora or internet at large to find phrases that correspond to stars that are consistently well recognized and stars that are consistently poorly recognized;'}{'b': 1', '2, 'a speaker verification system that uses challenge phrases in L and/or L to verify that the phrases that are consistently well recognized for a user continue to be well recognized during verification/authentication of a speaker, and the ones that were consistently were mispronounced by a user are mispronounced during verification/authentication phase; and'}a human-machine interface that facilitates user registration and speaker ...

Подробнее
12-01-2017 дата публикации

Call Distribution Techniques

Номер: US20170013122A1
Принадлежит:

Described is a system and method for increasing the efficiency of phone call usage by using strategic call forwarding techniques to analyze incoming calls and process these calls in real time to: 1) divert unwanted robocallers and/or 2) provide information about unknown human callers. Robocallers are detected by analyzing incoming calls to determine if the audio is human-based or generated by a robocaller. Information about unknown human callers is obtained by real-time look up and reporting techniques to allow the user to determine whether to answer a call. 1. A method comprising:intercepting an incoming call placed by a caller to a receiver, wherein the incoming call comprises incoming call data;if a portion of the incoming call data matches a pre-established list of approved callers, forwarding the incoming call to the receiver;determining if the portion of the incoming call data matches a pre-established list of disapproved callers;answering the incoming call while playing a ringback tone to the caller for a preset time period;analyzing incoming call audio during the preset time period to determine if the caller is a human or a machine;if the caller is determined to be a human and the portion of incoming data matches the pre-established list of disapproved callers, terminating the call;if the caller is determined to be a human and the portion of incoming data does not match the pre-established list of disapproved callers, forwarding the incoming call to the receiver; andif the caller is determined to be a machine, forwarding the incoming call to a honeypot server.2. The method as in claim 1 , wherein analyzing the incoming call audio during the preset time period comprises calculating an average amplitude of the amount of noise produced during a segment of the incoming call audio.3. The method as in claim 2 , wherein analyzing the incoming call audio during the preset time period further comprises determining if the segment of the incoming call audio is speech ...

Подробнее
10-01-2019 дата публикации

SYSTEM AND METHOD FOR EFFICIENT LIVENESS DETECTION

Номер: US20190013026A1
Автор: Feng Xuetao, Wang Yan
Принадлежит: ALIBABA GROUP HOLDING LIMITED

Embodiments described herein provide a system for facilitating liveness detection of a user. During operation, the system presents a verification interface to the user in a local display device. The verification interface includes one or more phrases and a reading style for a respective phrase in which the user is expected to recite the phrase. The system then obtains a voice signal based on the user's recitation of the one or more phrases via a voice input device of the system and determines whether the user's recitation of a respective phrase has complied with the corresponding reading style. If the user's recitation of a respective phrase has complied with the corresponding reading style, the system establishes liveness for the user. 1. A computer-implemented method for facilitating liveness detection of a user , the method comprising:presenting, by a computing device, a verification interface to the user in a local display device, wherein the verification interface includes one or more phrases and a reading style for a respective phrase in which the user is expected to recite the phrase;obtaining a voice signal based on the user's recitation of the one or more phrases via a voice input device of the computing device;determining whether the user's recitation of a respective phrase has complied with the corresponding reading style; andin response to determining that the user's recitation of a respective phrase has complied with the corresponding reading style, establishing liveness for the user.2. The method of claim 1 , further comprising providing a read-out of a respective phrase of the one or more phrases in a corresponding reading style as a guideline to the user.3. The method of claim 1 , further comprising:determining whether the user has recited a respective phrase correctly;wherein establishing liveness for the user is further dependent upon determining that the user has recited a respective phrase correctly.4. The method of claim 1 , further comprising: ...

Подробнее
10-01-2019 дата публикации

DETECTING REPLAY ATTACKS IN VOICE-BASED AUTHENTICATION

Номер: US20190013033A1
Принадлежит:

Disclosed are various embodiments for detecting replay attacks in voice-based authentication systems. In one embodiment, audio is captured via an audio input device. It is then verified that the audio includes a voice authentication factor spoken by a user. The audio is then compared with stored audio spoken by the user. If it is determined that an exact copy of the voice authentication factor is in the stored audio, one or more actions may be performed. 1. A method , comprising:receiving, via at least one of one or more computing devices, audio captured via an audio input device at a first geographic location;verifying, via at least one of the one or more computing devices, that the audio includes a voice authentication factor spoken by a user;receiving, via at least one of the one or more computing devices, information indicating that the user is physically present at a second geographic location instead of the first geographic location when the audio was captured; andperforming, via at least one of the one or more computing devices, at least one action in response to receiving the information, the at least one action comprising at least one of: causing a notification of authentication failure to be played by a speaker, requesting that the user provide another authentication factor, sending a notification to an administrator, blacklisting a network address, disabling access to an account associated with the user, storing the audio in a data store, or causing a honeypot mode to be entered by the one or more computing devices.2. The method of claim 1 , wherein the information includes a determination that the audio includes a voice in a language that differs from an expected language.3. The method of claim 1 , wherein the information includes a geolocation determination of the second geographic location.4. The method of claim 3 , wherein the second geographic location corresponds to a country never previously visited by the user.5. The method of claim 1 , further ...

Подробнее
18-01-2018 дата публикации

Smart unmanned aerial vehicle for home

Номер: US20180016006A1
Автор: Wenyan Jiang, Yu Tian
Принадлежит: Haoxiang Electric Energy Kunshan Co Ltd

The present invention discloses a smart unmanned aerial vehicle for home which includes a route reconnaissance module to respond to reconnaissance instructions which instruct the unmanned aerial vehicle to patrol a house according to a preset flight route; a feature recognition module to recognize family members and different reactions of the family members toward the unmanned aerial vehicle for generating and recording an instruction set for different family members; a control module to generate reconnaissance instructions according to external reconnaissance control signals, wherein when exception occurs, warning signals are generated; the control module interrupts a patrol according to an external patrol interruption signal or when the feature recognition module recognizes the family members; the control module responds or waits for a response to the control signals for actions of the feature recognition module. The present invention is customized for family members, which is able to monitor and etc.

Подробнее
21-01-2016 дата публикации

SPEAKER RECOGNITION FROM TELEPHONE CALLS

Номер: US20160019897A1
Принадлежит:

The present invention relates to a method for speaker recognition, comprising the steps of obtaining and storing speaker information for at least one target speaker; obtaining a plurality of speech samples from a plurality of telephone calls from at least one unknown speaker; classifying the speech samples according to at least one unknown speaker thereby providing speaker-dependent classes of speech samples; extracting speaker information for the speech samples of each of the speaker-dependent classes of speech samples; combining the extracted speaker information for each of the speaker-dependent classes of speech samples; comparing the combined extracted speaker information for each of the speaker-dependent classes of speech samples with the stored speaker information for at least one target speaker to obtain at least one comparison result; and determining whether at least one unknown speaker is identical with at least one target speaker based on at least one comparison result.

Подробнее
03-02-2022 дата публикации

VOICE RECOGNITION METHOD AND ELECTRONIC DEVICE USING THE SAME

Номер: US20220036902A1
Автор: LIANG Pei-Lin
Принадлежит:

A voice recognition method is provided. The voice recognition method includes: collecting a plurality of voice signals; extracting the voiceprint features of each of the voice signals; performing a data process on the voiceprint features, to convert the voiceprint features into a N-dimensional matrix, and N is an integer greater than or equal to 2; performing a feature normalization process on the N-dimensional matrix to obtain a plurality of voiceprint data; classifying the voiceprint data to generate a clustering result; finding out a centroid of each cluster according to the clustering result, and registering the voiceprint data adjacent to each of the centroid. The disclosure also provides an electronic device that adapted for the voice recognition method. 1. A voice recognition method , comprising:collecting a plurality of voice signals;extracting voiceprint features of each of the voice signals;performing a data process on the voiceprint features, to convert the voiceprint features into a N-dimensional matrix, and N is an integer greater than or equal to 2;performing a feature normalization process on the N-dimensional matrix to obtain a plurality of voiceprint data;classifying the voiceprint data to generate a clustering result; andfinding out a centroid of each cluster according to the clustering result, and registering the voiceprint data adjacent to each of the centroid.2. The voice recognition method according to claim 1 , after the step of classifying the voiceprint data to generate the clustering result claim 1 , further comprising:performing a gender recognition process on the voiceprint data to obtain a gender data of each of the voiceprint data, andupdating the clustering result according to the gender data.3. The voice recognition method according to claim 1 , wherein the step of performing the data process on the voiceprint feature further comprises:using a t-distributed stochastic neighbor embedding (t-SNE) method to obtain the N-dimensional ...

Подробнее
18-01-2018 дата публикации

Call Forwarding to Unavailable Party Based on Artificial Intelligence

Номер: US20180018969A1
Принадлежит:

A called party indicates that he or she is unavailable to receive a call. However, by way of a combination or any one of determining aspects of the who the caller is, where the caller is located, what he is speaking about, or the like as well as comparing this to prior calls, the call might be sent to a called party to be on the call. This can be by way of speech recognition of the call and creating a transcript and by receiving feedback from a called party about prior calls. 1. A method of conditionally forwarding a received phone call to a bidirectional transceiver associated with a called party , comprising the steps of:receiving said phone call at a network node, said phone call directed towards a called party;determining an identity of a calling party based on at least one of call identification information, voice recognition, and speech recognition;determining that said called party is unavailable;detecting urgency in a voice of said calling party based on content, as determined by speech recognition, of said phone call originating from said calling party;2. The method of claim 1 , comprising an additional step of forwarding said call to said bidirectional transceiver associated with said called party based on said detecting of urgency.3. The method of claim 2 , further comprising a step of transcribing into text said audio within said phone call originating from said calling party; andwherein said step of detecting urgency is based on a keyword within said text which has been pre-designated as a keyword which indicates said urgency.4. The method of claim 2 , wherein said step of detecting urgency is further based on a combination of tone and speed of speech above a pre-defined threshold indicating said urgency.5. The method of claim 1 , wherein:urgency is detected in said voice of said calling party and said call;a request from said calling party for the call to be sent to said calling party is denied based on said call identification information matching pre ...

Подробнее
18-01-2018 дата публикации

SPEAKER VERIFICATION

Номер: US20180018973A1
Принадлежит:

Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network produced in response to receiving the set of input data, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user. 1. A computer-implemented method comprising:receiving, by a mobile device that implements a language-independent speaker verification model comprising a neural network that is stored on the mobile device and configured to determine whether received audio data likely includes an utterance of one of multiple language-specific hotwords, (i) particular audio data corresponding to a particular utterance of a user, and (ii) data indicating a particular language spoken by the user; andin response to receiving (i) the particular audio data corresponding to a particular utterance of a user, and (ii) the data indicating a particular language spoken by the user, providing, for output, an indication that the language-independent speaker verification model has determined that the particular audio data likely includes the utterance of a hotword designated for the particular language spoken by the user.2. The computer-implemented method of claim 1 , wherein providing claim 1 , for output claim 1 , the indication comprises providing ...

Подробнее
22-01-2015 дата публикации

BIOMETRIC AUDIO SECURITY

Номер: US20150025889A1
Принадлежит: Max Sound Corporation

A biometric audio security system comprises providing an input voice audio source. The input audio is enhanced in two or more harmonic and dynamic ranges by re-synthesizing the audio into a full range PCM wave. A hardware key with a set of audio frequency spikes (identifiers) with varying amplitude and frequency values is provided. The enhanced voice audio input and the key using additive resynthesis are summed. The voice and the spike set is compared against the users identification signature to verify user's identity. The set of audio spikes are user specific. The spikes are stored on the protected key device as a template, which would plug into the system. The template is determined by the owner/manufacturer of the system. The spikes are created and identified using the additive synthesis technique with a predetermined number of partials (harmonics). The identifiers include both positive and negative values. The amplitude and frequency values are spaced in very fine intervals. The enhancing of voice audio input includes the parallel processing the input audio as follows: A module that is a low pass filter with dynamic offset; 1. A biometric audio security system comprising:Providing an input voice audio source;Enhancing the voice audio input in two or more harmonic and dynamic ranges by re-synthesizing the audio into a full range PCM wave;Providing a hardware key with a set of audio frequency spikes (identifiers) with varying amplitude and frequency values.Summing the enhanced voice audio input and the key using additive resynthesis.Comparing the voice and the spike set against the users identification signature to verify user's identity.2. The system of wherein the set of audio spikes are user specific.3. The system of where in the spikes are stored on the protected key device as a template claim 1 , which would plug into the system.4. The system of where in the template is determined by the owner/manufacturer of the system.5. The system of wherein the spikes ...

Подробнее
10-02-2022 дата публикации

Speaker identity and content de-identification

Номер: US20220044667A1
Принадлежит: International Business Machines Corp

One embodiment of the invention provides a method for speaker identity and content de-identification under privacy guarantees. The method comprises receiving input indicative of privacy protection levels to enforce, extracting features from a speech recorded in a voice recording, recognizing and extracting textual content from the speech, parsing the textual content to recognize privacy-sensitive personal information about an individual, generating de-identified textual content by anonymizing the personal information to an extent that satisfies the privacy protection levels and conceals the individual's identity, and mapping the de-identified textual content to a speaker who delivered the speech. The method further comprises generating a synthetic speaker identity based on other features that are dissimilar from the features to an extent that satisfies the privacy protection levels, and synthesizing a new speech waveform based on the synthetic speaker identity to deliver the de-identified textual content. The new speech waveform conceals the speaker's identity.

Подробнее
10-02-2022 дата публикации

SPEAKER SEPARATION BASED ON REAL-TIME LATENT SPEAKER STATE CHARACTERIZATION

Номер: US20220044687A1
Принадлежит:

Systems, methods, and non-transitory computer-readable media can obtain a stream of audio waveform data that represents speech involving a plurality of speakers. As the stream of audio waveform data is obtained, a plurality of audio chunks can be determined. An audio chunk can be associated with one or more identity embeddings. The stream of audio waveform data can be segmented into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks. A segment can be associated with a speaker included in the plurality of speakers. Information describing the plurality of segments associated with the stream of audio waveform data can be provided. 1. A computer-implemented method comprising:obtaining, by a computing system, a stream of audio waveform data that represents speech involving a plurality of speakers;as the stream of audio waveform data is obtained, determining, by the computing system, a plurality of audio chunks, wherein an audio chunk is associated with one or more identity embeddings;segmenting, by the computing system, the stream of audio waveform data into a plurality of segments based on the plurality of audio chunks and respective identity embeddings associated with the plurality of audio chunks, wherein a segment can be associated with a speaker included in the plurality of speakers; andproviding, by the computing system, information describing the plurality of segments associated with the stream of audio waveform data.2. The computer-implemented method of claim 1 , wherein the segmenting is performed in real-time based on a computational graph.3. The computer-implemented method of claim 1 , wherein each audio chunk in the plurality of audio chunks corresponds to a fixed length of time.4. The computer-implemented method of claim 1 , wherein the one or more identity embeddings associated with the audio chunk are generated by a temporal convolutional network that pre-processes ...

Подробнее
10-02-2022 дата публикации

SAMPLE-EFFICIENT REPRESENTATION LEARNING FOR REAL-TIME LATENT SPEAKER STATE CHARACTERIZATION

Номер: US20220044688A1
Принадлежит:

Systems, methods, and non-transitory computer-readable media can provide audio waveform data that corresponds to a voice sample to a temporal convolutional network for evaluation. The temporal convolutional network can pre-process the audio waveform data and can output an identity embedding associated with the audio waveform data. The identity embedding associated with the voice sample can be obtained from the temporal convolutional network. Information describing a speaker associated with the voice sample can be determined based at least in part on the identity embedding. 1. A computer-implemented method comprising:providing, by a computing system, audio waveform data that corresponds to a voice sample to a temporal convolutional network for evaluation, wherein the temporal convolutional network pre-processes the audio waveform data and outputs an identity embedding associated with the audio waveform data;obtaining, by the computing system, the identity embedding associated with the voice sample from the temporal convolutional network; anddetermining, by the computing system, information describing a speaker associated with the voice sample based at least in part on the identity embedding.2. The computer-implemented method of claim 1 , wherein determining the information describing the speaker further comprises:determining, by the computing system, an identity of the speaker associated with the voice sample based at least in part on the identity embedding.3. The computer-implemented method of claim 1 , wherein determining the information describing the speaker further comprises:determining, by the computing system, that the speaker associated with the voice sample matches a known speaker based at least in part on the identity embedding.4. The computer-implemented method of claim 1 , wherein the temporal convolutional network is trained based on a triplet loss function that evaluates a plurality of triplets claim 1 , wherein a triplet includes an anchor voice sample ...

Подробнее
24-01-2019 дата публикации

GENERATING DIALOGUE BASED ON VERIFICATION SCORES

Номер: US20190027152A1
Принадлежит: Intel Corporation

An example apparatus for generating dialogue includes an audio receiver to receive audio data including speech. The apparatus also includes a verification score generator to generate a verification score based on the audio data. The apparatus further includes a user detector to detect that the verification score exceeds a lower threshold but does not exceed a higher threshold. The apparatus includes a dialogue generator to generate dialogue to solicit additional audio data to be used to generate an updated verification score in response to detecting that the verification score exceeds a lower threshold but does not exceed a higher threshold. 1. An apparatus for generating dialogue , comprising:an audio receiver to receive audio data comprising speech;a verification score generator to generate a verification score based on the audio data;a user detector to detect that the verification score exceeds a lower threshold but does not exceed a higher threshold; anda dialogue generator to generate a dialogue to solicit additional audio data to be used to generate an updated verification score in response to detecting that the verification score exceeds a lower threshold but does not exceed a higher threshold.2. The apparatus of claim 1 , comprising a key phrase detector to detect a key phrase in the audio data claim 1 , wherein the verification score generator is to generate a verification score based on the audio data in response to the detection of the key phrase.3. The apparatus of claim 1 , comprising a speaker scorer to generate a speaker verification score based on the audio data and a speaker model claim 1 , wherein the verification score is at least in part based on the speaker verification score.4. The apparatus of claim 1 , comprising a speaker scorer to generate a speaker verification score based on the audio data and a speaker model claim 1 , wherein the speaker scorer is to calculate a text-dependent score based on the key phrase and a text-independent score ...

Подробнее
24-01-2019 дата публикации

INFORMATION PROCESSING APPARATUS, METHOD AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM

Номер: US20190027165A1
Принадлежит: FUJITSU LIMITED

An information processing apparatus includes a memory, and a processor coupled to the memory and configured to specify a first signal level of a first voice signal, specify a second signal level of a second voice signal, and execute evaluation of at least one of the first voice signal and the second voice signal based on at least one of a sum of the first signal level and the second signal level and an average of the first signal level and the second signal level. 1. An information processing apparatus comprising:a memory; and specify a first signal level of a first voice signal;', 'specify a second signal level of a second voice signal; and', 'execute evaluation of at least one of the first voice signal and the second voice signal based on at least one of a sum of the first signal level and the second signal level and an average of the first signal level and the second signal level., 'a processor coupled to the memory and configured to2. The information processing apparatus according to claim 1 , wherein the processor is configured to:specify a ratio of the first signal level and the second signal level; andexecute the evaluation on impressions of at least one of the first voice signal and the second voice signal based on the specified ratio.3. The information processing apparatus according to claim 1 , wherein the processor is configured to:specify a time period for which at least one of the sum and the average continuously exceeds a threshold; andexecute the evaluation on impressions of at least one of the first voice signal and the second voice signal based on the specified time period.4. The information processing apparatus according to claim 1 , wherein the processor is configured to:specify a frequency that at least one of the sum and the average exceeds a threshold; andexecute the evaluation on impressions of at least one of the first voice signal and the second voice signal based on the specified frequency.5. The information processing apparatus according ...

Подробнее
23-01-2020 дата публикации

Voice recognition based user authentication method using artificial intelligence device and apparatus therefor

Номер: US20200026838A1
Автор: Daeseok CHOI, Jeongsoo AHN
Принадлежит: LG ELECTRONICS INC

A voice recognition based user authentication method using an AI device is disclosed. A voice recognition based user authentication method using an AI device according to an embodiment of the present invention performs first user authentication on the basis of voice characteristics of a first speech of a specific user, provides at least one specific inquiry about a living history of an authentic user of the device to the specific user, senses at least one second speech which is a reply of the specific user to the at least one specific inquiry and performs second user authentication for the specific user to use the device on the basis of contents included in the at least one second speech. An intelligent computing device of the present invention can be associated with artificial intelligence modules, drones (unmanned aerial vehicles (UAVs)), robots, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G service, etc.

Подробнее
01-02-2018 дата публикации

METHOD AND DEVICE FOR TRANSFORMING FEATURE VECTOR FOR USER RECOGNITION

Номер: US20180033439A1
Принадлежит:

A method of converting a feature vector includes extracting a feature sequence from an audio signal including utterance of a user; extracting a feature vector from the feature sequence; acquiring a conversion matrix for reducing a dimension of the feature vector, based on a probability value acquired based on different covariance values; and converting the feature vector by using the conversion matrix. 1. A method of converting a feature vector , the method comprising:extracting a feature sequence from an audio signal including utterance of a user;extracting a feature vector from the feature sequence;acquiring a conversion matrix for reducing a dimension of the feature vector, based on a probability value acquired based on different covariance values; andconverting the feature vector by using the conversion matrix.2. The method of claim 1 , wherein the conversion matrix is a heteroscedastic linear discriminant analysis (HLDA).3. The method of claim 1 , wherein the acquiring of the conversion matrix comprises acquiring a useful dimension p of the conversion matrix claim 1 , based on accumulated energy for each dimension of a variance matrix for an intra-class covariance matrix of each speaker.4. The method of claim 1 , wherein the feature vector is an i-vector that is acquirable by joint factor analysis.5. The method of claim 1 , further comprising:performing scoring on a feature vector resulting from the conversion and a feature vector of each state, at least once; andidentifying the user, based on a result of the scoring.6. A device for converting a feature vector claim 1 , the device comprising:a receiver which receives an audio signal including utterance of a user; anda controller which extracts a feature sequence from the audio signal, extracts a feature vector from the feature sequence, acquires a conversion matrix for reducing a dimension of the feature vector, based on a probability value acquired based on different covariance values, and converts the feature ...

Подробнее
01-05-2014 дата публикации

Method and system for using conversational biometrics and speaker identification/verification to filter voice streams

Номер: US20140119520A1
Принадлежит: International Business Machines Corp

A method and system for using conversational biometrics and speaker identification and/or verification to filter voice streams during mixed mode communication. The method includes receiving an audio stream of a communication between participants. Additionally, the method includes filtering the audio stream of the communication into separate audio streams, one for each of the participants. Each of the separate audio streams contains portions of the communication attributable to a respective participant. Furthermore, the method includes outputting the separate audio streams to a storage system.

Подробнее
17-02-2022 дата публикации

Intelligent voice enable device searching method and apparatus thereof

Номер: US20220051677A1
Принадлежит: LG ELECTRONICS INC

An intelligent voice enable device searching method and apparatus are disclosed. A method for searching a plurality of voice enable devices according to one embodiment of the present disclosure includes receiving first device information from a first device receiving a wake-up voice; searching a first account associated with the first device based on the first device information; searching devices of a first group registered in the first account; searching a second account associated with a second device other than the first device among the devices of the first group; searching devices of a second group registered in the second account; searching devices of a third group sharing an IP address with the devices of the first group or the devices of the second group; and selecting a voice enable device to respond to the wake-up voice among the devices of the first group, the second group, and the third group. The method has an effect that a device that responds to a wake-up voice may be searched more accurately than in a conventional approach.

Подробнее
17-02-2022 дата публикации

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

Номер: US20220051679A1
Принадлежит:

A control section performs control to give notification of information regarding a previous dialogue on the basis of each status of participants in dialogue. For example, the information regarding the previous dialogue includes information regarding a significant word extracted from a speech of the previous dialogue. In this case, the information regarding the previous dialogue further includes, for example, additional information related to the significant word. For example, when one of utterers currently in dialogue makes an utterance indicative of intention to call up information, the control section perform control to give notification of the information regarding a previous dialogue in which all utterers currently in dialogue participated. 1. An information processing apparatus comprising:a control section configured to perform control in such a manner as to give notification of information regarding a previous dialogue on a basis of each status of participants in dialogue.2. The information processing apparatus according to claim 1 ,wherein the information regarding the previous dialogue includes information regarding a significant word extracted from a speech of the previous dialogue.3. The information processing apparatus according to claim 2 ,wherein the information regarding the previous dialogue further includes information related to the significant word.4. The information processing apparatus according to claim 1 , further comprising:a speech storage section configured to store a speech spanning a most recent predetermined period of time out of collected speeches,wherein the control section acquires the information regarding the previous dialogue on a basis of the speech stored in the speech storage section.5. The information processing apparatus according to claim 1 ,wherein, when any one of utterers currently in dialogue makes an utterance indicative of intention to call up information, the control section performs control in such a manner as to give ...

Подробнее
30-01-2020 дата публикации

METHOD, DEVICE AND COMPUTER STORAGE MEDIUM FOR SPEECH INTERACTION

Номер: US20200035241A1
Автор: CHANG Xiantang

A method, a device and a computer storage medium for speech interaction are disclosed. The method includes: receiving speech data transmitted by a first terminal device; obtaining a speech recognition result and a voiceprint recognition result of the speech data; obtaining a response text for the speech recognition result, and performing speech conversion for the response text with the voiceprint recognition result; and transmitting audio data obtained from the conversion to the first terminal device. Speech self-adaptation of human-machine interaction may be achieved, and the real feeling and interest of human-machine speech interaction may be enhanced and improved, respectively. 1. A method for speech interaction , comprising:receiving speech data transmitted by a first terminal device;obtaining a speech recognition result and a voiceprint recognition result of the speech data;obtaining a response text for the speech recognition result, and performing speech conversion for the response text with the voiceprint recognition result; andtransmitting audio data obtained from the conversion to the first terminal device.2. The method according to claim 1 , wherein the voiceprint recognition result comprises at least one kind of identity information of user's gender claim 1 , age claim 1 , region and occupation.3. The method according to claim 1 , wherein the obtaining a response text for the speech recognition result comprises:performing searching and matching with the speech recognition result to obtain at least one of a text search result and a prompt text corresponding to the speech recognition result.4. The method according to claim 3 , further comprising:under the condition that an audio search result is obtained by performing searching and matching with the speech recognition result, transmitting the audio search result to the first terminal device.5. The method according to claim 1 , wherein the obtaining a response text for the speech recognition result comprises ...

Подробнее
30-01-2020 дата публикации

DIARIZATION USING LINGUISTIC LABELING

Номер: US20200035245A1
Принадлежит: VERINT SYSTEMS LTD.

Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction. 120.-. (canceled)21. A method of diarization of audio data from a customer service interaction between at least an agent and a customer , the method comprising:receiving a set of diarized textual transcripts of customer service interactions between at least an agent and a customer from a transcription server, wherein the diarized textual transcripts are grouped in pluralities comprising at least a transcript associated to the agent and a transcript associated to the customer, wherein the transcript associated to the agent and the transcript associated to the customer are from a singular customer service interaction;automatedly applying at least one heuristic to the diarized textual transcripts with a processor to select at least one of the transcripts in each plurality as being associated to the agent;analyzing the selected transcripts with the processor to create at least one linguistic model;saving the at least one linguistic model to a linguistic database server; andapplying the linguistic model to new transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the agent, where in the new transcribed audio data is not diarized and a known speaker has not yet been associated with the new transcribed audio data.22 ...

Подробнее
30-01-2020 дата публикации

DIARIZATION USING ACOUSTIC LABELING

Номер: US20200035246A1
Принадлежит: VERINT SYSTEMS LTD.

Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

Подробнее
30-01-2020 дата публикации

MACHINE LEARNING FOR AUTHENTICATING VOICE

Номер: US20200035247A1
Принадлежит:

A machine learning multi-dimensional acoustic feature vector authentication system, according to an example of the present disclosure, builds and trains multiple multi-dimensional acoustic feature vector machine learning classifiers to determine a probability of spoofing of a voice. The system may extract an acoustic feature from a voice sample of a user. The system may convert the acoustic feature into multi-dimensional acoustic feature vectors and apply the multi-dimensional acoustic feature vectors to the multi-dimensional acoustic feature vector machine learning classifiers to detect spoofing and determine whether to authenticate a user. 1. A machine learning multi-dimensional acoustic feature vector authentication system comprising:at least one processor to execute machine readable instructions stored on at least one non-transitory computer readable medium; 'wherein the plurality of multi-dimensional acoustic feature vector machine learning classifiers comprise convolutional neural networks trained to identify multi-dimensional acoustic feature vectors;', 'at least one data storage to store a plurality of multi-dimensional acoustic feature vector machine learning classifiers,'} extract at least one acoustic feature from a voice sample of a user;', 'convert the acoustic feature into a plurality of multi-dimensional acoustic feature vectors;', 'apply each multi-dimensional acoustic feature vector in the plurality of multi-dimensional acoustic feature vectors to a corresponding multi-dimensional acoustic feature vector machine learning classifier from the plurality of multi-dimensional acoustic feature vector machine learning classifiers;', 'determine a probability of spoofing for each multi-dimensional acoustic feature vector from an output of the corresponding multi-dimensional acoustic feature vector machine learning classifier;', 'determine an overall probability of spoofing for the voice sample, based on the probability of spoofing for each multi-dimensional ...

Подробнее
04-02-2021 дата публикации

DETECTION OF REPLAY ATTACK

Номер: US20210034730A1
Автор: Lesso John Paul

In order to detect a replay attack in a speaker recognition system, at least one feature is identified in a detected magnetic field. It is then determined whether the at least one identified feature of the detected magnetic field is indicative of playback of speech through a loudspeaker. If so, it is determined that a replay attack may have taken place. 119.-. (canceled)20. A method of detecting a replay attack in a speaker recognition system , the method comprising:receiving an audio signal comprising speech;receiving a magnetometer signal;determining a syllabic rate or an articulation rate of the speech;detecting modulation of at least one feature of the magnetometer signal at the syllabic rate or the articulation rate;determining based on the detecting that the at least one identified feature of the detected magnetic field is indicative of playback of speech through a loudspeaker; anddetermining that a replay attack may have taken place.21. The method of claim 20 , wherein the syllabic rate or the articulation rate of the speech are determined for speech detected at the same time as the magnetometer signal.22. The method of claim 20 , wherein the determined syllabic rate or articulation rate is used to set a passband frequency range for detecting modulation of the at least one feature of the magnetometer signal.23. A method as claimed in claim 20 , wherein the audio signal is received at substantially the same time as the magnetic field is detected claim 20 , the method further comprising claim 20 , if it is determined that the at least one identified feature of the detected magnetic field is indicative of playback of speech through a loudspeaker claim 20 , determining that the audio signal may result from said replay attack.24. A method as claimed in claim 20 , wherein the magnetometer signal is received from a magnetometer claim 20 , the method further comprising:performing a Discrete Fourier Transform on the magnetometer signal.25. A method as claimed in claim ...

Подробнее
08-02-2018 дата публикации

METHODS AND APPARATUS FOR AUTHENTICATION IN AN ELECTRONIC DEVICE

Номер: US20180039769A1

An electronic device, comprising one or more input devices, for receiving biometric input from a user and generating one or more biometric input signals; an applications processor; a mixer configurable by the applications processor to provide a first signal path between one or more of the input devices and the applications processor; and a biometric authentication module coupled to the one or more input devices via a second signal path that does not include the mixer, for performing authentication of at least one of the one or more biometric input signals. 1. An electronic device , comprising:one or more input devices, for receiving biometric input from a user and generating one or more biometric input signals;an applications processor;a mixer configurable by the applications processor to provide a first signal path between one or more of the input devices and the applications processor; anda biometric authentication module coupled to the one or more input devices via a second signal path that does not include the mixer, for performing authentication of at least one of the one or more biometric input signals.2. The electronic device according to claim 1 , wherein the biometric authentication module is configured to perform an authentication algorithm based on stored identification characteristics of the user claim 1 , and a signal received at an input of the biometric authentication module.3. The electronic device according to claim 2 , wherein the second signal path is configured such that the signal received at the input of the biometric authentication module is based solely on the at least one of the one or more biometric input signals.4. The electronic device according to claim 1 , further comprising:a gating block, coupled between the one or more input devices and the mixer, for disabling the first signal path upon receipt of one or more control signals.5. The electronic device according to claim 4 , wherein the one or more control signals comprise a first ...

Подробнее
08-02-2018 дата публикации

SPEAKER RECOGNITION

Номер: US20180040323A1

This application describes methods and apparatus for speaker recognition. An apparatus according to an embodiment has an analyzer () for analyzing each frame of a sequence of frames of audio data (A) which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame. An assessment module () determines, for each frame of audio data, a contribution indicator of the extent to which the frame of audio data should be used for speaker recognition processing based on the determined characteristic of the speech sound. In this way frames which correspond to speech sounds that are of most use for speaker discrimination may be emphasized and/or frames which correspond to speech sounds that are of least use for speaker discrimination may be de-emphasized. 1. An apparatus for use in biometric speaker recognition , comprising:an analyzer for analyzing each frame of a sequence of frames of audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame; andan assessment module for determining for the each frame of audio data a contribution indicator of the extent to which the each frame of audio data should be used for speaker recognition processing based on the determined at least one characteristic of the speech sound.2. The apparatus as claimed in comprising a speaker recognition module configured to apply speaker recognition processing to said frames of audio data claim 1 , wherein the speaker recognition module is configured to process the frames of audio data according to the contribution indicator for each frame.3. The apparatus as claimed in wherein said contribution indicator comprises a weighting to be applied to the each frame in the speaker recognition processing.4. The apparatus as claimed in wherein said contribution indicator comprises a selection of frames of audio data not to be used in the speaker recognition processing.5. The ...

Подробнее
08-02-2018 дата публикации

Multiple Voice Services

Номер: US20180040324A1
Автор: Wilberding Dayn
Принадлежит:

Disclosed herein are example techniques to identify a voice service to process a voice input. An example implementation may involve an NMD receiving, via a microphone, voice data indicating a voice input. The NMD may identify, from among multiple voice services registered to a media playback system, a voice service to process the voice input and cause, via a network interface, the identified voice service to process the voice input. 1. A networked microphone device comprising:a microphone;a network interface;one or more processors;tangible, non-transitory computer-readable media having stored therein instructions executable by the one or more processors to cause the networked microphone device to perform a method comprising:receiving, via the microphone, voice data indicating a voice input;identifying, from among multiple voice services registered to a media playback system, a voice service to process the voice input; andcausing, via the network interface, the identified voice service to process the voice input.2. The networked microphone device of claim 1 , wherein identifying the voice service to process the voice input comprises:determining that a portion of the received voice data represents a particular wake-word corresponding to a specific voice service; andidentifying, as the voice service to process the voice input, the specific voice service that corresponds to the particular wake-word, wherein each voice service of the multiple voice services registered to the media playback system corresponds to a respective wake-word.3. The networked microphone device of claim 2 , wherein determining that the portion of the received voice data represents the particular wake-word corresponding to the specific voice service comprises:querying wake-word detection algorithms corresponding to each voice service of the multiple voice services with the received voice data; anddetermining that a wake-word detection algorithm of the specific voice service detected that the ...

Подробнее
08-02-2018 дата публикации

SPEAKER RECOGNITION

Номер: US20180040325A1

This application describes methods and apparatus for generating a prompt to be presented to a user for the user to vocalise as part of speaker recognition. An apparatus according to an embodiment has a selector for selecting at least one vocal prompt element to form at least part of said prompt from a predetermined set of a plurality of vocal prompt elements. The selector is configured to select the vocal prompt element based, at least partly, on an indication of the operating conditions for the biometric speaker recognition, for example background noise. The prompt is selected to be one which will provide a good likelihood of discrimination between users when vocalised and used for speaker recognition in the current operating conditions. The prompt may be issued as part of a verification process for an existing user or an enrolment process for an enrolling user. 1. An apparatus for generating a prompt to be vocalised by a user for biometric speaker recognition comprising:a selector for selecting at least one vocal prompt element to form at least part of said prompt from a predetermined set of a plurality of vocal prompt elements;wherein the selector is configured to select the vocal prompt element based, at least partly, on an indication of the operating conditions for the biometric speaker recognition.2. The apparatus as claimed in wherein the selector is configured to select the vocal prompt element based on respective discrimination scores for the vocal prompt elements wherein at least some discrimination scores vary according to the indication of operating conditions for the biometric speaker recognition.3. The apparatus as claimed in wherein said set of plurality of vocal prompt elements comprises a plurality of predefined subsets of vocal prompt elements and the selector is configured to select the voice prompt from one of the subsets based on the indication of operating conditions.4. The apparatus as claimed in wherein the voice prompt elements are ...

Подробнее
24-02-2022 дата публикации

Speaker recognition with quality indicators

Номер: US20220059121A1
Принадлежит: Pindrop Security Inc

Embodiments described herein provide for a machine-learning architecture for modeling quality measures for enrollment signals. Modeling these enrollment signals enables the machine-learning architecture to identify deviations from expected or ideal enrollment signal in future test phase calls. These differences can be used to generate quality measures for the various audio descriptors or characteristics of audio signals. The quality measures can then be fused at the score-level with the speaker recognition's embedding comparisons for verifying the speaker. Fusing the quality measures with the similarity scoring essentially calibrates the speaker recognition's outputs based on the realities of what is actually expected for the enrolled caller and what was actually observed for the current inbound caller.

Подробнее
24-02-2022 дата публикации

PROVIDING EMOTION MANAGEMENT ASSISTANCE

Номер: US20220059122A1
Автор: Luan Jian, Xiu Chi
Принадлежит:

A method for providing emotion management assistance is provided. Sound streams may be received. A speech conversation between a user and at least one conversation object may be detected from the sound streams. Identity of the conversation object may be identified at least according to speech of the conversation object in the speech conversation. Emotion state of at least one speech segment of the user in the speech conversation may be determined. An emotion record corresponding to the speech conversation may be generated, wherein the emotion record at least including the identity of the conversation object, at least a portion of content of the speech conversation, and the emotion state of the at least one speech segment of the user. 1. A method for providing emotion management assistance , comprising:receiving sound streams;detecting a speech conversation between a user and at least one conversation object from the sound streams;identifying identity of the conversation object at least according to speech of the conversation object in the speech conversation;determining emotion state of at least one speech segment of the user in the speech conversation; andgenerating an emotion record corresponding to the speech conversation, the emotion record at least including the identity of the conversation object, at least a portion of content of the speech conversation, and the emotion state of the at least one speech segment of the user.2. The method of claim 1 , wherein emotion state of each speech segment in the at least one speech segment of the user includes emotion type of the speech segment and/or level of the emotion type.3. The method of claim 1 , wherein the detecting the speech conversation comprises:detecting a start point and an end point of the speech conversation at least according to speech of the user and/or speech of the conversation object in the sound streams.4. The method of claim 3 , wherein the start point and the end point of the speech conversation ...

Подробнее
06-02-2020 дата публикации

Biometric authentication of electronic signatures

Номер: US20200042688A1
Автор: Steven R. Schwartz
Принадлежит: Ezee Steve LLC

At least one contemporaneous signature image is captured while a user generates an electronic signature for a document. When one or more contemporaneous signature images maps to a verification image, signature data representative of an electronic signature is associated with the document.

Подробнее
06-02-2020 дата публикации

Diarization using acoustic labeling

Номер: US20200043501A1
Принадлежит: Verint Systems Ltd

Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.

Подробнее
06-02-2020 дата публикации

VOICE IDENTITY FEATURE EXTRACTOR AND CLASSIFIER TRAINING

Номер: US20200043504A1
Автор: Li Na, Wang Jun

A voice identity feature extractor training method includes extracting a voice feature vector of training voice, The method may include determining a corresponding I-vector according to the voice feature vector of the training voice. The method may include adjusting a weight of a neural network model by using the I-vector as a first target output of the neural network model, to obtain a first neural network model. The method may include obtaining a voice feature vector of target detecting voice and determining an output result of the first neural network model for the voice feature vector of the target detecting voice. The method may include determining an I-vector latent variable. The method may include estimating a posterior mean of the I-vector latent variable, and adjusting a weight of the first neural network model using the posterior mean as a second target output, to obtain a voice identity feature extractor. 1. A voice identity feature extractor training method , applied to an electronic device and comprising:extracting a voice feature vector of training voice;determining an Identity-vector (I-vector) corresponding to the training voice according to the voice feature vector of the training voice;adjusting a weight of a neural network model by using the I-vector as a first target output of the neural network model, to obtain a first neural network model;obtaining a voice feature vector of target detecting voice and determining an output result of the first neural network model for the voice feature vector of the target detecting voice;determining an I-vector latent variable according to the output result; andestimating a posterior mean of the I-vector latent variable, and adjusting a weight of the first neural network model by using the posterior mean as a second target output of the first neural network model, to obtain a voice identity feature extractor.2. The voice identity feature extractor training method according to claim 1 , wherein the adjusting a ...

Подробнее
18-02-2021 дата публикации

VOICEPRINT RECOGNITION METHOD, MODEL TRAINING METHOD, AND SERVER

Номер: US20210050020A1
Автор: Li Na, TUO Deyi
Принадлежит:

Embodiments of this application disclose a voiceprint recognition method performed by a computer. After obtaining a to-be-recognized target voice message, the computer obtains target feature information of the target voice message by using a voice recognition model, the voice recognition model being obtained through training according to a first loss function and a second loss function. Next, the computer determines a voiceprint recognition result according to the target feature information and registration feature information, the registration feature information being obtained from a voice message of a to-be-recognized object using the voiceprint recognition model. The normalized exponential function and the centralization function are used for jointly optimizing the voice recognition model, and can reduce an intra-class variation between depth features from the same speaker. The two functions are used for simultaneously supervising and learning the voice recognition model, and enable the depth feature to have better discrimination, thereby improving recognition performance. 1. A voiceprint recognition method , comprising:obtaining a to-be-recognized target voice message;obtaining target feature information of the target voice message by using a voiceprint recognition model, the voiceprint recognition model being obtained through training according to a first loss function and a second loss function, the first loss function being a normalized exponential function, and the second loss function being a centralization function; anddetermining a voiceprint recognition result according to the target feature information and registration feature information, the registration feature information being obtained from a voice message of a to-be-recognized object using the voiceprint recognition model.2. The method according to claim 1 , wherein the determining a voiceprint recognition result according to the target feature information and registration feature information ...

Подробнее
18-02-2021 дата публикации

SIGNAL PROCESSING SYSTEM, SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND RECORDING MEDIUM

Номер: US20210050021A1
Принадлежит: NEC Corporation

A feature vector having high class identification capability is generated. A signal processing system provided with: a first generation unit for generating a first feature vector on the basis of one of time-series voice data, meteorological data, sensor data, and text data, or on the basis of a feature quantity of one of these; a weight calculation unit for calculating a weight for the first feature vector; a statistical amount calculation unit for calculating a weighted average vector and a weighted high-order statistical vector of second or higher order using the first feature vector and the weight; and a second generation unit for generating a second feature vector using the weighted high-order statistical vector. 1. A signal processing system , comprising:a memory; andat least one processor coupled to the memory,the at least one processor performing operations to:generate a first feature vector based on any one of pieces of time-series voice data, weather data, sensor data, and text data, or a feature amount of any one of pieces of the data;calculate a weight for the first feature vector;calculate a weighted average vector and a weighted high-order statistical vector of second order or higher by using the first feature vector and the weight; andgenerate a second feature vector by using the weighted high-order statistical vector.2. The signal processing system according to claim 1 , whereinthe weighted high-order statistical vector is a weighted standard deviation vector or a weighted variance vector.3. The signal processing system according to claim 1 , whereinthe weighted high-order statistical vector is a weighted high-order statistical vector of third order or higher.4. The signal processing system according to claim 1 , whereina function to generate the first feature vector, a function to calculate the weight, a function to calculate the weighted average vector and the weighted high-order statistical vector and a function to generate to the second feature ...

Подробнее
25-02-2021 дата публикации

METHOD AND APPARATUS FOR VOICE IDENTIFICATION, DEVICE AND COMPUTER READABLE STORAGE MEDIUM

Номер: US20210056975A1
Принадлежит:

Embodiments of the present disclosure provide a method and apparatus for voice identification, a device and a computer readable storage medium. The method may include: for an inputted voice signal, obtaining a first piece of decoded acoustic information by a first acoustic model and obtaining a second piece of decoded acoustic information by a second acoustic model, where the second acoustic model being generated by joint modeling of acoustic model and language model. The method may further include determining a first group of candidate identification results based on the first piece of decoded acoustic information, determining a second group of candidate identification results based on the second piece of decoded acoustic information, and then determining a final identification result for the voice signal based on the first group of candidate identification results and the second group of candidate identification results. 1. A method for voice identification , comprising:obtaining, for an inputted voice signal, a first piece of decoded acoustic information and a second piece of decoded acoustic information respectively by a first acoustic model and a second acoustic model, the first acoustic model being generated by acoustic modeling and the second acoustic model being generated by joint modeling of acoustic model and language model;determining a first group of candidate identification results and a second group of candidate identification results respectively based on the first piece of decoded acoustic information and the second piece of decoded acoustic information; anddetermining an identification result for the voice signal based on the first group of candidate identification results and the second group of candidate identification results.2. The method according to claim 1 , wherein the first acoustic model is a connectionist temporal classification (CTC) model claim 1 , the second acoustic model is a streaming multi-layer truncated attention (SMLTA) model ...

Подробнее
22-02-2018 дата публикации

AUTOMATED AUDIO DATA SELECTOR

Номер: US20180053511A1
Принадлежит:

Aspects define a capture signal as audio inputs by a user of word content. An input of the capture signal word content is recognized in response to an audio input of the user reciting the capture signal word content into a microphone in communication with the recording device during a recording of a speech presentation by the recording device. A recording portion start time is identified that is prior to a time of the input of the capture signal during the current recording of the audio speech presentation in response to recognizing the input of the capture signal word content from the user. The recording device is driven to capture a portion of the recorded audio speech presentation over a period of time spanning from the recording portion start time to the time of the input of the capture signal word content. 1. A computer-implemented method for the automated generation of audio selections , comprising executing on a computer processor the steps of:defining a capture signal as an audio input into a microphone of a recording device of a recitation by a user of word content that is selected from the group consisting of a specific word, and a phrase of multiple specific words;in response to an audio input of the user reciting the capture signal word content into a microphone in communication with a recording device during a recording of a speech presentation by the recording device, recognizing an input of the capture signal word content to the recording device;in response to recognizing the input of the capture signal word content from the user, identifying a recording portion start time during the current recording of the audio speech presentation that is prior to a time of the input of the capture signal; anddriving the recording device to capture a portion of the recorded audio speech presentation over a period of time spanning from the recording portion start time to the time of the input of the capture signal word content.2. The method of claim 1 , further ...

Подробнее
22-02-2018 дата публикации

Systems and Methods for Estimating Age of a Child Based on Speech

Номер: US20180053514A1
Принадлежит: Disney Enterprises Inc

There is provided a system comprising a microphone, configured to receive an input speech from an individual, an analog-to-digital (A/D) converter to convert the input speech to digital form and generate a digitized speech, a memory storing an executable code and an age estimation database, a hardware processor executing the executable code to receive the digitized speech, identify a plurality of boundaries in the digitized speech delineating a plurality of phonemes in the digitized speech, extract a plurality of formant-based feature vectors from each phoneme in the digitized speech based on at least one of a formant position, a formant bandwidth, and a formant dispersion, compare the plurality of formant-based feature vectors with age determinant formant-based feature vectors of the age estimation database, determine the age of the individual when the comparison finds a match in the age estimation database, and communicate an age-appropriate response to the individual.

Подробнее
26-02-2015 дата публикации

COLLABORATIVE AUDIO CONVERSATION ATTESTATION

Номер: US20150058017A1
Принадлежит:

Disclosed in some examples are systems, methods, devices, and machine readable mediums which may produce an audio recording with included verification from the individuals in the recording that the recording is accurate. In some examples, the system may also provide rights management control to those individuals. This may ensure that individuals participating in audio events that are to be recorded are assured that their words are not changed, taken out of context, or otherwise altered and that they retain control over the use of their words even after the physical file has left their control. 125-. (canceled)26. A method of recording audio comprising:using one or more processors to perform the operations of:receiving a voice exemplar from each of a plurality of individuals;recording an audio event;determining a plurality of audio event segments of the audio event, the audio event segments determined based upon changes in at least one identified active speaker, each segment having at least one corresponding identified active speaker, the identification based upon the received voice exemplars;receiving verification information for at least one segment from the corresponding identified active speaker for the at least one segment; andresponsive to receiving verification information for at least one segment, producing a master audio file including the plurality of audio event segments and verification information.27. The method of claim 26 , comprising sending a verification request for the at least one segment to the corresponding at least one identified active speaker for that segment.28. The method of claim 27 , wherein the verification request includes an audio clip of the segment.29. The method of claim 27 , comprising automatically generating a transcript of each segment and wherein the verification request includes the transcript of the segment.30. The method of claim 26 , comprising receiving digital rights management information for a respective segment from ...

Подробнее
23-02-2017 дата публикации

Blind Diarization of Recorded Calls With Arbitrary Number of Speakers

Номер: US20170053653A1
Автор: Sidi Oana, Wein Ron
Принадлежит:

In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded. 1. A method for automatically transcribing a customer service telephone conversation between an arbitrary number of speakers , the method comprising:receiving data corresponding to the telephone conversation, wherein the received data comprises audio data;separating the audio data into frames;analyzing the frames to identify utterances, wherein each utterance comprises a plurality of frames; identifying homogeneous speaker segments in the audio data, and', 'associating each homogenous speaker segments to a corresponding speaker in the telephone conversation,, 'performing blind diarization of the audio data to differentiate speakers, wherein the blind diarization comprisestagging each homogeneous speaker segment in the telephone conversation with a tag unique for each speaker; comparing the homogeneous speaker segments in the telephone conversation to one or more models retrieved from a database, and', 'based on the comparison, identifying one or more of the speakers; and, "performing speaker diarization to replace one or more of the tags with a speaker's identity, wherein the speaker diarization comprises:"}transcribing the conversation to obtain a text representation of the conversation, wherein each spoken part of the conversation is labeled with either the speaker's identity or the tag associated with the speaker.2. The method according to claim 1 , wherein the identifying homogeneous speaker segments in the audio data comprises using voice activity detection to identify segments of speech separated by segments of non-speech on ...

Подробнее
13-02-2020 дата публикации

ARTIFICIAL INTELLIGENCE DEVICE

Номер: US20200051571A1
Принадлежит: LG ELECTRONICS INC.

An AI device is provided. The AI device includes a memory to store data, a voice acquisition interface to acquire a voice signal, and a processor to perform preprocessing for the voice signal based on a parameter, to provide the preprocessed voice signal to a voice recognition model, to acquire a voice recognition result, to store a characteristic of the preprocessed voice signal in the memory, and to change the parameter using a distribution of characteristics of voice signals accumulated in the memory. 1. An artificial intelligence (AI) device comprising:a memory configured to store data;a voice acquisition interface configured to acquire a voice signal; anda processor configured to: perform preprocessing for the voice signal based on a parameter, provide the preprocessed voice signal to a voice recognition model to acquire a voice recognition result, store a characteristic of the preprocessed voice signal in the memory, and change the parameter using a distribution of characteristics of voice signals accumulated in the memory.2. The AI device of claim 1 , wherein the parameter includes:at least one of an attenuation amount of a noise signal in the voice signal or a normalization value of a speech signal in the voice signal, andwherein the processor is configured to:perform, based on the parameter, at least one of an operation of lowering a level of the noise signal by the attenuation amount or adjusting a level of the speech signal to the normalization value.3. The AI device of claim 1 , wherein the characteristics of the voice signals accumulated in the memory include at least one of levels of noise signals in voice signals preprocessed based on the parameter or levels of speech signals in the voice signals preprocessed based on the parameter claim 1 , andwherein the processor is configured to:acquire at least one of a distribution of levels of noise signals accumulated in the memory or a distribution of levels of speech signals accumulated in the memory.4. The ...

Подробнее
03-03-2016 дата публикации

Sound source-separating device and sound source -separating method

Номер: US20160064000A1
Принадлежит: Honda Motor Co Ltd

A sound source-separating device includes a sound-collecting part, an imaging part, a sound signal-evaluating part, an image signal-evaluating part, a selection part that selects whether to estimate a sound source direction based on the first sound signal or the first image signal, a person position-estimating part that estimates a sound source direction using the first image signal, a sound source direction-estimating part that estimates a sound source direction, a sound source-separating part that extracts a second sound signal corresponding to the sound source direction from the first sound signal, an image-extracting part that extracts a second image signal of an area corresponding to the estimated sound source direction from the first image signal, and an image-combining part that changes a third image signal of an area other than the area for the second image signal and combines the third image signal with the second image signal.

Подробнее
03-03-2016 дата публикации

VAD Detection Apparatus and Method of Operation the Same

Номер: US20160064001A1
Принадлежит: Individual

At a processing device, a first signal from a first microphone and a second signal from a second microphone are received. The first signal indicates whether a voice signal has been determined at the first microphone, and the second signal indicates whether a voice signal has been determined at the second microphone. When the first signal indicates potential voice activity or the second signal indicates potential voice activity, the processing device is activated to receive data and the data is examined for a trigger word. When the trigger word is found, a signal is sent to an application processor to further process information from one or more of the first microphone and the second microphone. When no trigger word is found, the processing device is reset to deactivate data input and allowing the first microphone and the second microphone to enter or maintain an event detection mode of operation.

Подробнее
20-02-2020 дата публикации

System and method for generating time-spectral diagrams in an integrated circuit solution

Номер: US20200057932A1
Автор: Lin Yang, XIANG Gao
Принадлежит: Gyrfalcon Technology Inc

A system for encoding data in an artificial intelligence (AI) integrated circuit solution may include a processor configured to receive voice data comprising at least a segment of an audio waveform, load the voice data into an input array of a cellular neural network (CeNN) in the AI integrated circuit, load one or more wavelet filters into one or more kernels of the CeNN, and perform one or more operations on the voice data to generate a time-spectral diagram. The time-spectral diagram may include a wavelet transformation of the voice data. Each of the one or more filters respectively represents a frequency. The system may also output a voice recognition result based on the time-spectral. Sample training data may be encoded in a similar manner for training the cellular neural network.

Подробнее
01-03-2018 дата публикации

SPEECH RECOGNITION METHOD AND APPARATUS

Номер: US20180061397A1
Принадлежит:

The present application discloses speech recognition methods and apparatuses. An exemplary method may include extracting, via a first neural network, a vector containing speaker recognition features from speech data. The method may also include compensating bias in a second neural network in accordance with the vector containing the speaker recognition features. The method may further include recognizing speech, via an acoustic model based on the second neural network, in the speech data. 1. A speech recognition method , comprising:extracting, via a first neural network, a vector containing speaker recognition features from speech data;compensating bias in a second neural network in accordance with the vector containing the speaker recognition features; andrecognizing speech, via an acoustic model based on the second neural network, in the speech data.2. The speech recognition method of claim 1 , wherein compensating bias in the second neural network in accordance with the vector containing the speaker recognition features includes:multiplying the vector containing the speaker recognition features by a weight matrix to be a bias term of the second neural network.3. The speech recognition method of claim 2 , wherein the first neural network claim 2 , the second neural network claim 2 , and the weight matrix are trained through:training the first neural network and the second neural network respectively; andcollectively training the trained first neural network, the weight matrix, and the trained second neural network.4. The speech recognition method of claim 3 , further comprising:initializing the first neural network, the second neural network, and the weight matrix;updating the weight matrix using a back propagation algorithm in accordance with a predetermined objective criterion; andupdating the second neural network and a connection matrix using the error back propagation algorithm in accordance with a predetermined objective criterion.5. The speech recognition ...

Подробнее
20-02-2020 дата публикации

Electronic device and communication connection method using voice thereof

Номер: US20200058309A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An electronic device according to various embodiments of the present invention includes: a microphone; a communication module; a memory; and at least one processor, wherein the processor can receive and record a voice through the microphone while a function of receiving the voice is activated, generate first authentication data including data for the voice and identification data for the electronic device on the basis of the recorded voice, determine the mode of the electronic device on the basis of the recorded voice, send the first authentication data, receive second authentication data corresponding to the first authentication data, use identification data included in the second authentication data to connect communication with an external electronic device when the data for the voice included in the first authentication data matches data for voice included in the second authentication data, and perform, according to the mode, at least one function related to the communication-connected external electronic device and the voice. Various other embodiments are also possible.

Подробнее
04-03-2021 дата публикации

METHODS AND SYSTEMS FOR INTELLIGENT CONTENT CONTROLS

Номер: US20210065719A1
Принадлежит:

Provided are methods and systems for intelligent content controls. A command may be received during presentation of content. The command may be time-driven, context-driven, or a combination of both. An end boundary may be determined based on a duration of time and/or one or more words of the command. Presentation of the content may be terminated at a nearest content transition with respect to the end boundary. 1. A method comprising:receiving a command associated with enforcement of content controls during presentation of a scene of content;determining, based on a portion of the command relating to metadata associated with the scene of the content and a timestamp, an end boundary; andcausing presentation of the content to be terminated at the end boundary.2. The method of claim 1 , wherein the command comprises a voice signature.3. The method of claim 2 , further comprising determining claim 2 , based on the voice signature claim 2 , that the command is authorized.4. The method of claim 1 , wherein the portion of the command comprises one or more keywords relating to the metadata associated with the scene of the content and the timestamp.5. The method of claim 1 , wherein the content is presented at a user device claim 1 , and wherein causing presentation of the content to be terminated at the end boundary comprises one or more of:causing the user device to power off at the timestamp;causing the user device to disregard a further command received at the user device at or following the timestamp; orcausing the user device to present a screensaver at the timestamp.6. The method of claim 1 , further comprising determining claim 1 , based on a content transition occurring during presentation of the content nearest the timestamp claim 1 , an adjusted end boundary claim 1 , wherein presentation of the content is caused claim 1 , based on the adjusted end boundary claim 1 , to be terminated at the content transition.7. The method of claim 1 , further comprising:determining ...

Подробнее
28-02-2019 дата публикации

DIARIZATION USING LINGUISTIC LABELING

Номер: US20190066690A1
Принадлежит: VERINT SYSTEMS LTD.

Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction. 120-. (canceled)21. A method of diarization , the method comprising:receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server;performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers;analyzing the selected textual speaker clusters with the processor to create at least one linguistic model;applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers;saving the at least one linguistic model to a linguistic database server and associating it with the labeled speaker; andwith the processor, applying the saved at least one linguistic model from the linguistic database server to a new audio file transcript from an audio ...

Подробнее
28-02-2019 дата публикации

DIARIZATION USING LINGUISTIC LABELING

Номер: US20190066691A1
Принадлежит: VERINT SYSTEMS LTD.

Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcribed audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction. 120.-. (canceled)21. A method of diarization , the method comprising:receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server;performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers, wherein the at least one heuristic is a comparison of a plurality of scripts associated with the identified group of speakers to each set of the textual speaker clusters and a correlation score between each of the textual speaker clusters and the plurality of scripts is calculated and the speaker cluster in each set with the greatest correlation score is selected as being the transcript likely to be associated with the identified group of speakers;analyzing the selected textual speaker clusters with the processor to create at least one linguistic model, ...

Подробнее
28-02-2019 дата публикации

DIARIZATION USING LINGUISTIC LABELING

Номер: US20190066692A1
Принадлежит: VERINT SYSTEMS LTD.

Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction. 120.-. (canceled)21. A method of diarization , the method comprising:receiving a set of textual transcripts from a transcription server and a set of audio files associated with the set of textual transcripts from an audio database server;performing a blind diarization on the set of textual transcripts and the set of audio files to segment and cluster the textual transcripts into a plurality of textual speaker clusters, wherein the number of textual speaker clusters is at least equal to a number of speakers in the textual transcript;automatedly applying at least one heuristic to the textual speaker clusters with a processor to select textual speaker clusters likely to be associated with an identified group of speakers;analyzing the selected textual speaker clusters with the processor to create at least one linguistic model;applying the linguistic model to transcribed audio data with the processor to label a portion of the transcribed audio data as having been spoken by the identified group of speakers;saving the at least one linguistic model to a linguistic database server and associating it with the labeled speaker;with the processor, receiving a new textual transcript from the transcription server and a new audio file associated with the new textual transcript from the ...

Подробнее
28-02-2019 дата публикации

DIARIZATION USING ACOUSTIC LABELING

Номер: US20190066693A1
Принадлежит: VERINT SYSTEMS LTD.

Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker. 120.-. (canceled)21. A method of diarization of audio files , the method comprising:receiving a plurality of audio files from a database server and speaker metadata associations with each of the plurality of audio files, wherein each audio file is a recording of a customer service interaction including a known speaker and at least one other speaker, wherein the known speaker is a specific customer service agent and the at least one other speaker is a customer;selecting a subset of the audio files, wherein each audio file of the subset is selected to maximize an acoustical difference in voice frequencies between the known speaker and the at least one other speaker in the same audio file;performing a blind diarization on the subset of audio files to segment the audio files into a plurality of segments of speech separated by non-speech, such that each segment has a high likelihood of containing speech sections from a single speaker;automatedly applying at least one metric to the segments of speech with a processor to label segments of speech likely to be associated with the known speaker and clustering the selected segments into an audio speaker segment;analyzing the selected audio speaker segment to create an acoustic voiceprint, wherein the acoustic voiceprint is built from all the selected speaker segments;applying the acoustic voiceprint to the audio files with the processor to label a portion of the audio file as having been spoken by the known speaker;adding the labeled portion of the audio file to the acoustic voiceprint;saving the acoustic voiceprint to a ...

Подробнее
28-02-2019 дата публикации

Voiceprint registration method, server and storage medium

Номер: US20190066695A1
Автор: Cong Gao

Embodiments of the present disclosure provide a voiceprint registration method, a server and a storage medium. The method may include: acquiring present speech information collected by a smart device; extracting a present voiceprint feature of the present speech information; determining whether the present voiceprint feature is a voiceprint feature associated with the smart device; and determining the present voiceprint feature as a user identification associated with the smart device to determine the present voiceprint feature as the voiceprint feature associated with the smart device, in response to determining that the present voiceprint feature is not the voiceprint feature associated with the smart device.

Подробнее
09-03-2017 дата публикации

Voice command input device and voice command input method

Номер: US20170069321A1
Автор: Keiichi Toiyama

The A voice command input device includes a first voice input unit, a second voice input unit, and a voice command identifier. The first voice input unit converts a voice into first voice command information, and outputs first identification information and the first voice command information. The second voice input unit converts a voice into second voice command information, and outputs second identification information and the second voice command information. The voice command identifier refers to the first identification information and the second identification information, and generates a control signal for controlling an operation target appliance based on the result of referring, the first voice command information, and the second voice command information.

Подробнее
11-03-2021 дата публикации

ELECTRONIC APPARATUS AND CONTROL METHOD THEREOF

Номер: US20210074302A1
Автор: SEO Heekyoung
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

Disclosed is an electronic apparatus which identifies utterer characteristics of an uttered voice input received; identifies one utterer group among a plurality of utterer groups based on the identified utterer characteristics; outputs a recognition result among a plurality of recognition results of the uttered voice input based on a voice recognition model corresponding to the identified utterer group among a plurality of voice recognition models provided corresponding to the plurality of utterer groups, the plurality of recognition results being different in recognition accuracy from one another; identifies recognition success or failure in the uttered voice input with respect to the output recognition result; and changes a recognition accuracy of the output recognition result in the voice recognition model corresponding to the recognition success, based on the identified recognition success in the uttered voice input. 1. An electronic apparatus comprising identify utterer characteristics of an uttered voice input received;', 'identify one utterer group among a plurality of utterer groups based on the identified utterer characteristics;', 'output a recognition result among a plurality of recognition results of the uttered voice input based on a voice recognition model corresponding to the identified utterer group among a plurality of voice recognition models provided corresponding to the plurality of utterer groups, the plurality of recognition results being different in recognition accuracy from one another;', 'identify recognition success or failure in the uttered voice input with respect to the output recognition result; and', 'change a recognition accuracy of the output recognition result in the voice recognition model corresponding to the recognition success, based on the identified recognition success in the uttered voice input., 'a processor configured to2. The electronic apparatus according to claim 1 , wherein the processor is configured to obtain the ...

Подробнее
11-03-2021 дата публикации

PRIVACY-PRESERVING VOICEPRINT AUTHENTICATION APPARATUS AND METHOD

Номер: US20210075787A1
Автор: Yan Zheng, Zhang Rui
Принадлежит:

A voiceprint authentication apparatus is provided, comprising: a voice receiving module configured to receive a user's voices in different speaking modes; a feature extraction module configured to extract respective sets of voice features from the user's voices in different speaking modes; a synthesis module configured to generate a first voiceprint template by synthesizing the respective sets of voice features; and a first communication module configured to send the first voiceprint template to a server to authenticate the user, wherein the user's voices and the respective sets of voice features are not sent to the server. A corresponding voice authentication method, as well as a computer readable medium, are also provided. 123-. (canceled)24. An apparatus , comprising:at least one processing core,at least one memory including computer program code,the at least one memory and the computer program code being configured to, with the at least one processing core, cause the apparatus at least toreceive a user's voices in different speaking modes;extract respective sets of voice features from the user's voices in the different speaking modes;generate a first voiceprint template by synthesizing the respective sets of the voice features; andsend the first voiceprint template to a server to authenticate the user, wherein the user's voices and the respective sets of the voice features are not sent to the server.25. The apparatus of claim 24 , wherein the apparatus is further configured to:extract a first set of features from the user's voice in a first speaking mode based on a linear prediction cepstrum coefficient algorithm; andextract a second set of features from the user's voice in a second speaking mode based on a mel frequency cepstral coefficient algorithm.26. The apparatus of claim 24 , wherein the apparatus is further configured to:synthesize the respective sets of the voice features with a voice synthesis algorithm based on a log magnitude approximate vocal tract ...

Подробнее
16-03-2017 дата публикации

ELECTRONIC DEVICE AND METHOD

Номер: US20170075652A1
Автор: KIKUGAWA Yusaku
Принадлежит:

According to one embodiment, an electronic device records an audio signal, determines a plurality of user-specific utterance features within the audio signal, the plurality of user-specific utterance features including a first set of user specific-utterance features associated with the registered user and a second set of user-specific utterance features associated with the unregistered user, and displays the identifier of the registered user differently than an identifier of the unregistered user. 1. An electronic device comprising:a microphone configured to obtain audio and convert the audio into a first audio signal, the first audio including utterances from a first user and utterances from a second user, wherein one of the first user or the second user is a registered user and the other of the first user or second user is an unregistered user;a memory, wherein the memory stores an identifier associated with the registered user; and record the first audio signal;', 'determine a plurality of user-specific utterance features within the first audio signal, the plurality of user-specific utterance features including a first set of user specific-utterance features associated with the registered user and a second set of user-specific utterance features associated with the unregistered user;', 'display the identifier of the registered user differently than an identifier of the unregistered user., 'a hardware processor in communication with the memory, the hardware processor configured to2. The electronic device of claim 1 , wherein the hardware processor is further configured to:identify the first set of user-specific utterance features associated with the registered user from a second audio signal;register the first set of user-specific utterance features with the registered user by associating the user-specific utterance feature with the identifier of the first user in the memory; anddetermine which portions of the second audio signal correspond to utterances from the ...

Подробнее
07-03-2019 дата публикации

Multiple Voice Services

Номер: US20190074014A1
Автор: Wilberding Dayn
Принадлежит:

Disclosed herein are example techniques to identify a voice service to process a voice input. An example implementation may involve a playback device capturing, via a microphone array, audio into one or more buffers. The playback device analyzes analyzing the captured audio using multiple wake-word detection algorithms. When a particular wake-word detection algorithm detects a wake-word corresponding to a particular voice assistant service, the playback device transmits the captured audio to the particular voice assistant service. The captured audio includes a voice input that includes a command to modify at least one playback setting of a media playback system. After transmitting the captured audio, the playback device receives, from the particular voice assistant service, instructions to modify the at least one playback setting according to the command, modifies the at least one playback setting, and with the at least one playback setting modified, plays back at least one audio track. 1. A playback device comprising:one or more amplifiers configured to drive one or more speakers;a microphone array;a network interface;one or more processors;tangible, non-transitory computer-readable media having stored therein instructions executable by the one or more processors to cause the playback device to perform a method comprising:continuously capturing, via the microphone array, audio into one or more buffers;analyzing the captured audio using multiple wake-word detection algorithms running concurrently on the one or more processors, each wake-word detection algorithm corresponding to a respective voice assistant service among multiple voice assistant services supported by the playback device;when a particular wake-word detection algorithm of the multiple wake-word detection algorithms detects, in the captured audio, a wake-word corresponding to a particular voice assistant service, transmitting, via the network interface, the captured audio to the particular voice ...

Подробнее
15-03-2018 дата публикации

End-to-end speaker recognition using deep neural network

Номер: US20180075849A1
Принадлежит: Pindrop Security Inc

The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

Подробнее
15-03-2018 дата публикации

SPEAKER SEGMENTATION AND CLUSTERING FOR VIDEO SUMMARIZATION

Номер: US20180075877A1
Принадлежит: Intel Corporation

Techniques are provided for video summarization, based on speaker segmentation and clustering, to identify persons and scenes of interest. A methodology implementing the techniques according to an embodiment includes extracting audio content from a video stream and detecting one or more segments of the audio content that include the voice of a single speaker. The method also includes grouping the one or more detected segments into an audio cluster associated with the single speaker and providing a portion of the audio cluster to a user. The method further includes receiving an indication from the user that the single speaker is a person of interest. Segments of interest are then extracted from the video stream, where each segment of interest is associated with a scene that includes the person of interest. The extracted segments of interest are then combined into a summarization video. 1. A processor-implemented method for video summarization , the method comprising:detecting, by a processor, one or more segments of audio content, the segments including the voice of a single speaker, the audio content extracted from a video stream;grouping, by the processor, the one or more detected segments into an audio cluster associated with the single speaker;providing, by the processor, a portion of the audio cluster to a user;receiving, by the processor, an indication from the user that the single speaker is a person of interest (POI);extracting, by the processor, segments of interest (SOIs) from the video stream, each SOI associated with a scene that includes the POI; andcombining, by the processor, the extracted SOIs into a summarization video.2. The method of claim 1 , further comprising extracting feature vectors from the audio cluster claim 1 , matching the feature vectors to an existing speaker model claim 1 , and designating the existing speaker model as the speaker model associated with the single speaker.3. The method of claim 2 , further comprising claim 2 , in ...

Подробнее
16-03-2017 дата публикации

Speech processing device, speech processing method, and computer program product

Номер: US20170076727A1
Автор: Makoto Hirohata, Ning Ding
Принадлежит: Toshiba Corp

According to an embodiment, a speech processing device includes an extractor, a classifier, a similarity calculator, and an identifier. The extractor is configured to extract a speech feature from utterance data. The classifier is configured to classify the utterance data into a set of utterances for each speaker based on the extracted speech feature. The similarity calculator is configured to calculate a similarity between the speech feature of the utterance data included in the set and each of a plurality of speaker models. The identifier is configured to identify a speaker for each set based on the calculated similarity.

Подробнее
24-03-2022 дата публикации

SPEAKER IDENTIFICATION

Номер: US20220093108A1
Автор: Lesso John Paul

A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process. 1. A method of speaker identification , performed in a portable electronic device , wherein the portable electronic device comprises first and second integrated circuits , the method comprising:receiving an audio signal representing speech;performing a first voice biometric process on the audio signal in said first integrated circuit of said portable electronic device to attempt to identify whether the speech is the speech of an enrolled speaker; andif the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal in said second integrated circuit of said portable electronic device to attempt to identify whether the speech is the speech of the enrolled speaker,wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.2. A method according to claim 1 , wherein the second voice biometric process is configured to have a lower Equal Error Rate than the first voice biometric process.3. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled speaker claim 1 , based on a result of the second voice biometric process.4. A method according to claim 1 , comprising making a decision as to whether the speech is the speech of the enrolled ...

Подробнее
24-03-2022 дата публикации

ANALYSING SPEECH SIGNALS

Номер: US20220093111A1
Автор: Lesso John Paul

A method of analysis of an audio signal comprises: receiving an audio signal representing speech; extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively; analyzing the first and second components of the audio signal with models of the first and second acoustic classes of the speech of an enrolled user. Based on the analyzing, information is obtained information about at least one of a channel and noise affecting the audio signal. 112.-. (canceled)13. A method of analysis of an audio signal , the method comprising:receiving an audio signal representing speech;extracting first and second components of the audio signal representing first and second acoustic classes of the speech respectively, the first acoustic class of speech being different to the second acoustic class of speech;analyzing the extracted first component of the audio signal with a first model of speech of an enrolled user in the first acoustic class and the extracted second component of the audio signal with a second model of speech of the enrolled user in the second acoustic class; andbased on said analyzing, determining at least one of a property of a channel affecting the first and second components of said audio signal and a property of noise affecting the first and second components of said audio signal.14. A method according to claim 13 , wherein the first and second acoustic classes are phonetically distinguishable acoustic classes of speech.15. A method according to claim 13 , wherein the first and second acoustic classes each comprise one of the following classes:a) a phoneme class;b) a vowel class;c) a fricative class;d) a sibilant class;e) a voiced class;f) an unvoiced class.16. A method according to claim 13 , wherein extracting first and second components of the audio signal comprises:identifying periods when the audio signal contains voiced speech; andidentifying remaining periods of speech as containing unvoiced ...

Подробнее
05-03-2020 дата публикации

MULTIMEDIA PROCESSING CIRCUIT AND ELECTRONIC SYSTEM

Номер: US20200075023A1
Принадлежит:

A multimedia processing circuit is provided. The multimedia processing circuit includes a smart interpreter engine and an audio engine. The smart interpreter engine includes a noise suppression module, a vocal identification module and a speech to text converter. The noise suppression module is utilized for performing a noise suppression process on speech data corresponding to a first language. The vocal identification module is utilized for performing a vocal identification process on the noise-suppressed speech data corresponding to the first language to generate vocal identification data corresponding to the first language. The speech to text converter is utilized for converting the vocal identification data corresponding to the first language into text data corresponding to the first language. The audio engine is utilized for receiving speech data corresponding to the first language and converting the speech data corresponding to the first language into an analog speech signal corresponding to the first language. 1. A multimedia processing circuit , comprising: a noise suppression module for performing a noise suppression process on speech data corresponding to a first language;', 'a vocal identification module for performing a vocal identification process on the noise-suppressed speech data corresponding to the first language to generate vocal identification data corresponding to the first language; and', 'a speech to text converter for converting the vocal identification data corresponding to the first language into text data corresponding to the first language; and, 'a smart interpreter engine, comprisingan audio engine for receiving the speech data corresponding to the first language and converting the speech data corresponding to the first language into an analog speech signal corresponding to the first language.2. The multimedia processing circuit of claim 1 , wherein the smart interpreter engine further comprises:a natural language processing module for ...

Подробнее
05-03-2020 дата публикации

SPEAKER RECOGNITION AND SPEAKER CHANGE DETECTION

Номер: US20200075028A1
Автор: Lesso John Paul

A method of speaker recognition comprises: receiving an audio signal comprising speech; performing a biometric process on a first part of the audio signal, wherein the first part of the audio signal extends over a first time period; obtaining a speaker recognition score from the biometric process for the first part of the audio signal; performing a biometric process on a plurality of second parts of the audio signal, wherein the second parts of the audio signal are successive sections of the first part of the audio signal, and wherein each second part of the audio signal extends over a second time period and the second time period is shorter than the first time period; obtaining a respective speaker recognition score from the biometric process for each second part of the audio signal; and determining whether there has been a speaker change based on the respective speaker recognition scores for successive second parts of the audio signal. 1. A method of speaker recognition , comprising:receiving an audio signal comprising speech;performing a biometric process on a first part of the audio signal, wherein the first part of the audio signal extends over a first time period;obtaining a speaker recognition score from the biometric process for the first part of the audio signal;performing a biometric process on a plurality of second parts of the audio signal, wherein the second parts of the audio signal are successive sections of the first part of the audio signal, and wherein each second part of the audio signal extends over a second time period and the second time period is shorter than the first time period;obtaining a respective speaker recognition score from the biometric process for each second part of the audio signal; anddetermining whether there has been a speaker change based on the respective speaker recognition scores for successive second parts of the audio signal.2. A method as claimed in claim 1 , wherein determining whether there has been a speaker change ...

Подробнее
18-03-2021 дата публикации

CONVOLUTIONAL NEURAL NETWORK WITH PHONETIC ATTENTION FOR SPEAKER VERIFICATION

Номер: US20210082438A1
Принадлежит:

Embodiments may include reception of a plurality of speech frames, determination of a multi-dimensional acoustic feature associated with each of the plurality of speech frames, determination of a plurality of multi-dimensional phonetic features, each of the plurality of multi-dimensional phonetic features determined based on a respective one of the plurality of speech frames, generation of a plurality of two-dimensional feature maps based on the phonetic features, input of the feature maps and the plurality of acoustic features to a convolutional neural network, the convolutional neural network to generate a plurality of speaker embeddings based on the plurality of feature maps and the plurality of acoustic features, aggregation of the plurality of speaker embeddings into a first speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, and determination of a speaker associated with the plurality of speech frames based on the first speaker embedding. 1. A system comprising:a processing unit; and determine a frame-level acoustic feature associated with each of a plurality of speech frames associated with a speaker;', 'determine a frame-level phonetic feature associated with each of the plurality of speech frames based on the frame-level acoustic feature associated with each of the plurality of frame-level acoustic features;', 'generate one or more two-dimensional feature maps based on the plurality of frame-level phonetic features;', 'input the one or more two-dimensional feature maps to a trained neural network to generate a plurality of frame-level speaker embeddings, the trained neural network including a convolutional neural network;', 'aggregate the plurality of frame-level speaker embeddings into a speaker embedding based on respective weights determined for each of the plurality of frame-level speaker embeddings; and', 'determine an identity of the speaker based on the speaker embedding., 'a storage device ...

Подробнее
18-03-2021 дата публикации

CHANNEL-COMPENSATED LOW-LEVEL FEATURES FOR SPEAKER RECOGNITION

Номер: US20210082439A1
Принадлежит:

A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers. 1. A computer-implemented method comprising:obtaining, by a computer, a recognition speech signal;generating, by the computer, a first degraded speech signal according to a first characteristic; andapplying, by the computer, a neural network on the first degraded speech signal to generate a first set of low-level features;modifying, by the computer, the first characteristic to generate a second characteristic;generating, by the computer, a second degraded speech signal for the recognition speech signal according to the second characteristic; andapplying, by the computer, the neural network on the second degraded speech signal to generate a second set of low-level features; andgenerating, by the computer, a trained neural network in response to determining that a plurality of sets of low-level features generated from the neural network satisfy a loss threshold.2. The method according to claim 1 , wherein determining that the sets of low-level features generated from the neural network satisfy the loss threshold comprises:calculating, by the computer, a loss result ...

Подробнее
14-03-2019 дата публикации

SPEAKER VERIFICATION COMPUTER SYSTEM WITH TEXTUAL TRANSCRIPT ADAPTATIONS OF UNIVERSAL BACKGROUND MODEL AND ENROLLED SPEAKER MODEL

Номер: US20190080697A1
Принадлежит:

A sampled speech data sequence contains words spoken by a speaker. A sequence of feature vectors is generated characterizing spectral distribution of sampled speech data. A textual transcript of the words spoken by the speaker is obtained. Data structures of a universal background model of a Gaussian mixture model (UBM-GMM) and of an Enrolled speaker Gaussian mixture model (ENR-GMM) are adapted responsive to the textual transcript, to generate an adapted UBM-GMM and an adapted ENR-GMM, respectively. An enrolled speaker probability is generated based on the sequence of feature vectors and the adapted ENR-GMM, and a universal speaker probability is generated based on the sequence of feature vectors and the adapted UBM-GMM. A speaker verification indication of whether the speaker is an enrolled speaker is generated by comparing the enrolled speaker probability to the universal speaker probability. 1. A method by a speaker verification computer system for verifying a speaker , the method comprising: obtaining a sequence of sampled speech data containing a sequence of words spoken by the speaker;', 'generating a sequence of feature vectors characterizing spectral distribution of the sequence of sampled speech data;', 'obtaining a textual transcript of the sequence of words spoken by the speaker;', 'adapting data structures of a universal background model of a Gaussian mixture model, UBM-GMM, and of an Enrolled speaker Gaussian mixture model, ENR-GMM, responsive to the textual transcript, to generate an adapted UBM-GMM and an adapted ENR-GMM, respectively;', 'generating an enrolled speaker probability based on a combination of the sequence of feature vectors and the adapted ENR-GMM, and a universal speaker probability based on a combination of the sequence of feature vectors and the adapted UBM-GMM;', 'generating a speaker verification indication of whether the speaker is an enrolled speaker based on a comparison of the enrolled speaker probability to the universal ...

Подробнее
14-03-2019 дата публикации

ADMINISTRATION OF PRIVILEGES BY SPEECH FOR VOICE ASSISTANT SYSTEM

Номер: US20190080698A1
Автор: Miller Gregory Thomas
Принадлежит:

A voice assistant system may be configured to receive a spoken introduction from a trusted user to introduce a new user and designate access privileges for the new user. The voice assistant system may process the speech using automated speech recognition and may parse the text to determine intent. The voice assistant system may also analyze the speech to determine a profile associated with the speaker that spoke the introduction. The voice assistant system may determine that the trusted user includes administrative privileges. The access privileges, when granted, may allow the new user to interact with the voice assistant system, such as to issue commands, extract information, play media, and/or perform other actions with the voice assistant system, which may be unavailable to people who are not introduced to the voice assistant system by a trusted user and/or do not have the access privileges associated with a user profile. 1. A computer-implemented method implemented by a voice-controlled assistant , the computer-implemented method comprising:receiving first speech from a first user who is associated with a first user profile having an administration privilege, the first speech to include at least identification of a group of privileges and a second user to gain privileges included in the group of privileges;analyzing first attributes of the first speech to determine that the first speech is associated with the first user profile;determining the second user and the group of privileges based at least in part on processing of the first speech;creating a second user profile associated with the second user;providing a prompt to request speech from the second user;receiving second speech from the second user;determining second attributes derived from the second speech, the second attributes to enable identification of subsequent speech associated with the second user; andassociating the second attributes and the group of privileges with the second user profile.2. The ...

Подробнее
22-03-2018 дата публикации

DETECTING CUSTOMERS WITH LOW SPEECH RECOGNITION ACCURACY BY INVESTIGATING CONSISTENCY OF CONVERSATION IN CALL-CENTER

Номер: US20180082676A1
Принадлежит:

Methods and a system are provided for estimating automatic speech recognition (ASR) accuracy. A method includes obtaining transcriptions of utterances in a conversation over two channels. The method further includes sorting the transcriptions along a time axis using a forced alignment. The method also includes training a language model with the sorted transcriptions. The method additionally includes performing ASR for utterances in a conversation between a first user and a second user. The second user is a target of ASR accuracy estimation. The method further includes determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model. The method also includes estimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user. 1. A method for estimating automatic speech recognition (ASR) accuracy , the method comprising:obtaining transcriptions of utterances in a conversation over two channels;sorting the transcriptions along a time axis using a forced alignment;training a language model with the sorted transcriptions;performing ASR for utterances in a conversation between a first user and a second user, the second user being a target of ASR accuracy estimation;determining whether an ASR result of the second user is consistent or inconsistent with an ASR result of the first user using the trained language model;listing word sequences that have at least one word from the first user followed by a word from the second user; andestimating the ASR result of the second user as poor responsive to the ASR result of the second user being as inconsistent with the ASR result of the first user.2. The method of claim 1 , wherein said obtaining claim 1 , sorting claim 1 , and training steps correspond to a training stage of the method claim 1 , and said performing claim 1 , determining claim 1 , and ...

Подробнее
22-03-2018 дата публикации

METHODS AND SYSTEM FOR REDUCING FALSE POSITIVE VOICE PRINT MATCHING

Номер: US20180082690A1
Принадлежит:

The methods, apparatus, and systems described herein are designed to reduce false positive voice print matching with fraudulent callers. A voice print of a call is created and compared to known voice prints to determine if it matches one or more of the known voice prints, and to transaction data associated with a database of voice prints. The methods include a pre-processing step to separate speech from non-speech, selecting a number of elements that affect the voice print the most, and/or generating a first score based on the number of selected audio elements matching audio elements of a voice print from the plurality of fraudulent speakers, determining if the first score exceeds a predetermined threshold score for the fraudulent speaker, and comparing the selected audio elements for the unknown caller, where the score exceeds the predetermined threshold score, to the voice prints associated with the customer account. 1. A method of reducing false positive matches in voice prints which comprises:receiving an audio communication from an unknown caller, separating a first portion of the audio communication into silent and non-silent segments, and evaluating the non-silent segments to determine which portions thereof are speech or non-speech;generating a plurality of parameters that determine what is speech and non-speech in the non-silent segments;using the generated parameters to determine what is speech and non-speech for at least the remainder of the telephonic communications;comparing the speech to selected audio elements of a background model that characterizes the speech of the unknown caller relative to a plurality of other audio elements of the background model;comparing the selected audio elements of the speech to matching audio elements of a recorded voice print from a plurality of fraudulent speakers to determine whether the speech belongs to a fraudulent speaker;generating a first score based on the number of selected audio elements matching audio ...

Подробнее
22-03-2018 дата публикации

DIMENSIONALITY REDUCTION OF BAUM-WELCH STATISTICS FOR SPEAKER RECOGNITION

Номер: US20180082691A1
Принадлежит: PINDROP SECURITY, INC.

In a speaker recognition apparatus, audio features are extracted from a received recognition speech signal, and first order Gaussian mixture model (GMM) statistics are generated therefrom based on a universal background model that includes a plurality of speaker models. The first order GMM statistics are normalized with regard to a duration of the received speech signal. The deep neural network reduces a dimensionality of the normalized first order GMM statistics, and outputs a voiceprint corresponding to the recognition speech signal. 1. A speaker recognition apparatus comprising:a feature extractor configured to extract audio features from a received recognition speech signal;a statistics accumulator configured to generate first order Gaussian mixture model (GMM) statistics from the extracted audio features based on a universal background model that includes a plurality of speaker models;a statistics normalizer configured to normalize the first order GMM statistics with regard to a duration of the received speech signal; anda deep neural network having a plurality of fully connected layers configured to reduce a dimensionality of the normalized first order GMM statistics, the deep neural network configured to output a voiceprint corresponding to the recognition speech signal.2. The speaker recognition apparatus according to claim 1 , wherein the fully connected layers of the deep neural network include:an input layer configured to receive the normalized first order statistics,one or more sequentially arranged first hidden layers arranged to receive coefficients from the input layer; anda last hidden layer arranged to receive coefficients from one hidden layer of the one or more first hidden layers, the last hidden layer having a dimension smaller than each of the one or more first hidden layers, the last hidden layer configured to output a voiceprint corresponding to the recognition speech signal.3. The speaker recognition apparatus according to claim 2 , wherein ...

Подробнее
22-03-2018 дата публикации

Channel-Compensated Low-Level Features For Speaker Recognition

Номер: US20180082692A1
Принадлежит: PINDROP SECURITY, INC.

A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers. 1. A system for generating channel-compensated low level features for speaker recognition , the system comprising:an acoustic channel simulator configured to receive a recognition speech signal, degrade the recognition speech signal to include characteristics of an audio channel, and output a degraded speech;a first feed forward convolutional neural network configured, in a training mode, to receive the degraded speech signal, and to derive from the degraded speech signal a plurality of channel-compensated low-level features, and further configured, in a test and enrollment mode, to receive the recognition speech signal and to calculate from the recognition speech signal a plurality of the channel-compensated low-level features;a speech signal analyzer configured, in the training mode, to extract features of the recognition speech signal;a loss function processor configured to calculate a loss based on the features from the speech analyzer and the channel-compensated low-level features from the first feed forward convolutional neural network;wherein, the calculated loss ...

Подробнее
26-03-2015 дата публикации

Anti-spoofing

Номер: US20150088509A1
Принадлежит: Agnitio SL

System for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier and method for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier.

Подробнее
24-03-2016 дата публикации

ELECTRONIC DEVICE, METHOD AND STORAGE MEDIUM

Номер: US20160086608A1
Автор: Yamaguchi Ryuichi
Принадлежит:

According to one embodiment, an electronic device includes a display controller and circuitry. The display controller displays a first object indicative of a first speaker, a first object indicative of a second speaker different from the first speaker, a second object indicative of a first speech period identified as a speech of the first speaker, and a second object indicative of a second speech period identified as a speech of the second speaker. The circuitry integrates the first speech period and the second speech period into a speech period of a same speaker when a first operation of associating the first object indicative of the first speaker with the first object indicative of the second speaker is operated. 1. An electronic device comprising:a receiver configured to receive audio data corresponding to speech from one or more speakers;a display controller configured to display a first object indicative of a first speaker, a first object indicative of a second speaker different from the first speaker, a second object indicative of a first speech period automatically identified as a speech of the first speaker, and a second object indicative of a second speech period automatically identified as a speech of the second speaker, based on the audio data; andcircuitry configured to integrate the first speech period and the second speech period into a speech period of a same speaker and cause the integrated speech period to be displayed, when a first operation comprising an operation of associating the first object indicative of the first speaker with the first object indicative of the second speaker is operated.2. The device of claim 1 , wherein the circuitry is configured to divide a third speech period automatically identified as a speech from a third speaker into a speech period of the third speaker and a fourth speech period automatically identified as a speech from a fourth speaker different from the third speaker and cause the divided speech periods to be ...

Подробнее
23-03-2017 дата публикации

VOICE RECOGNITION APPARATUS, VOICE RECOGNITION METHOD OF USER DEVICE, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM

Номер: US20170084278A1
Автор: JUNG Chi-sang
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A voice recognition apparatus, a voice recognition method, and a non-transitory computer readable recording medium are provided. The voice recognition apparatus includes a storage configured to store a preset threshold value for voice recognition; a voice receiver configured to receive a voice signal of an uttered voice; and a voice recognition processor configured to recognize a voice recognition starting word from the received voice signal, perform the voice recognition on the voice signal in response to a similarity score, which represents a recognition result of the recognized voice recognition starting word, being greater than or equal to the stored preset threshold value, and change the preset threshold value based on the recognition result of the voice recognition starting word. 1. A voice recognition apparatus comprising:a storage configured to store a preset threshold value for voice recognition;a voice receiver configured to receive a voice signal of an uttered voice; anda voice recognition processor configured to recognize a voice recognition starting word from the received voice signal, perform the voice recognition on the voice signal in response to a similarity score, which represents a recognition result of the recognized voice recognition starting word, being greater than or equal to the stored preset threshold value, and change the preset threshold value based on the recognition result of the voice recognition starting word.2. The voice recognition apparatus as claimed in claim 1 , wherein the voice recognition processor is further configured to change the preset threshold value and compare the changed present threshold value with a similarity score related to a text-based recognition result which is generated by recognizing the voice recognition starting word.3. The voice recognition apparatus as claimed in claim 2 , wherein the voice recognition processor is further configured to change the preset threshold value in response to the text-based ...

Подробнее
23-03-2017 дата публикации

REAL-TIME SPEAKER STATE ANALYTICS PLATFORM

Номер: US20170084295A1
Принадлежит:

Disclosed are machine learning-based technologies that analyze an audio input and provide speaker state predictions in response to the audio input. The speaker state predictions can be selected and customized for each of a variety of different applications. 1. A speech analytics platform implemented in one or more computing devices , for providing speech-derived speaker state data as a service , the platform comprising:a speech data processing subsystem embodied in one or more non-transitory machine accessible storage media, the speech data processing subsystem configured to produce speech data corresponding to audio input captured from a human or synthetic speaker, the produced speech data being dynamically segmented for real-time speech-based speaker state determination; and an automatic speech recognition module configured to perform a speech recognition operation on the speech data; and', 'a plurality of algorithms each configured for a different type of speaker state analytics, at least one of the algorithms extracting at least one non-word feature of the speech data and outputting speaker state data relating to the type of speaker state analytics for which the at least one algorithm has been configured., 'a plurality of analytics engines embodied in one or more non-transitory machine accessible storage media, wherein each of the plurality of analytics engines is configured to receive the pre-processed speech data from the speech data processing subsystem and provide as output a speaker state indicator, the plurality of analytics engines comprising2. The platform of claim 1 , wherein the speech data processing subsystem is configured to perform at least one of: speaker identification and speaker verification.3. The platform of claim 1 , wherein at least one of the plurality of analytics engines is further configured to extract one or more speech features from the speech data based at least in part on a criterion specified by end user software.4. The platform of ...

Подробнее
31-03-2022 дата публикации

SPEAKER RECOGNITION BASED ON SIGNAL SEGMENTS WEIGHTED BY QUALITY

Номер: US20220101859A1
Принадлежит: NEC Corporation

This speech processing device is provided with: a contribution degree estimation means which calculates a contribution degree representing a quality of a segment of the speech signal; and a speaker feature calculation means which calculates a feature from the speech signal, for recognizing attribute information of the speech signal, using the contribution degree as a weight of the segment of the speech signal. 1. A speech processing device , comprising:a processor; andmemory storing executable instructions that, when executed by the processor, causes the processor to perform as:a contribution degree estimation unit configured to calculate a contribution degree representing a quality of a segment of a speech signal indicative of speech, the speech signal being divided into a plurality of segments; anda speaker feature calculation unit configured to calculate a speaker feature from the speech signal, for recognizing attribute information of the speech signal, using the contribution degree as a weight of each segment of the speech signal, the speaker feature being indicative of individuality for identifying a speaker which utters the speech.2. The speech processing device as claimed in claim 1 ,wherein the processor is configured to divide the speech signal into silence segments and speech segments and to classify the quality in the speech segments into a speech sound leading to a correct solution in speaker recognition and a speech sound causing an error in the speaker recognition.3. The speech processing device as claimed in claim 1 , wherein the processor further performs as a speech statistic calculation unit configured to calculate a speech statistic representing a degree of appearance of each of types of sounds included in the speech signal claim 1 , and wherein the speaker feature calculation unit is configured to calculate the speaker feature on the basis of the speech statistic of the speech signal and the contribution degree of the speech signal.4. The speech ...

Подробнее
25-03-2021 дата публикации

Data processing method, device and apparatus for data processing

Номер: US20210089726A1
Автор: Guangchao YAO

In present disclosure, a data processing method, a data processing device, and an apparatus for data processing are provided. The method specifically includes: receiving a source language speech input by a target user; determining, based on the source language speech, a target acoustic model from a preset acoustic model library, the acoustic model library including at least two acoustic models corresponding to different timbre characteristics; converting, based on the target acoustic model, the source language speech into a target language speech; and outputting the target language speech. According to the embodiments of the present disclosure, the recognition degree of the speaker corresponding to the target language speech output by the translation device can be increased, and the effect of user communication can be improved.

Подробнее
29-03-2018 дата публикации

METHOD AND SYSTEM FOR USING CONVERSATIONAL BIOMETRICS AND SPEAKER IDENTIFICATION/VERIFICATION TO FILTER VOICE STREAMS

Номер: US20180090148A1
Принадлежит:

A method and system for using conversational biometrics and speaker identification and/or verification to filter voice streams during mixed mode communication. The method includes receiving an audio stream of a communication between participants. Additionally, the method includes filtering the audio stream of the communication into separate audio streams, one for each of the participants. Each of the separate audio streams contains portions of the communication attributable to a respective participant. Furthermore, the method includes outputting the separate audio streams to a storage system. 1. A method implemented in a computing system , the method comprising: the plurality of audio streams correspond, respectively, to a plurality of participants in the communication, and', 'the plurality of audio streams contain portions of the communication corresponding, respectively, to the plurality of participants; and, 'extracting a plurality of audio streams from a communication, whereinmatching one or more of the portions of the communication in the plurality of audio streams to voice prints by comparing the plurality of audio streams to only a plurality of the voice prints corresponding to identified participants within the communication; andadapting a speaker model of the voice prints after successfully matching the one or more of the portions of the communication in the plurality of audio streams to the voice prints, wherein adapting the speaker model includes capturing long-term voice changes of the identified participants in the voice prints used for the matching.2. The method of claim 1 , further comprising performing a verification process for at least one of the plurality of participants.3. The method of claim 1 , wherein the voice prints include seed phrases received for each of the plurality of participants.4. The method of claim 1 , wherein:providers of the voice prints are each associated with a role; andeach of the voice prints is stored in one of a plurality ...

Подробнее
21-03-2019 дата публикации

METHOD AND APPARATUS FOR PUSHING INFORMATION

Номер: US20190088262A1
Автор: WANG Wenyu
Принадлежит:

The present disclosure discloses a method and apparatus for pushing information. A specific embodiment of the method comprises: receiving voice information sent through a terminal by a user, the voice information including awakening voice information and querying voice information; extracting a voiceprint characteristic from the awakening voice information to obtain voiceprint characteristic information; matching the voiceprint characteristic information and a preset registration voiceprint information set, each piece of registration voiceprint information in the registration voiceprint information set including registration voiceprint characteristic information, and user behavior data of a registration user corresponding to the registration voiceprint characteristic information; and pushing, in response to the voiceprint characteristic information successfully matching the registration voiceprint characteristic information in the registration voiceprint information set, audio information to the terminal based on the querying voice information and user behavior data corresponding to the successfully matched registration voiceprint characteristic information. 1. A method for pushing information , comprising:receiving voice information sent through a terminal by a user, the voice information including awakening voice information and querying voice information, and the awakening voice information being used to switch the terminal from a standby state to a wake-up state;extracting a voiceprint characteristic from the awakening voice information to obtain voiceprint characteristic information;matching the voiceprint characteristic information and a preset registration voiceprint information set, each piece of registration voiceprint information in the registration voiceprint information set including registration voiceprint characteristic information, and user behavior data of a registration user corresponding to the registration voiceprint characteristic information; ...

Подробнее