Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 138. Отображено 138.
11-11-2014 дата публикации

Frequency ratio fingerprint characterization for audio matching

Номер: US0008886543B1

System and methods for characterizing interest points within a fingerprint are disclosed herein. The systems include generating a set of interest points and an anchor point related to an audio sample. A quantized absolute frequency of an anchor point can be calculated and used to calculate a set of quantized ratios. A fingerprint can then be generated based upon the set of quantized ratios and used in comparison to reference fingerprints to identify the audio sample. The disclosed systems and methods provide for an audio matching system robust to pitch-shift distortion by using quantized ratios within fingerprints rather than solely using absolute frequencies of interest points. Thus, the disclosed system and methods result in more accurate audio identification.

Подробнее
27-11-2018 дата публикации

Method for siren detection based on audio samples

Номер: US0010140998B2
Принадлежит: Waymo LLC, WAYMO LLC

The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indicate an identification of the captured sound. Because shorter audio samples can be analyzed more quickly, the system may first process the shortest audio samples in order to quickly identify features of the audio signal. Because longer audio samples contain more information, the system may be able to more accurately identify features in the audio signal in longer audio samples. However, analyzing longer audio signals takes more buffered audio than identifying features in shorter signals. Therefore, the present system attempts to identify features in the shortest audio signals first.

Подробнее
27-05-2014 дата публикации

System and method for synchronizing tag history

Номер: US0008735708B1

Systems and methods for music recognition and/or tag history synchronization are described. The system includes, for example, a first device, a second device and a server. The first device is configured to record music from a surrounding environment. The first device wirelessly sends the recorded music to the server for identification. The server is configured to identify the recorded music and to generate a tag corresponding to the identified music. The first tag history is updated to include the tag which includes information corresponding to the identified music. The first device and the second device are registered with the server as part of a particular user account. The server is configured to synchronize a second tag history stored in the second device with the updated first tag history.

Подробнее
15-11-2017 дата публикации

Segmenting content displayed on a computing device into regions based on pixels of a screenshot image that captures the content

Номер: GB0002550236A
Принадлежит:

Methods and apparatus directed to segmenting content displayed on a computing device into regions, to facilitate user interactivity with graphical content by segmenting the content displayed on the computing device into regions via analysis of pixels of a "screenshot image" that captures at least a portion of (e.g., all of) the displayed graphical content. Individual pixels of the screenshot image may be analyzed to determine one or more regions of the screenshot image and to optionally assign a corresponding semantic type to each of the regions. Some implementations are further directed to generating, based on one or more of the regions, interactive content to provide for presentation to the user via the computing device.

Подробнее
12-07-2016 дата публикации

Interest points density control for audio matching

Номер: US0009390719B1

Systems and methods are provided herein relating to audio matching. The density and quality of interest points can be controlled to assure a small but uniform number of high quality interest points. By scoring interest points based on quality and comparing them over time, those interest points that maintain a high quality when compared with a varying number of neighboring interest points can be retained, while those interest points that do not maintain a high quality can be discarded. Thus, the scalability of an audio matching system can be improved while retaining accuracy.

Подробнее
18-08-2015 дата публикации

Inverted client-side fingerprinting and matching

Номер: US0009113202B1
Принадлежит: Google Inc.

A technique for inverted client side fingerprinting and matching provides the benefits of disposable fingerprinting to identify multiple content streams from multiple clients without overloading a fingerprinting system. Rather than tasking a fingerprinting system with the generation and comparison of all fingerprints, the technique distributes some fingerprinting tasks to the clients receiving the content streams. As a result, the fingerprinting system is not bottlenecked by fingerprinting tasks. In one embodiment, the fingerprinting system can provide additional services to the clients.

Подробнее
05-07-2016 дата публикации

Real-time audio recognition using multiple recognizers

Номер: US0009384734B1

An audio recognition service recognizes an audio sample across multiple content types. At least a partial set of results generated by the service are returned to a client while the audio sample is still being recorded and/or transmitted. The client additionally displays the results in real-time or near real-time to the user. The audio sample can be sent over a first HTTP connection and the results can be returned over a second HTTP connection. The audio recognition service further processes check-in selections received from the client for content items indicated by the results. Responsive to receiving the check-in selections, the service determines whether a user is eligible for a reward. If the user is eligible, the service provides the reward.

Подробнее
11-07-2018 дата публикации

Personalized entity repository

Номер: GB0002558472A
Принадлежит:

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client.

Подробнее
06-06-2019 дата публикации

AGGREGATION OF RELATED MEDIA CONTENT

Номер: US20190172495A1
Принадлежит:

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience. 1. A media system , comprising:a memory; and receive first metadata associated with a first video item and second metadata associated with a second video item;', 'determine, based at least on the first metadata and the second metadata, that the first video item and the second video item are associated with a common event;', 'determine, based on at least one common feature between the first metadata and the second metadata, a timeline associated with the common event such that a first time in the first video item is synchronized with a second time in the second video item;', 'aggregate the first video item and the second video item into a composite video item that includes at least a first segment from the first video item and a second segment from the second video item, wherein the first segment and the second segment are arranged in the composite video in synchronization with the determined timeline, and wherein the first time in the first video item and the second time in the second video item both correspond to a third time in the timeline; and', 'cause the composite video item to be presented on a user device in which the first video item and the second video item are simultaneously played back at the third time, wherein the composite video item switches, during the third time in the timeline, from presenting the composite video to the first video item at the first time or the second video at the second time., 'a processor that executes ...

Подробнее
03-11-2016 дата публикации

Audio Data Classification

Номер: US20160322066A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for analyzing an audio sample to determine whether the audio sample includes music audio data. One or more detectors, including a spectral fluctuation detector, a peak repetition detector, and a beat pitch detector, may analyze the audio sample and generate a score that represents whether the audio sample includes music audio data. One or more of the scores may be combined to determine whether the audio sample includes music audio data or non-music audio data. 1. A computer implemented method comprising:receiving, by an audio classification system, an audio sample that is associated with audio data;computing, by the audio classification system, a spectrogram of the received audio sample, the spectrogram including one or more spectral slices;comparing, by the audio classification system for each of the one or more spectral slices in the spectrogram, the respective spectral slice with each of the spectral slices other than the respective spectral slice in the spectrogram;determining, by the audio classification system for each of the one or more spectral slices in the spectrogram, a plurality of similarity values that each represent a similarity between the respective spectral slice and one of the spectral slices other than the respective spectral slice in the spectrogram using the respective comparison, for each of the one or more spectral slices in the spectrogram, of the respective spectral slice with one of the spectral slices other than the respective spectral slice in the spectrogram;determining, by the audio classification system for each time shift between slices in the spectrogram, a mean similarity value based on the similarity values associated with the time shift;generating, by the audio classification system, a projection from the mean similarity values;smoothing, by the audio classification system, the projection;determining, by the audio classification ...

Подробнее
21-07-2015 дата публикации

Adaptive weighting of popular reference content in audio matching

Номер: US0009087124B1

Systems and methods are provided herein relating to audio matching. Adaptive weighting of popular reference content can be used to more efficiently allocate space in a weighted reference index used to match audio signals. An audio reference index can be maintained that contains a set of audio references wherein each audio reference in the set of audio references is associated with a score. A weighted reference index can be generated based on the audio reference index and the score associated with each audio reference wherein respective audio references are up-weighted or up-scored based at least in part of user popularity. The benefits in using adaptive weighting of popular reference content can improve the accuracy of an audio matching system.

Подробнее
11-05-2021 дата публикации

Audio processing with neural networks

Номер: US0011003987B2
Принадлежит: GOOGLE LLC, Google LLC

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio processing using neural networks. One of the systems includes multiple neural network layers, wherein the neural network system is configured to receive time domain features of an audio sample and to process the time domain features to generate a neural network output for the audio sample, the plurality of neural network layers comprising: a frequency-transform (F-T) layer that is configured to apply a transformation defined by a set of F-T layer parameters that transforms a window of time domain features into frequency domain features; and one or more other neural network layers having respective layer parameters, wherein the one or more neural network layers are configured to process frequency domain features to generate a neural network output.

Подробнее
01-09-2022 дата публикации

AGGREGATION OF RELATED MEDIA CONTENT

Номер: US20220277773A1
Принадлежит: Google LLC

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.

Подробнее
16-11-2017 дата публикации

SEGMENTING CONTENT DISPLAYED ON A COMPUTING DEVICE INTO REGIONS BASED ON PIXELS OF A SCREENSHOT IMAGE THAT CAPTURES THE CONTENT

Номер: US20170330336A1
Принадлежит: Google LLC

Methods and apparatus directed to segmenting content displayed on a computing device into regions. The segmenting of content displayed on the computing device into regions is accomplished via analysis of pixels of a “screenshot image” that captures at least a portion of (e.g., all of) the displayed content. Individual pixels of the screenshot image may be analyzed to determine one or more regions of the screenshot image and to optionally assign a corresponding semantic type to each of the regions. Some implementations are further directed to generating, based on one or more of the regions, interactive content to provide for presentation to the user via the computing device.

Подробнее
24-12-2020 дата публикации

AGGREGATION OF RELATED MEDIA CONTENT

Номер: US20200402538A1
Принадлежит: Google LLC

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.

Подробнее
29-10-2019 дата публикации

Frequency based audio analysis using neural networks

Номер: US0010460747B2
Принадлежит: Google LLC, GOOGLE LLC

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for frequency based audio analysis using neural networks. One of the methods includes training a neural network that includes a plurality of neural network layers on training data, wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample, wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output.

Подробнее
01-09-2015 дата публикации

Large-scale speaker identification

Номер: US0009123330B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding ambient sounds, identifying media content that matches the audio data, and a timestamp corresponding to a particular portion of the identified media content, identifying a speaker associated with the particular portion of the identified media content corresponding to the timestamp, and providing information identifying the speaker associated with the particular portion of the identified media content for output.

Подробнее
16-11-2017 дата публикации

FREQUENCY BASED AUDIO ANALYSIS USING NEURAL NETWORKS

Номер: US20170330586A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for frequency based audio analysis using neural networks. One of the methods includes training a neural network that includes a plurality of neural network layers on training data, wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample, wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output. 1. A method for training a neural network that includes a plurality of neural network layers on training data ,wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample,wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output, and obtaining training data comprising, for each of a plurality of training audio samples, frequency domain features of the training audio sample and a known output for the training audio sample; and', 'training the neural network on the training data to adjust the values of the parameters of the other neural ...

Подробнее
25-01-2018 дата публикации

SEGMENT-BASED SPEAKER VERIFICATION USING DYNAMICALLY GENERATED PHRASES

Номер: US20180025734A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
04-04-2019 дата публикации

Identifying Music as a Particular Song

Номер: US20190102144A1
Принадлежит:

In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for indicating a reference song. A computing device stores reference song characterization data that identifies a plurality of audio characteristics for each reference song in a plurality of reference songs. The computing device receives digital audio data that represents audio recorded by a microphone, converts the digital audio data from time-domain format into frequency-domain format, and uses the digital audio data in the frequency-domain format in a music-characterization process. In response to determining that characterization values for the digital audio data are most relevant to characterization values for a particular reference song, the computing device outputs an indication of the particular reference song. 1. A computer-implemented method , comprising:storing, by a computing device, reference song characterization data that identify a plurality of audio characteristics for each reference song in a plurality of reference songs;receiving, by the computing device, digital audio data that represents audio recorded by a microphone;converting, by the computing device, the digital audio data from time-domain format into frequency-domain format;using, by the computing device, the digital audio data in the frequency-domain format in a music-characterization process that outputs a collection of characterization values for the digital audio data, at least some of the characterization values representing values other than binary zeros and binary ones;comparing, by the computing device, the collection of characterization values for the digital audio data to the plurality of characterization values for each of the plurality of reference songs, to select a subset of multiple candidate songs from the plurality of reference songs as those reference songs that correspond to the characterization values for the digital audio data;comparing, by the computing ...

Подробнее
12-09-2013 дата публикации

DYNAMIC DISPLAY OF CONTENT CONSUMPTION BY GEOGRAPHIC LOCATION

Номер: US20130235027A1
Принадлежит: GOOGLE INC.

This disclosure relates to dynamic display of content consumption by geographic location. A recognition component recognizes content being consumed by a set of users, and identifies geographic locations of the consumption and a set of characteristics associated with the consumption. An aggregation component ranks the consumed content based on a subset of the characteristics associated with the consumption, and a display component generates a map displaying subsets of the consumed content as a function of respective rankings and geographic location. 1. A system , comprising:a memory storing computer executable components; and a recognition component that recognizes content being consumed by a set of users, and that identifies geographic locations of consumption and a set of consumption characteristics;', 'an aggregation component that ranks respective consumed content as a function of a subset of the consumption characteristics; and', 'a display component that generates a map displaying subsets of the consumed content as a function of respective rankings and geographic location., 'a processor that executes the following computer executable components stored in the memory2. The system of claim 1 , wherein the set consumption characteristics includes at least one of frequency of consumption claim 1 , devices associated with the consumption claim 1 , applications associated with the consumption claim 1 , or a set of demographic data associated with the set of users.3. The system of claim 1 , wherein the map is a geographic map.4. The system of claim 1 , wherein the map is a heat map.5. The system of claim 1 , wherein the consumed content includes at least one of songs claim 1 , movies claim 1 , television shows claim 1 , internet videos claim 1 , websites claim 1 , video games claim 1 , applications claim 1 , online articles claim 1 , electronic books claim 1 , or online searches.6. The system of claim 1 , further comprising a filter component that filters at least one ...

Подробнее
16-07-2020 дата публикации

PERSONALIZED ENTITY REPOSITORY

Номер: US20200226187A1
Принадлежит:

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client. 1. A method implemented by one or more processors , the method comprising:receiving a screen capture image, the screen capture image capturing content displayed on a display of a mobile device;determining text in the screen capture image by performing text recognition on the screen capture image;processing the text using a trained set prediction model to predict one or more fixed sets of entities based on the text;storing at least one fixed set of entities of the predicted fixed sets of entities in a personalized entity repository in memory on the mobile device; and using the stored at least one fixed set of entities to identify an entity in an additional screen capture image captured at the mobile device; and', 'rendering, at the mobile device, content that is based on the identified entity., 'subsequent to storing the at least one fixed set of entities2. The method of claim 1 , further comprising:determining a location associated with content recognized in a further screen capture image;processing the location using the trained set prediction ...

Подробнее
18-07-2017 дата публикации

Dual model speaker identification

Номер: US0009711148B1
Принадлежит: Google Inc.

A processing system receives an audio signal encoding an utterance and determines that a first portion of the audio signal corresponds to a predefined phrase. The processing system accesses one or more text-dependent models associated with the predefined phrase and determines a first confidence based on the one or more text-dependent models associated with the predefined phrase, the first confidence corresponding to a first likelihood that a particular speaker spoke the utterance. The processing system determines a second confidence for a second portion of the audio signal using one or more text-independent models, the second confidence corresponding to a second likelihood that the particular speaker spoke the utterance. The processing system then determines that the particular speaker spoke the utterance based at least in part on the first confidence and the second confidence.

Подробнее
20-08-2015 дата публикации

REFERENCE SIGNAL SUPPRESSION IN SPEECH RECOGNITION

Номер: US20150235651A1
Принадлежит:

The technology described herein can be embodied in a method that includes receiving a first signal representing an output of a speaker device, and a second signal comprising the output of the speaker device, and an audio signal corresponding to an utterance of a speaker. The method includes aligning one or more segments of the first signal with one or more segments of the second signal. Acoustic features of the one or more segments of the first and second signals are classified to obtain a first set of vectors and a second set of vectors, respectively, the vectors being associated with speech units. The second set is modified using the first set, such that the modified second set represents a suppression of the output of the speaker device in the second signal. A transcription of the utterance of the speaker can be generated from the modified second set of vectors. 1. A computer implemented method comprising:receiving, at a processing system, a first signal representing an output of a speaker device;receiving, at the processing system, a second signal comprising (i) the output of the speaker device and (ii) an audio signal corresponding to an utterance of a speaker;aligning, by the processing system, one or more segments of the first signal with one or more segments of the second signal;classifying acoustic features of the one or more segments of the first signal to obtain a first set of vectors associated with speech units;classifying acoustic features of the one or more segments of the second signal to obtain a second set of vectors associated with speech units;modifying the second set of vectors using the first set of vectors to obtain a modified second set of vectors, wherein the modified second set of vectors represents a suppression of the output of the speaker device in the second signal; andproviding the modified second set of vectors to generate a transcription of the utterance of the speaker.2. The method of claim 1 , wherein the acoustic features comprise ...

Подробнее
29-11-2016 дата публикации

Transformation invariant media matching

Номер: US0009508023B1
Принадлежит: Google Inc., GOOGLE INC

This disclosure relates to transformation invariant media matching. A fingerprinting component can generate a transformation invariant identifier for media content by adaptively encoding the relative ordering of interest points in media content. The interest points can be grouped into subsets, and stretch invariant descriptors can be generated for the subsets based on ratios of coordinates of interest points included in the subsets. The stretch invariant descriptors can be aggregated into a transformation invariant identifier. An identification component compares the identifier against a set of identifiers for known media content, and the media content can be matched or identified as a function of the comparison.

Подробнее
30-06-2015 дата публикации

Methods for enforcing time alignment for speed resistant audio matching

Номер: US0009069849B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

Systems and methods are provided herein relating to speed resistant audio matching. Descriptors can be generated for a received audio signal and matched with reference descriptors. A set of hits for respective reference samples can be generated based on the matching. A histogram can then be generated that correlates probe sample hit time with reference sample hit time. In one implementation, a rolling window can be used in analyzing the histogram allowing for slight variances in the timing between probe sample hits and reference sample hits. In another implementation, the histogram generated can be based on an estimated time stretch of the probe sample. In yet another implementation, a set of histograms can be generated based on a minimum speed change, a maximum speed change, and a speed step. Histograms can be evaluated to determine a most likely matching histogram.

Подробнее
05-11-2019 дата публикации

Object detection using neural network systems

Номер: US0010467493B2
Принадлежит: Google LLC, GOOGLE LLC

Systems, methods, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a system includes initial neural network layers configured to: receive an input image, and process the input image to generate a plurality of first feature maps that characterize the input image; a location generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate data defining a respective location of each of a predetermined number of bounding boxes in the input image, wherein each bounding box identifies a respective first region of the input image; and a confidence score generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate a confidence score for each of the predetermined number of bounding boxes in the input image.

Подробнее
22-02-2022 дата публикации

Determining that audio includes music and then identifying the music as a particular song

Номер: US0011256472B2
Принадлежит: Google LLC

In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.

Подробнее
19-08-2014 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: US0008812320B1
Принадлежит: Google Inc.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
05-07-2018 дата публикации

Machine Learning to Generate Music from Text

Номер: US20180190249A1
Принадлежит:

The present disclosure provides systems and methods that leverage one or more machine-learned models to generate music from text. In particular, a computing system can include a music generation model that is operable to extract one or more structural features from an input text. The one or more structural features can be indicative of a structure associated with the input text. The music generation model can generate a musical composition from the input text based at least in part on the one or more structural features. For example, the music generation model can generate a musical composition that exhibits a musical structure that mimics or otherwise corresponds to the structure associated with the input text. For example, the music generation model can include a machine-learned audio generation model. In such fashion, the systems and methods of the present disclosure can generate music that exhibits a globally consistent theme and/or structure. 1. A computer system to generate music from text , the system comprising:a feature extractor configured to extract one or more structural features from an input text, wherein the one or more structural features are indicative of a structure associated with the input text; anda machine-learned audio generation model configured to obtain the one or more structural features from the feature extractor and generate a musical composition from the input text based at least in part on the one or more structural features;one or more processors; and obtain the input text;', 'input the input text into the feature extractor;', 'receive the one or more structural features as an output of the feature extractor;', 'input the one or more structural features into the machine-learned audio generation model; and', 'receive data descriptive of the musical composition as an output of the machine-learned audio generation model., 'one or more non-transitory computer-readable media that collectively store instructions that, when executed by the ...

Подробнее
12-12-2017 дата публикации

Method for siren detection based on audio samples

Номер: US0009842602B2
Принадлежит: Waymo LLC, WAYMO LLC

The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indicate an identification of the captured sound. Because shorter audio samples can be analyzed more quickly, the system may first process the shortest audio samples in order to quickly identify features of the audio signal. Because longer audio samples contain more information, the system may be able to more accurately identify features in the audio signal in longer audio samples. However, analyzing longer audio signals takes more buffered audio than identifying features in shorter signals. Therefore, the present system attempts to identify features in the shortest audio signals first.

Подробнее
20-10-2016 дата публикации

Segment-Based Speaker Verification Using Dynamically Generated Phrases

Номер: US20160307574A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
14-12-2017 дата публикации

HOLD BACK AND REAL TIME RANKING OF RESULTS IN A STREAMING MATCHING SYSTEM

Номер: US20170357718A1
Принадлежит:

A method includes receiving, from an audio streaming system, a probe audio sample and identifying sufficiently matching reference audio samples based on a first comparison of a first portion of the probe audio sample to reference audio samples. The method also includes, in response to determining that the sufficiently matching reference audio samples do not meet a predetermined score threshold, retaining the sufficiently matching reference audio samples, identifying additional matching reference audio samples based on a second comparison a second portion of the probe audio sample to the reference audio samples, and outputting at least one of the reference audio samples based on the first comparison and the second comparison. 1. A method , comprising: receiving, from an audio streaming system, a probe audio sample;', 'identifying sufficiently matching reference audio samples based on a first comparison of a first portion of the probe audio sample to reference audio samples; and', retaining the sufficiently matching reference audio samples;', 'identifying additional matching reference audio samples based on a second comparison a second portion of the probe audio sample to the reference audio samples; and', 'outputting at least one of the reference audio samples based on the first comparison and the second comparison., 'in response to determining that the sufficiently matching reference audio samples do not meet a predetermined score threshold], 'using a processor to execute computer executable instructions stored on a non-transitory computer readable medium to perform operations comprising2. The method of claim 1 , further comprising:assigning first respective ranking scores to the sufficiently matching reference audio samples based on the first comparison, wherein the first comparison comprises a comparison of feature vectors of the first portion of the probe audio sample to respective first feature vectors of the reference audio samples;assigning second respective ...

Подробнее
05-09-2017 дата публикации

Hold back and real time ranking of results in a streaming matching system

Номер: US0009754026B2
Принадлежит: GOOGLE INC., GOOGLE INC, Google Inc.

A matching system receives probe audio samples for comparison to references of a data store. Comparisons are generated between a first segment of a probe audio sample and corresponding time segments of a plurality of reference audio samples to identify a plurality of sufficiently matching reference audio samples based upon a first set of consistency scores. Matching references are retained, unless they meet a score threshold. Comparisons are continually generated with a second segment of the probe audio sample and corresponding time segments of the sufficiently matching reference audio samples to generate a second set of consistency scores. The retained results are outputted based on the first and second set of consistency scores.

Подробнее
05-07-2016 дата публикации

Compressed patch features for audio fingerprinting

Номер: US0009384273B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

Systems and methods are provided herein relating to audio matching. In addition to interest points, localized patches surrounding interest points can be used as additional discriminative information. The patches can be compressed to increase scalability while retaining discriminative information related to the localized region within the patch. Compressed patches related to interest points of an audio sample can be compared to compressed patches related to interest points of a reference sample to determine whether the two samples are a match.

Подробнее
10-01-2017 дата публикации

Text-dependent speaker identification

Номер: US0009542948B2
Принадлежит: Google Inc., GOOGLE INC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker verification. The methods, systems, and apparatus include actions of inputting speech data that corresponds to a particular utterance to a first neural network and determining an evaluation vector based on output at a hidden layer of the first neural network. Additional actions include obtaining a reference vector that corresponds to a past utterance of a particular speaker. Further actions include inputting the evaluation vector and the reference vector to a second neural network that is trained on a set of labeled pairs of feature vectors to identify whether speakers associated with the labeled pairs of feature vectors are the same speaker. More actions include determining, based on an output of the second neural network, whether the particular utterance was likely spoken by the particular speaker.

Подробнее
29-09-2015 дата публикации

Using local gradients for pitch resistant audio matching

Номер: US0009148738B1

System and methods for characterizing interest points within a descriptor are disclosed herein. The systems include generating a set of interest points related to an audio sample. A set of gradients relating to respective interest points in the set of interest points can be generated. A set of descriptors can then be generated based upon the set of interest points and the set of gradients and used in comparison to reference descriptors to identify the audio sample. The disclosed systems and methods provide for an audio matching system robust to pitch-shift distortion by using gradients that characterize the time-frequency neighborhood around an interest point rather than solely relying on interest points themselves. Thus, the disclosed system and methods result in more accurate audio identification.

Подробнее
15-10-2015 дата публикации

TEXT-DEPENDENT SPEAKER IDENTIFICATION

Номер: US20150294670A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker verification. The methods, systems, and apparatus include actions of inputting speech data that corresponds to a particular utterance to a first neural network and determining an evaluation vector based on output at a hidden layer of the first neural network. Additional actions include obtaining a reference vector that corresponds to a past utterance of a particular speaker. Further actions include inputting the evaluation vector and the reference vector to a second neural network that is trained on a set of labeled pairs of feature vectors to identify whether speakers associated with the labeled pairs of feature vectors are the same speaker. More actions include determining, based on an output of the second neural network, whether the particular utterance was likely spoken by the particular speaker. 1. A computer-method comprising:inputting speech data that corresponds to a particular utterance to a first neural network;determining an evaluation vector based on output at a hidden layer of the first neural network;obtaining a reference vector that corresponds to a past utterance of a particular speaker;inputting the evaluation vector and the reference vector to a second neural network that is trained on a set of labeled pairs of feature vectors to identify whether speakers associated with the labeled pairs of feature vectors are the same speaker; anddetermining, based on an output of the second neural network, whether the particular utterance was likely spoken by the particular speaker.2. The method of claim 1 , wherein speakers associated with one or more of the labeled pairs of feature vectors are different speakers.3. The method of claim 1 , wherein a speaker associated with one or more of the labeled pairs of feature vectors is the particular speaker.4. The method of claim 1 , comprising:inputting the set of labeled pairs of feature vectors to a neural ...

Подробнее
18-08-2015 дата публикации

Real-time audio recognition protocol

Номер: US0009111537B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

An audio recognition service recognizes an audio sample across multiple content types. At least a partial set of results generated by the service are returned to a client while the audio sample is still being recorded and/or transmitted. The client additionally displays the results in real-time or near real-time to the user. The audio sample can be sent over a first HTTP connection and the results can be returned over a second HTTP connection. The audio recognition service further processes check-in selections received from the client for content items indicated by the results. Responsive to receiving the check-in selections, the service determines whether a user is eligible for a reward. If the user is eligible, the service provides the reward.

Подробнее
08-09-2015 дата публикации

Min/max filter for audio matching

Номер: US0009129015B1

Systems and methods are provided herein relating to audio matching. Descriptors can be generated for a received audio signal and matched with reference descriptors. Potential matching reference samples can then be filtered based on, at least in part, a number of hits, a match threshold, and a window size. As more hits are accumulated for a reference sample, the more likely the reference sample is to pass through the filter. Eliminating potential false positive matches before performing more computational demanding matching algorithms can increase efficiency within an audio matching system.

Подробнее
09-05-2024 дата публикации

Machine Learning Based Enhancement of Audio for a Voice Call

Номер: US20240153514A1
Принадлежит: Google LLC

Apparatus and methods related to enhancement of audio content are provided. An example method includes receiving, by a computing device and via a communications network interface, a compressed audio data frame, wherein the compressed audio data frame is received after transmission over a communications network, The method further includes decompressing the compressed audio data frame to extract an audio waveform. The method also includes predicting, by applying a neural network to the audio waveform, an enhanced version of the audio waveform, wherein the neural network has been trained on (i) a ground truth sample comprising unencoded audio waveforms prior to compression by an audio encoder, and (ii) a training dataset comprising decoded audio waveforms after compression of the unencoded audio waveforms by the audio encoder. The method additionally includes providing, by an audio output component of the computing device, the enhanced version of the audio waveform.

Подробнее
20-10-2020 дата публикации

Determining that audio includes music and then identifying the music as a particular song

Номер: US0010809968B2
Принадлежит: Google LLC, GOOGLE LLC

In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.

Подробнее
18-07-2019 дата публикации

DYNAMIC DISPLAY OF CONTENT CONSUMPTION BY GEOGRAPHIC LOCATION

Номер: US20190220473A1
Принадлежит: Google LLC

This disclosure relates to a method for providing a display of content consumption by geographic location. The method includes storing, in a data store, geographic locations of a set of users consuming content items and consumption characteristics of the content items, wherein the content items are identified by user devices at the geographic locations while the content items are played by source devices external to the user devices, and wherein information about a content item of the identified content items, which is consumed by a user of the set of users, is transmitted to the server system by a user device of the user. The method also includes extracting, from the data store, geographic locations of consumption and a set of consumption characteristics of each content item of the identified content items, wherein the set of consumption characteristics comprises a title and times of consumption of the content item by the set of users. The method further includes filtering the identified content items based on at least one filter that pertains to times of content consumption by the set of users, ranking the filtered content items based on the geographic locations of consumptions and consumption statistics, selecting, from the ranked content items, popular content items at particular geographic locations of consumption and over a time period, and generating a geographic map displaying to a user each of the selected popular content items at one or more of the particular geographic locations of consumption, the map to display a title and an icon to represent each of the selected popular content items

Подробнее
19-08-2021 дата публикации

AUDIO PROCESSING WITH NEURAL NETWORKS

Номер: US20210256379A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio processing using neural networks. One of the systems includes multiple neural network layers, wherein the neural network system is configured to receive time domain features of an audio sample and to process the time domain features to generate a neural network output for the audio sample, the plurality of neural network layers comprising: a frequency-transform (F-T) layer that is configured to apply a transformation defined by a set of F-T layer parameters that transforms a window of time domain features into frequency domain features; and one or more other neural network layers having respective layer parameters, wherein the one or more neural network layers are configured to process frequency domain features to generate a neural network output.

Подробнее
26-05-2016 дата публикации

DYNAMIC DISPLAY OF CONTENT CONSUMPTION BY GEOGRAPHIC LOCATION

Номер: US20160147789A1
Принадлежит:

This disclosure relates to dynamic display of content consumption by geographic location. A processor recognizes content being consumed by a set of users, and identifies geographic locations of the consumption and a set of characteristics associated with the consumption. The processor further determines at least one filter for a user of the set of users and filters the set of consumption characteristics based on the at least one filter. 1. A system , comprising:a memory; and recognize content being consumed by a set of users, and identify geographic locations of consumption and a set of consumption characteristics;', 'determine at least one filter for a user of the set of users;', 'filter the set of consumption characteristics based on the at least one filter;', 'rank respective consumed content based on a filtered set of consumption characteristics; and', 'generate a map displaying to the user subsets of the consumed content according to respective rankings and geographic location., 'a processor, coupled to the memory, to2. The system of claim 1 , wherein the set of consumption characteristics comprises at least one of frequency of consumption claim 1 , devices associated with the consumption claim 1 , applications associated with the consumption claim 1 , or a set of demographic data associated with the set of users.3. The system of claim 1 , wherein the map is a geographic map.4. The system of claim 1 , wherein the map is a heat map.5. The system of claim 1 , wherein the consumed content comprises at least one of songs claim 1 , movies claim 1 , television shows claim 1 , internet videos claim 1 , websites claim 1 , video games claim 1 , applications claim 1 , online articles claim 1 , electronic books claim 1 , or online searches.6. The system of claim 1 , wherein the processor is further to filter consumed content to be ranked.7. The system of claim 6 , wherein the processor is further to receive a set of user preferences regarding the consumed content to be ...

Подробнее
09-09-2014 дата публикации

Intelligent interest point pruning for audio matching

Номер: US0008831763B1

System and methods for intelligently pruning interest points are disclosed herein. The systems include generating a plurality of distorted audio samples and associated distorted interest points based upon a clean audio sample. Interest points that are common to sets of distorted interest points are retained with interest points not robust to distortion discarded. The disclosed systems and methods therefore can provide for a scalable audio matching solution by eliminating interest points in reference sample fingerprints. The set of pruned interest points are robust to distortion and the benefits of both scalability and accuracy can be had.

Подробнее
31-01-2023 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: US0011568879B2
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
25-02-2021 дата публикации

Self-Supervised Audio Representation Learning for Mobile Devices

Номер: US20210056980A1
Принадлежит: Google LLC

Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

Подробнее
18-01-2018 дата публикации

Method For Siren Detection Based On Audio Samples

Номер: US20180018981A1
Принадлежит:

The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indicate an identification of the captured sound. Because shorter audio samples can be analyzed more quickly, the system may first process the shortest audio samples in order to quickly identify features of the audio signal. Because longer audio samples contain more information, the system may be able to more accurately identify features in the audio signal in longer audio samples. However, analyzing longer audio signals takes more buffered audio than identifying features in shorter signals. Therefore, the present system attempts to identify features in the shortest audio signals first. 1. An apparatus comprising:an audio unit configured to receive an audio signal;a control unit configured to operate the apparatus; and process the audio signal from the audio unit to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample and the second windowed audio sample each have a different length of time;', 'determine a likelihood that the first windowed audio sample comprises a siren signal based on a detection of a group of features in the first windowed audio signal associated with a siren-classification profile;', 'based on the first windowed audio sample indicating a likelihood of a siren signal below a threshold, determine a likelihood that the second windowed audio sample includes a siren signal based on a detection of a group of features of the second windowed audio signal with the siren-classification profile; and', 'alter control of the apparatus by the control unit based ...

Подробнее
09-08-2016 дата публикации

Noise based interest point density pruning

Номер: US0009411884B1
Принадлежит: Google Inc., GOOGLE INC

Systems and methods for noise based interest point density pruning are disclosed herein. The systems include determining an amount of noise in an audio sample and adjusting the amount of interest points within an audio sample fingerprint based on the amount of noise. Samples containing high amounts of noise correspondingly generate fingerprints with more interest points. The disclosed systems and methods allow reference fingerprints to be reduced in size while increasing the size of sample fingerprints. The benefits in scalability do not compromise the accuracy of an audio matching system using noise based interest point density pruning.

Подробнее
05-03-2020 дата публикации

SEGMENT-BASED SPEAKER VERIFICATION USING DYNAMICALLY GENERATED PHRASES

Номер: US20200075029A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
15-11-2022 дата публикации

Self-supervised audio representation learning for mobile devices

Номер: US0011501787B2
Принадлежит: GOOGLE LLC, Google LLC

Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

Подробнее
04-12-2018 дата публикации

Segment content displayed on a computing device into regions based on pixels of a screenshot image that captures the content

Номер: US0010147197B2
Принадлежит: GOOGLE LLC, Google LLC

Methods and apparatus directed to segmenting content displayed on a computing device into regions. The segmenting of content displayed on the computing device into regions is accomplished via analysis of pixels of a “screenshot image” that captures at least a portion of (e.g., all of) the displayed content. Individual pixels of the screenshot image may be analyzed to determine one or more regions of the screenshot image and to optionally assign a corresponding semantic type to each of the regions. Some implementations are further directed to generating, based on one or more of the regions, interactive content to provide for presentation to the user via the computing device.

Подробнее
03-07-2014 дата публикации

HOLD BACK AND REAL TIME RANKING OF RESULTS IN A STREAMING MATCHING SYSTEM

Номер: US20140185815A1
Принадлежит: GOOGLE INC.

A matching system receives probe audio samples for comparison to references of a data store. Comparisons are generated to determine a sufficient match for a portion or a first amount of the probe sample. Ranking scores are assigned to the resulting match references. The match references are retained, unless meeting a score threshold. Comparisons are continually generated with second amounts of the probe sample and the retained references are updated with further matching references assigned ranking scores. The retained results are merged and determined to satisfy a score threshold for release as outputted results for matching references.

Подробнее
01-10-2015 дата публикации

SEGMENT-BASED SPEAKER VERIFICATION USING DYNAMICALLY GENERATED PHRASES

Номер: US20150279374A1
Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user. 1. (canceled)2. A computer-implemented method , comprising:providing a speaker identification verification phrase;obtaining audio data representing a candidate user speaking the speaker identification verification phrase;obtaining, for each of multiple subwords associated with the speaker identification phrase, sample acoustic features that are derived from the audio data representing the candidate user speaking the speaker identification verification phrase;obtaining, for each of the multiple subwords associated with the speaker identification verification phrase, reference acoustic features that (i) are stored in a collection of acoustic features for a target user, and (ii) are derived from audio data of the target user speaking one or more words that include the subword;determining, for each of the multiple subwords associated with the speaker identification verification phrase, that the sample acoustic features are associated with the reference acoustic features;in response to determining that the sample acoustic features are associated with the reference acoustic features, identifying the candidate user as the target user;obtaining, for each of one or ...

Подробнее
08-12-2015 дата публикации

Incentive-based check-in

Номер: US0009208225B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

Apparatus, systems and methods provide incentive-based usage of an audio recognition system. In an aspect, a system is provided that includes a query component configured to receive an audio sample from a device and a recognition component configured to determine an identification of the audio sample. The system further includes a reward component configured to identify a reward associated with the identification of the audio sample, wherein the query component is further configured to provide a query result to the device, the query result comprising the identification of the audio sample and the reward associated therewith.

Подробнее
23-09-2021 дата публикации

SEGMENT-BASED SPEAKER VERIFICATION USING DYNAMICALLY GENERATED PHRASES

Номер: US20210295850A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
08-03-2016 дата публикации

Interface for real-time audio recognition

Номер: US0009280599B1
Принадлежит: Google Inc.

An audio recognition service recognizes an audio sample across multiple content types. At least a partial set of results generated by the service are returned to a client while the audio sample is still being recorded and/or transmitted. The client additionally displays the results in real-time or near real-time to the user. The audio sample can be sent over a first HTTP connection and the results can be returned over a second HTTP connection. The audio recognition service further processes check-in selections received from the client for content items indicated by the results. Responsive to receiving the check-in selections, the service determines whether a user is eligible for a reward. If the user is eligible, the service provides the reward.

Подробнее
06-11-2018 дата публикации

Hold back and real time ranking of results in a streaming matching system

Номер: US0010120934B2
Принадлежит: GOOGLE LLC, GOOGLE INC, GOOGLE INC.

A method includes receiving, from an audio streaming system, a probe audio sample and identifying sufficiently matching reference audio samples based on a first comparison of a first portion of the probe audio sample to reference audio samples. The method also includes, in response to determining that the sufficiently matching reference audio samples do not meet a predetermined score threshold, retaining the sufficiently matching reference audio samples, identifying additional matching reference audio samples based on a second comparison a second portion of the probe audio sample to the reference audio samples, and outputting at least one of the reference audio samples based on the first comparison and the second comparison.

Подробнее
22-08-2017 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: US0009741348B2
Принадлежит: Google Inc., GOOGLE INC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
17-05-2022 дата публикации

Aggregation of related media content

Номер: US0011335380B2
Принадлежит: Google LLC

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.

Подробнее
12-08-2014 дата публикации

Real-time audio recognition protocol

Номер: US0008805683B1
Принадлежит: Google Inc.

An audio recognition service recognizes an audio sample across multiple content types. At least a partial set of results generated by the service are returned to a client while the audio sample is still being recorded and/or transmitted. The client additionally displays the results in real-time or near real-time to the user. The audio sample can be sent over a first HTTP connection and the results can be returned over a second HTTP connection. The audio recognition service further processes check-in selections received from the client for content items indicated by the results. Responsive to receiving the check-in selections, the service determines whether a user is eligible for a reward. If the user is eligible, the service provides the reward.

Подробнее
27-05-2014 дата публикации

Transformation invariant media matching

Номер: US0008738633B1

This disclosure relates to transformation invariant media matching. A fingerprinting component can generate a transformation invariant identifier for media content by adaptively encoding the relative ordering of interest points in media content. The interest points can be grouped into subsets, and stretch invariant descriptors can be generated for the subsets based on ratios of coordinates of interest points included in the subsets. The stretch invariant descriptors can be aggregated into a transformation invariant identifier. An identification component compares the identifier against a set of identifiers for known media content, and the media content can be matched or identified as a function of the comparison.

Подробнее
29-12-2015 дата публикации

Dynamic display of content consumption by geographic location

Номер: US0009224118B2

This disclosure relates to dynamic display of content consumption by geographic location. A recognition component recognizes content being consumed by a set of users, and identifies geographic locations of the consumption and a set of characteristics associated with the consumption. An aggregation component ranks the consumed content based on a subset of the characteristics associated with the consumption, and a display component generates a map displaying subsets of the consumed content as a function of respective rankings and geographic location.

Подробнее
04-08-2015 дата публикации

Ensemble interest point detection for audio matching

Номер: US0009098576B1

Systems and methods for audio matching are disclosed herein. In one embodiment, a system includes both interest point mixing and fingerprint mixing by using multiple interest point detection methods in parallel. Since multiple interest point detection methods are used in parallel, accuracy of audio matching is improved across a wide variety of audio signals. In addition the scalability of the disclosed audio matching system is increased by matching the fingerprint of an audio sample with a fingerprint of a reference sample versus matching an entire spectrogram. Accordingly, a more accurate and more general solution to audio matching can be accomplished.

Подробнее
24-12-2020 дата публикации

Determining that Audio Includes Music and then Identifying the Music as a Particular Song

Номер: US20200401367A1
Принадлежит: Google LLC

In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song. 1. A computer-implemented method , comprising:storing, by a computing device, reference song characterization data that identify a plurality of audio characteristics for each reference song in a plurality of reference songs;receiving, by the computing device, digital audio data that represents audio recorded by a microphone;determining, by a first processor of the computing device and using a music determination process, whether the digital audio data represents music, wherein the determining includes converting the digital audio data from a time-domain format into a first frequency-domain format;recognizing, by a second processor of the computing device after determining that the digital audio data represents music, that the digital audio data represents a particular reference song from among the plurality of reference songs, wherein the recognizing includes converting the digital audio data from the time-domain format into a second frequency-domain format, wherein the first processor and the second processor are distinct hardware processors included in the computing device, the first processor operating at a lower voltage than the second processor; andoutputting, by the computing device in response to recognizing that the digital audio data represents a particular reference song from among the plurality of reference songs, an indication of the particular reference song.2. The computer-implemented method of claim 1 , wherein the plurality of reference songs includes at least ten ...

Подробнее
23-08-2016 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: US0009424846B2
Принадлежит: Google Inc., GOOGLE INC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
01-08-2023 дата публикации

Personalized entity repository

Номер: US0011716600B2
Принадлежит: GOOGLE LLC

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client.

Подробнее
29-12-2015 дата публикации

Unified recognition of speech and music

Номер: US0009224385B1
Принадлежит: GOOGLE INC., GOOGLE INC

Methods, systems, and computer programs are presented for unified recognition of speech and music. One method includes an operation for starting an audio recognition mode by a computing device while receiving an audio stream. Segments of the audio stream are analyzed as the audio stream is received, where the analysis includes simultaneous checking for speech and music. Further, the method includes an operation for determining a first confidence score for speech and a second confidence score for music. As the audio stream is received, additional segments are analyzed until the end of the audio stream or until the first and second confidence scores indicate that the audio stream has been identified as speech or music. Further, results are presented on a display based on the identification of the audio stream, including text entered if the audio stream was speech or song information if the audio stream was music.

Подробнее
13-08-2019 дата публикации

Machine learning to generate music from text

Номер: US0010380983B2
Принадлежит: Google LLC, GOOGLE INC, GOOGLE LLC, Google Inc.

The present disclosure provides systems and methods that leverage one or more machine-learned models to generate music from text. In particular, a computing system can include a music generation model that is operable to extract one or more structural features from an input text. The one or more structural features can be indicative of a structure associated with the input text. The music generation model can generate a musical composition from the input text based at least in part on the one or more structural features. For example, the music generation model can generate a musical composition that exhibits a musical structure that mimics or otherwise corresponds to the structure associated with the input text. For example, the music generation model can include a machine-learned audio generation model. In such fashion, the systems and methods of the present disclosure can generate music that exhibits a globally consistent theme and/or structure.

Подробнее
23-02-2016 дата публикации

Audio matching using time alignment, frequency alignment, and interest point overlap to filter false positives

Номер: US0009268845B1
Принадлежит: GOOGLE INC.

Systems and methods audio matching using interest point overlap are disclosed herein. The systems include determining at least one matching reference segment based on a probe segment. Interest points for both the at least one matching reference segment and the probe segment can be generated. Probe segment interest points and matching reference segment interest points can be time aligned and frequency aligned. A count can be generated based on a number of overlapping interest points between each set of reference interest points and the set of probe segment interest points. The disclosed systems and methods allow false positive reference to be identified and eliminated based on the count. The benefits in eliminating false positive matches improve the accuracy of an audio matching system.

Подробнее
16-11-2017 дата публикации

AUDIO PROCESSING WITH NEURAL NETWORKS

Номер: US20170330071A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio processing using neural networks. One of the systems includes multiple neural network layers, wherein the neural network system is configured to receive time domain features of an audio sample and to process the time domain features to generate a neural network output for the audio sample, the plurality of neural network layers comprising: a frequency-transform (F-T) layer that is configured to apply a transformation defined by a set of F-T layer parameters that transforms a window of time domain features into frequency domain features; and one or more other neural network layers having respective layer parameters, wherein the one or more neural network layers are configured to process frequency domain features to generate a neural network output.

Подробнее
19-01-2023 дата публикации

GENERATING AUDIO WAVEFORMS USING ENCODER AND DECODER NEURAL NETWORKS

Номер: US20230013370A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.

Подробнее
04-04-2019 дата публикации

Determining that Audio Includes Music and then Identifying the Music as a Particular Song

Номер: US20190102458A1
Принадлежит:

In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song. 1. A computer-implemented method , comprising:storing, by a computing device, reference song characterization data that identify a plurality of audio characteristics for each reference song in a plurality of reference songs;receiving, by the computing device, digital audio data that represents audio recorded by a microphone;determining, by the computing device and using a music determination process, whether the digital audio data represents music;recognizing, by the computing device after determining that the digital audio data represents music, that the digital audio data represents a particular reference song from among the plurality of reference songs; andoutputting, by the computing device in response to determining that the digital audio data represents a particular reference song from among the plurality of reference songs, an indication of the particular reference song.2. The computer-implemented method of claim 1 , wherein the plurality of reference songs includes at least ten thousand reference songs claim 1 , such that the reference song characterization data identify audio characteristics for the at least ten thousand reference songs.3. The computer-implemented method of claim 1 , wherein reference song characterization values for the reference songs in the plurality of reference songs are limited to a binary one or a binary zero claim 1 , such that each characterization value is limited to a binary one or a binary zero.4. The computer-implemented method of claim 1 , ...

Подробнее
08-01-2019 дата публикации

Personalized entity repository

Номер: US0010178527B2
Принадлежит: GOOGLE LLC, GOOGLE INC, GOOGLE INC.

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client.

Подробнее
16-03-2023 дата публикации

Self-Supervised Audio Representation Learning for Mobile Devices

Номер: US20230085596A1
Принадлежит: Google LLC

Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

Подробнее
09-04-2019 дата публикации

Speaker identification using a text-independent model and a text-dependent model

Номер: US0010255922B1
Принадлежит: Google LLC, GOOGLE LLC

In some implementations, a single registration utterance that includes a hotword and an introduction declaration is received. A user is registered, including training a text-dependent speaker identification model using the hotword of the single registration utterance and training a text-independent speaker identification model using the introduction declaration of the single registration utterance. An authentication utterance by the user that includes the hotword and a voice command that is different from the introduction declaration is received. The user is authenticated, including processing the hotword of the authentication utterance using the text-dependent speaker identification model and processing the voice command using the text-independent speaker identification model. Access to an access-controlled personal resource of the user is provided without requiring the user to submit any further authentication information other than the single registration utterance by the user that includes ...

Подробнее
14-04-2015 дата публикации

Melody recognition systems

Номер: US0009008490B1
Принадлежит: Google Inc., GOOGLE INC, GOOGLE INC.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.

Подробнее
17-06-2021 дата публикации

Training Keyword Spotters

Номер: US20210183367A1
Принадлежит: Google LLC

A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

Подробнее
05-02-2019 дата публикации

Aggregation on related media content

Номер: US0010199069B1
Принадлежит: Google LLC, GOOGLE LLC

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.

Подробнее
17-01-2019 дата публикации

OBJECT DETECTION USING NEURAL NETWORK SYSTEMS

Номер: US20190019050A1
Принадлежит:

Systems, methods, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a system includes initial neural network layers configured to: receive an input image, and process the input image to generate a plurality of first feature maps that characterize the input image; a location generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate data defining a respective location of each of a predetermined number of bounding boxes in the input image, wherein each bounding box identifies a respective first region of the input image; and a confidence score generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate a confidence score for each of the predetermined number of bounding boxes in the input image. 1. A system comprising: receive an input image, and', the plurality of first feature maps are each of the same size,', 'each of the plurality of first feature maps have a respective value at each of a plurality of first feature map locations, and', 'each of the plurality of first feature map locations correspond to a respective first region in the input image;, 'process the input image to generate a plurality of first feature maps that characterize the input image, wherein], 'one or more initial neural network layers, wherein the one or more initial neural network layers are configured to receive a representation of the first plurality of feature maps, and', 'perform a convolution on the representation of the first plurality of feature maps to generate data defining a respective location of each of a predetermined number of bounding boxes in the input image, wherein each bounding box identifies a respective first region of the input image; and, 'a location generating convolutional neural network layer, wherein the location generating ...

Подробнее
27-04-2017 дата публикации

Personalized Entity Repository

Номер: US20170118576A1
Принадлежит:

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client. 1. A mobile device comprising:a display device;a personalized entity repository stored in memory, the personalized entity repository including a plurality of fixed sets of entities from an entity repository stored at a server, wherein each fixed set has a respective identifier and includes information about the entities in the set;at least one processor; and identify fixed sets of the entity repository that are relevant to a user of the mobile device based on context associated with the mobile device,', 'rank the fixed sets by relevancy;', 'determine selected sets from the identified fixed sets using the rank and set usage parameters applicable to the user; and', 'update the personalized entity repository using the selected sets., 'memory storing instructions that, when executed by the at least one processor, cause the mobile device to2. The mobile device of claim 1 , wherein updating the personalized entity repository occurs responsive to determining that a first fixed set of the identified fixed sets does not exist in the personalized entity ...

Подробнее
16-01-2018 дата публикации

Segmenting content displayed on a computing device into regions based on pixels of a screenshot image that captures the content

Номер: US0009870623B2
Принадлежит: GOOGLE LLC, GOOGLE INC, Google Inc.

Methods and apparatus directed to segmenting content displayed on a computing device into regions. The segmenting of content displayed on the computing device into regions is accomplished via analysis of pixels of a “screenshot image” that captures at least a portion of (e.g., all of) the displayed content. Individual pixels of the screenshot image may be analyzed to determine one or more regions of the screenshot image and to optionally assign a corresponding semantic type to each of the regions. Some implementations are further directed to generating, based on one or more of the regions, interactive content to provide for presentation to the user via the computing device.

Подробнее
02-02-2023 дата публикации

AUTOMATED MINING OF REAL-WORLD AUDIO TRAINING DATA

Номер: US20230033103A1
Автор: Dominik Roblek
Принадлежит:

Methods, systems, and apparatus, for generated labeled training examples for machine learning. In one aspect, a method includes receiving sets of audio recordings by a user device. For each set of audio recordings, each audio recording in the set is recorded over a respective separate microphone in the user device during a particular time interval, and each particular time interval is different for each set of audio recordings. For each set of audio recordings, a detector determines whether an audio recording in the set of audio recordings includes a particular audio feature, and whether another one of the audio recordings does not include the particular audio feature. For each set of audio recordings determined to include an audio recording that includes the particular audio feature and to include another audio recording that does not include the particular audio feature, a labeled training example is generated.

Подробнее
01-09-2020 дата публикации

Identifying music as a particular song

Номер: US0010761802B2
Принадлежит: Google LLC

In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for indicating a reference song. A computing device stores reference song characterization data that identifies a plurality of audio characteristics for each reference song in a plurality of reference songs. The computing device receives digital audio data that represents audio recorded by a microphone, converts the digital audio data from time-domain format into frequency-domain format, and uses the digital audio data in the frequency-domain format in a music-characterization process. In response to determining that characterization values for the digital audio data are most relevant to characterization values for a particular reference song, the computing device outputs an indication of the particular reference song.

Подробнее
26-04-2018 дата публикации

SEGMENTING CONTENT DISPLAYED ON A COMPUTING DEVICE INTO REGIONS BASED ON PIXELS OF A SCREENSHOT IMAGE THAT CAPTURES THE CONTENT

Номер: US20180114326A1
Принадлежит:

Methods and apparatus directed to segmenting content displayed on a computing device into regions. The segmenting of content displayed on the computing device into regions is accomplished via analysis of pixels of a “screenshot image” that captures at least a portion of (e.g., all of) the displayed content. Individual pixels of the screenshot image may be analyzed to determine one or more regions of the screenshot image and to optionally assign a corresponding semantic type to each of the regions. Some implementations are further directed to generating, based on one or more of the regions, interactive content to provide for presentation to the user via the computing device. 1. A method , comprising:capturing, by one or more processors of a computing device, a screenshot image that captures at least a portion of a display provided to a user by the computing device;segmenting the screenshot image into at least a first region and a second region, the segmenting being by one or more of the processors of the computing device and being based on a plurality of pixels of the screenshot image;determining at least one first characteristic of the first region, the determining being by one or more of the processors of the computing device and being based on one or more of: a plurality of pixels of the first region, a size of the first region, and a position of the first region;determining at least one second characteristic of the second region, the determining being by one or more of the processors of the computing device and being based on one or more of: a plurality of pixels of the second region, a size of the second region, and a position of the second region; andproviding, by one or more of the processors of the computing device, a plurality of the pixels of the first region to a content recognition engine based on the first region having the first characteristic;wherein the pixels of the second region are not provided to any content recognition engine based on the second ...

Подробнее
01-12-2015 дата публикации

Magnitude ratio descriptors for pitch-resistant audio matching

Номер: US0009202472B1

Systems and methods for generating unique pitch-resistant descriptors for audio clips are provided. In one or more embodiments, a descriptor for an audio clip is generated as a function of relative magnitudes between interest points within the audio clip's time-frequency representation. A number of techniques for leveraging the relative magnitudes to generate descriptors are considered. These techniques include ordering of interest points as a function of ascending or descending magnitude, creation of binary vectors based on magnitude comparisons between pairs of points, and calculation of quantized magnitude ratios between pairs of points. Descriptors generated based on relative magnitudes according to the techniques disclosed herein are relatively invariant to common transformations to the original audio clip, such as pitch shifting, time stretching, global volume changes, equalization, and/or dynamic range compression.

Подробнее
10-12-2019 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: US0010504524B2
Принадлежит: Google LLC, GOOGLE LLC

A computer-implemented method includes receiving a request for a verification phrase for verifying an identity of a user, and in response to receiving the request for the verification phrase, identifying subwords to be included in the verification phrase. The method also includes, in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase, based on a predetermined criteria. The method also includes providing the verification phrase as a response to the request for the verification phrase, wherein identifying subwords to be included in the verification phrase includes identifying candidate subwords, for which no stored acoustic data is associated with the user, as one or more of the subwords to be included in the verification phrase.

Подробнее
10-08-2021 дата публикации

Personalized entity repository

Номер: US0011089457B2
Принадлежит: GOOGLE LLC, Google LLC

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client.

Подробнее
24-05-2022 дата публикации

Training keyword spotters

Номер: US0011341954B2
Принадлежит: Google LLC

A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

Подробнее
05-11-2015 дата публикации

GENERATING CORRELATION SCORES

Номер: US20150317281A1
Принадлежит: Google Inc.

A computer-implemented method includes obtaining first and second binary vectors. For each of a plurality of vector locations in a first of j words in the first binary vector, the method includes shifting the binary values for the second binary vector so that a particular one of the binary values in the second binary vector is located at a vector location in a first of the k words in the second binary vector that matches the vector location in the first of j words in the first binary vector. For each of the j words in the first binary vector, the method includes aligning the second binary vector with the word in the first binary vector and determining a binary correlation score. A similarity of the first binary vector and the second binary vector can be determined based at least on one or more of the determined binary correlation scores.

Подробнее
13-10-2015 дата публикации

Aggregation of related media content

Номер: US0009159364B1

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.

Подробнее
12-08-2014 дата публикации

Noise based interest point density pruning

Номер: US0008805560B1
Принадлежит: Google Inc.

Systems and methods for noise based interest point density pruning are disclosed herein. The systems include determining an amount of noise in an audio sample and adjusting the amount of interest points within an audio sample fingerprint based on the amount of noise. Samples containing high amounts of noise correspondingly generate fingerprints with more interest points. The disclosed systems and methods allow reference fingerprints to be reduced in size while increasing the size of sample fingerprints. The benefits in scalability do not compromise the accuracy of an audio matching system using noise based interest point density pruning.

Подробнее
08-09-2020 дата публикации

Aggregation of related media content

Номер: US0010770112B2
Принадлежит: Google LLC

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.

Подробнее
02-03-2017 дата публикации

Hold Back and Real Time Ranking of Results in a Streaming Matching System

Номер: US20170061002A1
Принадлежит:

A matching system receives probe audio samples for comparison to references of a data store. Comparisons are generated between a first segment of a probe audio sample and corresponding time segments of a plurality of reference audio samples to identify a plurality of sufficiently matching reference audio samples based upon a first set of consistency scores. Matching references are retained, unless they meet a score threshold. Comparisons are continually generated with a second segment of the probe audio sample and corresponding time segments of the sufficiently matching reference audio samples to generate a second set of consistency scores. The retained results are outputted based on the first and second set of consistency scores. 1. A method , comprising: receiving, from an audio streaming system, a probe audio sample;', 'comparing a first time segment of the probe audio sample to corresponding time segments of a plurality of reference audio samples to identify a plurality of sufficiently matching reference audio samples based upon a first set of consistency scores generated between one or more feature vectors of the first time segment of the probe audio sample and corresponding feature vectors of the first time segment of each of the reference audio samples;', 'determining that the sufficiently matching reference audio samples do not meet a predetermined score threshold;', 'retaining the sufficiently matching reference audio samples;', 'comparing a second time segment of the probe audio sample to corresponding time segments of the sufficiently matching reference audio samples to identify a plurality of additional matching reference audio samples based upon a second set of consistency scores generated between one or more features vectors of the second time segment of the probe audio sample and corresponding feature vectors of the second time segment of each of the sufficiently matching reference audio samples; and', 'outputting at least one of the reference audio ...

Подробнее
25-10-2018 дата публикации

SEGMENT-BASED SPEAKER VERIFICATION USING DYNAMICALLY GENERATED PHRASES

Номер: US20180308492A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
25-11-2021 дата публикации

PERSONALIZED ENTITY REPOSITORY

Номер: US20210368313A1
Принадлежит:

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client. 1. A computer-implemented method , comprising:determining that each of multiple user interactions via a user device relate to a particular location;determining, based on the multiple user interactions that each relate to the particular location, a confidence score for a location-based set of entities for the particular location;in response to the confidence score for the location-based set of entities satisfying a threshold:causing an interactive prompt, related to the location-based set of entities, to be rendered at the user device;in response to an acceptance user interaction with the interactive prompt at the user device:causing the user device to download the location-based set of entities, the location-based set of entities being within a geographic boundary defined for the particular location; andin response to a rejection user interaction with the interactive prompt at the user device:refraining from causing the user device to download the location-based set of entities.2. The method of claim 1 , further comprising: identifying an ...

Подробнее
06-07-2021 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: US0011056120B2
Принадлежит: Google LLC, GOOGLE LLC

A method includes obtaining enrollment audio data representing a particular user speaking an enrollment phrase, and in response to receiving a request to verify an identity of an unverified user, prompting the unverified user to speak a verification utterance. The method also includes receiving verification audio data representing the unverified user speaking the verification utterance and determining whether the unverified user speaking the verification phrase includes the particular user who spoke the enrollment phrase based on the enrollment audio data and the verification audio data. The method also includes verifying the identity of the unverified user as the particular user.

Подробнее
23-11-2023 дата публикации

PERSONALIZED ENTITY REPOSITORY

Номер: US20230379678A1
Принадлежит: Google LLC

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client.

Подробнее
15-12-2020 дата публикации

Dynamic display of content consumption by geographic location

Номер: US0010866974B2
Принадлежит: GOOGLE LLC, Google LLC

This disclosure relates to a method for providing a display of content consumption by geographic location. The method includes storing, in a data store, geographic locations of a set of users consuming content items and consumption characteristics of the content items, wherein the content items are identified by user devices at the geographic locations while the content items are played by source devices external to the user devices, and wherein information about a content item of the identified content items, which is consumed by a user of the set of users, is transmitted to the server system by a user device of the user. The method also includes extracting, from the data store, geographic locations of consumption and a set of consumption characteristics of each content item of the identified content items, wherein the set of consumption characteristics comprises a title and times of consumption of the content item by the set of users. The method further includes filtering the identified content items based on at least one filter that pertains to times of content consumption by the set of users, ranking the filtered content items based on the geographic locations of consumptions and consumption statistics, selecting, from the ranked content items, popular content items at particular geographic locations of consumption and over a time period, and generating a geographic map displaying to a user each of the selected popular content items at one or more of the particular geographic locations of consumption, the map to display a title and an icon to represent each of the selected popular content items.

Подробнее
19-07-2018 дата публикации

Personalized Entities Repository

Номер: DE112016004859T5
Принадлежит: Google LLC

Es sind Systeme und Verfahren für ein personalisiertes Entitäten-Repository zur Verfügung gestellt. Beispielsweise umfasst eine Computervorrichtung ein personalisiertes Entitäten-Repository mit festen Gruppen von Entitäten aus einem Entitäten-Repository, das bei einem Server gespeichert ist, einen Prozessor und einen Speicher, der Anweisungen speichert, die veranlassen, dass die Computervorrichtung feste Gruppen von Entitäten identifiziert, die für einen Anwender relevant sind, basierend auf Kontext, der mit der Computervorrichtung assoziiert ist, einen Rang für die festen Gruppen durch Relevanz bildet und das personalisierte Entitäten-Repository unter Verwendung ausgewählter Gruppen updated, die basierend auf dem Rang und auf Gruppennutzungsparametern bestimmt sind, die auf den Anwender anwendbar sind. Bei einem weiteren Beispiel enthält ein Verfahren ein Erzeugen fester Gruppen von Entitäten aus einem Entitäten-Repository, einschließlich standortbasierter Gruppen und themenbasierter Gruppen, und ein Liefern einer Untergruppe der festen Gruppen zu einem Client, wobei der Client die Untergruppe basierend auf dem Standort des Clients und auf Elementen, die im Inhalt identifiziert sind, der zur Anzeige auf dem Client erzeugt ist, anfordert. Systems and procedures are provided for a personalized entity repository. For example, a computing device includes a personalized entity repository with fixed groups of entities from an entity repository stored at a server, a processor, and a memory that stores instructions that cause the computing device to identify fixed groups of entities that are relevant to a user, based on context associated with the computing device, ranking the fixed groups by relevance, and updating the personalized entity repository using selected groups determined based on rank and group usage parameters are applicable to the user. In another example, a method includes generating fixed groups of entities from an entity repository, including location- ...

Подробнее
16-10-2019 дата публикации

Envelope comparison for utterance detection

Номер: EP3069336B1
Принадлежит: Google LLC

Подробнее
14-02-2017 дата публикации

Melody recognition systems

Номер: US9569532B1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.

Подробнее
07-03-2023 дата публикации

Compressing audio waveforms using neural networks and vector quantizers

Номер: US11600282B2
Принадлежит: Google LLC

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Подробнее
05-01-2023 дата публикации

Compressing audio waveforms using neural networks and vector quantizers

Номер: WO2023278889A1
Принадлежит: Google LLC

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Подробнее
25-01-2023 дата публикации

Speaker verification using dynamically generated phrases

Номер: EP3392880B1
Принадлежит: Google LLC

Подробнее
15-06-2023 дата публикации

Compressing audio waveforms using neural networks and vector quantizers

Номер: US20230186927A1
Принадлежит: Google LLC

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Подробнее
24-06-2021 дата публикации

Training keyword spotters

Номер: WO2021127064A1
Принадлежит: Google LLC

A method (300) of training a custom hotword model (200) includes receiving a first set of training audio samples (134). The method also includes generating, using a speech embedding model (202) configured to receive the first set of training audio samples as input, a corresponding hotword embedding (208) representative of a custom hotword (132) for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples (144) with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data (12). The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

Подробнее
08-11-2023 дата публикации

Frequency based audio analysis using neural networks

Номер: EP3440598B1
Принадлежит: Google LLC

Подробнее
17-01-2019 дата публикации

Object detection using neural network systems

Номер: WO2019014625A1
Принадлежит: Google LLC

Systems, methods, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a system includes initial neural network layers configured to: receive an input image, and process the input image to generate a plurality of first feature maps that characterize the input image; a location generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate data defining a respective location of each of a predetermined number of bounding boxes in the input image, wherein each bounding box identifies a respective first region of the input image; and a confidence score generating convolutional neural network layer configured to perform a convolution on the representation of the first plurality of feature maps to generate a confidence score for each of the predetermined number of bounding boxes in the input image.

Подробнее
07-09-2022 дата публикации

Training keyword spotters

Номер: EP4052252A1
Принадлежит: Google LLC

A method (300) of training a custom hotword model (200) includes receiving a first set of training audio samples (134). The method also includes generating, using a speech embedding model (202) configured to receive the first set of training audio samples as input, a corresponding hotword embedding (208) representative of a custom hotword (132) for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples (144) with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data (12). The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

Подробнее
27-07-2022 дата публикации

Automated mining of real-world audio training data

Номер: EP4032085A1
Автор: Dominik Roblek
Принадлежит: Google LLC

Methods, systems, and apparatus, for generated labeled training examples for machine learning. In one aspect, a method includes receiving sets of audio recordings by a user device. For each set of audio recordings, each audio recording in the set is recorded over a respective separate microphone in the user device during a particular time interval, and each particular time interval is different for each set of audio recordings. For each set of audio recordings, a detector determines whether an audio recording in the set of audio recordings includes a particular audio feature, and whether another one of the audio recordings does not include the particular audio feature. For each set of audio recordings determined to include an audio recording that includes the particular audio feature and to include another audio recording that does not include the particular audio feature, a labeled training example is generated.

Подробнее
19-07-2023 дата публикации

Automated mining of real-world audio training data

Номер: EP4032085B1
Автор: Dominik Roblek
Принадлежит: Google LLC

Подробнее
27-05-2021 дата публикации

Automated mining of real-world audio training data

Номер: WO2021101501A1
Автор: Dominik Roblek
Принадлежит: Google LLC

Methods, systems, and apparatus, for generated labeled training examples for machine learning. In one aspect, a method includes receiving sets of audio recordings by a user device. For each set of audio recordings, each audio recording in the set is recorded over a respective separate microphone in the user device during a particular time interval, and each particular time interval is different for each set of audio recordings. For each set of audio recordings, a detector determines whether an audio recording in the set of audio recordings includes a particular audio feature, and whether another one of the audio recordings does not include the particular audio feature. For each set of audio recordings determined to include an audio recording that includes the particular audio feature and to include another audio recording that does not include the particular audio feature, a labeled training example is generated.

Подробнее
10-01-2024 дата публикации

Machine learning based enhancement of audio for a voice call

Номер: EP4302295A1
Принадлежит: Google LLC

Apparatus and methods related to enhancement of audio content are provided. An example method includes receiving, by a computing device and via a communications network interface, a compressed audio data frame, wherein the compressed audio data frame is received after transmission over a communications network, The method further includes decompressing the compressed audio data frame to extract an audio waveform. The method also includes predicting, by applying a neural network to the audio waveform, an enhanced version of the audio waveform, wherein the neural network has been trained on (i) a ground truth sample comprising unencoded audio waveforms prior to compression by an audio encoder, and (ii) a training dataset comprising decoded audio waveforms after compression of the unencoded audio waveforms by the audio encoder. The method additionally includes providing, by an audio output component of the computing device, the enhanced version of the audio waveform.

Подробнее
09-09-2022 дата публикации

Machine learning based enhancement of audio for a voice call

Номер: WO2022186838A1
Принадлежит: Google LLC

Apparatus and methods related to enhancement of audio content are provided. An example method includes receiving, by a computing device and via a communications network interface, a compressed audio data frame, wherein the compressed audio data frame is received after transmission over a communications network, The method further includes decompressing the compressed audio data frame to extract an audio waveform. The method also includes predicting, by applying a neural network to the audio waveform, an enhanced version of the audio waveform, wherein the neural network has been trained on (i) a ground truth sample comprising unencoded audio waveforms prior to compression by an audio encoder, and (ii) a training dataset comprising decoded audio waveforms after compression of the unencoded audio waveforms by the audio encoder. The method additionally includes providing, by an audio output component of the computing device, the enhanced version of the audio waveform.

Подробнее
06-06-2024 дата публикации

Generating coded data representations using neural networks and vector quantizers

Номер: US20240185870A1
Принадлежит: Google LLC

Methods, systems and apparatus, including computer programs encoded on computer storage media. According to one aspect, there is provided a method comprising: receiving a new input; processing the new input using an encoder neural network to generate a feature vector representing the new input; and generating a coded representation of the feature vector using a sequence of vector quantizers that are each associated with a respective codebook of code vectors, wherein the coded representation of the feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector.

Подробнее
21-05-2024 дата публикации

Compressing audio waveforms using neural networks and vector quantizers

Номер: US11990148B2
Принадлежит: Google LLC

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Подробнее
27-03-2024 дата публикации

Compressing audio waveforms using neural networks and vector quantizers

Номер: EP4341932A1
Принадлежит: Google LLC

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Подробнее
29-06-2022 дата публикации

Multi-task adapter neural networks

Номер: EP4018385A1
Принадлежит: Google LLC

A system including a multi-task adapter neural network for performing multiple machine learning tasks is described. The adapter neural network is configured to receive a shared input for the machine learning tasks, and process the shared input to generate, for each of the machine learning tasks, a respective predicted output. The adapter neural network includes (i) a shared encoder configured to receive the shared input and to process the shared input to extract shared feature representations for the machine learning tasks, and (ii) multiple task-adapter encoders, each of the task-adapter encoders being associated with a respective machine learning task in the machine learning tasks and configured to: receive the shared input, receive the shared feature representations from the shared encoder, and process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task.

Подробнее
09-11-2016 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: EP3090428A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
08-03-2018 дата публикации

動的に生成された句を使用するセグメントベースの話者検証

Номер: JP2018036675A
Принадлежит: Google LLC

【課題】ユーザの身元を検証するための、コンピュータ記憶媒体上に符号化されたコンピュータプログラムを含む方法、システム、および装置。 【解決手段】方法、システム、および装置は、ユーザの身元を検証するための検証句を求める要求を受信するアクションを含む。さらなるアクションは、ユーザの身元を検証するための検証句を求める要求を受信することに応答して、検証句に含められるべき部分語を識別すること、および検証句に含められるべき部分語を識別することに応答して、識別された部分語のうちの少なくともいくつかを含む候補句を検証句として獲得することを含む。さらなるアクションは、その検証句を、ユーザの身元を検証するための検証句を求める要求に対する応答として提供することを含む。 【選択図】図1

Подробнее
01-01-2020 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: EP3401906B1
Принадлежит: Google LLC

Подробнее
03-10-2018 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: EP3154056B1
Принадлежит: Google LLC

Подробнее
10-06-2020 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: EP3664082A1
Принадлежит: Google LLC

A method comprising: obtaining, by data processing hardware, audio data representing a particular user speaking a phrase, the phrase comprising at least one subword for which no stored audio data representing the user speaking the subword has been obtained; receiving, at data processing hardware, a request to verify an identity of an unverified user; in response to receiving the request to verify the identity of the unverified user, prompting, by the data processing hardware, the unverified user to speak a verification phrase; receiving, at the data processing hardware, verification audio data representing the unverified user speaking the verification phrase; determining, by the data processing hardware, whether the unverified user speaking the verification phrase comprises the particular user who spoke the phrase based on the audio data and the verification audio data; and in response to determining that the unverified user speaking the verification phrase comprises the particular user who spoke the phrase, verifying, by the data processing hardware, an identity of the unverified user as the particular user.

Подробнее
06-06-2018 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: EP3090428B1
Принадлежит: Google LLC

Подробнее
08-10-2015 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: WO2015153351A1
Принадлежит: GOOGLE INC.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Подробнее
12-09-2023 дата публикации

Self-supervised pitch estimation

Номер: US11756530B2
Принадлежит: Google LLC

Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

Подробнее
13-02-2019 дата публикации

Frequency based audio analysis using neural networks

Номер: EP3440598A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for frequency based audio analysis using neural networks. One of the methods includes training a neural network that includes a plurality of neural network layers on training data, wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample, wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output.

Подробнее
16-11-2017 дата публикации

Frequency based audio analysis using neural networks

Номер: WO2017196931A1
Принадлежит: Google LLC

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for frequency based audio analysis using neural networks. One of the methods includes training a neural network that includes a plurality of neural network layers on training data, wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample, wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output.

Подробнее
05-06-2024 дата публикации

Segment-based speaker verification using dynamically generated phrases

Номер: EP3664082B1
Принадлежит: Google LLC

Подробнее
04-09-2024 дата публикации

Compressing audio waveforms using neural networks and vector quantizers

Номер: EP4425493A2
Принадлежит: Google LLC

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Подробнее
01-10-2024 дата публикации

Automated mining of real-world audio training data

Номер: US12106748B2
Автор: Dominik Roblek
Принадлежит: Google LLC

Methods, systems, and apparatus, for generated labeled training examples for machine learning. In one aspect, a method includes receiving sets of audio recordings by a user device. For each set of audio recordings, each audio recording in the set is recorded over a respective separate microphone in the user device during a particular time interval, and each particular time interval is different for each set of audio recordings. For each set of audio recordings, a detector determines whether an audio recording in the set of audio recordings includes a particular audio feature, and whether another one of the audio recordings does not include the particular audio feature. For each set of audio recordings determined to include an audio recording that includes the particular audio feature and to include another audio recording that does not include the particular audio feature, a labeled training example is generated.

Подробнее
01-10-2024 дата публикации

Personalized entity repository

Номер: US12108314B2
Принадлежит: Google LLC

Systems and methods are provided for a personalized entity repository. For example, a computing device comprises a personalized entity repository having fixed sets of entities from an entity repository stored at a server, a processor, and memory storing instructions that cause the computing device to identify fixed sets of entities that are relevant to a user based on context associated with the computing device, rank the fixed sets by relevancy, and update the personalized entity repository using selected sets determined based on the rank and on set usage parameters applicable to the user. In another example, a method includes generating fixed sets of entities from an entity repository, including location-based sets and topic-based sets, and providing a subset of the fixed sets to a client, the client requesting the subset based on the client's location and on items identified in content generated for display on the client.

Подробнее
09-07-2024 дата публикации

Aggregation of related media content

Номер: US12033668B2
Принадлежит: Google LLC

Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.

Подробнее
14-02-2017 дата публикации

Generating correlation scores

Номер: US09569405B2
Принадлежит: Google LLC

A computer-implemented method includes obtaining first and second binary vectors. For each of a plurality of vector locations in a first of j words in the first binary vector, the method includes shifting the binary values for the second binary vector so that a particular one of the binary values in the second binary vector is located at a vector location in a first of the k words in the second binary vector that matches the vector location in the first of j words in the first binary vector. For each of the j words in the first binary vector, the method includes aligning the second binary vector with the word in the first binary vector and determining a binary correlation score. A similarity of the first binary vector and the second binary vector can be determined based at least on one or more of the determined binary correlation scores.

Подробнее
27-12-2016 дата публикации

Hold back and real time ranking of results in a streaming matching system

Номер: US09529907B2
Принадлежит: Google LLC

A matching system receives probe audio samples for comparison to references of a data store. Comparisons are generated to determine a sufficient match for a portion or a first amount of the probe sample. Ranking scores are assigned to the resulting match references. The match references are retained, unless meeting a score threshold. Comparisons are continually generated with second amounts of the probe sample and the retained references are updated with further matching references assigned ranking scores. The retained results are merged and determined to satisfy a score threshold for release as outputted results for matching references.

Подробнее
05-11-2024 дата публикации

Training keyword spotters

Номер: US12136412B2
Принадлежит: Google LLC

A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples. The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.

Подробнее