Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 6872. Отображено 100.
01-03-2012 дата публикации

System and method for subsequence matching

Номер: US20120054196A1
Принадлежит: Hewlett Packard Development Co LP

An embodiment of providing a computer-executed method of subsequence matching is provided. The method comprises receiving a search string. A plurality of subsequences for the search string are stored in a tree structure. The tree structure comprise a plurality of nodes. Each of the plurality of nodes comprises a presence bit map, a sequence bit map, and a list of address pointers. The method further includes traversing the tree structure using the search string, the presence bit map, the sequence bit map, and the list of address pointers. Additionally, the method includes identifying, in linear time, the plurality of subsequences based on the search string, the presence bit map, the sequence bit map, and the list of address pointers.

Подробнее
12-04-2012 дата публикации

Modulo operation method and apparatus for same

Номер: US20120089658A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

The present invention provides a modulo operation method. The modulo operation method, in a case where the square of a divisor N is greater than or equal to a dividend C, includes: determining the number of computation stages n satisfying 2 n <N≦2 n+1 ; performing an initialization operation by initializing a constant a to the smallest integer greater than or equal to half of N; performing a first operation by subtracting, when C is greater than or equal to N·a (product of N and a), the value of C by the value of N·a; and performing a second operation by assigning the smallest integer greater than or equal to half of a to the value of a, wherein the value of C is output as the result of modulo operation after the first operation and the second operation are repeated n times. In the first operation, when C is less than N·a, the value of C is unchanged. In the modulo operation method and apparatus of the present invention, the amount of computation in a modulo operation or division operation does not increase in linear proportion to the magnitude of the divisor N but increases in proportion to log N. As a result, the total amount of computation decreases and computation speed increases.

Подробнее
21-06-2012 дата публикации

Modular exponentiation resistant against skipping attacks

Номер: US20120159189A1
Автор: Marc Joye
Принадлежит: Individual

An exponentiation method resistant against skipping attacks. A main idea of the present invention is to evaluate, in parallel with the exponentiation such as y=g d , a value based on the exponent, e.g. f=d·1. These evaluations are performed using the same exponentiation algorithm by “gluing” together the group operations underlying the computation of y and f so that a perturbation to one operation also perturbs the other. This makes it possible to verify that f indeed equals d before returning the result. Also provided are an apparatus and a computer program product.

Подробнее
28-06-2012 дата публикации

Performing Reciprocal Instructions With High Accuracy

Номер: US20120166509A1
Принадлежит: Intel Corp

In one embodiment, the present invention includes a method for receiving a reciprocal instruction and an operand in a processor, accessing an entry of a lookup table based on a portion of the operand and the instruction, generating an encoder output based on a type of the reciprocal instruction and whether the reciprocal instruction is a legacy instruction, and selecting portions of the lookup table entry and input operand to be provided to a reciprocal logic unit based on the encoder output. Other embodiments are described and claimed.

Подробнее
19-07-2012 дата публикации

Semiconductor apparatus and semiconductor system including random code generation circuit, and data programming method

Номер: US20120185654A1
Принадлежит: Hynix Semiconductor Inc

A semiconductor apparatus includes a plurality of linear feedback shift registers configured to receive a plurality of seed codes as initial values and generate respective random codes under a control of a clock signal, a code combination section configured to logically combine the plurality of random codes generated by the plurality of linear feedback shift registers and generate a final random code, and a data conversion unit configured to convert input data based on the final random code and output conversion data.

Подробнее
06-09-2012 дата публикации

System and Method for Testing Whether a Result is Correctly Rounded

Номер: US20120226730A1
Автор: Alexandru Fit-Florea
Принадлежит: Nvidia Corp

A computer-implemented method for executing a floating-point calculation where an exact value of an associated result cannot be expressed as a floating-point value is disclosed. The method involves: generating an estimate of the associated result and storing the estimate in memory; calculating an amount of error for the estimate; determining whether the amount of error is less than or equal to a threshold of error for the associated result; and if the amount of error is less than or equal to the threshold of error, then concluding that the estimate of the associated result is a correctly rounded result of the floating-point calculation; or if the amount of error is greater than the threshold of error, then testing whether the floating-point calculation constitutes an exception case.

Подробнее
13-09-2012 дата публикации

System and method of bypassing unrounded results in a multiply-add pipeline unit

Номер: US20120233234A1
Принадлежит: Oracle International Corp

A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.

Подробнее
13-09-2012 дата публикации

Methods and systems for full pattern matching in hardware

Номер: US20120233693A1
Принадлежит: Hewlett Packard Development Co LP

Methods and systems are provided for hardware-based pattern matching. In an embodiment, an intrusion-prevention system (IPS) identifies a full match between a subject data word comprising subject-data blocks and a signature data pattern comprising signature-data blocks. The IPS receives the subject data word via a network interface, and thereafter makes a partial-match determination that two or more but less than all of the subject-data blocks respectively match the same number of the signature-data blocks stored in partial-match hardware with respect to both value and position. Thereafter, the IPS makes a full-match determination that all of the subject-data blocks respectively match all of the signature-data blocks stored in the IPS's full-match hardware with respect to both value and position. The IPS then stores an indicator that the full-match determination has been made, and may carry out one or more additional intrusion-prevention responses as well.

Подробнее
11-10-2012 дата публикации

Pipelined divide circuit for small operand sizes

Номер: US20120259907A1
Принадлежит: Oracle International Corp

A pipelined circuit for performing a divide operation on small operand sizes. The circuit includes a plurality of stages connected together in a series to perform a subtractive divide algorithm based on iterative subtractions and shifts. Each stage computes two quotient bits and outputs a partial remainder value to the next stage in the series. The first and last stages utilize a radix-4 serial architecture with edge modifications to increase efficiency. The intermediate stages utilize a radix-4 parallel architecture. The divide architecture is pipelined such that input operands can be applied to the divider on each clock cycle.

Подробнее
15-11-2012 дата публикации

Finite field cryptographic arithmetic resistant to fault attacks

Номер: US20120288086A1
Принадлежит: NXP BV

Various embodiments relate to a method for integrity protected calculation of a cryptographic function including: performing an operation c=a∘b in a cryptographic function f(x 1 , x 2 , . . . , x n ) defined over a commutative ring R; choosing a′ and b′ corresponding to a and b such that a′ and b′ are elements of a commutative ring R′; computing c′=a′∘′b′; computing a″=CRT(a, a′) and b″=CRT(b, b′), where CRT is the Chinese Remainder Theorem; computing c″=a″∘″b″; mapping c″ into R′; and determining if the mapping of c″ into R′ equals c′.

Подробнее
25-07-2013 дата публикации

Pseudo-noise generator

Номер: US20130191427A1
Автор: Lewis Farrugia
Принадлежит: Astrium Ltd

The present invention relates to a pseudo-noise generator comprising a plurality of pseudo-random number generators and an averaging unit. The averaging unit is arranged to receive a plurality of pseudo-random numbers from the plurality of pseudo-random number generators, calculate a mean value of the plurality of pseudo-random numbers, and output said mean value as a digital pseudo-noise signal.

Подробнее
09-01-2014 дата публикации

Systems, apparatuses, and methods for performing a horizontal add or subtract in response to a single instruction

Номер: US20140013075A1
Принадлежит: Intel Corp

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.

Подробнее
16-01-2014 дата публикации

Method and apparatus for decimal floating-point data logical extraction

Номер: US20140019506A1
Автор: Shihjong J. Kuo
Принадлежит: Intel Corp

Embodiments of systems, apparatuses, and methods for performing BIDSplit instructions in a computer processor are described. In some embodiments, the execution of a BIDSplit instruction tests the encoding of a binary-integer decimal source value and extracts a sign, exponent, and/or significand into a destination.

Подробнее
16-01-2014 дата публикации

Systems, apparatuses, and methods for performing a double blocked sum of absolute differences

Номер: US20140019713A1
Принадлежит: Intel Corp

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Подробнее
23-01-2014 дата публикации

Simd integer addition including mathematical operation on masks

Номер: US20140025717A1
Автор: Sergey Lyalin
Принадлежит: Intel Corp

Methods, apparatuses, and articles associated with SIMD adding two integers are disclosed. In embodiments, a method may include element-wise SIMD adding corresponding elements of a first SIMD-sized integer (A) and a second SIMD-sized integer (B) to generate a SIMD-sized integer result (R) and a carry bit. A may have an integer size (SizeA), while B may have an integer size (SizeB). The addition, in response to SizeA greater than SizeB, may further include updating R and the carry bit in view of one or more elements of A that do not have corresponding element or elements of B. Further, element-wise SIMD adding may include performing one or more mathematical operations on first one or more masks, with the first one or more masks interpreted as integers, and interpreting one or more integer results of the one or more mathematical operations as second one or more masks.

Подробнее
20-02-2014 дата публикации

Searchable Encrypted Data

Номер: US20140052999A1
Принадлежит: VISA INTERNATIONAL SERVICE ASSOCIATION

Embodiments of the invention broadly described, introduce systems and methods for enabling the searching of encrypted data. One embodiment of the invention discloses a method for generating a searchable encrypted database. The method comprises receiving a plurality of sensitive data records comprising personal information of different users, identifying one or more searchable fields for the sensitive data records, wherein each searchable field is associated with a subset of the personal information for a user, generating a searchable field index for each of the one or more searchable fields, and encrypting the sensitive data records using a database encryption key.

Подробнее
27-02-2014 дата публикации

Arithmetic circuit for performing division based on restoring division

Номер: US20140059106A1
Принадлежит: Fujitsu Ltd

An arithmetic circuit for performing division based on restoring division includes an intermediate remainder register configured to store an intermediate remainder, a quotient prediction circuit configured to perform, based on information about two most significant digits of the intermediate remainder and a most significant digit of a divisor, quotient prediction having lower precision than a highest precision obtainable from the information, thereby generating a prediction result, a fixed-value multiplication circuit configured to output one or more N-th (N: integer) multiples of the divisor selected in response to the prediction result, one or more subtracters configured to subtract, from the intermediate remainder, the one or more N-th multiples of the divisor output from the fixed-value multiplication circuit, and a partial quotient calculating circuit configured to obtain a partial quotient in response to one or more carry-out bits of one or more subtractions performed by the one or more subtracters.

Подробнее
06-03-2014 дата публикации

Converting a first address mapping function for mapping addresses to storage locations to a second address mapping function

Номер: US20140068211A1
Принадлежит: International Business Machines Corp

Provided are a computer program product, system, and method for converting a first address mapping function for mapping addresses to storage locations to a second address mapping function. For each of a plurality of addresses allocated in the storage using the first address mapping function, a node is generated in the second address mapping function. Each node in the second address mapping function associates a logical address with a physical location for the logical address. A determination is made of addresses having unused space and storage space is freed for the determined addresses having the unused space. Indication is made in the second address mapping function that the storage space for the determined addresses has been freed.

Подробнее
13-03-2014 дата публикации

Method for testing the security of an electronic device against an attack, and electronic device implementing countermeasures

Номер: US20140075203A1
Принадлежит: Oberthur Technologies SA

A method of testing security of an electronic device against a combination of a side-channel attack and a fault-injection attack implemented during a method of cryptographic processing that includes: delivering a message signature based on a secret parameter and implementing a recombination of at least two intermediate values according to the Chinese remainder theorem; and verifying the signature on the basis of at least one public exponent. The method of testing includes: transmitting a plurality of messages to be signed by said electronic device; disturbing each message, including modifying the message by inserting an identical error for each message, before executing a step of determining one of the intermediate values; and analyzing physical measurements, obtained during the step of verifying the signature as a function of the message to be signed, the identical error for each message, and an assumption of a value of part of the secret parameter.

Подробнее
06-01-2022 дата публикации

REPURPOSED HEXADECIMAL FLOATING POINT DATA PATH

Номер: US20220004361A1
Принадлежит:

A method includes dividing a fraction of a floating point result into a first portion and a second portion. The method includes outputting a first normalizer result based on the first portion during to a first clock cycle. The method includes storing a first segment of the first portion during to the first clock cycle. The method includes outputting a first rounder result based on the first normalizer result during to the first clock cycle. The method includes outputting a second normalizer result based on the second portion during to a second clock cycle. The method includes outputting a second rounder result based on the second normalizer result and the first segment during to the second clock cycle. 1. A method comprising:dividing a fraction of a floating point result into a first portion and a second portion;outputting a first normalizer result based on the first portion during a first clock cycle;storing a first segment of the first portion during the first clock cycle;outputting a first rounder result based on the first normalizer result during the first clock cycle;outputting a second normalizer result based on the second portion during a second clock cycle; andoutputting a second rounder result based on the second normalizer result and the first segment during the second clock cycle.2. The method of claim 1 , wherein the first rounder result includes the first segment.3. The method of claim 1 , wherein the floating point result is a quadruple precision binary floating point number.4. The method of claim 3 , wherein the first normalizer result is normalized using a double precision hexadecimal floating point normalizer.5. The method of claim 3 , wherein the first rounder result is rounded using a double precision hexadecimal floating point rounder.6. The method of claim 3 , wherein the first normalizer result is 57 bits including a 48-bit fraction claim 3 , a one bit leading zero too large flag claim 3 , and the first segment.7. The method of claim 3 , ...

Подробнее
02-01-2020 дата публикации

MULTIPLICATION OPERATIONS IN MEMORY

Номер: US20200004502A1
Автор: Tiwari Sanjay
Принадлежит:

Examples of the present disclosure provide apparatuses and methods for performing multiplication operations in a memory. An example method comprises performing a multiplication operation on a first element stored in a group of memory cells coupled to a first access line and a number of sense lines of a memory array and a second element stored in a group of memory cells coupled to a second access line and the number of sense lines of the memory array. The method can include a number operations performed without transferring data via an input/output (I/O) line. 120.-. (canceled)21. A method , comprising: one of a plurality of first elements stored in a first group of memory cells coupled to a first access line and to a number of sense lines of a memory array; and', 'one of a plurality of second elements stored in a second group of memory cells coupled to a second access line and to the number of sense lines of the memory array;', 'wherein multiplying the plurality of element pairs, in parallel, comprises performing a number of logical operations between constituent bits of the respective plurality of first elements and constituent bits of a mask bit vector stored in the memory array; and', 'wherein the number of logical operations are performed using sense amplifiers coupled to the number of sense lines and compute components coupled to the sense amplifiers; and, 'multiplying, in parallel and on a memory device, a plurality of element pairs stored in a memory array, wherein the plurality of element pairs each compriseproviding multiplication results corresponding to the respective element pairs.22. The method of claim 21 , wherein providing multiplication results corresponding to the respective element pairs comprises storing the multiplication results in the memory array.23. The method of claim 22 , wherein storing the multiplication results in the memory array comprises storing the multiplication results in a third group of memory cells coupled to a third access ...

Подробнее
13-01-2022 дата публикации

INTEGRATED CIRCUITS WITH MACHINE LEARNING EXTENSIONS

Номер: US20220012015A1
Принадлежит:

An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms. 1. An integrated circuit , comprising:a multiplier data path operable in a floating-point mode and a fixed-point mode; andan adder data path configurable to receive signals from the multiplier data path only during the floating-point mode.2. The integrated circuit of claim 1 , wherein the multiplier data path is decomposed into a plurality of smaller multiplier data paths.3. The integrated circuit of claim 2 , wherein each of the smaller multiplier data paths comprises:a partial product generator;a compressor operable to receive signals from the partial product generator; anda carry-propagate adder operable to receive signals from the compressor and to generate a corresponding product.4. The integrated circuit of claim 1 , comprising an adder operable to receive signals from the multiplier data path and to feed signals into the adder data path.5. The integrated circuit of claim 4 , wherein the adder data path comprises an additional adder claim 4 , and wherein the adder and the ...

Подробнее
13-01-2022 дата публикации

LOW-LATENCY DIGITAL SIGNATURE PROCESSING WITH SIDE-CHANNEL SECURITY

Номер: US20220012334A1
Принадлежит: Intel Corporation

A low-latency digital-signature with side-channel security is described. An example of an apparatus includes a coefficient multiplier circuit to perform polynomial multiplication, the coefficient multiplier circuit providing Number Theoretic Transform (NTT) and INTT (Inverse NTT) processing; and one or more accessory operation circuits coupled with the coefficient multiplier circuit, each of the one or more accessory operation circuits to perform a computation based at least in part on a result of an operation of the NTT/INTT coefficient multiplier circuit, wherein the one or more accessory operation circuits are to receive results of operations of the NTT/INTT coefficient multiplier circuit prior to the results being stored in a memory. 1. An apparatus comprising:a coefficient multiplier circuit to perform polynomial multiplication, the coefficient multiplier circuit providing Number Theoretic Transform (NTT) and INTT (Inverse NTT) processing; andone or more accessory operation circuits coupled with the coefficient multiplier circuit, each of the one or more accessory operation circuits to perform a computation based at least in part on a result of an operation of the NTT/INTT coefficient multiplier circuit;wherein the one or more accessory operation circuits are to receive results of operations of the NTT/INTT coefficient multiplier circuit prior to the results being stored in a memory.2. The apparatus of claim 1 , the one or more accessory operation circuits are to perform the accessory operations in a same cycle as the operations of the NTT/INTT coefficient multiplier circuit.3. The apparatus of claim 2 , wherein the performance of the one or more accessory operations overlaps at least in part with one or more other operations of the apparatus.4. The apparatus of claim 1 , wherein the polynomial multiplication includes multiplying a private polynomial with a public polynomial.5. The apparatus of claim 1 , wherein the computation by the one or more accessory ...

Подробнее
01-01-2015 дата публикации

METHOD AND APPARATUS FOR PROCESSING AND RECONSTRUCTING DATA

Номер: US20150006598A1
Принадлежит:

Certain aspects of the present disclosure relate to a method for quantizing signals and reconstructing signals, and/or encoding or decoding data for storage or transmission. Points of a signal may be determined as local extrema or points where an absolute rise of the signal is greater than a threshold. The tread and value of the points may be quantized, and certain of the quantizations may be discarded before the quantizations are transmitted. After being received, the signal may be reconstructed from the quantizations using an iterative process. 1. A method for processing data , comprising:receiving a plurality of measurements indicative of a signal, the plurality of measurements comprising at least a tread and a value at points of the signal;reconstructing the signal based at least in part on the plurality of measurements; andmodifying the value of at least one sample of the reconstructed signal based on the measurements.2. The method of claim 1 , wherein the modifying comprises determining local extrema of the reconstructed signal.3. The method of claim 2 , wherein the modifying comprises comparing the determined local extrema with the received measurements of the signal.4. The method of claim 2 , comprising discarding one or more of the determined local extrema if a difference between a value of the one or more determined local extrema and a value of a corresponding received measurement is greater than a threshold.5. The method of claim 4 , further comprising reconstructing the signal based on the determined local extrema that remain after the one or more of the determined local extrema are discarded.6. The method of claim 5 , comprising interpolating values of the signal near a first location corresponding to a discarded local extremum based on remaining local extrema located near the first location.7. The method of claim 5 , comprising interpolating values of the signal near a first location corresponding to a discarded local extremum based on values of the ...

Подробнее
01-01-2015 дата публикации

Method, device and circuit for pattern matching

Номер: US20150007320A1
Принадлежит: International Business Machines Corp

A method for pattern matching finds a target pattern from a stream of patterns, both of the stream of patterns and the target pattern being comprised of elements. The method includes acquiring occurrence numbers of target elements in the target pattern, initializing the buffer, the buffer indicating a section in the stream of patterns, determining whether occurrence numbers of the target elements in the buffer reach the occurrence numbers of the target elements in the target pattern, updating the buffer and then returning to the determining step, in response to determining that the occurrence numbers of the target elements in the buffer do not reach the occurrence numbers of the target elements in the target pattern, and outputting the elements in the buffer for subsequent processing, in response to determining that the occurrence numbers of the target elements in the buffer reach the occurrence numbers of the target elements in the target pattern. A device and a circuit for pattern matching are also provided to increase the speed for pattern matching.

Подробнее
20-01-2022 дата публикации

INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Номер: US20220019431A1
Принадлежит: Intel Corporation

A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads. 1. A graphics processing unit (GPU) comprising:a plurality of memory controllers;cache memory coupled with the plurality of memory controllers; a register file; and', 'circuitry coupled with the register file, the circuitry including a first core to perform a mixed precision matrix operation and a second core to perform, in response to a single instruction, multiple compute operations, wherein the multiple compute operations include a first operation to perform a fused multiply-add and a second operation to apply a rectified linear unit function to a result of the first operation., 'a graphics multiprocessor coupled with the cache memory and the plurality of memory controllers, the graphics multiprocessor having a single instruction, multiple thread (SIMT) architecture, wherein the graphics multiprocessor includes2. The GPU as in claim 1 , wherein the first operation and the second operation are single instruction multiple data (SIMD) operations.3. The GPU as in claim 1 , wherein the multiple compute operations are performed on input in a 16-bit floating-point format having a 1-bit sign and an 8-bit exponent.4. The GPU as in claim 3 , wherein the second core includes a dynamic precision processing resource that is configurable to automatically convert input in a 32-bit floating point format to the 16-bit floating-point format in conjunction with execution of the single instruction.5. The GPU as in claim 4 , wherein the dynamic precision processing resource includes ...

Подробнее
12-01-2017 дата публикации

ENTROPY SOURCE WITH MAGNETO-RESISTIVE ELEMENT FOR RANDOM NUMBER GENERATOR

Номер: US20170010864A1
Принадлежит:

An entropy source and a random number (RN) generator are disclosed. In one aspect, a low-energy entropy source includes a magneto-resistive (MR) element and a sensing circuit. The MR element is applied a static current and has a variable resistance determined based on magnetization of the MR element. The sensing circuit senses the resistance of the MR element and provides random values based on the sensed resistance of the MR element. In another aspect, a RN generator includes an entropy source and a post-processing module. The entropy source includes at least one MR element and provides first random values based on the at least one MR element. The post-processing module receives and processes the first random values (e.g., based on a cryptographic hash function, an error detection code, a stream cipher algorithm, etc.) and provides second random values having improved randomness characteristics. 1. An apparatus comprising:an entropy source comprising at least one magneto-resistive (MR) element and configured to provide first random values based on the at least one MR element; anda post-processing module configured to receive and process the first random values and provide second random values.2. The apparatus of claim 1 , wherein the post-processing module is configured to hash the first random values and provide the second random values.3. The apparatus of claim 2 , wherein the post-processing module is configured to hash the first random values based on a cryptographic hash function.4. The apparatus of claim 1 , wherein a total number of bits of each second random value is greater than a total number of bits of all first random values used to generate the second random value.5. The apparatus of claim 1 , the post-processing module comprising:a plurality of shift registers configured to receive a plurality of sequences of first random values from a plurality of entropy sources including the entropy source, anda hash module configured to receive a plurality of ...

Подробнее
10-01-2019 дата публикации

Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Номер: US20190012295A1
Принадлежит:

Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array. 1. A systolic array architecture implemented in circuitry of an integrated circuit , comprising: processing elements to perform multiplication of matrices; and', 'a processing element buffer memory for each of the processing elements;, 'a processing element array, comprising column feeders each communicatively coupled to a corresponding processing element to feed a first matrix into the processing element array; and', 'a column feeder buffer memory for each of the column feeders; and, 'a column feeder array communicatively coupled to a first orthogonal edge of the processing element array, comprising row feeders each communicatively coupled to a corresponding processing element to feed a second matrix into the processing element array; and', 'a row feeder buffer memory for each of the row feeders, wherein the systolic array architecture is communicatively coupled to an external memory to access matrix data of the first matrix and the second matrix, and bandwidth requirement of the external memory is reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array., 'a row feeder array communicatively coupled to a second orthogonal edge of the processing element array, adjacent to the first orthogonal edge, comprising2. The systolic array architecture of claim 1 , wherein the column feeder array and the row feeder array send the matrix data of the first matrix and the ...

Подробнее
15-01-2015 дата публикации

Word Boundary Lock

Номер: US20150016574A1
Автор: Craig Barner
Принадлежит: Cavium LLC

In an embodiment, a method for determining a word boundary in an incoming data stream includes initializing an N bit register with initial content, receiving a number of consecutive N bit words of the incoming data stream and processing each of the number of consecutive N bit words. The processing includes performing operations per bit position of the register, including performing an XOR operation on a corresponding received data bit and a next received data bit, performing an AND operation on a current state of the bit position of the register and a result of the XOR operation, and storing a result of the AND operation to update the state of the bit position of the register. The word boundary is defined based on the content of the register following the processing of the number of consecutive N bit words.

Подробнее
09-01-2020 дата публикации

MEMRISTOR-BASED DIVIDERS USING MEMRISTORS-AS-DRIVERS (MAD) GATES

Номер: US20200014388A1
Принадлежит:

Memristor-based dividers using memristors-as-drivers (MAD) gates. As a result of employing MAD gates in memristor-based dividers, such as binary non-restoring dividers and SRT dividers, the number of delay steps may be less than half than the number of delay steps required in traditional CMOS implementations of dividers. Furthermore, by using MAD gates, memristor-based dividers can be implemented with less complexity (e.g., fewer memristors and drivers). As a result, by the memristor-based dividers using MAD gates, the speed and complexity of a wide variety of arithmetic operations is improved. 1. A binary non-restoring divider , comprising:a first memristor, wherein said first memristor is connected to a first switch, a second switch, a third switch, a fourth switch, a fifth switch and a sixth switch;a second memristor connected in parallel to said first memristor, wherein said second memristor is connected to a seventh switch, wherein an eighth switch is connected to said first and second memristors;a third memristor connected in parallel to said second memristor, wherein said third memristor is connected to a ninth switch, a tenth switch and an eleventh switch; anda fourth memristor connected in parallel to said third memristor, wherein said fourth memristor is connected to a twelfth switch, a thirteenth switch, a fourteenth switch, a fifteenth switch, a sixteenth switch and a seventeenth switch.2. The binary non-restoring divider as recited in claim 1 , wherein said second switch is connected to ground via a resistor.3. The binary non-restoring divider as recited in claim 1 , wherein said first memristor is connected to a first power source via a resistor.4. The binary non-restoring divider as recited in claim 1 , wherein said first claim 1 , second and seventh switches are connected to a second power source.5. The binary non-restoring divider as recited in claim 1 , wherein said third and fourth switches are connected in series claim 1 , wherein said fifth and ...

Подробнее
19-01-2017 дата публикации

ARITHMETIC PROCESSING DEVICE AND METHOD OF CONTROLLING ARITHMETIC PROCESSING DEVICE

Номер: US20170017466A1
Автор: HONDO Mikio
Принадлежит: FUJITSU LIMITED

An arithmetic processing device includes: a first memory configured to store values of a first coefficient of a logarithmic function, where the logarithmic function is decomposed into a series operation term and the coefficient term, depending on respective values of a first bit group included in operand data of a first instruction to calculate the value of the first coefficient; a second memory configured to store values of a second coefficient included in the series operation term depending on the respective values of the first bit group included in operand data of a second instruction to calculate the value of the second coefficient; and a selector configured to select the value of the first coefficient read from the first memory based on the execution of the first instruction and select the value of the second coefficient read from the second memory based on the execution of the second instruction. 1. An arithmetic processing device comprising:a first memory configured to store values of a first coefficient of a logarithmic function where the logarithmic function is decomposed into a series operation term and the coefficient term for the series operation term depending on respective values of a first bit group included in operand data of a first instruction to calculate the value of the first coefficient;a second memory configured to store values of a second coefficient included in the series operation term depending on the respective values of the first bit group included in operand data of a second instruction to calculate the value of the second coefficient; anda selector, coupled to the first memory and the second memory, configured to select the value of the first coefficient read from the first memory based on an execution of the first instruction and select the value of the second coefficient read from the second memory based on an execution of the second instruction.2. The arithmetic processing device according to claim 1 ,wherein the first memory stores ...

Подробнее
19-01-2017 дата публикации

Integer/floating point divider and square root logic unit and associates methods

Номер: US20170017467A1
Автор: Bonnie Sexton
Принадлежит: SAMSUNG ELECTRONICS CO LTD

Embodiments of the inventive concept include a shared hardware integer/floating point divider and square root logic unit, which combines floating point division, floating point square root operations, and/or integer division into one shared hardware design. The shared hardware logic unit can share, for example, a sparse random access memory (sparse RAM) in place of a full partial remainder divisor (PD) table, one or more on-the-fly (OTF) state machines, and/or a data path for integer division, floating point division, and square root operations. The normalization of subnormal numbers and the normalization of signed and unsigned integers can be handled with shared hardware. The division operations and the square root operations can be of the same radix. Early out exceptions and special cases can be automatically handled. Both improved latency and less die area can be achieved in accordance with embodiments of the inventive concept.

Подробнее
15-01-2015 дата публикации

Method for assessing an output of a random number generator

Номер: US20150019605A1
Автор: Boehl Eberhard
Принадлежит: ROBERT BOSCH GMBH

A method for assessing an output of a random number generator which is provided by two phase-locked loops of the random number generator includes: receiving, by a checking system, the output of the random number generator for at least two sampling cycle, wherein for each sampling cycle (i) the output of the random generator includes a sequence of sample values between a starting value and an end value, and (ii) all sample values between the starting value and the end value in the respective cycle are entered into a signature; and comparing, by the checking system, the signatures from the at least two sampling cycles to one another. 1. A method for assessing an output of a random number generator which is provided by two phase-locked loops of the random number generator , comprising:receiving, by a checking system, the output of the random number generator for at least two sampling cycle, wherein for each sampling cycle (i) the output of the random generator includes a sequence of sample values between a starting value and an end value, and (ii) all sample values between the starting value and the end value in the respective cycle are entered into a signature; andcomparing, by the checking system, the signatures from the at least two sampling cycles to one another.2. The method as recited in claim 1 , wherein the signatures are formed by a multiple input signature register.3. The method as recited in claim 1 , wherein the signatures are formed by a counter which counts transitions of bit values which form the signatures.4. The method as recited in claim 1 , wherein the signatures are formed by a counter which counts the number of ones in bit values which form the signatures.5. The method as recited in claim 1 , wherein an entropy counter is incremented when the at least two signatures are different.6. The method as recited in claim 1 , wherein a warning counter is incremented when the at least two signatures are the same.7. The method as recited in claim 5 , wherein ...

Подробнее
15-01-2015 дата публикации

Method for evaluating an output of a random generator

Номер: US20150019606A1
Принадлежит: ROBERT BOSCH GMBH

A method and an assemblage for checking an output of a random generator are presented. In the method, signatures that are respectively created from a sequence of sampled values are compared with one another. 1. A method for checking an output of a random generator that is constructed as a ring oscillator , including an assemblage for testing , the method comprising:providing an output of the random generator that is made up of a sequence of sampled values, in which all of the sampled values between a starting value and a final value in one cycle enter into a signature;comparing signatures of at least two cycles with one another; andensuring that the number of sampled values between a starting value and a final value of the at least two cycles is identical.2. The method of claim 1 , wherein an MISR that creates a signature from a sequence of sampled values is used to create the signature.3. The method of claim 1 , wherein a counter of the transitions of each bit value claim 1 , which creates a signature from a sequence of sampled values claim 1 , is used for signature creation.4. The method of claim 1 , wherein a counter of the numbers of ones of each bit value claim 1 , which creates a signature from a sequence of samples values claim 1 , is used for signature creation.5. The method of claim 1 , wherein the starting value corresponds to the final value.6. The method of claim 1 , wherein a post-processing is carried out after testing.7. The method of claim 1 , wherein a check is made as to whether the starting value is valid.8. The method of wherein an entropy counter is incremented when it is ascertained that the at least two signatures are different.9. The method of claim 1 , wherein a warning counter is incremented when it is ascertained that the at least two signatures are identical.10. The method of claim 1 , wherein a check is made at to whether the first sampled value after the starting value differs from that starting value.11. An assemblage for testing an ...

Подробнее
15-01-2015 дата публикации

MODAL INTERVAL PROCESSOR

Номер: US20150019609A1
Автор: Hayes Nathan T.
Принадлежит: Sunfish Studio, LLC

A logic circuit computes various modal interval arithmetic values using a plurality of arithmetic function units. A multiplexer gates the desired arithmetic values to a storage register. 1. A logic circuit for computing first and second modal interval (MI) result values of at least first and second different MI functions responsive to respectively , first and second values of a selector signal , said computing based on at least one MI operand value encoded in an operand signal , each MI value comprising first and second multi-bit set theoretical numbers (STN) defining first and second end points of a range of real numbers , and further encoding one of the universal and existential quantification values , comprising:a) at least first and second arithmetic functional units (AFUs) each connected to receive the operand signal, and performing an arithmetic operation using as the arguments therefor, each MI operand value encoded in the operand signal, and respectively providing the first and second MI result values in first and second result signals;b) a multiplexer having a selector input receiving the selector signal, having a multibit output port for providing an output signal encoding a MI result value, and having at least first and second multi-bit input ports, said first and second input ports connected to receive respectively the first and second result signals provided as the operand signals by the first and second AFUs, and each input port associated with a single selector signal value, said multiplexer supplying, encoded in an output signal provided by the output port, the MI result value provided at the input port thereof associated with the current selector signal value; andc) a result register for storing each MI result value, and connected to receive the values respectively provided by the multiplexer output port.2. The logic circuit of claim 1 , for computing at least one MI function based on first and second MI operands encoded in first and second operand ...

Подробнее
19-01-2017 дата публикации

RANDOM NUMBER GENERATION CIRCUIT AND SEMICONDUCTOR SYSTEM INCLUDING THE SAME

Номер: US20170018317A1
Принадлежит:

A random number generation circuit may include a memory block. The random number generation circuit may include a fuse block configured to store an address of a failed memory cell from a memory cell array of the memory block, as a repair address, and generate a match signal by comparing the repair address with a normal address inputted from an exterior. The random number generation circuit may include a register configured to output a true random number by latching an address corresponding to activation timing of the match signal among normal addresses. 1. A random number generation circuit comprising:a memory block;a fuse block configured to store an address of a failed memory cell from a memory cell array of the memory block, as a repair address, and generate a match signal by comparing the repair address with a normal address inputted from an exterior; anda register configured to output a true random number by latching an address corresponding to activation timing of the match signal among normal addresses.2. The random number generation circuit according to claim 1 , further comprising:a repair processing block configured to replace the failed memory cell in the memory cell array of the memory block, with a redundant memory cell, according to the match signal.3. The random number generation circuit according to claim 1 , further comprising:a data input/output unit configured to process input/output data of the memory block, and output the true random number to an exterior of a semiconductor memory through an input/output terminal,wherein the random number generation circuit is included in a semiconductor memory.4. The random number generation circuit according to claim 1 , further comprising:a data input/output unit configured to perform output data encryption by encoding output data of the memory block and the true random number.5. The random number generation circuit according to claim 1 , further comprising:a data input/output unit configured to decode data ...

Подробнее
18-01-2018 дата публикации

PATTERN MATCHING BASED CHARACTER STRING RETRIEVAL

Номер: US20180018405A1
Принадлежит:

Embodiments relate to generating a retrieval condition for retrieving a target character string from texts by pattern matching. An aspect includes dividing a first text into words. Another aspect includes generating a converted character string by performing at least one of appending at least one character in at least either one of previous and subsequent positions of the target character string. Another aspect includes replacing at least one character of the target character string. Another aspect includes generating the retrieval condition for retrieval candidates in the words of the first text, the retrieval condition comprising determining that a retrieval candidate matches the target character string and does not match the converted character string based on a ratio of a part of the retrieval candidate which matches the converted character string and corresponds to the target character string is less than or equal to a reference frequency. 1. A computer-implemented method for generating a retrieval condition for retrieving a target character string from texts by pattern matching , the method comprising:dividing a first text into words;generating a converted character string by performing at least one of appending at least one character in at least either one of previous and subsequent positions of the target character string;replacing at least one character of the target character string; andgenerating the retrieval condition for retrieval candidates in the words of the first text, the retrieval condition comprising determining that a retrieval candidate matches the target character string and does not match the converted character string based on a ratio of a part of the retrieval candidate which matches the converted character string and corresponds to the target character string is less than or equal to a reference frequency.2. The method of including a processor claim 1 , and wherein:the retrieval condition improves extraction accuracy of the target ...

Подробнее
18-01-2018 дата публикации

PATTERN MATCHING BASED CHARACTER STRING RETRIEVAL

Номер: US20180018406A1
Принадлежит:

Embodiments relate to generating a retrieval condition for retrieving a target character string from texts by pattern matching. An aspect includes dividing a first text into words. Another aspect includes generating a converted character string by performing at least one of appending at least one character in at least either one of previous and subsequent positions of the target character string. Another aspect includes replacing at least one character of the target character string. Another aspect includes generating the retrieval condition for retrieval candidates in the words of the first text, the retrieval condition comprising determining that a retrieval candidate matches the target character string and does not match the converted character string based on a ratio of a part of the retrieval candidate which matches the converted character string and corresponds to the target character string is less than or equal to a reference frequency. 1. A computer program product comprising a computer readable storage medium having program instructions embodied therewith , the program instructions executable by a processor to cause the processor to perform a method , the method comprising:dividing, by a processor, a first text into words;generating a converted character string by performing at least one of appending at least one character in at least either one of previous and subsequent positions of the target character string;replacing at least one character of the target character string;generating the retrieval condition for retrieval candidates in the words of the first text, wherein the retrieval condition improves extraction accuracy of the target character string by determining that a retrieval candidate is an exclusion candidate based on the retrieval candidate being appended to the converted character string, and the retrieval candidate matching the target character string and not matching the converted character string based on a ratio of a part of the ...

Подробнее
18-01-2018 дата публикации

Finding k extreme values in constant processing time

Номер: US20180018566A1
Принадлежит: GSI Technology Inc

A method includes determining a set of k extreme values of a dataset of elements in a constant time irrespective of the size of the dataset. A method creates a set of k indicators, each indicator associated with one multi-bit binary number in a large dataset of multi-bit binary numbers. The method includes arranging the multi-bit binary numbers such that each bit n of each said multi-bit binary number is located in a different row n of an associative memory array, starting from a row storing a most significant bit (MSB), adding an indicator to the set for each multi-bit binary number having a bit with an extreme value in the row and continuing the adding until said set contains k indicators.

Подробнее
16-01-2020 дата публикации

BITONIC SORTING ACCELERATOR

Номер: US20200019374A1
Принадлежит:

An accelerator for bitonic sorting includes a plurality of compare-exchange circuits and a first-in, first-out (FIFO) buffer associated with each of the compare-exchange circuits. An output of each FIFO buffer is a FIFO value. The compare-exchange circuits are configured to, in a first mode, store a previous value from a previous compare-exchange circuit or a memory to its associated FIFO buffer and pass a FIFO value from its associated FIFO buffer to a subsequent compare-exchange circuit or the memory; in a second mode, compare the previous value to the FIFO value, store the greater value to its associated FIFO buffer, and pass the lesser value to the subsequent compare-exchange circuit or the memory; and in a third mode, compare the previous value to the FIFO value, store the lesser value to its associated FIFO buffer, and pass the greater value to the subsequent compare-exchange circuit or the memory. 1. A hardware accelerator for bitonic sorting , the hardware accelerator comprising:a plurality of compare-exchange circuits; anda first-in, first-out (FIFO) buffer associated with each of the compare-exchange circuits, wherein an output of each FIFO buffer is a FIFO data value; in a first mode of operation, store a previous data value from a previous compare-exchange circuit or a memory to its associated FIFO buffer and pass a FIFO data value from its associated FIFO buffer to a subsequent compare-exchange circuit or the memory;', 'in a second mode of operation, compare the previous data value to the FIFO data value, store the greater of the data values to its associated FIFO buffer, and pass the lesser of the data values to the subsequent compare-exchange circuit or the memory; and', 'in a third mode of operation, compare the previous data value to the FIFO data value, store the lesser of the data values to its associated FIFO buffer, and pass the greater of the data values to the subsequent compare-exchange circuit or the memory., 'wherein the compare-exchange ...

Подробнее
16-01-2020 дата публикации

ARITHMETIC LOGIC UNIT FOR SINGLE-CYCLE FUSION OPERATIONS

Номер: US20200019376A1
Принадлежит:

An arithmetic logic unit is disclosed that includes a first logical circuit that generates a first partial sum result from three operands in a first stage of a single clock cycle of a processor; a second circuit that generates a second partial result in the same first stage of the clock cycle of the processor; and an adder that receives the first partial result from the first logical circuit and the second partial result from the second circuit and generates a secondary result during a second stage of the single clock cycle of the processor. The arithmetic logic unit may optionally further include a backend circuit that performs additional arithmetic and logic functions in the same single clock cycle of the processor. 1. An arithmetic logic unit comprisinga first logical circuit that generates a first partial sum result from three operands in a first stage of a single clock cycle;a second circuit that generates a second partial result in the first stage of the clock cycle; andan adder that receives the first partial result from the first logical circuit and the second partial result from the second circuit and generates a secondary result during a second stage of the single clock cycle.2. The arithmetic logic unit of claim 1 , further comprising an inverse control input to the adder and a constant in control input to the adder.3. The arithmetic logic unit of claim 1 , wherein the first logic circuit comprises a first plurality of logic gates to receive the first and second operands claim 1 , and a second plurality of logic gates to receive the third operand; the output of the first plurality of logic gates and the output of the second plurality of logic gates are two inputs to a third part of the first logic circuit claim 1 , wherein the output of the third part of the first logic circuit is the first partial result received by the adder.4. The arithmetic logic unit of claim 3 , wherein the first plurality of logic gates of the first logic circuit includes an ...

Подробнее
16-01-2020 дата публикации

APPARATUS AND METHOD

Номер: US20200019378A1
Принадлежит: Toshiba Memory Corporation

According to one embodiment, an apparatus is capable of exchanging a frame with an external apparatus in a packet mode of a serial attached small computer system interface (SAS). The external apparatus includes a scrambler. The apparatus includes a descrambler and a controller. The descrambler is configured to descramble frame data scrambled by the scrambler. The controller is configured to, in a case where first frame data is received from the external apparatus, synchronize the descrambler with the scrambler using the first frame data and a first value that is to be scrambled by the scrambler to obtain the first frame data. 1. An apparatus capable of exchanging a frame with an external apparatus in a packet mode of a serial attached small computer system interface (SAS) , the external apparatus including a scrambler , the apparatus comprising:a descrambler configured to descramble frame data scrambled by the scrambler; anda controller configured to,in a case where first frame data is received from the external apparatus,synchronize the descrambler with the scrambler using the first frame data and a first value that is to be scrambled by the scrambler to obtain the first frame data.2. The apparatus of claim 1 , whereinthe controller is further configured to calculate a second state of the descrambler using the first frame data and the first value, the second state corresponding to either a primitive received after the first frame data or second frame data received after the first frame data.3. The apparatus of claim 2 , whereinthe controller is further configured to set the descrambler in the second state.4. The apparatus of claim 2 , whereineach of the scrambler and the descrambler includes a linear feedback shift register, andthe controller is further configured to:calculate first scrambling data used for generation of the first frame data by performing an exclusive logical OR (XOR) operation on the first frame data with the first value;calculate a first state of ...

Подробнее
21-01-2021 дата публикации

FLOATING POINT UNIT FOR EXPONENTIAL FUNCTION IMPLEMENTATION

Номер: US20210019116A1
Принадлежит:

A computer-implemented method for performing an exponential calculation using only two fully-pipelined instructions in a floating point unit that includes. The method includes computing an intermediate value y′ by multiplying an input operand with a predetermined constant value. The input operand is received in floating point representation. The method further includes computing an exponential result for the input operand by executing a fused instruction. The fused instructions includes converting the intermediate value y′ to an integer representation z represented by v most significant bits (MSB), and w least significant bits (LSB). The fused instruction further includes determining exponent bits of the exponential result based on the v MSB from the integer representation z. The method further includes determining mantissa bits of the exponential result according to a piece-wise linear mapping function using a predetermined number of segments based on the w LSB from the integer representation z. 1. A computer-implemented method for performing , using a floating point unit that comprises an estimation block , exponential calculation using only two fully-pipelined instructions , the computer-implemented method comprising:computing an intermediate value y′ by multiplying an input operand that is received in floating point representation with a predetermined constant value; and converting the intermediate value y′ to an integer representation z represented by v most significant bits (MSB), and w least significant bits (LSB);', 'determining exponent bits of the exponential result based on the v MSB from the integer representation z; and', 'determining mantissa bits of the exponential result according to a piece-wise linear mapping function using a predetermined number of segments based on the w LSB from the integer representation z., 'computing an exponential result for the input operand by executing a fused instruction that comprises2. The computer-implemented method ...

Подробнее
21-01-2021 дата публикации

Optimized compute hardware for machine learning operations

Номер: US20210019631A1
Принадлежит: Intel Corp

A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

Подробнее
26-01-2017 дата публикации

Decomposition of Decimal Floating Point Data

Номер: US20170024207A1
Принадлежит:

A decimal floating point finite number in a decimal floating point format is composed from the number in a different format. A decimal floating point format includes fields to hold information relating to the sign, exponent and significand of the decimal floating point finite number. Other decimal floating point data, including infinities and NaNs (not a number), are also composed. Decimal floating point data are also decomposed from the decimal floating point format to a different format. 1. A method of decomposing a decimal floating point datum , said method comprising:obtaining, by a processor of an emulated processing environment, a datum in a decimal floating point format;converting a significand of the datum in the decimal floating point format to the significand in a first format; andconverting an exponent of the datum in the decimal floating point format to the exponent in a second format, wherein in response to converting the significand and converting the exponent the datum is provided in a format other than the decimal floating point format.2. The method of claim 1 , wherein the first format comprises a packed decimal format and the second format comprises a binary integer format.3. The method of claim 1 , further comprising providing a sign for the datum.4. The method of claim 1 , wherein the converting the significand comprises:converting a number of least significant digits of the significand into a packed decimal format and storing the result in a first register pair;shifting the significand of a first floating point register pair a specified number of places;converting remaining digits of the significand into a packed decimal format and storing the result in a second register pair;combining a plurality of digits of the second register pair and contents of the first register pair to provide a decomposed value.5. The method of claim 4 , wherein the combining comprises concatenating the plurality of digits and the contents of the first register pair in ...

Подробнее
28-01-2016 дата публикации

SIMPLIFIED INVERSIONLESS BERLEKAMP-MASSEY ALGORITHM FOR BINARY BCH CODE AND CIRCUIT IMPLEMENTING THEREFOR

Номер: US20160026435A1
Автор: HUNG Jui Hui, YEN Chih Nan
Принадлежит: Storart Technology Co.,Ltd.

A simplified inversionless Berlekamp-Massey algorithm for binary BCH codes and circuit implementing the method are disclosed. The circuit includes a first register group, a second register group, a control element, an input element and a processing element. By breaking the completeness of math structure of the existing simplified inversionless Berlekamp-Massey algorithm, the amount of registers used can be reduced by two compared with conventional algorithm. Hardware complexity and operation time can be reduced. 1. A circuit for implementing a simplified inversionless Berlekamp-Massey algorithm for binary BCH codes , comprising:a first register group, having 2t registers connected in series, each register receiving an calculation value of iterative operation from upstream end during each clock and outputting the calculation value of iterative operation to downstream end in the next clock;a second register group, having 2t−1 registers connected in series, each register receiving a copied value from upstream end during each clock and outputting the copied value in the next clock or in a clock after the clock;a control element, electrically connected to the penultimate register from the most downstream end in the first register group, for receiving outputted calculation values of iterative operation from the register and outputting the first calculation value in each iterative operation, a discrepancy value and a control signal;an input element, electrically connected to the antepenultimate register from the most downstream end in the first register group, for receiving outputted calculation values of iterative operation from the register, electrically connected to the register in the most downstream end in the second register group, for receiving outputted copied values from the register, and selectively outputting Galois field value of 0 or 1, or the outputted calculation value of iterative operation to the first register group, and Galois field value of 0 or 1, or ...

Подробнее
28-01-2016 дата публикации

APPARATUS AND METHOD FOR PERFORMING FLOATING-POINT SQUARE ROOT OPERATION

Номер: US20160026437A1
Принадлежит:

A data processing apparatus has a processing circuitry for performing a floating-point square root operation on a radicand value R to generate a result value. The processing circuitry has first square root processing circuitry for processing radicand values R which are not an exact power of two and second square root processing circuitry for processing radicand values which are an exact power of 2. Power-of-two detection circuitry detects whether the radicand value is an exact power of two and selects the output of the first or second square root processing circuitry as appropriate. This allows the result to be generated in fewer processing cycles when the radicand is a power of 2. 1. A data processing apparatus comprising:processing circuitry configured to perform a floating-point square root operation for determining a square root of a radicand value R having a radicand exponent and a radicand mantissa to generate a result value having a result exponent and a result mantissa;wherein the processing circuitry comprises:first square root processing circuitry configured to perform the floating-point square root operation for radicand values which are not an exact power of two;second square root processing circuitry configured to perform the floating-point square root operation for radicand values which are an exact power of two, wherein the second square root processing circuitry is configured to generate the result value in fewer processing cycles than the first square root processing circuitry; andpower-of-two detection circuitry configured to detect whether the radicand value is an exact power of two, to control the processing circuitry to output the result value generated by the first square root processing circuitry if the radicand value is not an exact power of two, and to control the processing circuitry to output the result value generated by the second square root processing circuitry if the radicand value is an exact power of two.2. The data processing ...

Подробнее
22-01-2015 дата публикации

FINDING A BEST MATCHING STRING AMONG A SET OF STRINGS

Номер: US20150026194A1
Принадлежит:

A method for finding a best matching string among a set of strings for a reference string includes representing, for each of the set of strings paired with the reference string, a dynamic programming problem for calculating a final alignment score as a matrix of cells, and calculating a current optimal alignment boundary threshold. The method also includes executing, for each string of the set of strings, a calculation of a prospective final alignment score of a candidate alignment of the each of the set of strings and the reference string for each cell. Based on determining, that prospective final alignment score improves the current optimal alignment boundary threshold, the method includes calculating a final alignment score for the string of the set of strings associated with the cell. Otherwise, the method includes aborting the calculation of a candidate alignment covering the string associated with the cell. 115.-. (canceled)16. A computer-implemented method for finding a best matching string among a set of strings for a reference string , the method comprising:representing, for each of the set of strings paired with the reference string, a dynamic programming problem for calculating a final alignment score as a matrix of cells, each cell representing an intermediate result to be calculated;calculating a current optimal alignment boundary threshold, calculating a prospective final alignment score of a candidate alignment of the each of the set of strings and the reference string for each cell;', 'based on determining, that prospective final alignment score improves the current optimal alignment boundary threshold, calculating a final alignment score for the string of the set of strings associated with the cell; and', 'based on determining, that prospective final alignment score does not improve the current optimal alignment boundary threshold, aborting the calculation of a candidate alignment covering the string of the set of strings associated with the cell., ...

Подробнее
25-01-2018 дата публикации

SIMPLIFYING CLAUSES FOR MAX-SAT

Номер: US20180024967A1
Автор: Yanagisawa Hiroki
Принадлежит:

A method includes obtaining a plurality of clauses associated with a plurality of logical variables, each of the clauses consisting of a weight and a disjunction of one or more literals of the logical variables, detecting conditions associated with one or more inference rules, and simplifying the plurality of clauses on the basis of the detecting. 1. A computer-implemented method comprising:obtaining a plurality of clauses associated with a plurality of logical variables, each of the clauses in the plurality of clauses including a weight and a disjunction of one or more literals of the logical variables;{'img': [{'@id': 'CUSTOM-CHARACTER-00115', '@he': '3.22mm', '@wi': '2.12mm', '@file': 'US20180024967A1-20180125-P00001.TIF', '@alt': 'custom-character', '@img-content': 'character', '@img-format': 'tif'}, {'@id': 'CUSTOM-CHARACTER-00116', '@he': '3.22mm', '@wi': '2.12mm', '@file': 'US20180024967A1-20180125-P00001.TIF', '@alt': 'custom-character', '@img-content': 'character', '@img-format': 'tif'}, {'@id': 'CUSTOM-CHARACTER-00117', '@he': '3.22mm', '@wi': '2.12mm', '@file': 'US20180024967A1-20180125-P00001.TIF', '@alt': 'custom-character', '@img-content': 'character', '@img-format': 'tif'}, {'@id': 'CUSTOM-CHARACTER-00118', '@he': '3.22mm', '@wi': '2.12mm', '@file': 'US20180024967A1-20180125-P00001.TIF', '@alt': 'custom-character', '@img-content': 'character', '@img-format': 'tif'}], 'sub': 11', '12', '21', '22', '0', '11', '12', '21', '22', '11', '12', '0', '21', '22, 'o': [{'@ostyle': 'single', 'b'}, {'@ostyle': 'single', 'c'}], 'detecting (i) whether any clauses in the plurality of clauses other than a first clause (ab, w), a second clause (ā, w), a third clause (ac, w), a fourth clause (ā, w), and a fifth clause (a, w), where a is a first logical variable, b is a second logical variable, c is a third logical variable, and w, w, w, and ware weights, include a literal of the first logical variable a and a non-zero weight and, (ii) whether min(w,w)≧w+max(w, w); and ...

Подробнее
10-02-2022 дата публикации

Processing-in-memory (pim) devices

Номер: US20220043632A1
Автор: Choung Ki Song
Принадлежит: SK hynix Inc

A processing-in-memory (PIM) device includes first to Lth multiplication/accumulation (MAC) operators, first to Lth memory banks, and a plurality of data input/output (I/O) circuits. The first to Lth MAC operators include first to Lth left MAC operators and first to Lth right MAC operators. The plurality of data I/O circuits include left data I/O circuits and right data I/O circuits. A Uth MAC operator among the first to Lth MAC operators is configured to output one of the first to Mth MAC result data through a Uth left MAC operator among the first to Lth left MAC operators or a Uth right MAC operator among the first to Lth right MAC operators. The PIM device is configured to output the MAC result data outputted through the left MAC operators through the left data I/O circuits, and output the MAC result data outputted through the right MAC operators through the right data I/O circuits.

Подробнее
23-01-2020 дата публикации

SEARCH DEVICE, SEARCH METHOD, PROGRAM, AND INFORMATION RECORDING MEDIUM

Номер: US20200026491A1
Автор: IWASE Hiroaki, TORII Junji
Принадлежит: RAKUTEN, INC.

A device searches a file being recorded that includes lines sorted in accordance with keys included in the lines to find a line that matches a pattern. When the device receives a pattern, it initializes upper and lower limits of a search range and calculates a middle position between the limits. It acquires, from the file, a middle line that starts at or before the middle position and ends after it. If the key included in the middle line matches the pattern, it outputs the middle line and re-sets the upper or lower limit based on whether the key included in the middle line is greater or less than the pattern and, if there is a distance greater than a length of a newline between the limits, repeats the procedure starting from the middle position. Otherwise, it outputs a result to the effect that no matching line has been found. 1. A search device comprising:a recorder recording a text file that includes an array of a plurality of lines sorted in an order of keys, each of the keys being included in each of the plurality of lines, each of the lines including a newline placed at an end of the line;a receiver configured to receive a pattern for performing a search;at least one memory configured to store computer program code;at least one processor configured to read said computer program code and operate as instructed by said computer program code, said computer program code including:initializer code configured to cause at least one of said at least one processor to set a head position and an end position of the text file to a lower limit and an upper limit of a search range;calculator code configured to cause at least one of said at least one processor to make calculation of a middle position between the lower limit and the upper limit; (x) starts at or before the calculated middle position; and', '(y) ends after the calculated middle position;, 'acquirer code configured to cause at least one of said at least one processor to make acquisition of, from the text file ...

Подробнее
23-01-2020 дата публикации

Memristor Spiking Architecture

Номер: US20200026995A1

A circuit for a neuron of a multi-stage compute process is disclosed. The circuit comprises a weighted charge packet (WCP) generator. The circuit may also include a voltage divider controlled by a programmable resistance component (e.g., a memristor). The WCP generator may also include a current mirror controlled via the voltage divider and arrival of an input spike signal to the neuron. WCPs may be created to represent the multiply function of a multiply accumulate processor. The WCPs may be supplied to a capacitor to accumulate and represent the accumulate function. The value of the WCP may be controlled by the length of the spike in signal times the current supplied through the current mirror. Spikes may be asynchronous. Memristive components may be electrically isolated from input spike signals so their programmed conductance is not affected. Positive and negative spikes and WCPs for accumulation may be supported.

Подробнее
28-01-2021 дата публикации

MULTIPLICATION-FREE APPROXIMATION FOR NEURAL NETWORKS AND SPARSE CODING

Номер: US20210027029A1
Принадлежит: Intel Corporation

Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations. 125-. (canceled)26. A system comprising:a scanner;a controller to cause the scanner to scan an external image;an extractor to extract sparse codes from the scanned external image;a comparator to determine a similarity between two unit vectors from the scanned external image based on one or more matrix-vector multiplication operations executed on the two unit vectors; andan operation substitutor to replace the one or more matrix-vector multiplication operations executed on the two unit vectors with an approximation computation.27. The system of claim 26 , wherein the one or more matrix-vector multiplication operations executed on the two unit vectors are replaced with the approximation computation in the sparse code applications and the neural network applications.28. The system of claim 27 , wherein the approximation computation uses a set of basis vectors as an input claim 27 , and outputs a best-matching neuron of the neural network applications claim 27 , or a dictionary atom for the sparse code application that best corresponds to the input basis vectors.29. The system of claim 28 , wherein a matching pursuit (MP) orthogonal matching pursuit (OMP) computation is executed to compute the sparse codes.30. The system of claim 27 , wherein the matrix-vector multiplication operation is to be replaced with a convolutional filter computation that is a function of a constant and a sub-region of a vector.31. The system of claim 26 , further comprising logic to replace the one or more matrix-vector multiplication operation ...

Подробнее
28-01-2021 дата публикации

Method of Operation for a Configurable Number Theoretic Transform (NTT) Butterfly Circuit For Homomorphic Encryption

Номер: US20210028921A1
Принадлежит:

Fully homomorphic encryption integrated circuit (IC) chips, systems and associated methods are disclosed. In one embodiment, a method of operation for a number theoretic transform (NTT) butterfly circuit is disclosed. The (NTT) butterfly circuit includes a high input word path cross-coupled with a low word path. The high input word path includes a first adder/subtractor, and a first multiplier. The low input word path includes a second adder/subtractor, and a second multiplier. The method includes selectively bypassing the second adder/subtractor and the second multiplier, and reconfiguring the low and high input word paths into different logic processing units in response to different mode control signals. 1. (canceled)2. A homomorphic processor integrated circuit (IC) chip for transforming ciphertext (C) symbols into a number theoretic transform (NTT) domain , the homomorphic processor IC chip comprising: local control circuitry;', 'an NTT butterfly unit; and', multiple input/output (I/O) storage units,', 'a bit decomposed polynomial storage unit, and', 'a twiddle factor memory unit., 'on-chip memory coupled to the control circuitry and the NTT butterfly unit, the on-chip memory partitioned into separately accessible storage units for homomorphic processing functions, the on-chip memory including'}], 'at least one processor slice, the slice including'}3. The homomorphic processor IC chip according to claim 2 , wherein:a first one of the multiple I/O storage units stores ciphertexts (Ctxts) in a row-by-row format; anda second one of the multiple I/O storage units stores Ctxts in a column-by-column format.4. The homomorphic processor IC chip according to claim 3 , wherein:a third one of the multiple I/O storage units stores output Ctxts resulting from a multiplication operation involving a first Ctxt from the first one of the multiple I/O storage units multiplied with a second Ctxt from the second one of the multiple I/O storage units.5. The homomorphic processor IC ...

Подробнее
02-02-2017 дата публикации

QUADRATIC PROGRAM SOLVER FOR MPC USING VARIABLE ORDERING

Номер: US20170031332A1
Автор: Santin Ondrej
Принадлежит:

A system and approach for storing factors in a quadratic programming solver of an embedded model predictive control platform. The solver may be connected to an optimization model which may be connected to a factorization module. The factorization module may incorporate a memory containing saved factors that may be connected to a factor search mechanism to find a nearest stored factor in the memory. A factor update unit may be connected to the factor search mechanism to obtain the nearest stored factor to perform a factor update. The factorization module may provide variable ordering to reduce a number of factors that need to be stored to permit the factors to be updated at zero floating point operations per unit of time. 1. A system for quadratic programming comprising:an embedded platform comprising a model predictive control (MPC) controller connected to a physical subsystem; and a state observer; and', 'a semi-explicit quadratic programming (CP) solver connected to the state observer; and', an optimization module; and', 'a factorization module connected to the QP optimization module; and', a memory having a saved factors unit;', 'a factor search mechanism connected to the saved factors unit; and', 'a factor update mechanism connected to the factor search mechanism., 'wherein the factorization module comprises], 'wherein the semi-explicit QP solver comprises], 'wherein the MPC controller comprises2. The system of claim 1 , wherein the factorization module provides variable ordering to reduce factors which need to be stored to allow them to be updated at a zero floating-point operations per unit time (FLOPs) cost.3. The system of claim 1 , wherein the physical subsystem comprises:a physical plant;one or more sensors situated at the physical plant and connected to the MPC controller andone or more actuators attached to the physical plant and connected to the MPC controller.4. The system of claim 3 , wherein the physical plant is an internal combustion engine or ...

Подробнее
09-02-2017 дата публикации

DOUBLE ROUNDED COMBINED FLOATING-POINT MULTIPLY AND ADD

Номер: US20170039033A1
Принадлежит:

Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection. 1. An apparatus comprising:a floating-point (FP) multiplier circuit to multiply a first operand multiplicand mantissa by a second operand multiplier mantissa to generate a product;a FP alignment circuit to align a third operand mantissa according to the product of the FP multiplier circuit;an overflow detection circuit to detect an overflow condition in the product of the FP multiplier circuit;a first FP adder circuit to add together the aligned third operand mantissa and the product of the FP multiplier circuit using a first rounding input to generate a first sum or difference based on an assumption that the overflow condition in the product of the FP multiplier circuit was not detected;a second FP adder circuit to add together the aligned third operand mantissa and the product of the FP multiplier circuit using a second rounding input to generate a second sum or difference based on an assumption that the overflow condition in the ...

Подробнее
08-02-2018 дата публикации

Arithmetic processing device and control method for arithmetic processing device

Номер: US20180039480A1
Принадлежит: Fujitsu Ltd

A plurality of floating-point registers store data therein. A processing execution unit executes arithmetic processing by using data stored in the floating-point registers. A first switch and a second switch select a route connecting the processing execution unit and the floating-point registers. A switch control unit controls the first switch and the second switch so as to switch a route to be selected, based on a switching instruction from the processing execution unit.

Подробнее
11-02-2016 дата публикации

Elliptic curve encryption method comprising an error detection

Номер: US20160043863A1
Автор: Vincent Dupaquis
Принадлежит: Inside Secure SA

A method in an elliptic curve cryptographic system, the method being executed by an electronic device and including a multiplication operation of multiplying a point of an elliptic curve by a scalar number, the point having affine coordinates belonging to a Galois field, the multiplication operation including steps of detecting the appearance of a point at infinity during intermediate calculations of the multiplication operation, and of activating an error signal if the point at infinity is detected and if the number of bits of the scalar number processed by the multiplication operation is lower than the rank of the most significant bit of an order of a base point of the cryptographic system.

Подробнее
07-02-2019 дата публикации

Prefix Network-Directed Addition

Номер: US20190042194A1
Принадлежит:

The present disclosure relates generally to techniques for enhancing adders implemented on an integrated circuit. In particular, arithmetic performed by an adder implemented to receive operands having a first precision may be restructured so that a set of sub-adders may perform the arithmetic on a respective segment of the operands. More specifically, the adder may be restructured so that a decoder may determine a generate signal and a propagate signal for each of the sub-adders and may route the generate signal and the propagate signal to a prefix network. The prefix network may determine respective carry bit(s), which may carry into and/or select a sum at a subsequent sub-adder. As a result, the integrated circuit may benefit from increased efficiencies, reduced latency, and reduced resource consumption (e.g., area and/or power) involved with implementing addition, which may improve operations such as encryption or machine learning on the integrated circuit. 1. Adder circuitry on an integrated circuit device , the adder circuitry comprising:first input circuitry configured to receive a first input having a first set of bits;second input circuitry configured to receive a second input having a second set of bits;a first decoder coupled to the first input circuitry and to the second input circuitry, wherein the first decoder is configured to receive a first subset of the first set of bits and a first subset of the second set of bits and to determine a generate signal and a propagate signal based at least in part on the first subset of the first set of bits and the first subset of the second set of bits;a prefix network coupled to the first decoder, wherein the prefix network is configured to determine a carry out signal based at least in part on the generate signal and the propagate signal, wherein the prefix network comprises first combinatorial circuitry; andsecond combinatorial circuitry coupled to the prefix network, wherein the second combinatorial circuitry is ...

Подробнее
07-02-2019 дата публикации

Signed division in memory

Номер: US20190042196A1
Автор: Sanjay Tiwari
Принадлежит: Micron Technology Inc

Examples of the present disclosure provide apparatuses and methods for performing signed division operations. An apparatus can include a first group of memory cells coupled to a sense line and to a number of first access lines. The apparatus can include a second group of memory cells coupled to the sense line and to a number of second access lines. The apparatus can include a controller configured to operate sensing circuitry to divide a signed dividend element stored in the first group of memory cells by a signed divisor element stored in the second group of memory cells by performing a number of operations.

Подробнее
07-02-2019 дата публикации

Methods for using a multiplier to support multiple sub-multiplication operations

Номер: US20190042198A1
Принадлежит: Intel Corp

Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit (e.g., an 18×18 or 18×19 multiplier circuit) may be used to support two or more smaller multiplication operations sharing one or two sets of multiplier operands, a complex multiplication, and a sum of two multiplications. If the multiplier products overflow and interfere with one another, correction operations can be performed. Partial products from two or more larger multiplier circuits can be used to combine decomposed partial products. A large multiplier circuit can also be used to support two floating-point mantissa multipliers.

Подробнее
07-02-2019 дата публикации

Compute in memory circuits with multi-vdd arrays and/or analog multipliers

Номер: US20190042199A1
Принадлежит: Intel Corp

Compute-in memory circuits and techniques are described. In one example, a memory device includes an array of memory cells, the array including multiple sub-arrays. Each of the sub-arrays receives a different voltage. The memory device also includes capacitors coupled with conductive access lines of each of the multiple sub-arrays and circuitry coupled with the capacitors, to share charge between the capacitors in response to a signal. In one example, computing device, such as a machine learning accelerator, includes a first memory array and a second memory array. The computing device also includes an analog processor circuit coupled with the first and second memory arrays to receive first analog input voltages from the first memory array and second analog input voltages from the second memory array and perform one or more operations on the first and second analog input voltages, and output an analog output voltage.

Подробнее
07-02-2019 дата публикации

METHOD AND APPARATUS FOR GENERATING FIXED-POINT QUANTIZED NEURAL NETWORK

Номер: US20190042948A1
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths. 1. A method of generating a fixed-point quantized neural network , the method comprising:analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network;determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel;determining fractional lengths of a bias and a weight for each channel among the parameters for each channel based on a result of performing a convolution operation; andgenerating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.2. The method of claim 1 , wherein the analyzing of the statistical distribution comprises obtaining statistics for each channel of floating-point parameter values of weights claim 1 , input activations claim 1 , and output activations used in each channel during pre-training of the pre-trained floating-point neural network.3. The method of claim 1 , wherein the ...

Подробнее
18-02-2021 дата публикации

Apparatus For Hardware Accelerated Machine Learning

Номер: US20210049508A1
Автор: Bruestle Jeremy, Ng Choong
Принадлежит:

An architecture and associated techniques of an apparatus for hardware accelerated machine learning are disclosed. The architecture features multiple memory banks storing tensor data. The tensor data may be concurrently fetched by a number of execution units working in parallel. Each operational unit supports an instruction set specific to certain primitive operations for machine learning. An instruction decoder is employed to decode a machine learning instruction and reveal one or more of the primitive operations to be performed by the execution units, as well as the memory addresses of the operands of the primitive operations as stored in the memory banks. The primitive operations, upon performed or executed by the execution units, may generate some output that can be saved into the memory banks. The fetching of the operands and the saving of the output may involve permutation and duplication of the data elements involved. 1. A processing unit comprising: a clock input to receive a clock;', 'an operation specification input to receive an opcode that specifies an instruction;', 'a first operand input to receive a first operand;', 'a second operand input to receive a second operand; and', 'an accumulation register to store a current accumulation value, wherein the accumulation register is to be updated, within a clock cycle, with a new accumulation value based on the opcode and one or more of the first operand, the second operand, and the current accumulation value, wherein an output of the accumulation register comprises saturated lower bits of the new accumulation value., 'at least one execution unit comprising2. The processing unit of claim 1 , wherein the at least one execution unit further comprises:a random number register to store a current random number, wherein the instruction comprises a seed random number generator operation, and wherein the random number register is to be updated, within the clock cycle, with a new random number based on the first ...

Подробнее
16-02-2017 дата публикации

PROCESSING FIXED AND VARIABLE LENGTH NUMBERS

Номер: US20170046128A1
Принадлежит:

Embodiments of a processor are disclosed for performing arithmetic operations on variable-length and fixed-length machine independent numbers. The processor may include a floating point unit, and a logic circuit. The number unit may be configured to receive an operation, and first and second operands. Each of the first and second operands may include a sign byte, and multiple mantissa bytes, and may be processed in response to a determination that the operands are fixed-length numbers. The logic circuit may be further configured to perform the received operation on the processed first and second operands. 1. A processor , comprising:a floating point unit; receive an operation, a first operand and a second operand, wherein each of the first operand and the second operand include a sign and exponent block, a length block, and one or more mantissa digits;', process the first operand to generate a first processed operand;', 'process the second operand to generate a second processed operand; and', 'perform the operation using the first processed operand and the second processed operand to generate a result, wherein the result includes a sign and exponent block, and one or more mantissa digits., 'in response to a determination that the first operand and the second operand are fixed-length numbers], 'a logic circuit coupled to the floating point unit, wherein the logic circuit is configured to2. The processor of claim 1 , wherein to perform the operation using the first processed operand and the second processed operand claim 1 , the logic circuit is further configured to clear unused mantissa digits included in the result.3. The processor of claim 2 , wherein to clear the unused mantissa digits included in the result claim 2 , the logic circuit is further configured to convert a format of each unused mantissa digit from a first format to a second format.4. The processor of claim 1 , wherein to process the first operand claim 1 , the logic circuit is further configured to ...

Подробнее
16-02-2017 дата публикации

MULTIVARIATE DATA ANALYSIS METHOD

Номер: US20170046392A1
Автор: Lilienthal Scott E.
Принадлежит:

This invention is a computerized method which unites multivariate dataset and then performs various operations including data analytics. The set is stored in a “bipartite synthesis matrix” (BSM), e.g., a rectangular matrix with rows of data objects and columns of variable attributes defined by a plurality of partitions. Data objects are linked to one or more attributes within the matrix based on shared correspondences that occur within attribute partitions (each with a numerical range and a characteristic scale). Links within the matrix between data objects and attribute(s) are based on shared correspondences within partitions. The process exploits mode reduction in which shared correspondences of a BSM (or its graph) interrelate data objects by producing an adjacency matrix or its associated graph. The partition scale is repeatedly and incrementally altered, varying the density of shared correspondences within the data, based on partition number and size; therefore, a fully connected and weighted unipartite network may be established. Shared correspondences' given scale and variable attribute provide distance metrics for edges within the network. 1. A method for generating random numbers using a programmable controller including software comprising computer instructions stored on non-transitory computer media for performing the steps of:generating binary data using a pseudorandom number generator;creating a bipartite data synthesis matrix comprising a table with at least one row corresponding to said at least one variable, and columns defined by a plurality of partitions fitting within an interval according to an adjustable scale;generating a random number;determining a scale for said partitions based on said random number; andpopulating said bipartite data synthesis matrix with said binary data.2. A method of analyzing data by use of a programmable controller including software comprising computer instructions stored on non-transitory computer media for performing ...

Подробнее
15-02-2018 дата публикации

Method for optimizing an artificial neural network (ann)

Номер: US20180046894A1
Автор: Song Yao

The present invention relates to artificial neural network, for example, convolutional neural network. In particular, the present invention relates to how to implement and optimize a convolutional neural network based on an embedded FPGA. Specifically, it proposes an overall design process of compressing, fix-point quantization and compiling the neural network model.

Подробнее
15-02-2018 дата публикации

Sparse convolutional neural network accelerator

Номер: US20180046906A1
Принадлежит: Nvidia Corp

A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Подробнее
19-02-2015 дата публикации

Differential power analysis - resistant cryptographic processing

Номер: US20150052368A1
Принадлежит: Cryptography Research Inc

Information leaked from smart cards and other tamper resistant cryptographic devices can be statistically analyzed to determine keys or other secret data. A data collection and analysis system is configured with an analog-to-digital converter connected to measure the device's consumption of electrical power, or some other property of the target device, that varies during the device's processing. As the target device performs cryptographic operations, data from the A/D converter are recorded for each cryptographic operation. The stored data are then processed using statistical analysis, yielding the entire key, or partial information about the key that can be used to accelerate a brute force search or other attack.

Подробнее
03-03-2022 дата публикации

HARDWARE ACCELERATED MACHINE LEARNING

Номер: US20220067522A1
Автор: Bruestle Jeremy, Ng Choong
Принадлежит:

A machine learning hardware accelerator architecture and associated techniques are disclosed. The architecture features multiple memory banks of very wide SRAM that may be concurrently accessed by a large number of parallel operational units. Each operational unit supports an instruction set specific to machine learning, including optimizations for performing tensor operations and convolutions. Optimized addressing, an optimized shift reader and variations on a multicast network that permutes and copies data and associates with an operational unit that support those operations are also disclosed. 120-. (canceled)21. An apparatus comprising:an instruction decoder to decode a matrix multiplication instruction;a local memory comprising a plurality of static random-access memory (SRAM) banks to store at least a portion of a first input tensor and a second input tensor, each of the first and second input tensors comprising a multidimensional array of real numbers;circuitry to permute a first plurality of the real numbers associated with the first input tensor in accordance with a permutation pattern included with the matrix multiplication instruction to generate a permuted first plurality of real numbers; a multiplier to multiply a first real number of the permuted first plurality of real numbers and a second real number of a second plurality of real numbers associated with the second input tensor to generate a product, and', 'an accumulator to add the product to an accumulation value to generate a result value, the first real number and the second real number each having a first bit width and the accumulation value having a second bit width at least twice the first bit width., 'a plurality of operational units to perform a plurality of parallel multiply-accumulate (MAC) operations in accordance with the matrix multiplication instruction, each operational unit comprising, 'a hardware accelerator to multiply the first input tensor and the second input tensor to produce an ...

Подробнее
26-02-2015 дата публикации

METHOD AND APPARATUS FOR A NON-DETERMINISTIC RANDOM BIT GENERATOR (NRBG)

Номер: US20150055778A1
Принадлежит:

A hardware-based digital random number generator is provided. In one embodiment, a processor includes a digital random number generator (DRNG) to condition entropy data provided by an entropy source, to generate a plurality of deterministic random bit (DRB) strings, and to generate a plurality of nondeterministic random bit (NRB) strings, and an execution unit coupled to the DRNG, in response to a first instruction to read a seed value, to retrieve one of the NRB strings from the DRNG and to store the NRB string in a destination register specified by the first instruction. 1. A processor , comprising:a digital random number generator (DRNG) to condition entropy data provided by an entropy source, to generate a plurality of deterministic random bit (DRB) strings, and to generate a plurality of nondeterministic random bit (NRB) strings; andan execution unit coupled to the DRNG, in response to a first instruction to read a seed value, to retrieve one of the NRB strings from the DRNG and to store the NRB string in a destination register specified by the first instruction.2. The processor of claim 1 , further comprising a flag register to store a flag set by the execution unit to indicate whether the NRB string stored in the destination register is valid.3. The processor of claim 1 , wherein the execution unit is configured claim 1 , in response to a second instruction to read a random number claim 1 , to retrieve one of the DRB strings from the DRNG and to store the DRB in a destination register specified by the second instruction.4. The processor of claim 3 , wherein the DRNG comprises:a conditioner to condition the entropy data provided by the entropy source to generate conditioned entropy (CE) data;a DRB generator (DRBG) coupled to the conditioner to generate the DRB strings based on the CE data; andan NRB generator (NRBG) coupled to the conditioner and the DRBG to generate the NRB strings based on the DRB strings and the CE data.5. The processor of claim 4 , wherein ...

Подробнее
14-02-2019 дата публикации

CONVERTING A BOOLEAN MASKED VALUE TO AN ARITHMETICALLY MASKED VALUE FOR CRYPTOGRAPHIC OPERATIONS

Номер: US20190050204A1
Принадлежит:

A first input share value, a second input share value, and a third input share value may be received. The first input share value may be converted to a summation or subtraction between an input value and a combination of the second input share value and the third input share value. A random number value may be generated and combined with the second input share value and the third input share value to generate a combined value. Furthermore, a first output share value may be generated based on a combination of the converted first input share value, the combined value, and additional random number values. 1. A system comprising:a first set of registers to store a first input share value, a second input share value, and a third input share value, the first input share value representing a Boolean combination between an input value, the second input share value, and the third input share value;a second set of registers to store a first output share value, a second output share value, and a third output share value, the first output share value representing an arithmetic combination between the input value, the second output share value, and the third output share value;a cryptographic component to perform a cryptographic operation based on a Boolean operation and an arithmetic operation; and receive an indication that the cryptographic operation being performed by the cryptographic component has switched from using the Boolean operation to using the arithmetic operation;', 'in response to the indication, receive the first, second, and third input share values from the first set of registers;', 'convert the first input share value to a summation or a subtraction between the input value and a combination of the second input share value and the third input share value;', 'generate a random number value;', 'combine the random number value with the second input share value and the third input share value to generate a combined value;', 'generate additional random number ...

Подробнее
22-02-2018 дата публикации

VARIABLE PRECISION FLOATING-POINT MULTIPLIER

Номер: US20180052661A1
Автор: Langhammer Martin
Принадлежит:

Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floating-point multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carry-propagate adder (CPA), a second carry-propagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a non-bridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width. 1. Multiplier circuitry on an integrated circuit die , comprising:a first adder circuit;a second adder circuit; anda rounding circuit that is coupled to the first and second adder circuit, wherein the first and second adder circuits generate a first floating-point multiplier output having a first precision in a first mode and further generate a second floating-point multiplier output having a second precision that is different than the first precision in a second mode.2. The multiplier circuitry of claim 1 , wherein the first and second adder circuits comprise carry-propagate adders.3. The multiplier circuitry of claim 1 , wherein a mantissa of the first precision is entirely contained within the first adder circuit claim 1 , and wherein the second adder circuit contains lower bits of the mantissa that are used to determine a rounding decision.4. The multiplier circuitry of claim 1 , wherein zeroes are appended to multiplier inputs of the second mode so that a mantissa of the second precision is entirely contained within the first adder circuit claim 1 , wherein the most significant bits of the ...

Подробнее
25-02-2021 дата публикации

Inference accelerator using logarithmic-based arithmetic

Номер: US20210056446A1
Принадлежит: Nvidia Corp

Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

Подробнее
23-02-2017 дата публикации

Pattern matching based character string retrieval

Номер: US20170053039A1
Принадлежит: International Business Machines Corp

Embodiments relate to generating a retrieval condition for retrieving a target character string from texts by pattern matching. An aspect includes dividing a first text into words. Another aspect includes generating a converted character string by performing at least one of appending at least one character in at least either one of previous and subsequent positions of the target character string. Another aspect includes replacing at least one character of the target character string. Another aspect includes generating the retrieval condition for retrieval candidates in the words of the first text, the retrieval condition comprising determining that a retrieval candidate matches the target character string and does not match the converted character string based on a ratio of a part of the retrieval candidate which matches the converted character string and corresponds to the target character string is less than or equal to a reference frequency.

Подробнее
26-02-2015 дата публикации

FRACTIONAL SCALING DIGITAL FILTERS AND THE GENERATION OF STANDARDIZED NOISE AND SYNTHETIC DATA SERIES

Номер: US20150058388A1
Автор: Smigelski Jeffrey R.
Принадлежит: WRIGHT STATE UNIVERSITY

Generation of standardized noise signals that provide mathematically correct noise with no errors and no loss of data, and can generate the noise of specific environments based on the transfer function of that environment are discussed. Various embodiments can generate synthetic data sets based on natural data sets that have similar scaling behavior. Fractional scaling digital filters, containing the fractional scaling characteristics of one or more of the eleven fundamental forms of basic building block transfer functions which incorporate the scaling exponent, can be encoded on FPGA devices or DSP chips for use in digital signal processing. Fractional Scaling Digital Filters allow fractional calculus, and thus fractional filtering (e.g., fractional scaling, fractional phase shifting, fractional integration, or fractional differentiation), to be performed on any signal, represent exact filtering solutions rather than approximations, and demonstrably are extremely accurate, highly efficient, and exhibit a higher level of performance than traditional DSP filters. 1. A system , comprising:an input series component that receives an input series;a fast Fourier transform (FFT) component that converts the input series to a complex frequency domain representation of the input series;a transfer function component that multiplies the complex frequency domain representation of the input series by a selected transfer function to generate a complex frequency domain representation of an output series; andan output component that outputs a time domain representation of the output series, wherein the time domain representation of the output series is generated by the FFT component.2. The system of claim 1 , further comprising an analysis component that determines a scaling exponent associated with the input series.3. The system of claim 2 , wherein the transfer function component adjusts a scaling exponent of the selected transfer function based at least in part on the scaling ...

Подробнее
10-03-2022 дата публикации

Memory device and operation method thereof

Номер: US20220075600A1
Принадлежит: Macronix International Co Ltd

A memory device and an operation method thereof are provided. The memory device includes: a memory array including a plurality of memory cells for storing a plurality of weights; a multiplication circuit for performing bitwise multiplication on a plurality of input data and the weights to generate a plurality of multiplication results, wherein in performing bitwise multiplication, the memory cells generate a plurality of memory cell currents; a digital accumulating circuit for performing a digital accumulating on the multiplication results; an analog accumulating circuit for performing an analog accumulating on the memory cell currents to generate a first MAC operation result; and a decision unit for deciding whether to perform the analog accumulating; the digital accumulating or a hybrid accumulating, wherein in performing the hybrid accumulating, whether the digital accumulating circuit is triggered is based on the first MAC operation result.

Подробнее
10-03-2022 дата публикации

Protection of cryptographic operations by intermediate randomization

Номер: US20220075879A1
Принадлежит: Cryptography Research Inc

Aspects of the present disclosure involve a method and a system to support execution of the method to perform a cryptographic operation involving a first vector and a second vector, by projectively scaling the first vector, performing a first operation involving the scaled first vector and the second vector to obtain a third vector, generating a random number, storing the third vector in a first location, responsive to the random number having a first value, or in a second location, responsive to the random number having a second value, and performing a second operation involving a first input and a second input, wherein, based on the random number having the first value or the second value, the first input is the third vector stored in the first location or the second location and the second input is a fourth vector stored in the second location or the first location.

Подробнее
15-05-2014 дата публикации

Impulse regular expression matching

Номер: US20140136465A1
Принадлежит: LSI Corp

Disclosed is a method and apparatus for matching regular expressions. A buffer of symbols giving a number of the last occurrence positions of each symbol is maintained. When two constants match on either side of a regular expression operator, the buffer of symbols is queried to determine if a member of the complement of the regular expression operator occurred between the two constants. If so, then the operator was not satisfied. If not, then the operator was satisfied.

Подробнее
03-03-2016 дата публикации

SEMICONDUCTOR DEVICE

Номер: US20160062951A1
Принадлежит: Hitachi, Ltd.

A semiconductor device includes a plurality of spin units individually including a memory cell configured to store values of spins in an Ising model, a memory cell configured to store an interaction coefficient from an adjacent spin that exerts an interaction on the spin, a memory cell configured to store an external magnetic field coefficient of the spin, and an interaction circuit configured to determine a subsequent state of the spin. The spin units individually include a random number generator configured to supply the random number to the plurality of the spin units and generate two-valued simulated coefficients of two values or simulated coefficients of three values in performing an interaction to determine a subsequent state of a spin of the spin units from a value of a spin from an adjacent spin unit, an interaction coefficient, and an external magnetic field coefficient. 1. A semiconductor device comprising: a memory cell configured to store a value of a single spin in an Ising model;', 'a memory cell configured to store an interaction coefficient expressing an interaction from another spin to the single spin;', 'a coefficient regulator configured to select one from a predetermined coefficient group at a probability proportional to a size of the interaction coefficient by comparing the interaction coefficient with a random number; and', 'an arithmetic circuit configured to perform an arithmetic operation to determine a subsequent state of the spin according to the selected coefficient; and, 'a plurality of spin units individually includinga random number generator configured to supply the random number to the plurality of the spin units.2. The semiconductor device according to claim 1 ,wherein the random number supplied to the plurality of the spin units is a random pulse train.3. The semiconductor device according to claim 1 ,wherein a product of the coefficient selected at the coefficient regulator and a value of an adjacent spin is computed and inputted ...

Подробнее
01-03-2018 дата публикации

COMPUTER-BASED SQUARE ROOT AND DIVISION OPERATIONS

Номер: US20180060039A1
Принадлежит: Advanced Micro Devices, Inc.

Square root operations in a computer processor are disclosed. A first iteration for calculating partial results of a square root operation is performed in a larger number of cycles than remaining iterations. The first iteration requires calculation of a first digit that is larger than the subsequent digits. The first iteration thus requires multiplication of values that are larger than corresponding values for the subsequent other digits. By splitting the first digit into two parts, the required multiplications can be performed in less time than if the first digit were not split. Performing these multiplications in less time reduces the total delay for clock cycles associated with the first digit calculations, which increases the possible clock frequency allowed. A multiply-and-accumulate unit that performs either packed-single operations or double-precision operations may be used, along with a combined division/square root unit for simultaneous execution of division and square root operations. 1. A method for performing a first operation by calculating at least a portion of a square root result of an operand , the method comprising:receiving the operand;calculating a first digit of the square root result based on the operand;splitting the first digit into a first digit component and a second digit component;calculating a first residual value based on the first digit component and the second digit component, wherein calculating the first residual value is performed in a first number of computer clock cycles; andcalculating subsequent residual values and subsequent digits of the square root result based on the first residual value, wherein calculating each subsequent residual value is performed in a second number of computer clock cycles, the second number of computer clock cycles being less than the first number of computer clock cycles.2. The method of claim 1 , wherein:the first digit of the square root result includes more bits than each of the subsequent digits ...

Подробнее
20-02-2020 дата публикации

METHOD FOR GENERATING A PRIME NUMBER FOR A CRYPTOGRAPHIC APPLICATION

Номер: US20200057611A1
Принадлежит:

The present invention relates to a method for generating a prime number and using it in a cryptographic application, comprising the steps of: a) determining at least one binary base B with a small size b=log(B) bits and for each determined base B at least one small prime psuch that B mod p=1, with i an integer, b) selecting a prime candidate Y, c) decomposing the selected prime candidate Yin a base B selected among said determined binary bases : Y=ΣyBd) computing a residue yfrom the candidate Yfor said selected base such that y=Σe) testing if said computed residue yis divisible by one small prime pi selected among said determined small primes for said selected base B, f) while said computed residue yis not divisible by said selected small prime, iteratively repeating above step e) until tests performed at step e) prove that said computed residue yis not divisible by any of said determined small primes for said selected base B, g) when said computed residue yis not divisible by any of said determined small primes for said selected base B, iteratively repeating steps c) to f) for each base B among said determined binary bases, h) when, for all determined bases B, said residue ycomputed for a determined base is not divisible by any of said determined small primes for said determined base B, executing a known rigorous probable primality test on said candidate Y, and when the known rigorous probable primality test is a success, storing said prime candidate Yand using said stored prime candidate Yin said cryptographic application. 1. A method for generating a prime number and using it in a cryptographic application , comprising the steps of:{'b': 2', '4', '3', '5, 'sub': 2', 'i', 'i, 'a) determining, via a processing system () comprising at least one hardware processor (), a test primality circuit () and a memory circuit (), at least one binary base B with a small size b=log(B) bits and for each determined base B, at least one small prime psuch that B mod p=1, with i an ...

Подробнее
02-03-2017 дата публикации

METHOD AND DEVICE FOR OPTICS BASED QUANTUM RANDOM NUMBER GENERATION

Номер: US20170060534A1
Принадлежит:

A device for random number generation based on an optical process of quantum nature, including a light source emitting photons randomly, a light detector adapted to absorb the randomly emitted photons and to measure a number n of photons produced by the light source in a time interval T, and a randomness extractor. The detector includes a photon sensor acting as a photon-to-electron converter, an amplifier for converting the electron signal received from the photon sensor into a voltage and amplifying the voltage signal, as well as an analog-to-digital converter for processing the amplified signal received from the amplifier by encoding the amplified signal into digital values and sending these digital values to the randomness extractor for further processing such as to produce quantum random numbers (QRNs) based on the number of photons produced by the light source in a time interval T. 1. Device for random number generation based on an optical process of quantum nature comprising:a light source emitting photons randomly,a light detector comprising a photon sensor adapted to absorb the randomly emitted photons,an amplifier for converting an electron signal received from the photon sensor into a voltage and amplifying the voltage signal, andan analog-to-digital converter (ADC) for treating the amplified signal received from the amplifier by encoding the amplified signal into digital values and sending these digital values to a randomness extractor of the device for further processing,{'sub': j', 'i, 'wherein the randomness extractor is adapted to generate a number k of high-entropy output bits yfrom a number l>k of lower-entropy raw input bits r,'}the photon sensor of the detector is adapted to operate in a linear regime and acts as a photon-to-electron converter to allow the detector to be adapted to measure a number n of photons produced by said light source in a time interval T,{'sub': i', 'j', 'j, 'sup': '−(sl−k)/2', 'the randomness extractor is adapted, for raw ...

Подробнее
04-03-2021 дата публикации

ACCUMULATORS FOR BEHAVIORAL CHARACTERISTICS OF WAVES

Номер: US20210064366A1
Принадлежит:

An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations. 1. An apparatus comprising:a plurality of processing elements configured to concurrently execute a plurality of first waves;accumulators associated with the plurality of processing elements, wherein the accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements; anda dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators.2. The apparatus of claim 1 , wherein the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths claim 1 , usage of an arithmetic logic unit (ALU) claim 1 , and number of export operations.3. The apparatus of claim 1 , wherein the accumulators have corresponding maximum values claim 1 , and wherein the dispatcher is configured to determine available portions of the accumulators that are equal to ...

Подробнее
04-03-2021 дата публикации

Matrix multiplication device and operation method thereof

Номер: US20210064373A1
Принадлежит: Neuchips Corp

A matrix multiplication device and an operation method thereof are provided. The matrix multiplication device includes calculation circuits, a control circuit, a multiplication circuit, and a routing circuit. The calculation circuits produce multiply-accumulate values. The control circuit receives a plurality of first element values of a first matrix. The control circuit classifies the first element values into at least one classification value. The multiplication circuit multiplies the classification value by a second element value of a second matrix in a low power mode to obtain at least one product value. The routing circuit transmits each of the product values to at least one corresponding calculation circuit in the calculation circuits in the low power mode.

Подробнее
02-03-2017 дата публикации

Protection of a modular exponentiation calculation

Номер: US20170061119A1
Автор: Yannick Teglia
Принадлежит: STMICROELECTRONICS ROUSSET SAS

A method of protecting a modular exponentiation calculation executed by an electronic circuit using a first register and a second register, successively comprising, for each bit of the exponent: a first step of multiplying the content of one of the registers, selected from among the first register and the second register according to the state of the bit of the exponent, by the content of the other one of the first and second registers, placing the result in said one of the registers; a second step of squaring the content of said other one of the registers by placing the result in this other register, wherein the content of said other one of the registers is stored in a third register before the first step and is restored in said other one of the registers before the second step.

Подробнее
04-03-2021 дата публикации

Automatically bootstrapping a domain-specific vocabulary

Номер: US20210064702A1
Принадлежит: International Business Machines Corp

A computer-implemented method, system and computer program product for automatically bootstrapping a domain-specific vocabulary from at least one source document using one or more computers, by: (a) encoding one or more passages in the source document to identify one or more relevant words therein, wherein the encoding assigns an importance to the relevant words using an attention mechanism (AM) on top of a recurrent neural network (RNN); (b) expanding the relevant words using word embedding distance, ontology information, or multi-part analogies; and (c) mapping the expanded words to concepts for inclusion into the domain-specific vocabulary, wherein concept disambiguation is performed to ensure that incorrect concepts are not included into the domain-specific vocabulary.

Подробнее
20-02-2020 дата публикации

Determining consensus by parallel proof of voting in consortium blockchain

Номер: US20200059369A1

An example for determining consensus by Parallel Proof of Voting (PPoV) in a consortium blockchain includes causing each bookkeeping node to generate and publish a block to a consortium blockchain network. After collecting all the block generated in the previous step, the consortium node votes send a total voting message (the hash value of each block, as well as the agreed opinion and signature) to the leader node. The leader node counts the voting results and random selects the next leader node, which publishes the block group header to the consortium blockchain network. When a blockchain node receives the block generated by the bookkeeping nodes and the block group header generated by the leader node, it will store them in the database as a block group.

Подробнее
05-03-2015 дата публикации

Computing device storing look-up tables for computation of a function

Номер: US20150067441A1
Принадлежит: Koninklijke Philips NV

A computing device is provide, configured to compute a function of one or more inputs, the device comprising a storage device storing one or more look-up tables used in the computation of said function, the look-up tables mapping input values to output values, the look-up table being constructed with respect to the first error correcting code, a second error correcting code, a first error threshold and a second error threshold, such that any two input values ( 112 ) that each differ at most a first error threshold number of bits from a same code word of the first error correcting code, are mapped to respective output values ( 131 - 38 ) that each differ at most a second error threshold number of bits from a same code word of the second error correcting code, wherein the first error threshold is at least 1 and at most the error correcting capability (t1) of the first error correcting code, and the second error 10 threshold is at most the error correcting capability (t2) of the second error correcting code.

Подробнее
17-03-2022 дата публикации

Flexible accelerator for a tensor workload

Номер: US20220083314A1
Принадлежит: Nvidia Corp

Accelerators are generally utilized to provide high performance and energy efficiency for tensor algorithms. Currently, an accelerator will be specifically designed around the fundamental properties of the tensor algorithm and shape it supports, and thus will exhibit sub-optimal performance when used for other tensor algorithms and shapes. The present disclosure provides a flexible accelerator for tensor workloads. The flexible accelerator can be a flexible tensor accelerator or a FPGA having a dynamically configurable inter-PE network supporting different tensor shapes and different tensor algorithms including at least a GEMM algorithm, a 2D CNN algorithm, and a 3D CNN algorithm, and/or having a flexible DPU in which a dot product length of its dot product sub-units is configurable based on a target compute throughput that is less than or equal to a maximum throughput of the flexible DPU.

Подробнее
28-02-2019 дата публикации

PROCESSOR AND METHOD FOR OUTER PRODUCT ACCUMULATE OPERATIONS

Номер: US20190065149A1
Принадлежит: MICROUNITY SYSTEMS ENGINEERING, INC.

A processor and method for performing outer product and outer product accumulation operations on vector operands requiring large numbers of multiplies and accumulations is disclosed. 1. (canceled)2. A system comprising:a memory storing a first plurality of multiplier operands and a second plurality of multiplicand operands; a multiplier circuit coupled to receive one multiplier operand and one multiplicand operand and provide a multiplication result;', 'an adder circuit coupled to the multiplier circuit to receive the multiplication result, add it to any previous multiplication result stored in an accumulator circuit and provide a sum; and', 'the accumulator circuit coupled to the adder circuit to store the sum as an accumulation result., 'a processor coupled to the memory to receive the first plurality of multiplier operands and the second plurality of multiplicand operands and to multiply each one of the first plurality of multiplier operands with every one of the second plurality of multiplicand operands, the processor including an array of circuit tiles arranged in rows and columns, each circuit tiles in the array including3. A system as in wherein:the array of circuit tiles has a same number n of rows and columns;the processor includes a register file having a bit width of r bits; andeach one of a plurality of multiplier operands and each one of the plurality of multiplicand operands has a bit width of b bits, and an aggregate width of r bits where r=n*b.4. A system as in wherein the accumulator circuit stores more than one copy of the accumulation result.5. A system as in wherein each circuit tile in the array further includes an output stage circuit coupled to the accumulator circuit for storing the accumulation result before transfer of the accumulation result from the array.6. A system as in wherein each circuit tile in the array further includes a switching circuit coupled between the accumulator circuit and the output stage circuit for controlling data ...

Подробнее
17-03-2022 дата публикации

Convolutional neural network operation method and device

Номер: US20220083857A1
Автор: BO Lin, Chao Li, Wei Zhu
Принадлежит: Xiamen Sigmastar Technology Ltd

A convolutional neural network operation device includes a scheduling mode unit, a first data processing circuit, a second data processing circuit and a multiple-accumulate (MAC) operation array. The scheduling mode unit determines, according to a quantity and size information of the target convolutional kernels, a target scheduling mode corresponding to a size of a convolutional computing block. The first data processing circuit recombines weight data in the target convolutional kernels and the second data processing circuit recombines input data in a target convolutional layer according to the target scheduling mode. The MAC operation array includes multiple MAC operation cells, and performs a MAC operation based on the recombined weight data and the recombined input data, wherein a quantity of the MAC operation cells used by the MAC operation array in each round of operation corresponds to the size of the convolutional computing block.

Подробнее
08-03-2018 дата публикации

Apparatus for Calculating and Retaining a Bound on Error during Floating Point Operations and Methods Thereof

Номер: US20180067722A1
Автор: Jorgensen Alan A.
Принадлежит:

The apparatus and method for calculating and retaining a bound on error during floating point operations inserts an additional bounding field into the standard floating-point format that records the retained significant bits of the calculation with notification upon insufficient retention. The bounding field, accounting for both rounding and cancellation errors, includes the lost bits D Field and the accumulated rounding error R Field. The D Field states the number of bits in the floating point representation that are no longer meaningful. The bounds on the represented real value are determined by the truncated floating point value and the addition of the error determined by the number of lost bits. The true, real value is absolutely contained by these bounds. The allowable loss (optionally programmable) of significant digits provides a fail-safe, real-time notification of loss of significant digits. This allows representation of real numbers accurate to the last digit. 1910950910. A processing component for use with a main processing unit () comprising a bounded floating point unit (BFPU) () communicably coupled to said main processing unit () , wherein:{'b': 950', '200, 'said BFPU () comprises a bounded floating point addition/subtraction circuit ();'}{'b': 200', '400', '600, 'said bounded floating point addition/subtraction circuit () comprises a dominant bound circuit () and a main bound circuit ();'}{'b': 200', '210', '220', '285, 'said bounded floating point addition/subtraction circuit () further comprises a first operand conglomerate register (), a second operand conglomerate register (), a final result conglomerate register ();'}{'b': 210', '201', '100, 'said first operand conglomerate register () accommodates a first operand () in a bounded floating point format ();'}{'b': 100', '50', '51', '52', '53, 'said bounded floating point format () comprises a sign bit S Field (), an exponent E Field (), a bound B Field (), and a significand T Field ();'}{'b': 52', ...

Подробнее
10-03-2016 дата публикации

Method and apparatus for scalar multiplication secure against differential power attacks

Номер: US20160072622A1
Принадлежит: Umm al-Qura University

A method of scalar multiplication to obtain the scalar product between a key and a point on an elliptic curve, wherein the secret is m bits long. In selected embodiments, the first step is to partition the secret into two partitions each with m/2 bits. Point-doubling operations are performed on the point and stored into three buffers. Point additions are performed at randomized time intervals thereby preventing the method from being susceptible to differential power analysis attacks.

Подробнее
08-03-2018 дата публикации

PRE-PROCESSING BEFORE PRECISE PATTERN MATCHING

Номер: US20180069873A1
Принадлежит:

Pre-processing before precise pattern matching of a target pattern from a stream of patterns. Including acquiring occurrence numbers of target elements in the target pattern, initializing the buffer, the buffer indicating a section in the stream of patterns, determining whether occurrence numbers of the target elements in the buffer reach the occurrence numbers of the target elements in the target pattern, updating the buffer and then returning to the determining step, in response to determining that the occurrence numbers of the target elements in the buffer do not reach the occurrence numbers of the target elements in the target pattern, and outputting the elements in the buffer for subsequent processing, in response to determining that the occurrence numbers of the target elements in the buffer reach the occurrence numbers of the target elements in the target pattern. 1. A method for identifying a target pattern from a stream of patterns , the target pattern and the stream of patterns comprises consecutive elements and the target pattern comprises one or more of the consecutive elements of the stream of patterns , the method comprising:storing a predetermined number of consecutive elements from the stream of patterns in a buffer as a section of elements, wherein the section of elements is defined by a buffer starting point indicator and a buffer ending point indicator;determining a second occurrence value for each element in the target pattern, wherein the second occurrence value is equal to the number of times each element in the target pattern occurs in the section of elements stored in the buffer;updating the buffer to include one additional element in the section of elements by moving the buffer ending point indicator towards the end of the stream of patterns by one element;repeating determining the second occurrence value and updating the buffer until the second occurrence value matches a first occurrence value for each element in the target pattern, wherein ...

Подробнее
11-03-2021 дата публикации

Reconfigurable arithmetic engine circuit

Номер: US20210072954A1
Автор: Raymond J. Andraka
Принадлежит: Cornami Inc

A representative reconfigurable processing circuit and a reconfigurable arithmetic circuit are disclosed, each of which may include input reordering queues; a multiplier shifter and combiner network coupled to the input reordering queues; an accumulator circuit; and a control logic circuit, along with a processor and various interconnection networks. A representative reconfigurable arithmetic circuit has a plurality of operating modes, such as floating point and integer arithmetic modes, logical manipulation modes, Boolean logic, shift, rotate, conditional operations, and format conversion, and is configurable for a wide variety of multiplication modes. Dedicated routing connecting multiplier adder trees allows multiple reconfigurable arithmetic circuits to be reconfigurably combined, in pair or quad configurations, for larger adders, complex multiplies and general sum of products use, for example.

Подробнее
11-03-2021 дата публикации

SPARSE MATRIX MULTIPLICATION ACCELERATION MECHANISM

Номер: US20210073318A1
Принадлежит: Intel Corporation

An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware. 1. An apparatus to facilitate acceleration of matrix multiplication operations , comprising:a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices; andsparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.2. The apparatus of claim 1 , wherein the sparse matrix acceleration hardware detects zeroes within the received matrix data by comparing each matrix value with a zero value.3. The apparatus of claim 1 , further comprising compression hardware to compress the matrix data.4. The apparatus of claim 3 , wherein compressing the matrix data comprises generating packed matrix data by removing zero values from the matrix data and generating an indicator vector to identify the location of the zero values in the packed matrix data.5. The apparatus of claim 4 , wherein the sparse matrix acceleration hardware receives the compressed matrix data and identifies the zero values in the compressed matrix data based on the indicator vector.6. The apparatus of claim 1 , wherein the sparse matrix acceleration hardware optimizing the matrix data comprises swapping rows in each of a plurality of sub-matrices of a first of the plurality of input matrices to achieve a maximum number of adjacent rows having a predetermined ...

Подробнее
11-03-2021 дата публикации

ADDER CIRCUITRY FOR VERY LARGE INTEGERS

Номер: US20210075425A1
Автор: Langhammer Martin
Принадлежит: Intel Corporation

An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders. 1. An integrated circuit , comprising:a first adder circuit configured to generate a first sum and first carry bits;a second adder circuit configured to generate a second sum and second carry bits;a third adder circuit configured to receive the first and second sums; andcounter circuitry configured to receive the first carry bits and the second carry bits.2. The integrated circuit of claim 1 , wherein the third adder circuit does not include any inputs for receiving the first and second carry bits.3. The integrated circuit of claim 1 , further comprising:a fourth adder circuit configured to generate a third sum and third carry bits;a fifth adder circuit configured to generate a fourth sum and fourth carry bits; anda sixth adder circuit configured to receive the third and fourth sums, wherein the counter circuitry is further configured to receive the third carry bits and the fourth carry bits.4. The integrated circuit of claim 3 , wherein:the third adder circuit is devoid of inputs for receiving the first and second carry bits; andthe sixth adder circuit is devoid of inputs for receiving the third and fourth carry bits.5. The integrated circuit of claim 3 , ...

Подробнее
17-03-2016 дата публикации

DOUBLE ROUNDED COMBINED FLOATING-POINT MULTIPLY AND ADD

Номер: US20160077802A1
Принадлежит:

Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection. 1. A machine implemented method comprising:detecting an executable thread portion including a first floating-point (FP) multiplication operation and a second FP operation, the second FP operation specifying as a source operand a result of the first FP multiplication operation;encoding the first FP multiplication operation and the second FP operation as a combined FP operation including a rounding of the result of the first FP multiplication operation followed by the second FP operation using the rounded result as said source operand;storing the encoding of said combined FP operation; andexecuting said combined FP operation as part of the executable thread portion.2. The machine implemented method of claim 1 , wherein the second FP operation is a FP addition operation.3. The machine implemented method of claim 1 , wherein the second FP operation is a FP subtraction operation.4. The machine implemented method of claim 1 , wherein the ...

Подробнее
17-03-2016 дата публикации

Extracting Entropy From The Vibration Of Multiple Machines

Номер: US20160077804A1
Принадлежит: International Business Machines Corp

Generating a pool of random numbers for use by computer applications. Vibration sensors are placed throughout a machine and collect entropy data from the measurements of the vibration sensors. The data is then filtered and sent via secure connection to a second machine to be added to the second machine's entropy pool. Applications needing a random number may acquire a number from the pool. A method, computer program product and system to generate the pool are provided.

Подробнее