Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 209. Отображено 124.
15-11-2016 дата публикации

Method and apparatus for performing a shift and exclusive or operation in a single instruction

Номер: US0009495166B2
Принадлежит: Intel Corporation, INTEL CORP

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.

Подробнее
09-08-2016 дата публикации

Instructions and logic to provide memory access key protection functionality

Номер: US0009411600B2
Принадлежит: Intel Corporation, INTEL CORP

Instructions and logic provide memory key protection functionality. Embodiments include a processor having a register to store a memory protection field. A decoder decodes an instruction having an addressing form field for a memory operand to specify one or more memory addresses, and a memory protection key. One or more execution units, responsive to the memory protection field having a first value and to the addressing form field of the decoded instruction having a second value, enforce memory protection according to said first value of the memory protection field, using the specified memory protection key, for accessing the one or more memory addresses, and fault if a portion of the memory protection key specified by the decoded instruction does not match a stored key value associated with the one or more memory addresses.

Подробнее
22-11-2016 дата публикации

Method and apparatus for performing a shift and exclusive or operation in a single instruction

Номер: US0009501281B2
Принадлежит: Intel Corporation, INTEL CORP

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.

Подробнее
17-08-2017 дата публикации

SYSTEM FOR SPECULATIVE EXECUTION EVENT COUNTER CHECKPOINTING AND RESTORING

Номер: US20170235580A1
Принадлежит:

An example system for speculative execution event counter checkpointing and restoring may include a plurality of processors, a first interconnect to couple two or more of the plurality of processors, a second interconnect to couple one or more of the plurality of processors to one or more other system components, and a system memory coupled to one or more of the processors. At least one processor of the plurality of processors may include: a plurality of symmetric cores, at least one of the symmetric cores to simultaneously process a plurality of threads and to perform out-of-order instruction processing for the plurality of threads; at least one shared cache circuit to be shared among two or more the of symmetric cores; and event counter circuitry comprising: a plurality of event counters including programmable event counters and fixed event counters; one or more configuration registers to store configuration data to specify an event type to be counted by the programmable event counters, wherein at least one of the one or more configuration registers is to store configuration data for a plurality of the programmable event counters. The processor may further include transactional memory circuitry to process transactional memory operations including load operations and store operations, the transactional memory circuitry to process a transaction begin instruction to indicate a start of a transactional execution region of a program, a transaction end instruction to indicate an end of the transactional execution region, and a transaction abort instruction to abort processing of the transactional execution region. The processor may further include transaction checkpoint circuitry to store a processor state at the start of the transactional execution region of the program, the processor state including values of one or more of the event counters. The processor may further include lock elision circuitry to cause critical sections of the program to execute as transactions ...

Подробнее
16-05-2017 дата публикации

Instruction and logic to control transfer in a partial binary translation system

Номер: US0009652234B2

A dynamic optimization of code for a processor-specific dynamic binary translation of hot code pages (e.g., frequently executed code pages) may be provided by a run-time translation layer. A method may be provided to use an instruction look-aside buffer (iTLB) to map original code pages and translated code pages. The method may comprise fetching an instruction from an original code page, determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated. If both determinations return yes, the method may further comprise fetching a next instruction from a translated code page. If either determinations returns no, the method may further comprise decoding the instruction and fetching the next instruction from the original code page.

Подробнее
25-04-2017 дата публикации

Flexible architecture and instruction for advanced encryption standard (AES)

Номер: US0009634829B2
Принадлежит: Intel Corporation, INTEL CORP

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
02-05-2017 дата публикации

Flexible architecture and instruction for advanced encryption standard (AES)

Номер: US0009641320B2
Принадлежит: Intel Corporation, INTEL CORP

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
22-08-2017 дата публикации

Processor-based apparatus and method for processing bit streams using bit-oriented instructions through byte-oriented storage

Номер: US0009740484B2

An apparatus and method are described for processing bit streams using bit-oriented instructions. For example, a method according to one embodiment includes the operations of: executing an instruction to get bits for an operation, the instruction identifying a start bit address and a number of bits to be retrieved; retrieving the bits identified by the start bit address and number of bits from a bit-oriented register or cache; and performing a sequence of specified bit operations on the retrieved bits to generate results.

Подробнее
25-04-2017 дата публикации

Method and apparatus to process SHA-2 secure hashing algorithm

Номер: US0009632782B2

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.

Подробнее
02-08-2016 дата публикации

Apparatus and method of execution unit for calculating multiple rounds of a skein hashing algorithm

Номер: US0009405537B2

An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows. a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections.

Подробнее
25-08-2016 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20160246606A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A processor comprising:a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;one or more of the plurality of cores to perform out-of-order execution of instructions of the threads; instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations, and', 'a second level cache unit to cache instructions and data; and', 'an execution unit to execute a first instruction to indicate an end of a transaction', 'execution region and to cause memory transactions to be atomically committed or aborted., 'one or more of the plurality of cores comprising2. The processor of claim 1 , the execution unit to further execute:a ...

Подробнее
25-04-2017 дата публикации

Flexible architecture and instruction for advanced encryption standard (AES)

Номер: US0009634830B2
Принадлежит: Intel Corporation, INTEL CORP

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
29-12-2016 дата публикации

Instructions and Logic to Provide Memory Access Key Protection Functionality

Номер: US20160378692A1
Принадлежит:

Instructions and logic provide memory key protection functionality. Embodiments include a processor having a register to store a memory protection field. A decoder decodes an instruction having an addressing form field for a memory operand to specify one or more memory addresses, and a memory protection key. One or more execution units, responsive to the memory protection field having a first value and to the addressing form field of the decoded instruction having a second value, enforce memory protection according to said first value of the memory protection field, using the specified memory protection key, for accessing the one or more memory addresses, and fault if a portion of the memory protection key specified by the decoded instruction does not match a stored key value associated with the one or more memory addresses.

Подробнее
06-09-2016 дата публикации

Apparatus and method for vector instructions for large integer arithmetic

Номер: US0009436435B2

An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register.

Подробнее
10-01-2017 дата публикации

Method and apparatus for a non-deterministic random bit generator (NRBG)

Номер: US0009544139B2

A hardware-based digital random number generator is provided. In one embodiment, a processor includes a digital random number generator (DRNG) to condition entropy data provided by an entropy source, to generate a plurality of deterministic random bit (DRB) strings, and to generate a plurality of nondeterministic random bit (NRB) strings, and an execution unit coupled to the DRNG, in response to a first instruction to read a seed value, to retrieve one of the NRB strings from the DRNG and to store the NRB string in a destination register specified by the first instruction.

Подробнее
17-08-2017 дата публикации

PROCESSOR FOR SPECULATIVE EXECUTION EVENT COUNTER CHECKPOINTING AND RESTORING

Номер: US20170235579A1
Принадлежит:

An example processor for speculative execution event counter checkpointing and restoring may include a plurality of symmetric cores, at least one of the symmetric cores to simultaneously process a plurality of threads and to perform out-of-order instruction processing for the plurality of threads; at least one shared cache circuit to be shared among two or more the of symmetric cores. The processor may further include event counter circuitry comprising: a plurality of event counters including programmable event counters and fixed event counters and one or more configuration registers to store configuration data to specify an event type to be counted by the programmable event counters, wherein at least one of the one or more configuration registers is to store configuration data for a plurality of the programmable event counters. The processor may further include transactional memory circuitry to process transactional memory operations including load operations and store operations, the transactional memory circuitry to process a transaction begin instruction to indicate a start of a transactional execution region of a program, a transaction end instruction to indicate an end of the transactional execution region, and a transaction abort instruction to abort processing of the transactional execution region. The processor may further include transaction checkpoint circuitry to store a processor state at the start of the transactional execution region of the program, the processor state including values of one or more of the event counters. The processor may further include lock elision circuitry to cause critical sections of the program to execute as transactions on multiple threads without acquiring a lock, the lock elision circuitry to cause the critical sections to be re-executed non-speculatively using one or more locks in response to detecting a transaction failure. 1. A processor comprising:a plurality of symmetric cores, at least one of the symmetric cores to ...

Подробнее
20-12-2016 дата публикации

QoS based binary translation and application streaming

Номер: US0009525586B2
Принадлежит: Intel Corporation, INTEL CORP

In one embodiment, Quality of Service (QoS) criteria based server side binary translation and execution of applications is performed on multiple servers utilizing distributed translation and execution in either a virtualized or native execution environment. The translated applications are executed to generate output display data, the output display data is encoded in a media format suitable for video streaming, and the video stream is delivered over a network to a client device. In one embodiment, one or more graphics processors assist the central processors of the servers by accelerating the rendering of the application output, and a media encoder encodes the application output into a media format.

Подробнее
18-08-2016 дата публикации

METHOD, APPARATUS, AND SYSTEM FOR SPECULATIVE ABORT CONTROL MECHANISMS

Номер: US20160239304A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A system comprising:a plurality of processors;a processor interconnect to communicatively couple two of the plurality of processors; anda system memory comprising dynamic random access memory communicatively coupled to one or more of the plurality of processors over a memory interconnect; a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;', 'one or more of the plurality of cores to perform out-of-order execution of instructions of the threads;', instruction fetch logic to fetch instructions of one or more of the threads, instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file, a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache instructions and data, and', 'an execution unit to execute a ...

Подробнее
29-08-2017 дата публикации

Method and apparatus for performing a shift and exclusive or operation in a single instruction

Номер: US0009747105B2

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.

Подробнее
17-08-2017 дата публикации

QOS BASED BINARY TRANSLATION AND APPLICATION STREAMING

Номер: US20170237797A1
Принадлежит:

In one embodiment, Quality of Service (QoS) criteria based server side binary translation and execution of applications is performed on multiple servers utilizing distributed translation and execution in either a virtualized or native execution environment. The translated applications are executed to generate output display data, the output display data is encoded in a media format suitable for video streaming, and the video stream is delivered over a network to a client device. In one embodiment, one or more graphics processors assist the central processors of the servers by accelerating the rendering of the application output, and a media encoder encodes the application output into a media format. 1. A system comprising:a server having a central processor and a network interface, the central processor having a first instruction set, the server to translate a binary having a second instruction set into a translated executable having the first instruction set, the translation performed using Quality of Service (QoS) criteria, wherein the server executes the translated binary to generate a frame of rendered output, and transmits the frame of rendered output via the network interface; anda client device having a display, a client processor and a client network interface, the client device to receive the frame of rendered output from the server via the client network interface, and to display the frame of rendered output on the display using the client processor.2. The binary translation system of claim 1 , wherein the QoS criteria include priority based acceleration and multiple client parameters claim 1 , wherein the multiple client parameters include a client device resolution claim 1 , a client device location claim 1 , a client application type claim 1 , and a set of client decode capabilities.3. The binary translation system of claim 1 , wherein the frame of rendered output is encoded into a media format before the server transmits the frame of rendered output.4. ...

Подробнее
09-05-2017 дата публикации

Flexible architecture and instruction for advanced encryption standard (AES)

Номер: US0009647831B2
Принадлежит: Intel Corporation, INTEL CORP

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
15-11-2016 дата публикации

Method and apparatus for performing a shift and exclusive or operation in a single instruction

Номер: US0009495165B2
Принадлежит: Intel Corporation, INTEL CORP

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.

Подробнее
25-08-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160248580A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
17-08-2017 дата публикации

SYSTEM-ON-CHIP FOR SPECULATIVE EXECUTION EVENT COUNTER CHECKPOINTING AND RESTORING

Номер: US20170235638A1
Принадлежит:

An example system for speculative execution event counter checkpointing and restoring may include a plurality of symmetric cores, at least one of the symmetric cores to simultaneously process a plurality of threads and to perform out-of-order instruction processing for the plurality of threads; at least one shared cache circuit to be shared among two or more the of symmetric cores. The system may further include a memory controller to couple the symmetric cores to a system memory and a data communication interface to couple one or more of the cores to input/output devices. The system may further include event counter circuitry comprising: a plurality of event counters including programmable event counters and fixed event counters and one or more configuration registers to store configuration data to specify an event type to be counted by the programmable event counters, wherein at least one of the one or more configuration registers is to store configuration data for a plurality of the programmable event counters. The system may further include transactional memory circuitry to process transactional memory operations including load operations and store operations, the transactional memory circuitry to process a transaction begin instruction to indicate a start of a transactional execution region of a program, a transaction end instruction to indicate an end of the transactional execution region, and a transaction abort instruction to abort processing of the transactional execution region. The system may further include transaction checkpoint circuitry to store a processor state at the start of the transactional execution region of the program, the processor state including values of one or more of the event counters. The system may further include lock elision circuitry to cause critical sections of the program to execute as transactions on multiple threads without acquiring a lock, the lock elision circuitry to cause the critical sections to be re-executed non- ...

Подробнее
14-02-2017 дата публикации

Apparatus and method of execution unit for calculating multiple rounds of a skein hashing algorithm

Номер: US9569210B2
Принадлежит: INTEL CORP, Intel Corporation

An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows: a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections.

Подробнее
15-09-2016 дата публикации

Instruction and logic to test transactional execution status

Номер: US20160266992A1
Принадлежит:

Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction.

Подробнее
26-09-2017 дата публикации

Method and apparatus to process KECCAK secure hashing algorithm

Номер: US0009772845B2

A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner.

Подробнее
28-07-2016 дата публикации

PROCESSORS, METHODS, AND SYSTEMS TO RELAX SYNCHRONIZATION OF ACCESSES TO SHARED MEMORY

Номер: US20160216967A1
Принадлежит: Intel Corporation

A processor of an aspect includes a plurality of logical processors. A first logical processor of the plurality is to execute software that includes a memory access synchronization instruction that is to synchronize accesses to a memory. The processor also includes memory access synchronization relaxation logic that is to prevent the memory access synchronization instruction from synchronizing accesses to the memory when the processor is in a relaxed memory access synchronization mode. 1. A processor comprising:a plurality of logical processors;a first logical processor of the plurality, the first logical processor to execute software that includes a memory access synchronization instruction that is to synchronize accesses to a memory; andmemory access synchronization relaxation logic to prevent the memory access synchronization instruction from synchronizing accesses to the memory when the processor is in a relaxed memory access synchronization mode.2. The processor of claim 1 , wherein the processor has one or more architecturally-visible bits to indicate that the processor is in the relaxed memory access synchronization mode.3. The processor of claim 2 , wherein the one or more architecturally-visible bits are accessible to software to allow the software to modify the one or more architecturally-visible bits to indicate that the processor is in the relaxed memory access synchronization mode.4. The processor of claim 2 , wherein the one or more architecturally-visible bits correspond to the memory claim 2 , and further comprising another set of one or more architecturally-visible bits which correspond to a second claim 2 , different memory.5. The processor of claim 1 , wherein the memory access synchronization instruction is selected from a fence instruction and a barrier instruction claim 1 , and wherein the memory access synchronization relaxation logic comprises logic to convert the memory access synchronization instruction to a no operation (NOP).6. The ...

Подробнее
16-05-2017 дата публикации

Flexible architecture and instruction for advanced encryption standard (AES)

Номер: US0009654281B2
Принадлежит: Intel Corporation, INTEL CORP

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
02-05-2017 дата публикации

Flexible architecture and instruction for advanced encryption standard (AES)

Номер: US0009641319B2
Принадлежит: Intel Corporation, INTEL CORP

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
16-05-2017 дата публикации

Flexible architecture and instruction for advanced encryption standard (AES)

Номер: US0009654282B2
Принадлежит: Intel Corporation, INTEL CORP

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
14-06-2012 дата публикации

ENHANCING PERFORMANCE BY INSTRUCTION INTERLEAVING AND/OR CONCURRENT PROCESSING OF MULTIPLE BUFFERS

Номер: US20120151183A1
Принадлежит:

An embodiment may include circuitry to execute, at least in part, a first list of instructions and/or to concurrently process, at least in part, first and second buffers. The execution of the first list of instructions may result, at least in part, from invocation of a first function call. The first list of instructions may include at least one portion of a second list of instructions interleaved, at least in part, with at least one other portion of a third list of instructions. The portions may be concurrently carried out, at least in part, by one or more sets of execution units of the circuitry. The second and third lists of instructions may implement, at least in part, respective algorithms that are amenable to being invoked by separate respective function calls. The concurrent processing may involve, at least in part, complementary algorithms. 1. An apparatus comprising: (a) execution, at least in part, of a first list of instructions, the execution resulting, at least in part, from invocation of a first function call, the first list of instructions comprising at least one portion of a second list of instructions interleaved, at least in part, with at least one other portion of a third list of instructions, the at least one portion and the at least one other portion to be concurrently carried out, at least in part, by one or more sets of execution units of the circuitry, the second list and the third list of instructions being to implement, at least in part, respective algorithms that are amenable to being invoked by separate respective function calls; and', '(b) concurrent processing, at least in part, of a first buffer and a second buffer, the concurrent processing involving, at least in part, complementary algorithms., 'circuitry to perform at least one of the following subparagraphs (a) and (b)2. The apparatus of claim 1 , wherein:the circuitry is capable of performing both of the subparagraphs (a) and (b);the respective algorithms comprise, at least in part ...

Подробнее
28-06-2012 дата публикации

SYSTEM, APPARATUS, AND METHOD FOR SEGMENT REGISTER READ AND WRITE REGARDLESS OF PRIVILEGE LEVEL

Номер: US20120166767A1
Принадлежит:

Embodiments of systems, apparatuses, and methods for performing privilege agnostic segment base register read or write instruction are described. An exemplary method may include fetching the privilege agnostic segment base register write instruction, wherein the privilege agnostic write instruction includes a 64-bit data source operand, decoding the fetched privilege agnostic segment base register write instruction, and executing the decoded privilege agnostic segment base register write instruction to write the 64-bit data of the source operand into the segment base register identified by the opcode of the privilege agnostic segment base register write instruction. 1. A method of performing privilege agnostic segment base register write instruction in a computer processor , comprising:fetching the privilege agnostic segment base register write instruction, wherein the privilege agnostic write instruction includes a 64-bit data source operand;decoding the fetched privilege agnostic segment base register write instruction;executing the decoded privilege agnostic segment base register write instruction to write the 64-bit data of the source operand into the segment base register identified by the opcode of the privilege agnostic segment base register write instruction.2. The method of claim 1 , wherein the segment base register is IA32_FS_BASE.3. The method of claim 1 , wherein the segment base register is IA32_GS_BASE.4. The method of claim 1 , wherein the privilege agnostic segment base register write instruction is a part of a privilege level 3 program.5. The method of claim 1 , further comprising:determining that the computer processor can support the privilege agnostic segment base register write instruction by checking a CPUID feature flag of the processor.6. The method of claim 5 , further comprising:setting a flag in the computer processor indicating support for the privilege agnostic segment base register write instruction.7. The method of claim of claim 1 , ...

Подробнее
06-09-2012 дата публикации

Method, apparatus, and system for speculative execution event counter checkpointing and restoring

Номер: US20120227045A1
Принадлежит: Intel Corp

An apparatus, method, and system are described herein for providing programmable control of performance/event counters. An event counter is programmable to track different events, as well as to be checkpointed when speculative code regions are encountered. So when a speculative code region is aborted, the event counter is able to be restored to it pre-speculation value. Moreover, the difference between a cumulative event count of committed and uncommitted execution and the committed execution, represents an event count/contribution for uncommitted execution. From information on the uncommitted execution, hardware/software may be tuned to enhance future execution to avoid wasted execution cycles.

Подробнее
08-08-2013 дата публикации

INSTRUCTION AND LOGIC TO TEST TRANSACTIONAL EXECUTION STATUS

Номер: US20130205119A1
Принадлежит:

Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction. 1. A computer implemented method comprising:decoding an instruction to test a transactional status; andexecuting the decoded instruction to determine if the execution context is within a transactional region.2. The computer implemented method of wherein responsive to said executing the decoded instruction claim 1 , setting a register to a second value indicative of a nesting level of the transactional region.3. The computer implemented method of wherein responsive to said executing the decoded instruction claim 1 , setting a flag to a first value if the instruction is executed within a transactional region.4. The computer implemented method of wherein the flag is set to the first value of zero if the instruction is executed within a transactional region.5. The computer implemented method of wherein responsive to said executing the decoded instruction claim 3 , setting the flag to a second value if the instruction is not executed within a transactional region.6. The computer implemented method of wherein the flag is set to the second value of one if the instruction is not executed ...

Подробнее
29-08-2013 дата публикации

Add Instructions to Add Three Source Operands

Номер: US20130227252A1
Принадлежит:

A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. 1. A method comprising:receiving an add instruction, the add instruction indicating a first source operand, a second source operand, and a third source operand; andstoring a sum calculated using the first, second, and third source operands as a result of the add instruction, in which the sum is stored partly in a destination operand indicated by the add instruction and partly in a plurality of flags.2. The method of claim 1 , wherein storing comprises storing a next to most significant bit of the sum in a first flag of the plurality and a most significant bit of the sum in a second flag of the plurality.3. The method of claim 1 , wherein storing the sum partly in the flags comprises storing the sum partly in a carry flag and partly in a second flag.4. The method of claim 3 , wherein the second flag comprises a re-purposed architectural flag.5. The method of claim 3 , wherein the second flag comprises an overflow flag.6. The method of claim 1 , wherein storing the sum comprises storing a sum of the first claim 1 , second claim 1 , and third source operands added to a combination of the plurality of flags.7. The method of claim 6 , wherein storing the sum comprises storing a sum of the first claim 6 , second claim 6 , and third source operands added to a first flag of the plurality and added to a product that is two times a second flag of the plurality.8. The method of claim 6 , wherein the combination of the plurality of flags includes an overflow flag claim 6 , and ...

Подробнее
19-09-2013 дата публикации

Instruction For Enabling A Processor Wait State

Номер: US20130246824A1
Принадлежит:

In one embodiment, the present invention includes a processor having a core with decode logic to decode an instruction prescribing an identification of a location to be monitored and a timer value, and a timer coupled to the decode logic to perform a count with respect to the timer value. The processor may further include a power management unit coupled to the core to determine a type of a low power state based at least in part on the timer value and cause the processor to enter the low power state responsive to the determination. Other embodiments are described and claimed. 1. A processor comprising:a decode logic to receive and decode an instruction indicating a location in at least one storage to be monitored; andpower management logic to indicate a power state based at least in part on an amount of time the location is not accessed by at least a second instruction.2. The processor of claim 1 , further comprising a monitor engine coupled to a cache memory to determine if a line of the cache memory including a copy of the monitored location is updated.3. The processor of claim 2 , wherein the monitor engine is to communicate the updated copy and a wake up signal to the processor.4. The processor of claim 3 , wherein the processor is to determine whether the updated copy corresponds to a target value claim 3 , and if so to exit a low power state claim 3 , and otherwise to determine a new low power state and to enter into the new low power state.5. The processor of claim 1 , wherein the instruction further indicates a timer value claim 1 , wherein the processor further includes a timer coupled to the decode logic to perform a count with respect to the timer value.6. The processor of claim 5 , wherein the power management logic is to determine a type of a low power state for the processor based at least in part on the timer value claim 5 , and to cause the processor to enter the low power state responsive to the determination if a value of the monitored location does ...

Подробнее
17-10-2013 дата публикации

METHOD AND APPARATUS TO PROCESS KECCAK SECURE HASHING ALGORITHM

Номер: US20130275722A1
Принадлежит: Intel Corporation

A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner. 1. A processor , comprising:a plurality of registers;an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively; andan execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner.2. The processor of claim 1 , wherein the KECCAK state cube includes 64 slices partitioned into 4 subcubes claim 1 , wherein each subcube contains 16 slices.3. The processor of claim 2 , wherein the plurality of registers include 4 registers claim 2 , each having at least 450 bits.4. The processor of claim 1 , wherein claim 1 , for each round of the KECCAK algorithm claim 1 , the execution unit is configured to perform KECCAK_THETA operations claim 1 , includingperforming a θ function of the KECCAK algorithm on the subcubes stored in the registers in parallel, andperforming a first portion of a ρ function of the KECCAK algorithm on the subcubes in parallel.5. The processor of claim 4 , wherein the execution unit is further configured to perform KECCAK_ROUND operations claim 4 , includingperforming a second portion of the ρ function of the KECCAK algorithm on the subcubes in parallel,performing a π function of the KECCAK algorithm on the ...

Подробнее
24-10-2013 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-1 SECURE HASHING ALGORITHM

Номер: US20130283064A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a SHA-1 hash algorithm, the first instruction having a first operand to store a SHA-1 state, a second operand to store a plurality of messages, and a third operand to specify a hash function, and an execution unit coupled to the instruction decoder to perform a plurality of rounds of the SHA-1 hash algorithm on the SHA-1 state specified in the first operand and the plurality of messages specified in the second operand, using the hash function specified in the third operand. 1. A processor , comprising:an instruction decoder to receive a first instruction to process a SHA-1 hash algorithm, the first instruction having a first operand to store a SHA-1 state, a second operand to store a plurality of messages, and a third operand to specify a hash function; andan execution unit coupled to the instruction decoder, in response to the first instruction, to perform a plurality of rounds of the SHA-1 hash algorithm on the SHA-1 state specified in the first operand and the plurality of messages specified in the second operand, using the hash function specified in the third operand.2. The processor of claim 1 , wherein the first operand specifies a first register having at least 160 bits storing data of SHA-1 state variables.3. The processor of claim 2 , wherein the second operand specifies a second register or a memory location having at least 128 bits storing at least four messages.4. The processor of claim 3 , wherein at least four rounds of the SHA-1 algorithm are performed in response to the first instruction as a single instruction multiple data (SIMD) instruction.5. The processor of claim 1 , wherein the instruction decoder receives a second instruction claim 1 , and wherein in response to the second instruction claim 1 , the execution unit is configured to perform a first part of message scheduling operations based on a plurality of first previous messages specified by the second ...

Подробнее
14-11-2013 дата публикации

Instruction and Logic to Control Transfer in a Partial Binary Translation System

Номер: US20130305019A1
Принадлежит:

A dynamic optimization of code for a processor-specific dynamic binary translation of hot code pages (e.g., frequently executed code pages) may be provided by a run-time translation layer. A method may be provided to use an instruction look-aside buffer (iTLB) to map original code pages and translated code pages. The method may comprise fetching an instruction from an original code page, determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated. If both determinations return yes, the method may further comprise fetching a next instruction from a translated code page. If either determinations returns no, the method may further comprise decoding the instruction and fetching the next instruction from the original code page. 127-. (canceled)28. A method comprising:fetching an instruction from an original code page;determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated;if the fetched instruction is a first instruction of a new code page and the original code page is deprecated, fetching a next instruction from a translated code page; andif the fetched instruction is not a first instruction of a new code page or the original code page is not deprecated, decoding the instruction and fetching the next instruction from the original code page.29. The method of claim 28 , wherein the original code page and the translated code page are mapped by an instruction look-aside buffer (iTLB).30. The method of claim 29 , wherein the translated code page is stored in a context parallel to a context of the original code page.31. The method of claim 29 , wherein the iTLB has a tag bit for each code page to indicate whether a code page is an original code page or translated code page.32. The method of claim 29 , wherein the iTLB has a data bit for each code page to indicate whether a code page is deprecated or not.33. The method of ...

Подробнее
21-11-2013 дата публикации

ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION WITHOUT READING CARRY FLAG

Номер: US20130311756A1
Принадлежит:

A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag. 1. A method comprising:receiving a rotate instruction, the rotate instruction indicating a source operand and a rotate amount;storing a result in a destination operand indicated by the rotate instruction, the result having the source operand rotated by the rotate amount; andcompleting execution of the rotate instruction without reading a carry flag.2. The method of claim 1 , wherein completing comprises completing execution of the rotate instruction without reading an overflow flag.3. The method of claim 2 , wherein completing comprises completing execution of the rotate instruction without writing the carry flag and without writing the overflow flag.4. The method of claim 2 , wherein completing comprises completing execution of the rotate instruction without reading a sign flag claim 2 , without reading a zero flag claim 2 , without reading an auxiliary carry flag claim 2 , and without reading a parity flag.5. The method of claim 4 , wherein completing comprises completing execution of the rotate instruction without writing the carry flag claim 4 , without writing the overflow flag claim 4 , without writing the sign flag claim 4 , without writing the zero flag claim 4 , without writing the auxiliary carry flag claim 4 , and without writing the parity flag.6. The method of claim 1 , wherein receiving comprises receiving a rotate instruction that explicitly specifies the source operand and that explicitly specifies the destination operand.7. The method of claim 1 , wherein receiving comprises receiving a rotate instruction that explicitly specifies a second source operand having the rotate amount.8. ...

Подробнее
05-12-2013 дата публикации

PROCESSOR-BASED APPARATUS AND METHOD FOR PROCESSING BIT STREAMS

Номер: US20130326201A1
Принадлежит:

An apparatus and method are described for processing bit streams using bit-oriented instructions. For example, a method according to one embodiment includes the operations of: executing an instruction to get bits for an operation, the instruction identifying a start bit address and a number of bits to be retrieved; retrieving the bits identified by the start bit address and number of bits from a bit-oriented register or cache; and performing a sequence of specified bit operations on the retrieved bits to generate results. 1. A method comprising:executing an instruction to get bits for an operation, the instruction identifying a start bit address and a number of bits to be retrieved;retrieving the bits identified by the start bit address and number of bits from a bit-oriented register or cache; andperforming a sequence of specified bit operations on the retrieved bits to generate results.2. The method as in further comprising:determining whether the bits identified by the start bit address and number of bits are stored in the bit-oriented register or cache;if not, then converting the start bit address and number of bits to a start byte address and number of bytes to be retrieved; andretrieving the bytes identified by the start byte address and number of bytes from a byte-oriented memory.3. The method as in further comprising:discarding unwanted bits from the first byte and the last byte retrieved; andperforming the sequence of specified bit operations on the remaining bits to generate results.4. The method as in further comprising:generating a byte address for storing the results back to a byte-oriented memory; andusing the byte address to store the results back to the byte-oriented memory.5. The method as in wherein the sequence of specified bit operations are part of a decompression process for decompressing a bit stream.6. A method comprising:executing an instruction to put new bits for an operation into a bit stream, the instruction identifying a start bit ...

Подробнее
02-01-2014 дата публикации

Flexible Architecture and Instruction for Advanced Encryption Standard (AES)

Номер: US20140003602A1
Принадлежит:

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1. An apparatus comprising:a key scheduler, the key scheduler to generate a round key for an aes round associated with an aes round key operation based on a received key andaes round logic to perform one of a plurality of aes round operations to compute a result of an aes operation on an input to the aes round and the round key for the aes round to provide a next input to a next aes round or the result of the aes operation.2. The apparatus of claim 1 , wherein the round key is stored in a data cache.3. The apparatus of claim 2 , wherein the key scheduler to pre-compute all round keys for the aes operation and to store all round keys in the data cache prior to starting the aes operation and the aes round logic to perform each aes round operation using the pre-computed round keys stored in the data cache.4. The apparatus of claim 2 , wherein an aes round operation to cause the execution unit to:perform an exclusive OR (XOR) operation on an input to the round and the round key for the aes round to produce an intermediate value;perform a substitution operation for each byte in the intermediate value based on values stored in a lookup table; andpass results of the substitution operation through a bit-linear transform that shifts rows in the intermediate value.5. The apparatus of claim 1 , wherein an aes round operation for a number of aes rounds −1 to cause the execution unit to:perform an exclusive OR (XOR) operation on the input to the aes round and the round key for the aes round to produce an ...

Подробнее
02-01-2014 дата публикации

Systems, Apparatuses, and Methods for Implementing Temporary Escalated Privilege

Номер: US20140006739A1
Принадлежит:

Embodiments of systems, apparatuses, and methods for temporarily allowing access to a lower privilege level from a higher privilege level. 1. A processor comprising:execution logic to execution one or more instructions of a program, wherein the program operates at different privilege levels including a higher privilege level and a lower privilege level;storage for an indication of when the program is operating at the higher privilege level whether or not the program is allowed to access data associated with the lower privilege level.2. The processor of claim 1 , wherein the storage for an indication of when the program is operating at the higher privilege level whether or not the program is allowed to access data associated with the lower privilege level is an EFLAGS register.318. The processor of claim 2 , wherein bit of the EFLAGS register is used for the indication.4. The processor of claim 2 , wherein a reserved bit of the EFLAGS register is used for the indication.5. The processor of claim 1 , wherein the storage for an indication of when the program is operating at the higher privilege level whether or not the program is allowed to access data associated with the lower privilege level is a status register.6. The processor of claim 1 , wherein the higher privilege level is a kernel level and the lower privilege level is a user mode.7. The processor of claim 6 , wherein the data to be access is in a user mode page.8. A method of using temporary privilege while executing a program in a processor comprising:entering a higher privileged level;requesting access of data associated with a lower privilege level;determining if the access request for data associated with a lower privilege level is allowed based upon an indicator set in the processor;denying the access request when the indicator does not indicate that such a request should be granted; andallowing the access request when the indicator does indicate that such a request should be granted.9. The method of ...

Подробнее
02-01-2014 дата публикации

MATRIX MULTIPLY ACCUMULATE INSTRUCTION

Номер: US20140006753A1
Принадлежит:

A method is described. The method includes iteratively performing for each position in a result matrix stored in a third register, multiplying a value at a matrix position stored in a first register with a value at a matrix position stored in a second register to obtain a first multiplicative value, where the positions in the first register and the second register are determined by the position in the result matrix and performing an exclusive or (XOR) operation with the first multiplicative value and a value stored at a result matrix position stored in the third register to obtain a result value. 1. A method of performing a MUL_ACCUMULATE_BYTE_GF2 instruction in a computer processor , comprising:decoding the MUL_ACCUMULATE_BYTE_GF2 instruction; multiplying a value at a matrix position stored in a first register with a value at a matrix position stored in a second register to obtain a first multiplicative value, wherein the positions in the first register and the second register are determined by the position in the result matrix; and', 'performing an exclusive or (XOR) operation with the first multiplicative value and a value stored at a result matrix position stored in the third register to obtain a result value., 'executing the MUL_ACCUMULATE_BYTE_GF2 instruction by iteratively performing for each position in a result matrix stored in a third register2. The method of wherein executing the MUL_ACCUMULATE_BYTE_GF2 instruction further comprises storing the result value at the result position stored in the third register.3. The method of wherein executing the MUL_ACCUMULATE_BYTE_GF2 instruction further comprises performing a reduction operation on the multiplicative value prior to performing the XOR operation.4. The method of wherein executing the MUL_ACCUMULATE_BYTE_GF2 instruction further comprises determining if there are additional positions in the first matrix and the second matrix to process.5. The method of wherein executing the MUL_ACCUMULATE_BYTE_GF2 ...

Подробнее
09-01-2014 дата публикации

ADDITION INSTRUCTIONS WITH INDEPENDENT CARRY CHAINS

Номер: US20140013086A1
Принадлежит:

A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register. 1. A method comprising:receiving a first addition instruction;receiving a second addition instruction;executing the first addition instruction and the second addition instruction without data dependency between the first addition instruction and the second addition instruction;storing a first carry output of the first addition instruction in a first flag of a flags register without modifying a second flag in the flags register; andstoring a second carry output of the second addition instruction in the second flag of the flags register without modifying the first flag.2. The method of claim 1 , further comprising:receiving a multiplication instruction to multiply a first factor with a second factor; andmultiplying the first factor with the second factor to product a product that includes a least significant half and a most significant half, and the least significant half being a source operand for the first addition instruction, the most significant half being a source operand for the second addition instruction.3. The method of claim 2 , further comprising:receiving the multiplication instruction, the first addition instruction and the second addition instruction as three consecutive instructions.4. The method of claim 1 , wherein the first addition instruction reads the first flag for carry input and the second addition instruction reads the second flag for carry input.5. The method of claim 4 , wherein the first flag is one of a carry flag and an overflow flag claim 4 , and the second flag is the other of the carry flag and the overflow flag.6. The method of claim 1 , further ...

Подробнее
16-01-2014 дата публикации

INSTRUCTIONS PROCESSORS, METHODS, AND SYSTEMS TO PROCESS BLAKE SECURE HASHING ALGORITHM

Номер: US20140016773A1
Принадлежит:

A method of an aspect includes receiving an instruction indicating a first source having at least one set of four state matrix data elements, which represent a complete set of four inputs to a G function of a cryptographic hashing algorithm. The algorithm uses a sixteen data element state matrix, and alternates between updating data elements in columns and diagonals. The instruction also indicates a second source having data elements that represent message and constant data. In response to the instruction, a result is stored in a destination indicated by the instruction. The result includes updated state matrix data elements including at least one set of four updated state matrix data elements. Each of the four updated state matrix data elements represents a corresponding one of the four state matrix data elements of the first source, which has been updated by the G function. 1. A method comprising:receiving an instruction, the instruction indicating a first source having packed state matrix data elements including at least one set of four state matrix data elements that represent a complete set of four inputs to a G function of a cryptographic hashing algorithm, the cryptographic hashing algorithm using a state matrix having sixteen state matrix data elements and alternating between updating state matrix data elements in columns and diagonals of the state matrix, the instruction also indicating a second source having packed data elements that represent message and constant data; andstoring a result in a destination indicated by the instruction in response to the instruction, the result having packed updated state matrix data elements including at least one set of four updated state matrix data elements, each of the four updated state matrix data elements in the one set representing a corresponding one of the four state matrix data elements in the one set of the first source updated by the G function.2. The method of claim 1 , wherein the cryptographic hashing ...

Подробнее
16-01-2014 дата публикации

INSTRUCTIONS TO PERFORM GROESTL HASHING

Номер: US20140016774A1
Принадлежит:

A method is described. The method includes executing an instruction to perform one or more Galois Field (GF) multiply by 2 operations on a state matrix and executing an instruction to combine results of the one or more GF multiply by 2 operations with exclusive or (XOR) functions to generate a result matrix. 1. A method of performing a process in a computer processor , comprising:executing an instruction to perform one or more Galois Field (GF) multiply by 2 operations on a state matrix; andexecuting an instruction to combine results of the one or more GF multiply by 2 operations with exclusive or (XOR) functions to generate a result matrix.2. The method of further comprising executing the instruction to perform GF multiply by 2 operations a second time on the state matrix prior to executing the instruction to combine results.3. The method of wherein performing the one or more GF multiply by 2 operations comprises:storing rows of the state matrix in a first register;performing a GF multiply by 2 operation on each row stored in the first register; andstoring the results of the GF multiply by 2 operation in a second register.4. The method of wherein performing the one or more GF multiply by 2 operations further comprises performing an XOR operation for each most significant bit having a value of 1.5. The method of wherein performing the one or more GF multiply by 2 operations further comprises:performing a second GF multiply by 2 operation on each row stored in the second register; andstoring the results of the second GF multiply by 2 operation in a third register.6. The method of wherein executing the instruction to combine results of the one or more GF multiply by 2 operations comprises using data stored in the first claim 5 , second and third registers as source operands to combine factors of the state matrix.7. The method of wherein the result matrix is stored in the first register.8. The method of wherein the state matrix is an 8×8 matrix of 8 bit entries.9. The ...

Подробнее
20-02-2014 дата публикации

INSTRUCTIONS TO PERFORM JH CRYPTOGRAPHIC HASHING

Номер: US20140053000A1
Принадлежит: Intel Corporation

A method is described. The method includes executing one or more JH_SBOX_L instruction to perform S-Box mappings and a linear (L) transformation on a JH state and executing one or more JH_Permute instruction to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed 1. A method of performing a process in a computer processor , comprising:executing one or more JH_SBOX_L instruction to perform S-Box mappings and a linear (L) transformation on a JH state; andexecuting one or more JH_Permute instruction to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed.2. The method of wherein JH state bits are stored consecutively in first and second registers before executing the JH_SBOX_L instruction.3. The method of further wherein the first and second registers are 512-bit ZMM registers.4. The method of further wherein the first register stores a lower 512 bits of the JH state and the second register stores the upper 512 bits of the JH state.5. The method of further comprising:executing the JH_SBOX_L instruction a first time to perform the S-Box mappings and the L transformation on a first component of the JH state stored in the first register; andexecuting the JH_SBOX_L instruction a second time to perform the S-Box mappings and the L transformation on a second component of the JH state stored in the second register.6. The method of wherein the JH_SBOX_L instruction is executed the first time and the second time using a mask register.7. The method of further comprising:storing the results of the first execution of the JH_SBOX_L instruction in a first destination register as first JH state results; andstoring the results of the second execution of the JH_SBOX_L instruction in a second destination register as second JH state results.8. The method of wherein execution of the JH_Permute instruction further comprises:retrieving JH state results from the first ...

Подробнее
27-02-2014 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20140059333A1
Принадлежит: Intel Corp

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.

Подробнее
01-01-2015 дата публикации

Protected Power Management Mode In A Processor

Номер: US20150006917A1
Принадлежит:

In an embodiment, a processor includes a plurality of cores. Each core includes a core power unit to detect one or more power management events, and in response to the one or more power management events, initiate a protected power management mode in the core. Software interrupts to the core may be disabled during the protected power management mode. The core is to execute power management code during the protected power management mode. Other embodiments are described and claimed. 1. A processor comprising: detect one or more power management events; and', 'in response to the one or more power management events, initiate a protected power management mode in the core, wherein software interrupts to the core are disabled during the protected power management mode, wherein the core is to execute power management code during the protected power management mode., 'a plurality of cores, each core including a core power unit to2. The processor of claim 1 , wherein the core power unit comprises one or more event detectors to detect the one or more power management events.3. The processor of claim 1 , wherein the one or more power management events comprise at least one of a temperature event claim 1 , a voltage event claim 1 , and a timing event.4. The processor of claim 1 , wherein the core power unit comprises an operating system interface to enable direct communication with an operating system.5. The processor of claim 4 , wherein the software interrupts disabled during the protected power management mode comprise interrupts not received through the operating system interface.6. The processor of claim 1 , wherein the core power unit comprises protected power management mode logic to disable the software interrupts responsive to the at least one power management event.7. The processor of claim 1 , wherein the core power unit is further to claim 1 , in response to the at least one power management event claim 1 , store an execution state of the core in one or more ...

Подробнее
17-04-2014 дата публикации

Apparatus and method for vector compute and accumulate

Номер: US20140108480A1
Принадлежит: Intel Corp

An apparatus and method are described for comparing elements between two immediate values. For example, a method according to one embodiment includes the following operations: reading values of a first set of elements stored in a first immediate value, each element having a defined element position in the first immediate value; comparing each element from the first set of elements with each of a second set of elements stored in a second immediate value; counting the number of times the value of each element of the first set of elements is found in the second set of elements to arrive at a final count for each element of the first set of elements; and transferring the final count for each element to a third immediate value, wherein the final count is stored in an element position in the third immediate value corresponding to the defined element position in the first immediate value.

Подробнее
29-01-2015 дата публикации

METHOD, APPARATUS, AND SYSTEM FOR TRANSACTIONAL SPECULATION CONTROL INSTRUCTIONS

Номер: US20150032998A1
Принадлежит:

An apparatus and method is described herein for providing speculation control instructions. An xAcquire and xRelease instruction are provided to define a critical section. In one embodiment, the xAcquire instruction includes a lock instruction with an elision prefix and the xRelease instruction includes a lock release instruction with an elision prefix. As a result, a processor is able to elide locks and transactionally execute a critical section defined in software by xAcquire and xRelease. But by adding only prefix hints, legacy processor are able to execute the same code by just ignoring the hints and executing the critical section traditionally with locks to guarantee mutual exclusion. Moreover, xBegin and xEnd are similarly provided for in an Instruction Set Architecture (ISA) to define a transactional code region. In addition, other control speculation instructions, such as xAbort to enable explicit abort of a critical or transactional code section and xTest to test a state of speculative execution is also provided in the ISA. 1. An apparatus comprising:decode logic configured to decode a lock instruction including a lock elision hint field set to hint that at least a portion of the lock instruction is to be elided; andlock elision logic coupled to the decode logic, the lock elision logic being configured to determine if the lock instruction is to be elided based on the lock elision hint field being set to hint that at least a portion of the lock instruction is to be elided and eliding the lock instruction in response to determining the lock instruction is to be elided; andexecution logic coupled to the lock elision logic, the execution logic being configured to execute a critical section started by the lock instruction transactionally in response to the lock elision logic eliding at least a portion of the lock instruction.2. The apparatus of claim 1 , wherein the lock instruction including the lock elision hint field comprises the lock elision hint field ...

Подробнее
01-05-2014 дата публикации

APPARATUS AND METHOD OF EXECUTION UNIT FOR CALCULATING MULTIPLE ROUNDS OF A SKEIN HASHING ALGORITHM

Номер: US20140122839A1
Принадлежит:

An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows. a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections. 1. An apparatus , comprising:an execution unit within an instruction pipeline, said execution unit having multiple stages of the following circuit: i) a first input to receive a first quad word and a second input to receive a second quad word;', 'ii) an adder having a pair of inputs that are respectively coupled to said first and second inputs;', 'iii) a rotator having a respective input coupled to said second input;', 'iv) an XOR gate having a first input coupled to an output of said adder and a second input coupled to an output of said rotator;, 'a) a first logic circuitry section comprising multiple mix logic sections each comprisingb) permute logic circuitry having inputs coupled to said respective adder and XOR gate outputs of said multiple mix logic sections.2. The apparatus of wherein said execution unit further has:a ROM containing control values for said rotator for said multiple mix logic sections.3. The apparatus of wherein said Skein hashing algorithm is a Skein 256 hashing algorithm.4. The apparatus of wherein said Skein hashing algorithm is a Skein 512 hashing algorithm.5. The apparatus of wherein said Skein hashing algorithm is a Skein 1024 hashing algorithm.6. The apparatus of ...

Подробнее
26-02-2015 дата публикации

METHOD AND APPARATUS FOR A NON-DETERMINISTIC RANDOM BIT GENERATOR (NRBG)

Номер: US20150055778A1
Принадлежит:

A hardware-based digital random number generator is provided. In one embodiment, a processor includes a digital random number generator (DRNG) to condition entropy data provided by an entropy source, to generate a plurality of deterministic random bit (DRB) strings, and to generate a plurality of nondeterministic random bit (NRB) strings, and an execution unit coupled to the DRNG, in response to a first instruction to read a seed value, to retrieve one of the NRB strings from the DRNG and to store the NRB string in a destination register specified by the first instruction. 1. A processor , comprising:a digital random number generator (DRNG) to condition entropy data provided by an entropy source, to generate a plurality of deterministic random bit (DRB) strings, and to generate a plurality of nondeterministic random bit (NRB) strings; andan execution unit coupled to the DRNG, in response to a first instruction to read a seed value, to retrieve one of the NRB strings from the DRNG and to store the NRB string in a destination register specified by the first instruction.2. The processor of claim 1 , further comprising a flag register to store a flag set by the execution unit to indicate whether the NRB string stored in the destination register is valid.3. The processor of claim 1 , wherein the execution unit is configured claim 1 , in response to a second instruction to read a random number claim 1 , to retrieve one of the DRB strings from the DRNG and to store the DRB in a destination register specified by the second instruction.4. The processor of claim 3 , wherein the DRNG comprises:a conditioner to condition the entropy data provided by the entropy source to generate conditioned entropy (CE) data;a DRB generator (DRBG) coupled to the conditioner to generate the DRB strings based on the CE data; andan NRB generator (NRBG) coupled to the conditioner and the DRBG to generate the NRB strings based on the DRB strings and the CE data.5. The processor of claim 4 , wherein ...

Подробнее
02-03-2017 дата публикации

APPARATUS AND METHOD FOR VECTOR INSTRUCTIONS FOR LARGE INTEGER ARITHMETIC

Номер: US20170060584A1
Принадлежит:

An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register. 120.-. (canceled)21. A hardware processor comprising:a hardware decoder to decode a first instruction into a decoded first instruction, a second instruction into a decoded second instruction, and an add instruction into a decoded add instruction; and execute the decoded first instruction to multiply a first input operand and a second input operand and store a lower portion of a result, said first and second input operands being respective elements of a first input vector and a second input vector,', 'execute the decoded second instruction to multiply the first input operand and the second input operand and store an upper portion of a result, said first and second input operands being the respective elements of the first input vector and the second input vector, and', 'execute the decoded add instruction to add aligned elements of the upper portion and the lower portion with a previous, corresponding carry term from an input operand and store a result., 'a hardware execution unit to22. The hardware processor of claim 21 , wherein the hardware execution unit is to execute the decoded add instruction to further cause a next carry term of said add instruction's adding to be stored.23. The hardware processor ...

Подробнее
19-03-2015 дата публикации

EFFICIENT MULTIPLICATION, EXPONENTIATION AND MODULAR REDUCTION IMPLEMENTATIONS

Номер: US20150082047A1
Принадлежит:

In one embodiment, the present disclosure provides a method that includes segmenting an n-bit exponent e into a first segment eand a number t of k-bit segments ein response to a request to determine a modular exponentiation result R, wherein R is a modular exponentiation of a generator base g for the exponent e and a q-bit modulus m, wherein the generator base g equals two and k is based at least in part on a processor configured to determine the result R; iteratively determining a respective intermediate modular exponentiation result for each segment e, wherein the determining comprises multiplication, exponentiation and a modular reduction of at least one of a multiplication result and an exponentiation result; and generating the modular exponentiation result R=gmod m based on, at least in part, at least one respective intermediate modular exponentiation result. 1. A method , comprising:{'sub': t', 'i, 'segmenting an n-bit exponent e into a first segment eand a number t of k-bit segments ein response to a request to determine a modular exponentiation result R, wherein R is a modular exponentiation of a generator base g for the exponent e and a q-bit modulus m, wherein the generator base g equals two and k is based at least in part on a processor configured to determine the result R;'}{'sub': 'i', 'iteratively determining a respective intermediate modular exponentiation result for each segment e, wherein the determining comprises multiplication, exponentiation and a modular reduction of at least one of a multiplication result and an exponentiation result; and'}{'sup': 'e', 'generating the modular exponentiation result R=gmod m based on, at least in part, at least one respective intermediate modular exponentiation result.'}2. The method of claim 1 , further comprising selecting k such that 2=p wherein the processor is a p-bit processor comprising a plurality of p-bit registers.3. The method of claim 2 , wherein an intermediate exponentiation result is determined ...

Подробнее
26-03-2015 дата публикации

METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION

Номер: US20150089195A1
Принадлежит:

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value. 1. A processor comprising:a plurality of levels of cache including a Level 1 (L1) cache;a plurality of integer registers;a plurality of registers in which to store floating-point data elements including 128-bit packed double operands that are to have two 64-bit double floating-point data elements;a plurality of status registers;an instruction pointer register;an instruction prefetcher to fetch instructions;a decoder to decode the fetched instructions including an instruction to perform a shift and exclusive OR (XOR) operation, wherein the instruction to perform the shift and XOR operation has a first source operand identifier to identify a first source operand, a second source operand identifier to identify a second source operand, an immediate field to specify a shift amount, and a field to identify the first and second source operands as being one of a 32-bit source operand and a 64-bit source operand; and shift the first source operand by the shift amount specified by the immediate field, wherein the first source operand is a scalar value,', 'XOR the shifted first source operand with the second source operand, and', 'store a resulting shifted and XOR′ed value in a destination register, wherein the destination register is a scalar register; and, 'an execution unit coupled to the decoder, such that the processor, in response to the instruction to perform the shift and XOR operation, is toa floating-point unit to operate on floating-point data elements.2. The processor of claim 1 , wherein the processor claim 1 , in response to the instruction claim 1 , is to right shift the first source operand by the shift amount.3. The processor of claim 1 , wherein the processor claim 1 , in response to the ...

Подробнее
26-03-2015 дата публикации

METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION

Номер: US20150089196A1
Принадлежит:

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value. 1. A system comprising:a display control;a memory interface; and a plurality of levels of cache including a Level 1 (L1) cache;', 'a plurality of integer registers;', 'a plurality of floating-point registers in which to store floating-point data elements including 128-bit packed double operands that are to have two 64-bit double floating-point data elements;', 'a plurality of status registers;', 'an instruction pointer register;, 'a processor, the processor comprisingan instruction prefetcher to fetch instructions;a decoder to decode the fetched instructions including an instruction to perform a shift and exclusive OR (XOR) operation, wherein the instruction to perform the shift and XOR operation has a first source operand identifier to identify a first source operand, a second source operand identifier to identify a second source operand, an immediate field to specify a shift amount, and a field to identify the first and second source operands as being one of a 32-bit source operand and a 64-bit source operand; and shift the first source operand by the shift amount specified by the immediate field, wherein the first source operand is a scalar value,', 'XOR the shifted first source operand with the second source operand, and', "store a resulting shifted and XOR'ed value in a destination register, wherein the destination register is a scalar register; and"], 'an execution unit coupled to the decoder, such that the processor, in response to the instruction to perform the shift and XOR operation, is toa floating-point unit to operate on floating-point data elements.2. The system of claim 1 , wherein the processor claim 1 , in response to the instruction claim 1 , is to right shift the first source operand by ...

Подробнее
26-03-2015 дата публикации

METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION

Номер: US20150089197A1
Принадлежит:

Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value. 1. A cellular phone comprising:a random access memory (RAM);a wireless transceiver; and a plurality of levels of cache including a Level 1 (L1) cache;', 'a plurality of integer registers;', 'a plurality of registers in which to store floating-point data elements including 128-bit packed double operands that are to have two 64-bit double floating-point data elements;', 'a plurality of status registers;', 'an instruction pointer register;', 'an instruction prefetcher to fetch instructions;', 'a decoder to decode the fetched instructions including an instruction to perform a shift and exclusive OR (XOR) operation, wherein the instruction to perform the shift and XOR operation has a first source operand identifier to identify a first source operand, a second source operand identifier to identify a second source operand, an immediate field to specify a shift amount, and a field to identify the first and second source operands as being one of a 32-bit source operand and a 64-bit source operand; and', shift the first source operand by the shift amount specified by the immediate field, wherein the first source operand is a scalar value,', 'XOR the shifted first source operand with the second source operand, and', "store a resulting shifted and XOR'ed value in a destination register, wherein the destination register is a scalar register; and"], 'an execution unit coupled to the decoder, such that the processor, in response to the instruction to perform the shift and XOR operation, is to, 'a floating-point unit to operate on floating-point data elements., 'a processor coupled to the RAM and the wireless transceiver, the processor comprising2. The cellular phone of claim 1 , further comprising:a Bluetooth device; ...

Подробнее
26-03-2015 дата публикации

ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION EITHER WITHOUT WRITING OR READING FLAGS

Номер: US20150089199A1
Принадлежит: Intel Corporation

A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag. 130.-. (canceled)31. A multi-core processor comprising:at least four cores, wherein each of the at least four cores comprises:at least one level 1 (L1) cache;a register to store a plurality of flags, including a carry flag, a sign flag, a zero flag, and an overflow flag;at least sixteen 64-bit general-purpose registers, wherein the 64-bit general-purpose registers are operable to store 64-bit operands in a 64-bit mode and are operable to store 32-bit operands in a 32-bit mode, wherein the 32-bit operands are to be stored in a lower 32-bits of the 64-bit general-purpose registers;a branch prediction logic;an instruction fetch logic to fetch a rotate right instruction;a decoder to decode the rotate right instruction, wherein the rotate right instruction is to indicate a 64-bit operand size, a first 64-bit source operand, a second 64-bit source operand, and a 64-bit general-purpose register;a register renaming logic to rename the 64-bit general-purpose registers;a re-order buffer; anda plurality of execution units, including a first execution unit to execute the rotate right instruction, wherein the rotate right instruction is to rotate the first 64-bit source operand right by an amount indicated by the second 64-bit source operand, wherein bits rotated out of a least significant bit of the first 64-bit source operand are to be rotated into a most significant bit of the first 64-bit source operand, wherein a result is to be stored into the 64-bit general-purpose register, and wherein the rotate right instruction is to complete without writing the carry flag, without writing the sign flag, without ...

Подробнее
26-03-2015 дата публикации

ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION EITHER WITHOUT WRITING OR READING FLAGS

Номер: US20150089200A1
Принадлежит: Intel Corporation

A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag. 130.-. (canceled)31. A system-on-chip (SoC) comprising:an on-die memory controller;an on-die graphics device;an on die controller for a universal serial bus (USB) port; andan on-die multi-core processor comprising:at least four cores, wherein each of the at least four cores comprises:at least one level 1 (L1) cache;a register to store a plurality of flags, including a carry flag, a sign flag, a zero flag, and an overflow flag;at least sixteen 64-bit general-purpose registers, wherein the 64-bit general-purpose registers are operable to store 64-bit operands in a 64-bit mode and are operable to store 32-bit operands in a 32-bit mode, wherein the 32-bit operands are to be stored in a lower 32-bits of the 64-bit general-purpose registers;a branch prediction logic;an instruction fetch logic to fetch a rotate right instruction;a decoder to decode the rotate right instruction, wherein the rotate right instruction is to indicate a 64-bit operand size, a first 64-bit source operand, a second 64-bit source operand, and a 64-bit general-purpose register;a register renaming logic to rename the 64-bit general-purpose registers;a re-order buffer; anda plurality of execution units, including a first execution unit to execute the rotate right instruction, wherein the rotate right instruction is to rotate the first 64-bit source operand right by an amount indicated by the second 64-bit source operand, wherein bits rotated out of a least significant bit of the first 64-bit source operand are to be rotated into a most significant bit of the first 64-bit source operand, wherein a result is to be stored into the 64-bit ...

Подробнее
26-03-2015 дата публикации

ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION EITHER WITHOUT WRITING OR READING FLAGS

Номер: US20150089201A1
Принадлежит: Intel Corporation

A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag. 130-. (canceled)31. A cell phone comprising:a random access memory (RAM);a wireless transceiver; and at least four cores, wherein each of the at least four cores comprises:', 'at least one level 1 (L1) cache;', 'a register to store a plurality of flags, including a carry flag, a sign flag, a zero flag, and an overflow flag;', 'at least sixteen 64-bit general-purpose registers, wherein the 64-bit general-purpose registers are operable to store 64-bit operands in a 64-bit mode and are operable to store 32-bit operands in a 32-bit mode, wherein the 32-bit operands are to be stored in a lower 32-bits of the 64-bit general-purpose registers;', 'a branch prediction logic;', 'an instruction fetch logic to fetch a rotate right instruction;', 'a decoder to decode the rotate right instruction, wherein the rotate right instruction is to indicate a 64-bit operand size, a first 64-bit source operand, a second 64-bit source operand, and a 64-bit general-purpose register; and', 'a plurality of execution units, including a first execution unit to execute the rotate right instruction, wherein the rotate right instruction is to rotate the first 64-bit source operand right by an amount indicated by the second 64-bit source operand, wherein bits rotated out of a least significant bit of the first 64-bit source operand are to be rotated into a most significant bit of the first 64-bit source operand, wherein a result is to be stored into the 64-bit general-purpose register, and wherein the rotate right instruction is to complete without writing the carry flag, without writing the sign flag, without writing the zero flag, ...

Подробнее
26-03-2015 дата публикации

EVENT COUNTER CHECKPOINTING AND RESTORING

Номер: US20150089286A1
Принадлежит:

Event counter checkpointing and restoring is disclosed. In one implementation, a processor includes a first event counter to count events that occur during execution within the processor, event counter checkpoint logic, communicably coupled with the first event counter, to store, prior to a transactional execution of the processor, a value of the first event counter, a second event counter to count events prior to and during the transactional execution, wherein the second event counter is to increment without resetting after the transactional execution is aborted, event count restore logic to restore the first event counter to the stored value after the transactional execution is aborted, and tuning logic to determine, in response to aborting of the transactional execution, a number of the events that occurred during the transactional execution based on the stored value of the first event counter and a value of the second event counter. 1. A processor comprising:a first event counter to count events that occur during execution within the processor;event counter checkpoint logic communicably coupled with the first event counter, wherein the event counter checkpoint logic is to store, prior to a transactional execution of the processor, a value of the first event counter;a second event counter to count events prior to and during the transactional execution, wherein the second event counter is to increment without resetting after the transactional execution is aborted;event count restore logic communicably coupled with the first event counter, wherein the event count restore logic is to restore the first event counter to the stored value after the transactional execution is aborted; andtuning logic communicably coupled with the first event counter and the second event counter, wherein the tuning logic is to determine, in response to aborting of the transactional execution, a number of the events that occurred during the transactional execution based on the stored value ...

Подробнее
12-06-2014 дата публикации

Apparatus and method for vector instructions for large integer arithmetic

Номер: US20140164467A1
Принадлежит: Intel Corp

An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register.

Подробнее
09-04-2015 дата публикации

PROCESSOR TO PERFORM A BIT RANGE ISOLATION INSTRUCTION

Номер: US20150100760A1
Принадлежит: Intel Corporation

Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. 130.-. (canceled)31. A processor comprising:instruction fetch logic;branch prediction logic;a plurality of registers including control registers, status registers, and 64-bit general purpose registers, wherein the lower 32-bits of the 64-bit general purpose registers are addressable to operate on 32-bit operands, and wherein the status registers include a 32-bit register having a plurality of bits associated with a group of status flags, the status flags including a carry flag, a zero flag, a sign flag, and an overflow flag;an instruction decoder to receive and decode a bit range isolation instruction, the instruction to implicitly indicate a first end of a range of bits of interest, and, through an immediate associated with the instruction, to explicitly specify a second end of the first range of bits of interest; and mask generation logic to generate a mask; and', 'bitwise operation logic to receive the source operand and the mask, and to perform a bitwise AND operation on the source operand and the mask to produce a result including a first ...

Подробнее
09-04-2015 дата публикации

SYSTEM-ON-CHIP (SoC) TO PERFORM A BIT RANGE ISOLATION INSTRUCTION

Номер: US20150100761A1
Принадлежит: Intel Corporation

Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. 130-. (canceled)31. A system-on-chip (SoC) comprising:a network controller to couple the SoC to a network;a memory controller to provide access to a dynamic random access memory;a shared cache to store data; and an L1 cache;', 'an instruction fetch logic;', 'branch prediction logic;', 'a plurality of registers including control registers, status registers, and 64-bit general purpose registers, wherein the lower 32-bits of the 64-bit general purpose registers are addressable to operate on 32-bit operands, and wherein the status registers include a 32-bit register having a plurality of bits associated with a group of status flags, the status flags including a carry flag, a zero flag, a sign flag, and an overflow flag;', 'an instruction decoder to receive and decode a bit range isolation instruction, the instruction to implicitly indicate a first end of a range of bits of interest, and, through an immediate associated with the instruction, to explicitly specify a second end of the first range of bits of interest; and', mask generation logic to ...

Подробнее
09-04-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150100796A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126-. (canceled)27. A mobile computer comprising:a random access memory (RAM);a network interface controller; anda processor coupled to the RAM, the processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including an Advanced Encryption Standard (AES) decryption instruction that is to use a first 128-bit source/destination register of the plurality and a second 128-bit source register of the plurality; andan execution unit coupled to the decode unit, such that the AES decryption instruction is to cause the execution unit to receive data from the first 128-bit source/destination register and the second 128-bit source register, perform operations for an AES single decryption round including an inverse byte substitution, an inverse shift rows, and an exclusive OR, but omitting an inverse mix columns, and store a result in the first 128-bit source/destination register.28. The mobile computer of claim 27 , wherein the AES decryption instruction is capable of using any one of a 128-bit round key claim 27 , a 192-bit round key claim 27 , and a 256-bit round key.29. The mobile computer of claim 27 , wherein the processor further comprises a microcode Read Only Memory (ROM) to store micro-operations to implement the AES decryption instruction.30. The mobile computer of claim 27 , further comprising an input/output ...

Подробнее
09-04-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150100797A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126.-. (canceled)27. A system comprising:a memory controller to control communication with a double data rate memory; anda processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including four Advanced Encryption Standard (AES) instructions, wherein each of the four AES instructions has a unique opcode, the four AES instructions including:a first AES encryption instruction that is to use a 128-bit source/destination register of the plurality and a 128-bit source register of the plurality, anda second AES encryption instruction that is to use a 128-bit source register of the plurality and a 128-bit destination register of the plurality; andan execution unit coupled to the decode unit, such that:the first AES encryption instruction is to cause the execution unit to receive data from the 128-bit source/destination register and the 128-bit source register of the first AES encryption instruction, perform operations for an AES single encryption round including a byte substitution, a shift rows, and an exclusive OR, but omitting a mix columns, and store a first result in the 128-bit source/destination register; andthe second AES encryption instruction is to cause the execution unit to receive data from the 128-bit source register of the second AES encryption instruction, perform operations including a mix columns, ...

Подробнее
09-04-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150100798A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126-. (canceled)27. A mobile computer comprising:a random access memory (RAM);a network interface controller; anda processor coupled to the RAM, the processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including four Advanced Encryption Standard (AES) instructions, wherein each of the four AES instructions has a unique opcode, the four AES instructions including:a first AES decryption instruction that is to use a 128-bit source/destination register of the plurality and a 128-bit source register of the plurality, anda second AES decryption instruction that is to use a 128-bit source register of the plurality and a 128-bit destination register of the plurality; andan execution unit coupled to the decode unit, such that:the first AES decryption instruction is to cause the execution unit to receive data from the 128-bit source/destination register and the 128-bit source register of the first AES decryption instruction, perform operations for an AES single decryption round including an inverse byte substitution, an inverse shift rows, and an exclusive OR, but omitting an inverse mix columns, and store a first result in the 128-bit source/destination register; andthe second AES decryption instruction is to cause the execution unit to receive data from the 128-bit source register of the second AES decryption ...

Подробнее
16-04-2015 дата публикации

Flexible architecture and instruction for advanced encryption standard (aes)

Номер: US20150104007A1
Принадлежит: Intel Corp

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
16-04-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150104008A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126.-. (canceled)27. A mobile computer comprising:a random access memory (RAM);a network interface controller; anda processor coupled to the RAM, the processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including four Advanced Encryption Standard (AES) instructions, wherein each of the four AES instructions has a unique opcode, the four AES instructions including:a first AES encryption instruction that is to use a 128-bit source/destination register of the plurality and a 128-bit source register of the plurality, anda second AES encryption instruction that is to use a 128-bit source register of the plurality and a 128-bit destination register of the plurality; andan execution unit coupled to the decode unit, such that:the first AES encryption instruction is to cause the execution unit to receive data from the 128-bit source/destination register and the 128-bit source register of the first AES encryption instruction, perform operations for an AES single encryption round including a byte substitution, a shift rows, and an exclusive OR, but omitting a mix columns, and store a first result in the 128-bit source/destination register; andthe second AES encryption instruction is to cause the execution unit to receive data from the 128-bit source register of the second AES encryption instruction, perform ...

Подробнее
16-04-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150104009A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126.-. (canceled)27. A system comprising:a memory controller to control communication with a double data rate memory; anda processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including an Advanced Encryption Standard (AES) decryption instruction that is to use a first 128-bit source/destination register of the plurality and a second 128-bit source register of the plurality; andan execution unit coupled to the decode unit, such that the AES decryption instruction is to cause the execution unit to receive data from the first 128-bit source/destination register and the second 128-bit source register, perform operations for an AES single decryption round including an inverse byte substitution, an inverse shift rows, and an exclusive OR, but omitting an inverse mix columns, and store a result in the first 128-bit source/destination register.28. The system of claim 27 , wherein the AES decryption instruction is capable of using any one of a 128-bit round key claim 27 , a 192-bit round key claim 27 , and a 256-bit round key.29. The system of claim 27 , wherein the processor further comprises a microcode Read Only Memory (ROM) to store micro-operations to implement the AES decryption instruction.30. A system comprising:a memory controller to control communication with a memory; anda processor comprising:a ...

Подробнее
16-04-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150104010A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126-. (canceled)27. A system comprising:a memory controller to control communication with a double data rate memory; anda processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including four Advanced Encryption Standard (AES) instructions, wherein each of the four AES instructions has a unique opcode, the four AES instructions including:a first AES decryption instruction that is to use a 128-bit source/destination register of the plurality and a 128-bit source register of the plurality, anda second AES decryption instruction that is to use a 128-bit source register of the plurality and a 128-bit destination register of the plurality; andan execution unit coupled to the decode unit, such that:the first AES decryption instruction is to cause the execution unit to receive data from the 128-bit source/destination register and the 128-bit source register of the first AES decryption instruction, perform operations for an AES single decryption round including an inverse byte substitution, an inverse shift rows, and an exclusive OR, but omitting an inverse mix columns, and store a first result in the 128-bit source/destination register; andthe second AES decryption instruction is to cause the execution unit to receive data from the 128-bit source register of the second AES decryption instruction, perform ...

Подробнее
08-04-2021 дата публикации

TECHNIQUES FOR RESTRICTED DEPLOYMENT OF TARGETED PROCESSOR FIRMWARE UPDATES

Номер: US20210103662A1
Принадлежит:

Methods and apparatus for restricted deployment of targeted processor firmware updates. During a patch enabling per-work flow, service entitlement license information comprising one of more service entitlements is generated and provisioned on one or more computing platforms. A restricted deployment microcode (uCode) update release (aka uCode patch) targeted for platforms having CPUs and/or XPUs with certain part identifier is sent to the one or more platforms. Run-time software and/or firmware on the platforms are executed to access the provisioned service entitlement license information, which is used to authentic and verify the restricted deployment uCode update release using a service entitlement having a part identifier associated with the platform's CPU. In one solution, authentication is performed using a hash-matching scheme and verification is used to verify the platform is properly licensed to load uCode included in the restricted deployment microcode (uCode) update release into the CPU. 1. A method for restricted deployment of microcode (uCode) for a processing unit having a part identifier , comprising:receiving a restricted deployment uCode update release at a platform having a processing unit with the part identifier;authenticating and verifying the restricted deployment uCode update release against service entitlement license information stored on the platform; andwhen the restricted deployment uCode update release is determined to be authentic and verified, updating uCode on the processing unit using uCode in the restricted deployment uCode update release.2. The method of claim 1 , further comprising provisioning the platform with the service entitlement license information prior to receiving the restricted deployment uCode update release.3. The method of claim 2 , further comprising:computing a first hash using a first key over data comprising the part identifier plus additional data;generating a license table containing service entitlement license ...

Подробнее
20-04-2017 дата публикации

Instruction for performing an overload check

Номер: US20170109160A1
Принадлежит: Intel Corp

A processor is described having a functional unit within an instruction execution pipeline. The functional unit having circuitry to determine whether substantive data from a larger source data size will fit within a smaller data size that the substantive data is to flow to.

Подробнее
20-04-2017 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Номер: US20170109162A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. 1. A processor , comprising:an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants; andan execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.2. The processor of claim 1 , wherein the first operand specifies a first register having at least 256 bits to store SHA-2 state variables used to perform SHA-256 round operations.3. The processor of claim 2 , wherein the second operand specifies a second register or a memory location having at least 64 bits to store at least two messages and round constants for the SHA-256 round operations.4. The processor of claim 1 , wherein the first operand specifies a first register having at least 512 bits to store SHA-2 state variables used to perform SHA-512 round operations.5. The processor of claim 1 , wherein the instruction decoder receives a second ...

Подробнее
10-07-2014 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Номер: US20140195782A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. 1. A processor , comprising:an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants; andan execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.2. The processor of claim 1 , wherein the first operand specifies a first register having at least 256 bits to store SHA-2 state variables used to perform SHA-256 round operations.3. The processor of claim 2 , wherein the second operand specifies a second register or a memory location having at least 64 bits to store at least two messages and round constants for the SHA-256 round operations.4. The processor of claim 1 , wherein the first operand specifies a first register having at least 512 bits to store SHA-2 state variables used to perform SHA-512 round operations.5. The processor of claim 1 , wherein the instruction decoder receives a second ...

Подробнее
10-07-2014 дата публикации

THREE INPUT OPERAND VECTOR ADD INSTRUCTION THAT DOES NOT RAISE ARITHMETIC FLAGS FOR CRYPTOGRAPHIC APPLICATIONS

Номер: US20140195817A1
Принадлежит: Intel Corporation

A method is described that includes performing the following within an instruction execution pipeline implemented on a semiconductor chip: summing three input vector operands through execution of a single instruction; and, not raising any arithmetic flags even though a result of the summing creates more bits than circuitry designed to transport the summation is able to transport. 1. A method comprising:performing the following within an instruction execution pipeline implemented on a semiconductor chip:summing three input vector operands through execution of a single instruction; and,not raising any arithmetic flags even though a result of said summing creates more bits than circuitry designed to transport said summation is able to transport.2. The method of wherein said summing is performed with a single micro-operation.3. The method of wherein whether a result of said summation is written over one of said input vector operands is specified in said instruction's instruction format.4. The method of further comprising:summing three different input vector operands through execution of a following single instruction, one of said different vector operands being the result of said summing performed by said single instruction; and,not raising any arithmetic flags even though a result of said summing of said following single instruction creates more bits than hardware designed to transport said summation is able to transport.5. The method of further comprising iterating through the processes of and repeatedly to perform multiple rounds of a cryptographic hashing process.6. The method of wherein the performing of said multiple rounds includes executing a logic function instruction for each round on three input operand vectors claim 5 , where claim 5 , the logic function instruction also has an operand that specifies what specific logic function is to be performed on said three input operand vectors.7. The method of further comprising iterating through the processes of ...

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119123A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a final round of an AES decryption operation;a second source register to store input data to be decrypted by the final round of the AES decryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the final round of the AES decryption operation, the final round of the AES decryption operation to use the round key from the first source register to decrypt input data from the second source register and to store a final decrypted result of the AES decryption operation in a destination register;wherein the final round of the AES decryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to use an inverse substitution box (S-box),an inverse Shift Rows operation, andan Add Round Key operation in which an exclusive OR function is to use data from the round key.. A processor, comprising: This application claims the priority filing benefit of, is a continuation of, and incorporates by reference, U.S. patent application Ser. No. 14/572,620 entitled “FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION ...

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119124A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a final round of an AES encryption operation;a second source register to store input data to be encrypted by the final round of the AES encryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the final round of the AES encryption operation, the final round of the AES encryption operation to use the round key from the first source register to encrypt input data from the second source register and to store a final encrypted result of the AES encryption operation in a destination register;wherein the final round of the AES encryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to use a substitution box (S-box) to result in a first array of substituted data,a Shift Rows transform to shift row data in the first array by a specified amount to generate a shift rows result, andan Add Round Key transform in which an exclusive OR function is to use data from the round key and the shift rows result.. A processor, comprising: This application claims the priority filing benefit of, is a continuation of, and ...

Подробнее
28-04-2016 дата публикации

Flexible architecture and instruction for advanced encryption standard (aes)

Номер: US20160119125A1
Принадлежит: Intel Corp

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119126A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1. A system , comprising:a processor, comprising:a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a final round of an AES decryption operation;a second source register to store input data to be decrypted by the final round of the AES decryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the final round of the AES decryption operation, the final round of the AES decryption operation to use the round key from the first source register to decrypt input data from the second source register and to store a final decrypted result of the AES decryption operation in a destination register;wherein the final round of the AES decryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to use an inverse substitution box (S-box),an inverse Shift Rows operation, andan Add Round Key operation in which an exclusive OR function is to use data from the round key;a system memory comprising a multiple data rate dynamic random access memory coupled to the processor over one or more interconnects; andone or more storage devices coupled to the processor.2. The system of ...

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119127A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1. A system , comprising:a processor, comprising:a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a final round of an AES encryption operation;a second source register to store input data to be encrypted by the final round of the AES encryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the final round of the AES encryption operation, the final round of the AES encryption operation to use the round key from the first source register to encrypt input data from the second source register and to store a final encrypted result of the AES encryption operation in a destination register;wherein the final round of the AES encryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to use a substitution box (S-box) to result in a first array of substituted data,a Shift Rows transform to shift row data in the first array by a specified amount to generate a shift rows result, andan Add Round Key transform in which an exclusive OR function is to use data from the round key and the shift rows result;a system memory comprising a multiple data rate dynamic ...

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119128A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1. A system comprising:a processor comprising:a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a round of an AES decryption operation;a second source register to store input data to be decrypted by the round of the AES decryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the round of the AES decryption operation, the AES decryption operation to use the round key from the first source register to decrypt input data from the second source register and to store a result of the round of the AES decryption operation in a destination register;wherein the round of the AES decryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to use an inverse substitution box (S-box),an inverse Shift Rows operation,an inverse Mix Columns operation, andan Add Round Key operation in which an exclusive OR function is to use data from the round key;a memory controller to couple the processor to a dynamic random access memory (DRAM); andan input/output (I/O) controller to couple the processor to one or more devices.2. The system of claim 1 , wherein the memory ...

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119129A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1. A system comprising:a processor comprising:a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a round of an AES encryption operation;a second source register to store input data to be encrypted by the round of the AES encryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the round of the AES encryption operation, the AES encryption operation to use the round key from the first source register to encrypt input data from the second source register and to store a result of the round of the AES encryption operation in a destination register;wherein the round of the AES encryption operation is to include:a Sub Bytes transform to perform a byte substitution on the input data, the Sub Bytes transform to use a substitution box (S-box) to result in a first array of substituted data,a Shift Rows transform to shift row data in the first array by a specified amount to result in a second array,a Mix Columns transform in which columns of the second array are to be treated as polynomials over a Galois Field GF(28) and multiplied modulo x4+1 with a fixed polynomial to generate a mix columns result, andan Add Round Key ...

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119130A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1. A system comprising:a processor comprising:a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a final round of an AES decryption operation;a second source register to store input data to be decrypted by the final round of the AES decryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the final round of the AES decryption operation, the final round of the AES decryption operation to use the round key from the first source register to decrypt input data from the second source register and to store a final decrypted result of the AES decryption operation in a destination register;wherein the final round of the AES decryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to use an inverse substitution box (S-box),an inverse Shift Rows operation, andan Add Round Key operation in which an exclusive OR function is to use data from the round key; anda memory controller to couple the processor to a dynamic random access memory (DRAM); andan input/output (I/O) controller to couple the processor to one or more devices.2. The system of claim 1 , wherein ...

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119131A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1. A system comprising:a processor comprising:a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a final round of an AES encryption operation;a second source register to store input data to be encrypted by the final round of the AES encryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the final round of the AES encryption operation, the final round of the AES encryption operation to use the round key from the first source register to encrypt input data from the second source register and to store a final encrypted result of the AES encryption operation in a destination register;wherein the final round of the AES encryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to use a substitution box (S-box) to result in a first array of substituted data,a Shift Rows transform to shift row data in the first array by a specified amount to generate a shift rows result, andan Add Round Key transform in which an exclusive OR function is to use data from the round key and the shift rows result; anda memory controller to couple the processor to a dynamic ...

Подробнее
24-07-2014 дата публикации

Instructions to perform jh cryptographic hashing in a 256 bit data path

Номер: US20140205084A1
Принадлежит: Intel Corp

A method is described. The method includes executing one or more JH_SBOX_L instructions to perform S-Box mappings and a linear (L) transformation on a JH state and executing one or more JH_P instructions to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed.

Подробнее
12-05-2016 дата публикации

Method, Apparatus, And System For Speculative Abort Control Mechanisms

Номер: US20160132333A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A system comprising:a plurality of processors;a processor interconnect to communicatively couple two of the plurality of processors;a system memory comprising a dynamic random access memory communicatively coupled to a processor of the plurality of processors over a memory interconnect; a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;', 'one or more of the plurality of cores to perform out-of-order execution of instructions of the threads;', instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache instructions and data, and', 'an execution unit to execute a ...

Подробнее
12-05-2016 дата публикации

Method, apparaturs, and system for speculative abort control mechanisms

Номер: US20160132334A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A processor comprising:a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;one or more of the plurality of cores to perform out-of-order execution of instructions of the threads; instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache instructions and data, and', 'an execution unit to execute a first instruction to abort transactional execution responsive to an abort condition., 'one or more of the plurality of cores comprising2. The processor of claim 1 , the execution unit to further execute:a second instruction to indicate a beginning of a transactional ...

Подробнее
12-05-2016 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20160132335A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A system comprising:a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;one or more of the plurality of cores to perform out-of-order execution of instructions of the threads; instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache instructions and data, and', 'an execution unit to execute a first instruction to abort transactional execution responsive to an abort condition; and, 'one or more of the plurality of cores comprisingone or more integrated memory controllers to communicatively couple the plurality of cores to dynamic random access system memory.2. ...

Подробнее
12-05-2016 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20160132336A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A system comprising:a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;one or more of the plurality of cores to perform out-of-order execution of instructions of the threads; instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache instructions and data, and', 'an execution unit to execute a first instruction to indicate an end of the transaction execution region and to cause memory transactions to be atomically committed or aborted; and, 'one or more of the plurality of cores comprisingone or more integrated memory controllers to communicatively couple the ...

Подробнее
12-05-2016 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20160132337A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A system comprising:a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;one or more of the plurality of cores to perform out-of-order execution of instructions of the threads; instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache instructions and data, and', 'an execution unit to execute a first instruction to indicate a beginning of a transactional execution region of instructions; and, 'one or more of the plurality of cores comprisingone or more integrated memory controllers to communicatively couple the plurality of cores to dynamic random access system ...

Подробнее
10-05-2018 дата публикации

GATHERING AND SCATTERING MULTIPLE DATA ELEMENTS

Номер: US20180129506A1
Принадлежит:

According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception. 1. A processor comprising:a destination register to store a plurality of data elements;a source register to store a plurality of index values, each of which corresponds to one of the plurality of data elements;a base address register to store a base address;a mask register to store mask values, each mask value corresponding to one of the plurality of data elements;a decoder to decode a vector gather instruction, the vector gather instruction having a scale field to specify a common scaling factor to be applied to the index values; andexecution circuitry coupled to the decoder, the execution circuitry to perform operations associated with the vector gather instruction, the operations comprising conditionally accessing, based on corresponding mask values, one or more of the plurality of data elements and storing the one or more of the plurality of data elements in the destination register;wherein the execution circuitry is to scale the index values in accordance with the scale field to generate a corresponding plurality of scaled index values, and add the base address to each of the scaled index values to generate a corresponding plurality of non-contiguous system memory addresses for the data elements to be accessed and stored in the destination register.2. The processor of wherein system memory addresses are determined by adding a displacement value to the combination of the base address and the scaled index ...

Подробнее
21-05-2015 дата публикации

HAND HELD DEVICE TO PERFORM A BIT RANGE ISOLATION INSTRUCTION

Номер: US20150143084A1
Принадлежит: Intel Corporation

Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. 130.-. (canceled)31. A hand held device comprising:a dynamic random access memory;a memory controller coupled to the dynamic random access memory;a flash memory device to store data;a wireless transceiver;a user input device;a network controller to couple the hand held device to a network;a shared cache to store data; and an L1 cache;', 'a instruction fetch logic;', 'branch prediction logic;', 'a plurality of registers including control registers, status registers, and 64-bit general purpose registers, wherein the lower 32-bits of the 64-bit general purpose registers are addressable to operate on 32-bit operands, and wherein the status registers include a 32-bit register having a plurality of bits associated with a group of status flags, the status flags including a carry flag, a zero flag, a sign flag, and an overflow flag;', 'an instruction decoder to receive and decode a bit range isolation instruction, the instruction to implicitly indicate a first end of a range of bits of interest, and, through an immediate associated with the instruction, to ...

Подробнее
17-05-2018 дата публикации

MULTIPLICATION INSTRUCTION FOR WHICH EXECUTION COMPLETES WITHOUT WRITING A CARRY FLAG

Номер: US20180136936A1
Принадлежит: Intel Corporation

A method in one aspect may include receiving a multiply instruction. The multiply instruction may indicate a first source operand and a second source operand. A product of the first and second source operands may be stored in one or more destination operands indicated by the multiply instruction. Execution of the multiply instruction may complete without writing a carry flag. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. 1. A processor comprising:a flags register to store arithmetic flags;a decoder to decode instructions, including an unsigned multiply instruction; andexecution circuitry coupled to the decoder, the execution circuitry to perform operations associated with the unsigned multiply instruction, the operations comprising to perform a multiplication of an unsigned explicit source operand and an unsigned implicit source operand to generate an unsigned product,the execution circuit to store a lower half of the unsigned product in a first destination register and to store an upper half of the unsigned product in a second destination register without affecting any of the arithmetic flags.2. The processor of claim 1 , wherein the arithmetic flags include a carry flag and an overflow flag.3. The processor of claim 2 , wherein the unsigned multiply instruction comprises an enable flag update control bit set to a first value to indicate that the carry flag and the overflow flag are not to be updated.4. The processor of claim 2 , wherein the flags register is to additionally store a parity flag claim 2 , an auxiliary carry flag claim 2 , a zero flag claim 2 , a sign flag claim 2 , a trap flag claim 2 , an interrupt enable flag claim 2 , an I/O privileged level claim 2 , a nested task flag claim 2 , a resume flag claim 2 , a virtual-8086 mode flag claim 2 , an alignment check flag claim 2 , a virtual interrupt flag claim 2 , a virtual interrupt pending flag claim 2 , and an ID flag claim 2 , and a direction ...

Подробнее
04-06-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150154122A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126.-. (canceled)27. A system comprising:a memory controller to control communication with a double data rate memory; anda processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including an Advanced Encryption Standard (AES) encryption instruction that is to use a first 128-bit source/destination register and a second 128-bit source register; andan execution unit coupled to the decode unit, such that the AES encryption instruction is to cause the execution unit to receive data from the first 128-bit source/destination register and the second 128-bit source register, perform operations for an AES single encryption round including a byte substitution, a shift rows, and an exclusive OR, but omitting a mix columns, and store a result in the first 128-bit source/destination register.28. The system of claim 27 , wherein the AES encryption instruction is capable of using any one of a 128-bit round key claim 27 , a 192-bit round key claim 27 , and a 256-bit round key.29. The system of claim 27 , wherein the processor further comprises a microcode Read Only Memory (ROM) to store micro-operations to implement the AES encryption instruction.30. A system comprising:a memory controller to control communication with a memory; anda processor comprising:a plurality of cores;a level 1 instruction cache to cache ...

Подробнее
25-05-2017 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Номер: US20170147340A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. 1. A processor , comprising:an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants; andan execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.2. The processor of claim 1 , wherein the first operand specifies a first register having at least 256 bits to store SHA-2 state variables used to perform SHA-256 round operations.3. The processor of claim 2 , wherein the second operand specifies a second register or a memory location having at least 64 bits to store at least two messages and round constants for the SHA-256 round operations.4. The processor of claim 1 , wherein the first operand specifies a first register having at least 512 bits to store SHA-2 state variables used to perform SHA-512 round operations.5. The processor of claim 1 , wherein the instruction decoder receives a second ...

Подробнее
25-05-2017 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Номер: US20170147341A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. 1. A processor comprising:a plurality of 128-bit single instruction, multiple data (SIMD) registers;{'b': '256', 'a decode unit to decode instructions, including a Secure Hash Algorithm (SHA) schedule instruction, the SHA256 schedule instruction havinga first field to specify a first 128-bit SIMD source register of the 128-bit SIMD registers, the first 128-bit SIMD source register to store a first operand that is to include a first 32-bit data element in bits [31:0], a second 32-bit data element in bits [63:32], a third 32-bit data element in bits [95:64], and a fourth 32-bit data element in bits [127:96];a second field to specify a second 128-bit SIMD source register of the 128-bit SIMD registers, the second 128-bit SIMD source register to store a second operand that is to include a fifth 32-bit data element in bits [31:0], a sixth 32-bit data element in bits [63:32], a seventh 32-bit data element in bits [95:64], and an eighth 32-bit data element in bits [127:96]; anda third field to specify a third 128-bit SIMD source register of the 128-bit SIMD registers, the third 128-bit SIMD source register to store a third operand that is to include a ninth 32-bit data element in bits [31:0], a tenth 32-bit data element in bits [63:32], an eleventh 32-bit data element in bits [95:64], and a twelfth 32-bit ...

Подробнее
25-05-2017 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Номер: US20170147342A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. 1. A system on a chip (SoC) comprising:an integrated memory controller unit; anda processor core, the processor core comprising:a plurality of 128-bit single instruction, multiple data (SIMD) registers;a decode unit coupled to the instruction fetch unit, the decode unit to decode instructions, including a Secure Hash Algorithm (SHA) 256 schedule instruction, the SHA256 schedule instruction having:a first field to specify a first 128-bit SIMD source register of the 128-bit SIMD registers, the first 128-bit SIMD source register to store a first operand that is to include a first 32-bit data element in bits [31:0], a second 32-bit data element in bits [63:32], a third 32-bit data element in bits [95:64], and a fourth 32-bit data element in bits [127:96];a second field to specify a second 128-bit SIMD source register of the 128-bit SIMD registers, the second 128-bit SIMD source register to store a second operand that is to include a fifth 32-bit data element in bits [31:0], a sixth 32-bit data element in bits [63:32], a seventh 32-bit data element in bits [95:64], and an eighth 32-bit data element in bits [127:96]; anda third field to specify a third 128-bit SIMD source register of the 128-bit SIMD registers, the third 128-bit SIMD source register to store a third operand that is to include a ninth 32- ...

Подробнее
25-05-2017 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Номер: US20170147343A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. 1. A system comprising:a memory;a processor coupled with the memory, the processor comprising:a plurality of 128-bit single instruction, multiple data (SIMD) registers;a decode unit coupled to the instruction fetch unit, the decode unit to decode instructions, including a Secure Hash Algorithm (SHA) 256 schedule instruction, the SHA256 schedule instruction having:{'b': 31', '0', '63', '32', '95', '64', '127', '96, 'a first field to specify a first 128-bit SIMD source register of the 128-bit SIMD registers, the first 128-bit SIMD source register to store a first operand that is to include a first 32-bit data element in bits [:], a second 32-bit data element in bits [:], a third 32-bit data element in bits [:], and a fourth 32-bit data element in bits [:];'}{'b': 31', '0', '63', '32', '95', '64', '127', '96, 'a second field to specify a second 128-bit SIMD source register of the 128-bit SIMD registers, the second 128-bit SIMD source register to store a second operand that is to include a fifth 32-bit data element in bits [:], a sixth 32-bit data element in bits [:], a seventh 32-bit data element in bits [:], and an eighth 32-bit data element in bits [:]; and'}{'b': 31', '0', '63', '32', '95', '64', '127', '96, 'a third field to specify a third 128-bit SIMD source register of the 128-bit SIMD registers, ...

Подробнее
25-05-2017 дата публикации

METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM

Номер: US20170147348A1
Принадлежит:

A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. 1. A processor , comprising:an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants; andan execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.2. The processor of claim 1 , wherein the first operand specifies a first register having at least 256 bits to store SHA-2 state variables used to perform SHA-256 round operations.3. The processor of claim 2 , wherein the second operand specifies a second register or a memory location having at least 64 bits to store at least two messages and round constants for the SHA-256 round operations.4. The processor of claim 1 , wherein the first operand specifies a first register having at least 512 bits to store SHA-2 state variables used to perform SHA-512 round operations.5. The processor of claim 1 , wherein the instruction decoder receives a second ...

Подробнее
02-06-2016 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20160154648A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A system comprising:a plurality of processors;a processor interconnect to communicatively couple two of the plurality of processors; anda system memory comprising dynamic random access memory communicatively coupled to a processor of the plurality of processors over a memory interconnect; a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;', 'one or more of the plurality of cores to perform out-of-order execution of instructions of the threads;', 'one or more of the plurality of cores comprising:', 'instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache ...

Подробнее
31-05-2018 дата публикации

GATHERING AND SCATTERING MULTIPLE DATA ELEMENTS

Номер: US20180150301A9
Принадлежит:

According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception. 1. A processor comprising:a decoder stage to decode a single instruction for accessing data elements at a plurality of memory locations; and issue accesses to one or more of the plurality of memory locations;', 'detect if any faults or exceptions occur; and', 'handle any pending traps or interrupts upon completion of the single instruction, or detection of a fault or an exception., 'one or more execution units, coupled to the decoder to receive the decoded instruction and responsive to the decoded instruction, to2. The processor of claim 2 , the one or more execution units further to:detect if any traps or interrupts occur; andrecord detected traps or interrupts as pending traps or interrupts.3. The processor of wherein detecting if any traps or interrupts occur includes detecting any breakpoints.4. The processor of claim 3 , the one or more execution units further to:set a flag in response to said handling of any traps or interrupts.5. The processor of claim 4 , the one or more execution units further to:not handle a pending breakpoint upon said completion of the single instruction, or detection of a fault or an exception, whenever the flag has been set.6. The processor of wherein the flag is an EFLAG.RF.7. The processor of claim 2 , the one or more execution units further to:store the data elements in a destination register; andclear any corresponding state elements in a mask register.8. The processor of ...

Подробнее
11-06-2015 дата публикации

INSTRUCTIONS AND LOGIC TO PROVIDE MEMORY ACCESS KEY PROTECTION FUNCTIONALITY

Номер: US20150160998A1
Принадлежит:

Instructions and logic provide memory key protection functionality. Embodiments include a processor having a register to store a memory protection field. A decoder decodes an instruction having an addressing form field for a memory operand to specify one or more memory addresses, and a memory protection key. One or more execution units, responsive to the memory protection field having a first value and to the addressing form field of the decoded instruction having a second value, enforce memory protection according to said first value of the memory protection field, using the specified memory protection key, for accessing the one or more memory addresses, and fault if a portion of the memory protection key specified by the decoded instruction does not match a stored key value associated with the one or more memory addresses. 1. A processor comprising:a control register to store a memory protection field;a cache to store cache coherent data in one or more cache lines for one or more memory addresses of a primary storage;a decode stage to decode a first instruction specifying a register operand, an addressing form field for a memory operand to specify said one or more memory addresses, and a memory protection key; and enforce memory protection according to said first value of the memory protection field, using the specified memory protection key, for accessing said one or more memory addresses, and', 'fault if a portion of the memory protection key specified by the decoded first instruction does not match one or more stored key values associated with said one or more memory addresses., 'one or more execution units, responsive to said memory protection field having a first value and to said addressing form field of the decoded first instruction having a second value, to2. The processor of claim 1 , wherein the fault is a general protection fault.3. The processor of claim 1 , wherein said first value of the memory protection field specifies protected write access.4. The ...

Подробнее
21-08-2014 дата публикации

SIMD INTEGER MULTIPLY-ACCUMULATE INSTRUCTION FOR MULTI-PRECISION ARITHMETIC

Номер: US20140237218A1
Принадлежит:

A multiply-and-accumulate (MAC) instruction allows efficient execution of unsigned integer multiplications. The MAC instruction indicates a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination. The first vector register stores a first factor, and the second vector register stores a partial sum. The MAC instruction is executed to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result. The first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width. The most significant half of the result is stored in the third vector register, and the least significant half of the result is stored in the second vector register. 1. A method comprising:receiving a multiply-and-accumulate (MAC) instruction for unsigned integer operations, the MAC instruction indicating a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination, the first vector register storing a first factor and the second vector register storing a partial sum of the MAC instruction; executing the MAC instruction to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result, wherein the first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width;storing a most significant half of the result in the third vector register; andstoring a least significant half of the result in the second vector register.2. The method of claim 1 , wherein the first vector register stores a plurality of first factors of a plurality of multiplications claim 1 , the second vector register stores a plurality of partial sums of the plurality of multiplications claim 1 , and wherein executing the ...

Подробнее
07-06-2018 дата публикации

METHOD AND APPARATUS TO PROCESS KECCAK SECURE HASHING ALGORITHM

Номер: US20180157489A1
Принадлежит:

A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner. 1. A processor , comprising:a plurality of registers;an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively; andan execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner.2. The processor of claim 1 , wherein the KECCAK state cube includes 64 slices partitioned into 4 subcubes claim 1 , wherein each subcube contains 16 slices.3. The processor of claim 2 , wherein the plurality of registers include 4 registers claim 2 , each having at least 450 bits.4. The processor of claim 1 , wherein claim 1 , for each round of the KECCAK algorithm claim 1 , the execution unit is configured to perform KECCAK_THETA operations claim 1 , includingperforming a θ function of the KECCAK algorithm on the subcubes stored in the registers in parallel, andperforming a first portion of a ρ function of the KECCAK algorithm on the subcubes in parallel.5. The processor of claim 4 , wherein the execution unit is further configured to perform KECCAK_ROUND operations claim 4 , includingperforming a second portion of the ρ function of the KECCAK algorithm on the subcubes in parallel,performing a π function of the KECCAK algorithm on the ...

Подробнее
18-06-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150169473A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126.-. (canceled)27. A mobile computer comprising:a random access memory (RAM);a network interface controller; anda processor coupled to the RAM, the processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including an Advanced Encryption Standard (AES) encryption instruction that is to use a first 128-bit source/destination register and a second 128-bit source register; andan execution unit coupled to the decode unit, such that the AES encryption instruction is to cause the execution unit to receive data from the first 128-bit source/destination register and the second 128-bit source register, perform operations for an AES single encryption round including a byte substitution, a shift rows, and an exclusive OR, but omitting a mix columns, and store a result in the first 128-bit source/destination register.28. The mobile computer of claim 27 , wherein the AES encryption instruction is capable of using any one of a 128-bit round key claim 27 , a 192-bit round key claim 27 , and a 256-bit round key.29. The mobile computer of claim 27 , wherein the processor further comprises a microcode Read Only Memory (ROM) to store micro-operations to implement the AES encryption instruction.30. The mobile computer of claim 27 , further comprising an input/output controller.31. The mobile computer of claim 30 , further ...

Подробнее
18-06-2015 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20150169474A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 126-. (canceled)27. A processor comprising:a plurality of cores;a level 1 instruction cache to cache instructions;a level 1 data cache;a bus interface unit;a plurality of 128-bit registers;a decode unit to decode the instructions from the level 1 instruction cache including four Advanced Encryption Standard (AES) instructions, wherein each of the four AES instructions has a unique opcode, the four AES instructions including:a first AES decryption instruction that is to use a 128-bit source/destination register of the plurality and a 128-bit source register of the plurality, anda second AES decryption instruction that is to use a 128-bit source register of the plurality and a 128-bit destination register of the plurality; andan execution unit coupled to the decode unit, such that:the first AES decryption instruction is to cause the execution unit to receive data from the 128-bit source/destination register and the 128-bit source register of the first AES decryption instruction, perform operations for an AES single decryption round including an inverse byte substitution, an inverse shift rows, and an exclusive OR, but omitting an inverse mix columns, and store a first result in the 128-bit source/destination register; andthe second AES decryption instruction is to cause the execution unit to receive data from the 128-bit source register of the second AES decryption instruction, perform operations including an inverse mix columns, and store a second result in the 128-bit destination register of ...

Подробнее
30-06-2016 дата публикации

Instruction and logic to test transactional execution status

Номер: US20160188479A1
Принадлежит: Intel Corp

Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction.

Подробнее
07-07-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160196219A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a round of an AES encryption operation;a second source register to store input data to be encrypted by the round of the AES encryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the round of the AES encryption operation, the AES encryption operation to use the round key from the first source register to encrypt input data from the second source register and to store a result of the round of the AES encryption operation in a destination register;wherein the round of the AES encryption operation is to include:a Sub Bytes transform to perform a byte substitution on the input data, the Sub Bytes transform to include a substitution box (S-box) lookup to result in a first array of substituted data;a Shift Rows transform to shift row data in the first array by a specified amount to result in a second array;{'sup': 8', '4, 'a Mix Columns transform in which columns of the second array are to be treated as polynomials over a Galois Field GF(2) and multiplied modulo x+1 with a fixed polynomial to generate a mix columns result; and'}an Add Round Key transform in which an ...

Подробнее
16-07-2015 дата публикации

SUPERVISOR MODE EXECUTION PROTECTION

Номер: US20150199198A1
Принадлежит:

Apparatuses and methods for supervisor mode execution protection are disclosed. In one embodiment, a processor includes an interface to access a memory, execution hardware, and control logic. A region in the memory is user memory. The execution hardware is to execute an instruction. The control logic is to prevent the execution hardware from executing the instruction when the instruction is stored in user memory and the processor is in supervisor mode. 1. A processor comprising:an interface to access a memory, wherein a region in the memory is user memory;execution hardware to execute an instruction; andcontrol logic to prevent the execution hardware from executing the instruction when the instruction is stored in user memory and the processor is in supervisor mode.2. The processor of claim 1 , further comprising instruction hardware to fetch the instruction claim 1 , wherein the control logic is to prevent the execution hardware from executing the instruction by preventing the instruction hardware from fetching the instruction from user memory when the processor is in supervisor mode.3. The processor of claim 1 , wherein the processor is in supervisor mode when executing software in a privilege level higher than a lowest privilege level.4. The processor of claim 1 , further comprising a paging unit claim 1 , wherein the paging unit is to translate a linear address to a physical address using a page table hierarchy claim 1 , and the region may be designated as user memory by a flag at any level in the page table hierarchy.5. The processor of claim 1 , wherein the execution unit is also to execute a processor identification instruction by returning an indication that the processor includes control logic to prevent the processor from executing from user memory when the processor is in supervisor mode.6. The processor of claim 1 , further comprising processing storage including a programmable storage location to enable the control logic to prevent the processor from ...

Подробнее
18-09-2014 дата публикации

QOS BASED BINARY TRANSLATION AND APPLICATION STREAMING

Номер: US20140281008A1
Принадлежит:

In one embodiment, Quality of Service (QoS) criteria based server side binary translation and execution of applications is performed on multiple servers utilizing distributed translation and execution in either a virtualized or native execution environment. The translated applications are executed to generate output display data, the output display data is encoded in a media format suitable for video streaming, and the video stream is delivered over a network to a client device. In one embodiment, one or more graphics processors assist the central processors of the servers by accelerating the rendering of the application output, and a media encoder encodes the application output into a media format. 2. The binary translation system of claim 1 , wherein the QoS criteria include priority based acceleration and multiple client parameters claim 1 , wherein the multiple client parameters include a client device resolution claim 1 , a client device location claim 1 , a client application type claim 1 , and a set of client decode capabilities.3. The binary translation system of claim 1 , wherein the frame of rendered output is encoded into a media format before the server transmits the frame of rendered output.4. The binary translation system of claim 3 , wherein the server further includes a graphics processor claim 3 , to generate the frame of rendered output.5. The binary translation system of claim 4 , wherein the graphics processor encodes the frame of rendered output into the media format.6. The binary translation system of claim 1 , wherein the client processor of the client device has the first instruction set.7. The binary translation system of claim 1 , wherein the client processor of the client device has the second instruction set.8. The binary translation system of claim 2 , wherein the server executes the binary translation within a virtual machine.9. The binary translation system of claim 8 , wherein the virtual machine is tuned for the client device.10. The ...

Подробнее
18-09-2014 дата публикации

Processors, methods, and systems to relax synchronization of accesses to shared memory

Номер: US20140281196A1
Принадлежит: Intel Corp

A processor of an aspect includes a plurality of logical processors. A first logical processor of the plurality is to execute software that includes a memory access synchronization instruction that is to synchronize accesses to a memory. The processor also includes memory access synchronization relaxation logic that is to prevent the memory access synchronization instruction from synchronizing accesses to the memory when the processor is in a relaxed memory access synchronization mode.

Подробнее
07-07-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160197720A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. 1a plurality of cores;a level 1 (L1) instruction cache to store a plurality of instructions, the plurality of instructions to include a first Advanced Encryption Standard (AES) instruction;an L1 data cache;instruction fetch logic to fetch instructions from the L1 instruction cache;decode logic to decode instructions;a first source register to store a round key to be used for a round of an AES decryption operation;a second source register to store input data to be decrypted by the round of the AES decryption operation;an execution unit including AES execution logic to execute the first AES instruction to perform the round of the AES decryption operation, the AES decryption operation to use the round key from the first source register to decrypt input data from the second source register and to store a result of the round of the AES decryption operation in a destination register;wherein the round of the AES decryption operation is to include:a substitution operation to be performed on the input data, the substitution operation to include an inverse substitution box (S-box) lookup,an inverse Shift Rows operation,an inverse Mix Columns operation, andan Add Round Key operation in which an exclusive OR function is to use data from the round key.. A processor comprising: This application claims the priority filing benefit of, is a continuation of, and incorporates by reference, U.S. patent application Ser. No. 14/572,620 entitled “FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION ...

Подробнее
18-09-2014 дата публикации

INSTRUCTION EMULATION PROCESSORS, METHODS, AND SYSTEMS

Номер: US20140281398A1
Принадлежит:

A processor of an aspect includes decode logic to receive a first instruction and to determine that the first instruction is to be emulated. The processor also includes emulation mode aware post-decode instruction processor logic coupled with the decode logic. The emulation mode aware post-decode instruction processor logic is to process one or more control signals decoded from an instruction. The instruction is one of a set of one or more instructions used to emulate the first instruction. The one or more control signals are to be processed differently by the emulation mode aware post-decode instruction processor logic when in an emulation mode than when not in the emulation mode. Other apparatus are also disclosed as well as methods and systems. 1. A processor comprising:a decoder to receive a first instruction having a given opcode, the decoder including:check logic to check whether the given opcode has a first meaning or a second meaning;decode logic to decode the first instruction, and output one or more corresponding control signals, when the given opcode has the first meaning; andemulation inducement logic to induce emulation of the first instruction when the given opcode has the second meaning.2. The processor of claim 1 , wherein the second meaning is older than the first meaning.3. The processor of claim 2 , wherein the second meaning comprises an opcode definition that is in a process of becoming deprecated.4. The processor of claim 1 , further comprising a storage location coupled with the decoder to store an indication of whether the given opcode has the first meaning or the second meaning claim 1 , and wherein the check logic is to check the storage location to determine the indication.5. The processor of claim 4 , wherein the storage location is accessible to a program loader module to allow the program loader module to store the indication in the storage location.6. The processor of claim 4 , further comprising logic coupled with the storage location ...

Подробнее
18-09-2014 дата публикации

INTER-PROCESSOR ATTESTATION HARDWARE

Номер: US20140283032A1
Принадлежит:

Embodiments of an invention for inter-processor attestation hardware are disclosed. In one embodiment, an apparatus includes first attestation hardware associated with a first portion of a system. The first attestation hardware is to attest to a second portion of the system that the first portion of the system is secure. 1. An apparatus comprising:first attestation hardware associated with a first portion of a system, wherein the first attestation hardware is to attest to a second portion of the system that the first portion of the system is secure.2. The apparatus of claim 1 , wherein the second portion of the system includes second attestation hardware to which the first attestation hardware is to attest.3. The apparatus of claim 2 , wherein the first attestation hardware is to attest by sending a message to the second attestation hardware.4. The apparatus of claim 1 , further comprising a processor in the first portion of the system.5. The apparatus of claim 4 , wherein the processor is to execute untranslated machine instructions.6. The apparatus of claim 4 , wherein the processor operates without microcode.7. The apparatus of claim 4 , wherein the processor is a reduced instruction set computing processor.8. The apparatus of claim 4 , wherein the attestation hardware includes a state machine.9. The apparatus of claim 8 , wherein the state machine operates independently of software executing on the processor.10. A method comprising:attesting, by first attestation hardware associated with a first portion of a system, to a second portion of the system that the first portion of the system is secure.11. The method of claim 10 , wherein attesting includes sending an attestation message.12. The method of claim 11 , wherein the attestation message is sent from the first attestation hardware to second attestation hardware associated with the second portion of the system.13. The method of claim 12 , further comprising creating claim 12 , by the first attestation hardware ...

Подробнее
14-07-2016 дата публикации

Instruction And Logic To Test Transactional Execution Status

Номер: US20160202979A1
Принадлежит:

Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction. 1. A system comprising:a plurality of processors;a processor interconnect to communicatively couple two of the processors;a system memory comprising dynamic random access memory communicatively coupled to the two processors; a plurality of multithreaded cores;', instruction fetch logic to fetch instructions of one or more of the plurality of threads,', 'an instruction decode unit to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'an instruction cache to cache instructions to be executed,', 'a data cache to cache data,', 'a level 2 (L2) cache unit to cache instructions and data, and', 'an execution unit to execute a transactional execution region of instructions, the execution unit having a first instruction, the first instruction to test status related to the transactional execution region., 'one or more of the plurality of multithreaded cores to perform out-of-order instruction execution of instructions for a plurality of threads, one or more of the plurality of multithreaded cores comprising], 'one or more of the ...

Подробнее
14-07-2016 дата публикации

Instruction and logic to test transactional execution status

Номер: US20160202987A1
Принадлежит:

Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction. 14-. (canceled)5. A processor comprising: instruction fetch logic to fetch instructions of one or more of the plurality of threads;', 'an instruction decode unit to decode the instructions;', 'register renaming logic to rename one or more registers within a register file;', 'an instruction cache to cache instructions to be executed;', 'a data cache to cache data;', 'a level 2 (L2) cache unit to cache instructions and data;', 'checkpoint logic to checkpoint an architectural state responsive to a first instruction to initiate a transactional execution region including transactional memory operations;', 'transaction tracking logic to determine whether the transactional memory operations of the transactional execution region result in a conflict, the transaction tracking logic to adjust one or more flags responsive to a determination that a conflict exists;', 'logic to commit the transactional memory operations following a determination that no conflict exists or to roll back to a checkpointed architectural state., 'one or more of the plurality of multithreaded cores to perform out-of- ...

Подробнее
14-07-2016 дата публикации

Instruction and logic to test transactional execution status

Номер: US20160203019A1
Принадлежит:

Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction. 1. A system comprising:a plurality of multithreaded cores; instruction fetch logic to fetch instructions of one or more of the plurality of threads,', 'an instruction decode unit to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'an instruction cache to cache instructions to be executed,', 'a data cache to cache data,', 'a level 2 (L2) cache unit to cache instructions and data, and', 'an execution unit to execute a transactional execution region of instructions, the execution unit having a first instruction, the first instruction to test status related to the transactional execution region; and, 'one or more of the plurality of multithreaded cores to perform out-of-order instruction execution of instructions for a plurality of threads, one or more of the plurality of multithreaded cores comprisingone or more integrated memory controllers to communicatively couple a core of the plurality of cores to dynamic random access system memory.2. The system as in further comprising:a shared cache to be shared by two or more of the ...

Подробнее
14-07-2016 дата публикации

Instruction and logic to test transactional execution status

Номер: US20160203068A1
Принадлежит:

Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction. 1. A system comprising:a plurality of multithreaded cores;one or more of the plurality of multithreaded cores to perform out-of-order instruction execution of instructions for a plurality of threads, one or more of the plurality of multithreaded cores comprising:instruction fetch logic to fetch instructions of one or more of the plurality of threads;an instruction decode unit to decode the instructions;register renaming logic to rename one or more registers within a register file;an instruction cache to cache instructions to be executed;a data cache to cache data;a level 2 (L2) cache unit to cache instructions and data;checkpoint logic to checkpoint an architectural state responsive to a first instruction to initiate a transactional execution region including transactional memory operations;transaction tracking logic to determine whether the transactional memory operations of the transactional execution region result in a conflict, the transaction tracking logic to adjust one or more flags responsive to a determination that a conflict exists;logic to commit the transactional memory ...

Подробнее
21-07-2016 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20160210177A1
Принадлежит:

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. 1. A processor comprising:a plurality of cores, one or more of the plurality of cores to concurrently execute multiple threads;one or more of the plurality of cores to perform out-of-order execution of instructions of the threads; instruction fetch logic to fetch instructions of one or more of the threads,', 'instruction decode logic to decode the instructions,', 'register renaming logic to rename one or more registers within a register file,', 'a data cache to cache data,', 'a translation lookaside buffer to store virtual to physical address translations,', 'a second level cache unit to cache instructions and data, and', 'an execution unit to execute a first instruction to indicate a beginning of a transactional execution region of instructions., 'one or more of the plurality of cores comprising2. The processor of claim 1 , the execution unit to further execute:a second instruction to abort transactional execution ...

Подробнее
19-07-2018 дата публикации

PROCESSOR TO PERFORM A BIT RANGE ISOLATION INSTRUCTION

Номер: US20180203698A1
Принадлежит: Intel Corporation

Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. 1. A processor comprising:a plurality of registers to store data, the plurality of registers including a first source register to store a source value, a second source register to store an 8-bit index value, a destination register, and a flags register to store a plurality of flag values, including a carry flag, a zero flag, a sign flag, and an overflow flag;instruction decode circuitry to decode a bit range isolation instruction; andexecution circuitry coupled to the instruction decode circuitry, the execution circuitry to perform operations associated with the bit range isolation instruction, the operations comprising selecting a first range of bits from the source value or a system memory location starting at a first bit position identified by the 8-bit index value, writing the first range of bits to corresponding bit positions in the destination register, and writing zeroes to all bits in a second range of bits in the destination register having more significant bit positions than the first bit position;wherein the execution circuitry is to ...

Подробнее