Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 7798. Отображено 200.
10-08-2009 дата публикации

ПРЕДОТВРАЩЕНИЕ МНОЖЕСТВЕННЫХ ДОСТУПОВ К БУФЕРУ БЫСТРОГО ПРЕОБРАЗОВАНИЯ АДРЕСА ДЛЯ ОДНОЙ И ТОЙ ЖЕ СТРАНИЦЫ В ПАМЯТИ

Номер: RU2008103216A
Принадлежит:

... 1. Процессор, содержащий ! память, сконфигурированную для хранения данных во множестве страниц; ! буфер быстрого преобразования адреса (TLB), сконфигурированный для отыскивания, когда подвергается доступу посредством команды, содержащей виртуальный адрес, информации преобразования адреса, которая предоставляет возможность преобразовать виртуальный адрес в физический адрес одной из множества страниц, и для выдачи информации преобразования адреса, если информация преобразования адреса найдена в TLB; и ! контроллер TLB, сконфигурированный для определения, запрашивают ли текущая команда и следующая команда доступ к одной и той же странице в пределах множества страниц, и, если так, для предотвращения доступа к TLB следующей командой. ! 2. Процессор по п. 1, в котором текущая команда включает в себя информацию о следующей команде, при этом контроллер TLB дополнительно сконфигурирован для использования информации, включенной в текущую команду, для того, чтобы определять, запрашивают ли текущая ...

Подробнее
15-01-2011 дата публикации

EXECUTION OF INSTRUCTIONS DIRECT FROM THE INPUT SOURCE

Номер: AT0000495491T
Принадлежит:

Подробнее
15-11-2011 дата публикации

PROGRAMMABLE GENERAL-PURPOSE MEDIUM PROCESSOR

Номер: AT0000532130T
Принадлежит:

Подробнее
15-01-2012 дата публикации

DIVIDE FROM THREADS IN A PROCESSOR

Номер: AT0000540353T
Принадлежит:

Подробнее
15-02-2009 дата публикации

INTEGRATED CIRCUIT AND PROCEDURE FOR THE PACKET SWITCHING CONTROL

Номер: AT0000421823T
Принадлежит:

Подробнее
15-03-1999 дата публикации

MORE DYNAMICALLY, IN AN ARRAY ARCHITECTURE IN THE MULTIPLE MODE WORKING PARALLEL PROCESSOR

Номер: AT0000177547T
Принадлежит:

Подробнее
15-06-1983 дата публикации

DATENVERARBEITENDE ANLAGE MIT UEBERLAPPTER BEREIT- STELLUNG UND AUSFUEHRUNG VON MASCHINENBEFEHLEN

Номер: ATA71078A
Автор:
Принадлежит:

Подробнее
02-08-2018 дата публикации

Lock free streaming of executable code data

Номер: AU2018205196A1
Принадлежит: Davies Collison Cave Pty Ltd

A method of writing a long opcode that overlaps two or more atomically writable blocks of memory, comprising: atomically writing, by a computing device, a latter portion of the long opcode to a second atomically writable block of memory of the computing device, the latter portion of the long opcode being after a prior portion of the long opcode; and atomically writing, by the computing device, the prior portion of the long opcode to a first atomically writable block of memory of the computing device; wherein the atomically writing the prior portion is performed after the atomically writing the latter portion; and wherein further the second atomically writeable block of memory comprises both debug break point opcodes and the latter portion of the long opcode after the atomically writing the latter portion. DATA PAGE OPERATING DISASSEMBLER STREAM FAULT INTERCEPTOR UNWRITTEN PAGE NETWORK SERVER ...

Подробнее
31-08-1989 дата публикации

CACHE SYSTEM

Номер: AU0000587935B2
Принадлежит:

Подробнее
13-11-1986 дата публикации

INFORMATION PROCESSING SYSTEM WITH ENHANCED INSTRUCTION EXECUTION AND SUPPORT CONTROL

Номер: AU0005634086A
Принадлежит:

Подробнее
16-11-2006 дата публикации

Memory caching in data processing

Номер: AU2006245560A1
Автор: EZRA RABIN, RABIN EZRA
Принадлежит:

Подробнее
24-02-2003 дата публикации

Power reduction in microprocessor systems

Номер: AU2002319536A1
Принадлежит:

Подробнее
24-01-1985 дата публикации

PREFETCH CIRCUITING

Номер: AU0000541875B2
Принадлежит:

Подробнее
23-11-1989 дата публикации

BUFFER MEMORY DEVICE CAPABLE OF MEMORIZING OPERAND AND INSTRUCTION DATA BLOCKS AT DIFFERENT BLOCK SIZES

Номер: AU0003491289A
Автор: NAME NOT GIVEN
Принадлежит:

Подробнее
20-09-1988 дата публикации

INSTRUCTION PREFETCH BUFFER CONTROL

Номер: CA1242282A

An instruction prefetch buffer control (20) is provided for an instruction prefetch buffer array (10) which stores the code for a number of instructions that have already been executed as well as the code for a number of instructions yet to be executed. The instruction prefetch buffer control includes a register (201) for storing an instruction fetch pointer, this pointer being supplied to the buffer array (10) as a write pointer which points to the location in the array where a new word is to be written from main memory. A second register (205) stores an instruction execution pointer which is supplied to the buffer array (10) as a read pointer. A first adder (203) increments the first register to increment the instruction fetch pointer for sequential instructions and calculates a new instruction fetch pointer for branch instructions. A second adder (215) increments the second register to increment the instruction execution pointer for sequential instructions and calculates a new instruction ...

Подробнее
26-01-1993 дата публикации

CONTROL OF PIPELINED OPERATION IN A MICROCOMPUTER SYSTEM EMPLOYING DYNAMIC BUS SIZING WITH 80386 PROCESSOR AND 82385 CACHE CONTROLLER

Номер: CA0001313274C

BC988-005 CONTROL OF PIPELINED OPERATION IN A MICROCOMPUTER SYSTEM EMPLOYING DYNAMIC BUS SIZING WITH 80386 PROCESSOR AND 82385 CACHE CONTROLLER Any incompatibility between pipelined operations (such as is available in the 80386) and dynamic bus sizing (allowing the processor to operate with devices of 8-, 16- and 32-bit sizes) is accommodated by use of an address decoder and ensuring that device addresses for cacheable devices are in a first predetermined range and any device addresses for noncacheable devices are not in that predetermined range. Since by definition cacheable devices are 32bit devices, pipelined operation is allowed only in the address decoder indicates the access is to a cacheable device. In that event, a next address signal is provided to the 80386. This allows the 80386 to proceed to a following cycle prior to completion of the previous cycle. For accesses which are to devices whose address indicate they are noncacheable, a next address signal is withheld until the cycle ...

Подробнее
01-02-1983 дата публикации

HIGH SPEED COMPACT DIGITAL COMPUTER SYSTEM

Номер: CA1140678A
Принадлежит: DATA GENERAL CORP, DATA GENERAL CORPORATION

Therein is disclosed high speed, compact computer system architecture. System architecture includes a processor for processing machine language data, a memory for storing at least machine language instructions for use by processor, microinstruction logic for storing and providing sequences of frequently used instructions, and busses for transmitting at least instructions between processor and memory. Microinstruction memory circuitry is disclosed for efficient storage of microinstructions in available microinstruction memory space. Also disclosed is microinstruction selection circuitry for high speed selection of successive microinstructions of a sequence. Memory control circuitry is disclosed for providing memory refresh during battery back-up operation. Instruction pre-fetch circuitry is disclosed for fetching from memory a next instruction to be executed while a current instruction is being executed.

Подробнее
12-09-2003 дата публикации

METHOD OF PREFETCHING DATA/INSTRUCTIONS RELATED TO EXTERNALLY TRIGGERED EVENTS

Номер: CA0002478007A1
Автор: DOERING, ANDREAS
Принадлежит:

Method of prefetching data /instructions related to externally triggered events in a system including an infrastructure (18) having an input interface (20) for receiving data/ instructions to be handled by the infrastructure and an output interface (22) for transmitting data after they have been handled, a memory (14) for storing data/instructions when they are received by input interface, a processor (10) for processing at least some data/instructions, the processor having a cache wherein the data/instructions are stored before being processed, and an external source (26) for assigning sequential tasks to the processor. The method comprises the following steps which are performed while the processor is performing a previous task: determining the location in the memory of data/ instructions to be processed by the processor, indicating to the cache the addresses of these memory locations, fetching the contents of the memory locations and writing them into the cache, and assigning the task ...

Подробнее
09-07-1994 дата публикации

METHOD AND SYSTEM FOR INCREASED INSTRUCTION DISPATCH EFFICIENCY IN SUPERSCALAR PROCESSOR SYSTEM

Номер: CA0002107046A1
Принадлежит: NA

A method and system for increased instruction dispatch efficiency in a superscalar processor system having an instruction queue for receiving a group of instructions in an application specified sequential order and an instruction dispatch unit for dispatching instructions from an associated instruction buffer to multiple execution units on an opportunistic basis. The dispatch status of instructions within the associated instruction buffer is periodically determined and, in response to a dispatch of the instructions at the beginning of the instruction buffer, the remaining instructions are shifted within the instruction buffer in the application specified sequential order and a partial group of instructions are loaded into the instruction buffer from the instruction queue utilizing a selectively controlled multiplex circuit. In this manner additional instructions may be dispatched to available execution units without requiring a previous group of instructions to be dispatched completely.

Подробнее
20-08-1996 дата публикации

COMBINED QUEUE FOR INVALIDATES AND RETURN DATA IN MULTIPROCESSOR SYSTEM

Номер: CA0002045756C

A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. Macroinstruction pipelining is employed (instead of microinstruction pipelining), with queueing between units of the CPU to allow flexibility in instruction execution times. A wide bandwidth is available for memory access; fetching 64-bit data blocks on each cycle. A hierarchical cache arrangement has an improved method of cache set selection, increasing the likelihood of a cache hit. A writeback cache is used (instead of writethrough) and writeback is allowed to proceed even though other accesses are suppressed due to queues being full. A branch prediction method employs a branch history table which records the taken vs. not-taken history of branch opcodes recently used, and uses an empirical algorithm to predict which way the next occurrence of this branch will go, based upon the history table. A floating point processor function is integrated on-chip, with enhanced speed due to ...

Подробнее
30-12-1991 дата публикации

HIGH-PERFORMANCE MULTI-PROCESSOR HAVING FLOATING POINT UNIT

Номер: CA0002046020A1
Принадлежит:

HIGH-PERFORMANCE MULTI-PROCESSOR HAVING FLOATING POINT UNIT A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. Macroinstruction pipelining is employed (instead of microinstruction pipelining), with queueing between units of the CPU to allow flexibility in instruction execution times. A wide bandwidth is available for memory access; fetching 64-bit data blocks on each cycle. A hierarchical cache arrangement has an improved method of cache set selection, increasing the likelihood of a cache hit. A writeback cache is used (instead of writethrough) and writeback is allowed to proceed even though other accesses are suppressed due to queues being full. A branch prediction method employs a branch history table which records the taken vs. not-taken history of branch opcodes recently used, and uses an empirical algorithm to predict which way the next occurrence of this branch will go, based upon the history table. A floating point processor ...

Подробнее
05-03-1996 дата публикации

MICROCOMPUTER MEMORY ACCESS METHOD

Номер: CA0002059033C

The following measures are taken to obtain a microcomputer which securely fetches an instruction code from a low-speed memory to improve the reliability when the instruction code is not present in an instruction queue buffer. When the requested instruction code is not present in the instruction queue buffer, a CPU judges whether the memory to be accessed is a high-speed memory or low-speed memory. When the memory to be accessed is a high-speed memory, the CPU fetches the instruction code directly from the memory by skipping the instruction queue buffer. Meanwhile, when the memory to be accessed is a low-speed memory, the CPU does not skip the instruction queue buffer but it waits for the instruction code to be fetched to the instruction queue buffer. Thus, the instruction code is securely fetched, the reliability is improved, and the timing is easily set for design.

Подробнее
31-10-1977 дата публикации

Номер: CH0000592340A5
Автор:

Подробнее
30-11-1978 дата публикации

Номер: CH0000607138A5
Принадлежит: SIEMENS AG

Подробнее
10-02-1988 дата публикации

Cache directory and control

Номер: CN0087105300A
Принадлежит:

Подробнее
08-12-2010 дата публикации

System and method for broadcasting instructions/data to a plurality of processors in a multiprocessor device via aliasing

Номер: CN0101082900B
Принадлежит:

A system and method for broadcasting instructions/data to a plurality of processors in a multiprocessor device via aliasing are provided. In order to broadcast data to a plurality of processors, a control processor writes to the registers that store the identifiers of the processors and sets two or more of these registers to a same value. The control processor may write the desired data/instructions to be broadcast to a portion of memory corresponding to the starting address associated with the processor identifier of the two or more processors. When the two or more processors look for a starting address of their local store from which to read, the two or more processors will identify the same starting address, essentially aliasing the memory region. The two or more processors will read the instructions/data from the same aliased memory region starting at the identified starting address and process the same instructions/data.

Подробнее
23-06-2010 дата публикации

Method and device for partitioning resource between multiple threads within multi-threaded processor

Номер: CN0001429361B
Принадлежит:

A method of partitioning a memory resource, associated with a multi-threaded processor, includes defining the memory resource to include first and second portions that are dedicated to the first and second threads respectively. A third portion of the memory resource is then designated as being shared between the first and second threads. Upon receipt of an information item, (e.g., a microinstruction associated with the first thread and to be stored in the memory resource), a history of Least Recently Used (LRU) portions is examined to identify a location in either the first or the third portion, but not the second portion, as being a least recently used portion. The second portion is excluded from this examination on account of being dedicated to the second thread.

Подробнее
08-06-1994 дата публикации

Operation method of micro-computer with assemby line construction

Номер: CN0001024960C
Принадлежит: SONY CORP, SONY K.K.

Подробнее
14-12-1962 дата публикации

Synchronous digital computer

Номер: FR0001312094A
Автор:
Принадлежит:

Подробнее
04-07-1980 дата публикации

Système d'ordinateur numérique à grande vitesse de faible encombrement.

Номер: FR0002443721A
Автор: Charles T. Retter.
Принадлежит:

L'ordinateur comprend l'unité centrale de traitement CPU 114, la mémoire principale 124 et l'interface I/O 130. La mémoire 124 renferme les données et les instructions de l'utilisateur lues et écrites sous contrôle de l'unité arithmétique et logique ALU 112 de la CPU. Celle-ci renferme la logique de micro-instructions 144 avec sa mémoire 160 fournissant les instructions courantes. L'ALU 112 est prévue pour rechercher à l'avance l'instruction suivante pendant l'exécution de la précédente. Réalisation d'un ordinateur opérant à très grande vitesse.

Подробнее
07-07-1967 дата публикации

Data processing apparatus

Номер: FR0001477814A
Автор:
Принадлежит:

Подробнее
18-08-1978 дата публикации

DISPOSITIF DE COMMANDE D'ACCES EN MEMOIRE POUR MICROPROCESSEUR

Номер: FR0002378312A
Принадлежит:

Circuit d'accès à une mémoire de programme pour micro-processeur. Il comporte des compteurs de programme (PZ) et de macro-instructions (MZ), un registre intermédiaire (ZR) et un registre à décalage ou une mémoire tampon du type << premier entré-premier sorti >> (BS). Le temps d'accès en mémoire de programme (PS) est réduit au temps de transfert en parallèle vers le micro-processeur (P) de la plus ancienne instruction (in ) chargée dans BS. L'accès en mémoire de données (DS) est simultané. L'instruction i n + 1 prend la place de In , dans BS et le compte de MZ est immédiatement accru pour fournir l'adresse de in +2 L'accès en mémoire PS et le transfert de in +2 dans BS, via ZR, s'effectue alors de façon asynchrone. Application particulière aux autocommutateurs à commande décentralisée par microprocesseurs.

Подробнее
18-08-1978 дата публикации

CIRCUIT DE COMMANDE D'ACCES A UNE MEMOIRE DE CALCULATEUR

Номер: FR0002378313A
Принадлежит:

Circuit d'interface entre unité centrale de traitement de données, notamment un microprocesseur, et mémoire de programme, pour traitement d'instructions de différentes longueurs sans cycle d'attente. Les instructions sont décalées dans un registre SR sous le contrôle d'un décodeur BD de longueur d'instruction (1 à 3 octets dans le cas représenté) et d'un circuit de commande SS assurant que chaque instruction occupe la droite du registre après transfert de la précédente dans la mémoire intermédiaire ZS.La lecture de ZS par le microprocesseur mu P déclenche le transfert en parallèle de l'instruction suivante et le décalage approprié à la mise en place de la deuxième instruction suivante. Dès que les premiers emplacements (par exemple 4) du registre sont libres, de nouveaux octets sont extraits en parallèle de la mémoire SP via des compteurs d'octets B et d'adresses A. Application particulière aux autocommutateurs à commande par microprocesseurs.

Подробнее
21-06-1985 дата публикации

NUMERICAL SYSTEM Of COMPUTER HAS HIGH SPEED OF COMPACTNESS

Номер: FR0002443721B1
Автор:
Принадлежит:

Подробнее
14-08-1980 дата публикации

UNIT Of CACHE MEMORY HAS SIMULTANEOUS READING DEVICE Of INSTRUCTIONS

Номер: FR0002447078A1
Автор:
Принадлежит:

Подробнее
04-07-1980 дата публикации

NUMERICAL SYSTEM Of COMPUTER HAS HIGH SPEED OF COMPACTNESS

Номер: FR0002443721A1
Автор:
Принадлежит:

Подробнее
24-08-2001 дата публикации

Extraction of instructions from the memory sub-system of a mixed architecture processor using emulation of one instruction set, uses native instruction processor as intermediary to fetch emulated instructions

Номер: FR0002805363A1
Принадлежит:

Un procédé et un appareil pour connecter un processeur capable de traiter des instructions de plusieurs types de jeu d'instructions. Plus particulièrement, un moteur (40) sensible à des instructions natives d'extraction d'un sous-système de mémoire (20) (tel qu'un moteur d'extraction EM (40) ) est connecté à un moteur (30) qui traite des instructions émulées (tels qu'un moteur x86 (30) ). Ceci est réalisé en utilisant un protocole à échange d'informations, de sorte que le moteur x86 (30) envoie un signal de requête d'extraction explicite (110) au moteur d'extraction EM (40) avec une adresse d'extraction (120). Le moteur d'extraction EM (40) accède ensuite au sous-système de mémoire (20) et récupère une ligne d'instruction (150) pour décodage et exécution ultérieurs. Le moteur d'extraction EM (40) envoie cette ligne d'instruction (150) au moteur x86 (30) avec un signal d'achèvement d'extraction explicite (140).

Подробнее
04-07-1986 дата публикации

PROCEDE D'EXPLOITATION D'UN MICROCALCULATEUR A CYCLE D'INSTRUCTION PERFECTIONNE

Номер: FR0002575563A
Автор: NOBUHISA WATANABE
Принадлежит:

UN MICROCALCULATEUR COMPORTE UN DECODEUR D'INSTRUCTION 4 ET UN COMPTEUR DE PROGRAMME 5. LE DECODEUR D'INSTRUCTION DECODE DES INSTRUCTIONS EXTRAITES ET DELIVRE UN SIGNAL DE COMMANDE ORDONNANT L'EXECUTION DE L'INSTRUCTION EXTRAITE. LE SIGNAL DE COMMANDE VENANT DU DECODEUR D'INSTRUCTION COMPORTE UNE COMPOSANTE COMMANDANT DES CYCLES D'INSTRUCTION QUI DECLENCHE UN CYCLE D'EXTRACTION AU DEBUT DE CHAQUE CYCLE D'INSTRUCTION AFIN D'EXTRAIRE L'OPERANDE RELATIF A L'INSTRUCTION EN COURS D'EXECUTION ET A MI-CHEMIN DE CHAQUE CYCLE D'INSTRUCTION AFIN D'EXTRAIRE LE CODE OP RELATIF A L'INSTRUCTION SUIVANTE. LE COMPTEUR DE PROGRAMME REPOND AU DECLENCHEMENT DE CHAQUE CYCLE D'EXTRACTION EN INCREMENTANT SA VALEUR DE COMPTAGE DE MANIERE A MAINTENIR CETTE DERNIERE COHERENTE AVEC L'ADRESSE A LAQUELLE IL EST DONNE ACCES LORS DE CHAQUE CYCLE D'EXTRACTION.

Подробнее
22-12-2008 дата публикации

HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE

Номер: KR0100875262B1
Автор:
Принадлежит:

Подробнее
22-05-2008 дата публикации

METHOD AND APPARATUS FOR SHUFFLING DATA

Номер: KR0100831472B1
Автор:
Принадлежит:

Подробнее
29-09-1994 дата публикации

Номер: KR19940009098B1
Автор:
Принадлежит:

Подробнее
22-05-2003 дата публикации

HARDWARE INSTRUCTION TRANSLATION WITHIN A PROCESSOR PIPELINE

Номер: KR20030040515A
Принадлежит:

A processing system has an instruction pipeline (30) and a processor core. An instruction translator (42) for translating non-native instructions into native instruction operations is provided within the instruction pipeline downstream of the fetch stage (32). The instruction translator is able to generate multiple step sequences of native instruction operations in a manner that allows variable length native instruction operations sequences to be generated to emulate non-native instructions. The fetch stage is provided with a word buffer (62) that stores both a current instruction word and a next instruction word. Accordingly, variable length non-native instructions that span between instruction words read from the memory may be provided for immediate decode and multiple power consuming memory fetch avoided. © KIPO & WIPO 2007 ...

Подробнее
19-01-2009 дата публикации

PRE-DECODING VARIABLE LENGTH INSTRUCTIONS

Номер: KR1020090007629A
Принадлежит:

A pre-decoder in a variable instruction length processor indicates properties of instructions in pre-decode bits stored in an instruction cache with the instructions. When all the encodings of pre-decode bits associate with one length instruction are defined, a property of an instruction of that length may be indicated by altering the instruction to emulate an instruction of a different length, and encoding the property in the pre-decode bits associated with instructions of the different length. One example of a property that may be so indicated is an undefined instruction. © KIPO & WIPO 2009 ...

Подробнее
26-10-2002 дата публикации

INFORMATION PROCESSOR AND COMPUTER SYSTEM

Номер: KR20020081038A
Принадлежит:

PURPOSE: To provide an information processor and a computer system capable of efficiently reading instructions such as a VLIW instruction and distributing them to an arithmetic unit. CONSTITUTION: The information processor is provided with instruction buffers (21, 31) of m lines and n columns, instruction executing parts (25 to 27, 35 to 37) to execute a plurality of instructions in parallel and control circuits (22 to 24) to select the prescribed number of instructions from the instruction buffers of m lines and n columns and to distribute them to the instruction executing parts. © KIPO & JPO 2003 ...

Подробнее
13-01-2004 дата публикации

HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE

Номер: KR20040004502A
Принадлежит:

The high-performance, RISC core based microprocessor architecture permits concurrent execution of instructions obtained from memory through an instruction prefetch unit having multiple prefetch paths allowing for the main program instruction stream, a target conditional branch instruction stream and a procedural instruction stream. The target conditional branch prefetch path allows both possible instruction streams for a conditional branch instruction to be prefetched. The procedural instruction prefetch path allows a supplementary instruction stream to be accessed without clearing the main or target prefetch buffers. Each instruction set includes a plurality of fixed length instructions. An instruction FIFO is provided for buffering instruction sets in a plurality of instruction set buffers including a first buffer and a second buffer. An instruction execution unit including a register file and a plurality of functional units is provided with an instruction control unit capable of examining ...

Подробнее
21-06-2019 дата публикации

Номер: KR1020190070981A
Автор:
Принадлежит:

Подробнее
11-07-1997 дата публикации

Номер: TW0000310410B
Автор:
Принадлежит: CIRRUS LOGIC INC

Подробнее
09-02-2006 дата публикации

AN APPARATUS AND METHOD FOR HETEROGENOUS CHIP MULTIPROCESSORS VIA RESOURCE ALLOCATION AND RESTRICTION

Номер: WO2006014254A1
Принадлежит:

A method and apparatus for heterogeneous chip multiprocessors (CMP) via resource restriction. In one embodiment, the method includes the accessing of a resource utilization register to identify a resource utilization policy. Once accessed, a processor controller ensures that the processor core utilizes a shared resource in a manner specified by the resource utilization policy. In one embodiment, each processor core within a CMP includes an instruction issue throttle resource utilization register, an instruction fetch throttle resource utilization register and other like ways of restricting its utilization of shared resources within a minimum and maximum utilization level. In one embodiment, resource restriction provides a flexible manner for allocating current and power resources to processor cores of a CMP that can be controlled by hardware or software. Other embodiments are described and claimed.

Подробнее
13-04-2006 дата публикации

EXPANSION OF COMPUTE ENGINE CODE SPACE BY SHARING ADJACENT CONTROL STORES USING INTERLEAVED PROGRAM ADDRESSES

Номер: WO2006039183A2
Принадлежит:

Method and apparatus to support expansion of compute engine code space by sharing adjacent control stores using interleaved addressing schemes. Instructions corresponding to an original instruction thread are partitioned into multiple interleaved sequences that are stored in respective control stores. During thread execution, instructions are retrieved from the control stores in a repeated order based on the interleaving scheme. For example, in one embodiment two compute engines share two control stores. Thus, instructions for a given thread are sequentially loaded from the control stores in an alternating manner. In another embodiment, four control stores are shared by four compute engines. In this case, the instructions in a thread are interleave using four stores, and each store is accessed every fourth instruction in the code sequence. Schemes are also provided for handling branching operations to maintain synchronized access to the control stores.

Подробнее
30-01-2014 дата публикации

LOCK FREE STREAMING OF EXECUTABLE CODE DATA

Номер: WO2014018812A1
Автор: REIERSON, Kristofer
Принадлежит:

A disassembler receives instructions and disassembles them into a plurality of separate opcodes. The disassembler creates a table identifying boundaries between each opcode. Each opcode is written to memory in an opcode-by-opcode manner by atomically writing standard blocks of memory. Debug break point opcodes are appended to opcode to create a full block of memory when needed. The block of memory may be thirty-two or sixty-four bits long, for example. Long opcodes may overlap two or more memory blocks. Debug break point opcodes may be appended to a second portion of the long opcode to create a full block of memory. A stream fault interceptor identifies when a requested data page is not available and retrieving the data page.

Подробнее
13-03-2014 дата публикации

FETCH WIDTH PREDICTOR

Номер: WO2014039962A1
Принадлежит:

Various techniques for predicting instruction fetch widths. In one embodiment, a fetch prediction unit in a processor is configured to generate a fetch width that specifies a number of bits to be retrieved in a subsequent fetch from an instruction cache. The fetch prediction unit may also generate a fetch prediction that includes the fetch width in response to a current fetch request. A number of bits corresponding to the fetch width may be fetched from the instruction cache. The fetch width may correspond to a location of a predicted-taken control transfer instruction. This fetch width prediction may lead to power savings in instruction cache accesses.

Подробнее
01-12-2005 дата публикации

MICROPROCESSOR ARCHITECTURE

Номер: WO2005114441A3
Принадлежит:

A microprocessor is provided having improved performance and characterized in the use of dynamic branch prediction, a unified cache debug unit, a 64 bit right only barrel shifter, a predictive pre-fetch memory pipeline and customizable processor extensions.

Подробнее
03-05-2007 дата публикации

ARCHITECTURE FOR MICROPROCESSOR-BASED SYSTEMS INCLUDING SIMD PROCESSING UNIT AND ASSOCIATED SYSTEMS AND METHODS

Номер: WO000002007049150A3
Принадлежит:

An architecture for microprocessor-based systems is provided. The architecture includes a SIMD processing unit and associated systems and methods for optimizing the performance of such a system. In one embodiment systems and methods for performing systolic array-based block matching are provided. In another embodiment, a parameterizable clip instruction is provided. In an additional embodiment, two pair of deblocking instruction for use with the H.264 and VCl codecs provided. In a further embodiment, an instruction and datapath for accelerating sub-pixel interpolation are provided. In another embodiment, systems and methods for selectively decoupling processor extension logic are provided. In yet another embodiment, systems and methods for recording instruction sequences in a microprocessor having dynamically decoupleable extension logic is provided, hi yet one other embodiment, systems and methods for synchronizing multiple processing engines of a SIMD engine are provided.

Подробнее
06-03-2003 дата публикации

PIPELINED PROCESSOR AND INSTRUCTION LOOP EXECUTION METHOD

Номер: WO2003019356A1
Принадлежит:

Processor (10) having a processing pipeline (100) is extended with an arrangement to reduce the loss of cycles associated with loop execution in pipeline (100). Loop start detection unit (116a) detects a loop start instruction containing information about the loop count and last instruction in the loop. Information about the first instruction in the loop is also present. Loop end detection unit (114a) is provided with the loop end information, and fetch stage (112) is provided with the loop start information by loop start detection unit (116a). Upon detection of a loop end, loop end detection unit (114a) triggers fetch stage (112) to fetch the first instruction of the loop. In addition, loop end detection unit (114a) generates detection tags labeling the content of pipeline (100), which are evaluated by tag detection unit (144). Loop execution control stage (142) compares the loop count information with detection information generated by tag detection unit (144) and, if necessary, removes ...

Подробнее
09-08-2007 дата публикации

CROSS-ARCHITECTURE OPTIMIZATION

Номер: WO000002007089535A3
Принадлежит:

Embodiments include a device, apparatus, and a method. An apparatus includes a monitor circuit for determining an execution characteristic of a first instruction associated with a first computing machine architecture. The apparatus also includes a generator circuit for creating an optimization profile useable in an execution of a second instruction associated with a second computing machine architecture.

Подробнее
30-08-2007 дата публикации

METHOD AND APPARATUS FOR MONITORING INPUTS TO A COMPUTER

Номер: WO000002007098026A3
Принадлежит:

A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The sleeping computer (12) can be awaiting data or instructions (12). In the case of instructions, the sleeping computer (12) can be waiting to store the instructions or to immediately execute the instructions. In the later case, the instructions are placed in an instruction register (30a) when they are received and executed therefrom, without first placing the instructions first into memory. The instructions can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In one application, the sleeping computer (12) is awakened by an input such that it commences ...

Подробнее
21-11-2019 дата публикации

COMBINING STATES OF MULTIPLE THREADS IN A MULTI-THREADED PROCESSOR

Номер: US20190354370A1
Принадлежит:

A processing apparatus comprising one or more processing modules, each comprising an execution unit. The one or more processing modules are operable to run a plurality of parallel or concurrent threads, and the processing apparatus further comprises a storage location for storing an aggregated exit state of the plurality of threads. An instruction set of the processing apparatus comprises an exit instruction for inclusion in each of the plurality of threads, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective thread and also causes the individual exit state specified in the operand to contribute to the aggregated exit state.

Подробнее
13-05-2014 дата публикации

Opcode length caching

Номер: US0008725948B2

A computer system caches variable-length instructions in a data structure. The computer system locates a first copy of an instruction in the cached data structure using a current value of the instruction pointer as a key. The computer system determines a predictive length of the instruction, and reads a portion of the instruction from an instruction memory as a second copy. The second copy has the predictive length. Based on the comparison of the first copy with the second copy, the computer system determines whether or not to read the rest of the instruction from the instruction memory, and then interprets the instruction for use by the computer system.

Подробнее
30-07-2019 дата публикации

Spin loop delay instruction

Номер: US0010365929B2

A Spin Loop Delay instruction. The instruction has a field associated therewith that indicates one or more conditions to be checked. Dispatching of the instruction is initially delayed. The instruction is subsequently dispatched based on a timeout, provided the instruction has not been previously dispatched based on meeting at least one condition of the one or more conditions to be checked.

Подробнее
29-09-2005 дата публикации

Microcomputer having instruction RAM

Номер: US20050216614A1
Автор: Tetsuya Sakairi
Принадлежит: NEC Electronics Corporation

A microcomputer comprises an instruction RAM temporally storing a program transferred from an external memory, a CPU reading out the program from the instruction RAM via a dedicated fetch bus and carrying out a process according to the program, an instruction transfer control circuit directly transferring the program from the external memory to the instruction RAM via a dedicated transfer bus, and a transfer information register temporally storing instruction transfer information which has been stored in the external memory and is necessary information for transferring the program from the external memory to the instruction RAM by the instruction transfer control circuit.

Подробнее
23-11-2006 дата публикации

Handling cache miss in an instruction crossing a cache line boundary

Номер: US20060265572A1
Принадлежит:

A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.

Подробнее
02-12-2004 дата публикации

METHOD FOR USING SERIAL FLASH MEMORY AS PROGRAM STORAGE MEDIA FOR MICROPROCESSOR

Номер: US20040243872A1
Принадлежит:

A method for dynamically adjusting an operating speed of a microprocessor for the microprocessor to access at least a serial flash memory(together with a random access memory). The method includes reducing an executing speed of the microprocessor if the required data in the serial flash memory (or the random access memory) is not well prepared and executing the microprocessor at a normal speed if the required data in the serial flash memory(or the random access memory) is well prepared.

Подробнее
15-09-1998 дата публикации

General purpose, multiple precision parallel operation, programmable media processor

Номер: US0005809321A1
Принадлежит: MicroUnity Systems Engineering, Inc.

A general purpose, programmable media processor for processing and transmitting a media data stream of audio, video, radio, graphics, encryption, authentication, and networking information in real-time. The media processor incorporates an execution unit that maintains substantially peak data throughout of media data streams. The execution unit includes a dynamically partionable multi-precision arithmetic unit, programmable switch and programmable extended mathematical element. A high bandwidth external interface supplies media data streams at substantially peak rates to a general purpose register file and the multi-precision execution unit. A memory management unit, and instruction and data cache/buffers are also provided. High bandwidth memory controllers are linked in series to provide a memory channel to the general purpose, programmable media processor. The general purpose, programmable media processor is disposed in a network fabric consisting of fiber optic cable, coaxial cable and ...

Подробнее
06-02-1996 дата публикации

Dual cache memory device with cache monitoring

Номер: US0005490262A1
Автор: Tahara; Tamotsu
Принадлежит: Oki Electric Industry Co., Ltd.

A CPU connected to a memory storing module programs has an instruction cache for reading a module header heading a module program and module instructions following the module header out of the memory, and storing and transferring them. An operand cache reads module data included in the module program of interest out of the memory, and stores and transfers them. A processing section performs arithmetic and logical operations in pipelining with the module header of the module program transferred from the instruction cache and with the module data transferred from the operand cache. A pipelined instruction control controls other circuitry of the CPU by decoding the module instructions of the module program transferred from the instuction cache. The CPU transfers the module header to the processing section via the instruction cache while writing the whole block of module header and module instructions to the instruction cache. Alternatively, the CPU may transfer the module header to the processing ...

Подробнее
16-12-1997 дата публикации

Computer processing system employing dynamic instruction formatting

Номер: US0005699536A1

A computer processing apparatus includes a buffer called a decoded instruction buffer (DIB), which is used to store groups of commands representing instructions that can be executed in parallel. Each pattern in a DIB group may be an encoding of a long instruction termed a long decoded instruction (LDI). The DIB works in conjunction with a conventional computer processing apparatus consisting of a memory system, an instruction queue, and an instruction dispatch unit feeding into a set of execution units. When an instruction is not available in the DIB, this and subsequent instructions are fetched from the memory system into the instruction queue and executed in a conventional way. Simultaneous with the execution of instructions by the conventional apparatus, a group formatter creates a set of LDIs, each of which is an alternate encoding of a set of the original instructions which can be executed in parallel. In constructing the LDIs, the group formatter analyzes the dependency between instructions ...

Подробнее
06-06-1995 дата публикации

Instruction fetch unit with early instruction fetch mechanism

Номер: US0005423014A1
Принадлежит: Intel Corporation

An instruction fetch unit in which an early instruction fetch is initiated to access a main memory simultaneously with checking a cache for the desired instruction. On a slow path to main memory is a large main translation lookaside buffer (TLB) that holds address translations. On a fast path is a smaller translation write buffer (TWB), a mini-TLB, that holds recently used address translations. A guess fetch access in initiated by presenting an address to the main memory in parallel with presenting the address to the cache. The address is compared with the contents of the TWB for a hit and with the contents of the cache for a hit. The guess access is allowed to proceed upon the condition that there is a hit in the TWB (the TWB is able to translate the logical address into a physical address) and a miss in the I-cache (the data are not available in the I-cache and hence the guess access of main memory is necessary to get the data). The guess access is canceled upon the condition that there ...

Подробнее
17-05-1994 дата публикации

System for controlling the number of data pieces in a queue memory

Номер: US0005313600A1
Автор: Kasai; Yoshio
Принадлежит: Mitsubishi Denki Kabushiki Kaisha

A system for controlling the number of data pieces in a queue memory includes a counter comprising a plurality of counting circuits each of which adds the current effective data piece number of the instruction queue to an associated, preselected value derived from the difference between a detectable input data piece number and a particular output data piece number. A selector responsive to a selecting signal provided by a queue controller selects one of the counting circuits with the associated preselected value equal to the difference between the current input data piece number and the current output data piece number to provide the effective data piece number. The queue controller calculates this difference and provides it as the selecting signal to the selector.

Подробнее
28-12-1999 дата публикации

Computer architecture capable of concurrent issuance and execution of general purpose multiple instructions

Номер: US0006009506A
Автор:
Принадлежит:

A system for issuing a family of instructions during a single clock includes a decoder for decoding the family of instructions and logic, responsive to the decode result, for determining whether resource conflicts would occur if the family were issued during one clock. If no resource conflicts occur, an execution unit executes the family regardless of whether dependencies among the instructions in the family exist.

Подробнее
05-11-1996 дата публикации

Data processing system and method thereof

Номер: US0005572689A
Автор:
Принадлежит:

A data processing system (55) and method thereof includes one or more data processors (10). Data processor (10) is capable of performing both vector operations and scalar operations. Using a single microsequencer (22), data processor (10) is capable of executing both vector instructions and scalar instructions. Data processor (10) also has a memory circuit (14) capable of storing both vector operands and scalar operands.

Подробнее
29-01-2002 дата публикации

Method and apparatus for compression, decompression, and execution of program code

Номер: US0006343354B1
Принадлежит: Motorola Inc., MOTOROLA INC, MOTOROLA INC.

During a compressing portion, memory (20) is divided into cache line blocks (500). Each cache line block is compressed and modified by replacing address destinations of address indirection instructions with compressed address destinations. Each cache line block is modified to have a flow indirection instruction as the last instruction in each cache line. The compressed cache line blocks (500) are stored in a memory (858). During a decompression portion, a cache line (500) is accessed based on an instruction pointer (902) value. The cache line is decompressed and stored in cache. The cache tag is determined based on the instruction pointer (902) value.

Подробнее
22-09-1992 дата публикации

STATE CONTROLLED INSTRUCTION LOGIC MANAGEMENT APPARATUS INCLUDED IN A PIPELINED PROCESSING UNIT

Номер: US5150468A
Автор:
Принадлежит:

Подробнее
09-11-2021 дата публикации

Dynamic processor cache to avoid speculative vulnerability

Номер: US0011169805B2

A processor including a logic unit configured to execute multiple instructions being one of a speculative instruction or an architectural instruction is provided. The processor also includes a split cache comprising multiple lines, each line including a data accessed by an instruction and copied into the split cache from a memory address, wherein a line is identified as a speculative line for the speculative instruction, and as an architectural line for the architectural instruction. The processor includes a cache manager configured to select a number of speculative lines allocated in the split cache. The cache manager prevents an architectural line from being replaced by a speculative line based on a number of speculative lines allotted in the split cache, and manages the number of speculative lines to be allocated in the split cache based on the number of speculative lines relative to a number of architectural lines.

Подробнее
25-09-2018 дата публикации

Instruction execution that broadcasts and masks data values at different levels of granularity

Номер: US0010083316B2
Принадлежит: INTEL CORPORATION, INTEL CORP, Intel Corporation

An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication ...

Подробнее
05-09-2006 дата публикации

System, circuit, and method for adjusting the prefetch instruction rate of a prefetch unit

Номер: US0007103757B1

A system, circuit, and method are presented for adjusting a prefetch rate of a prefetch unit from a first rate to a second rate by determining a probability factor associated with a branch instruction. The circuit and method may determine the probability factor based on a type of disparity associated with the branch instruction. The circuit and method may further be adapted to calculate the second rate based on the probability factor. The ability to adjust the prefetch rate of a prefetch unit advantageously decreases the number of memory transactions, thereby decreasing the power consumption of a processing unit.

Подробнее
16-05-2006 дата публикации

Reordering serial data in a system with parallel processing flows

Номер: US0007047395B2
Принадлежит: Intel Corporation, INTEL CORP, INTEL CORPORATION

A distributed system is provided for apportioning an instruction stream into multiple segments for processing in multiple parallel processing units, and for merging the processed segments into a single processed instruction stream having the same sequential relative order as the original instruction stream. Tags may be attached to each segment after apportioning to indicate the order in which the various segments are to be merged. In one embodiment, the end of each segment includes a tag indicating the unit to which the next instruction in the original instruction sequence is directed.

Подробнее
01-03-2007 дата публикации

Power reduction for processor front-end by caching decoded instructions

Номер: US2007050554A1
Принадлежит:

A power aware front-end unit for a processor may include a UOP cache that disables other circuitry within the front-end unit. In an embodiment, a front-end unit may disable instruction synchronization circuitry, instruction decode circuitry and, optionally, instruction fetch circuitry while instruction look-ups are underway in both a block cache and an instruction cache. If the instruction look-up indicates a miss, the disabled circuitry thereafter may be enabled.

Подробнее
12-11-2019 дата публикации

Methods and systems for performing a replay execution

Номер: US0010474471B2
Принадлежит: Intel Corporation, INTEL CORP

One or more embodiments may provide a method for performing a replay. The method includes initiating execution of a program, the program having a plurality of sets of instructions, and each set of instructions has a number of chunks of instructions. The method also includes intercepting, by a virtual machine unit executing on a processor, an instruction of a chunk of the number of chunks before execution. The method further includes determining, by a replay module executing on the processor, whether the chunk is an active chunk, and responsive to the chunk being the active chunk, executing the instruction.

Подробнее
22-09-2016 дата публикации

PROCESSORS WITH BRANCH INSTRUCTION, CIRCUITS, SYSTEMS AND PROCESSES OF MANUFACTURE AND OPERATION

Номер: US20160274914A1
Принадлежит:

An electronic processor is provided for use with a memory (2530) having selectable memory areas. The processor includes a memory area selection circuit (MMU) operable to select one of the selectable memory areas at a time, and an instruction fetch circuit (2520, 2550) operable to fetch a target instruction at an address from the selected one of the selectable memory areas. The processor includes an execution circuit (Pipeline) coupled to execute instructions from the instruction fetch circuit (2520, 2550) and operable to execute a first instruction for changing the selection by the memory area selection circuit (MMU) from a first one of the selectable memory areas to a second one of the selectable memory areas, the execution circuit (Pipeline) further operable to execute a branch instruction that points to a target instruction, access to the target instruction depending on actual change of selection to the second one of the memory areas; and the processor includes a logic circuit (3108, ...

Подробнее
09-08-2016 дата публикации

System, method and apparatus for improving transactional memory (TM) throughput using TM region indicators

Номер: US0009411739B2
Принадлежит: Intel Corporation, INTEL CORP

Systems, apparatuses, and methods for improving transactional memory (TM) throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit. A copy-on-write (COW) buffer may be used to maintain a mapping from checkpointed architectural registers to physical registers, wherein the COW buffer maintains a plurality of register checkpoints for a plurality of TM regions by marking separations between TM regions using pointers, a first pointer to identify a position in the COW buffer of the last committed instruction, a retirement pointer to identify a boundary between a youngest TM region and a currently retiring position.

Подробнее
04-02-2020 дата публикации

Livelock recovery circuit configured to detect illegal repetition of an instruction and transition to a known state

Номер: US0010552155B2

Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.

Подробнее
21-02-2012 дата публикации

Dispatch mechanism for dispatching instructions from a host processor to a co-processor

Номер: US0008122229B2

A dispatch mechanism is provided for dispatching instructions of an executable from a host processor to a heterogeneous co-processor. According to certain embodiments, cache coherency is maintained between the host processor and the heterogeneous co-processor, and such cache coherency is leveraged for dispatching instructions of an executable that are to be processed by the co-processor. For instance, in certain embodiments, a designated portion of memory (e.g., UCB) is utilized, wherein a host processor may place information in such UCB and the co-processor can retrieve information from the UCB (and vice-versa). The UCB may thus be used to dispatch instructions of an executable for processing by the co-processor. In certain embodiments, the co-processor may comprise dynamically reconfigurable logic which enables the co-processor's instruction set to be dynamically changed, and the dispatching operation may identify one of a plurality of predefined instruction sets to be loaded onto the ...

Подробнее
12-12-2006 дата публикации

Processor and method of automatic instruction mode switching between n-bit and 2n-bit instructions by using parity check

Номер: US0007149879B2

A processor and method of automatic instruction mode switching between N-bit and 2N-bit instructions by using parity bit check. The processor and method includes an instruction input device having a memory for storing a plurality of 2N-bit words, an instruction fetch device fetching a 2N-bit word from the memory, and a mode switch logic determining whether the 2N-bit word fetched by the instruction fetch device is two (N-P)-bit instructions or one 2(N-P)-bit instruction to accordingly switch the processor to corresponding N-bit or 2N-bit mode, wherein when the 2N-bit word fetched is even parity, the 2N-bit word is determined as two (N-P)-bit instructions if two N-bit words included in the 2N-bit word are on the even parity state, or determined as a 2(N-P)-bit instruction if the two N-bit words are on the odd parity state.

Подробнее
31-12-2019 дата публикации

Determining the effectiveness of prefetch instructions

Номер: US0010521350B2

Effectiveness of prefetch instructions is determined. A prefetch instruction is executed to request that data be fetched into a cache of the computing environment. The effectiveness of the prefetch instruction is determined. This includes updating, based on executing the prefetch instruction, a cache directory of the cache. The updating includes, in the cache directory, effectiveness data relating to the data. The effectiveness data includes whether the data was installed in the cache based on the prefetch instruction. Additionally, the determining the effectiveness includes obtaining at least a portion of the effectiveness data from the cache directory, and using the at least a portion of effectiveness data to determine the effectiveness of the prefetch instruction.

Подробнее
28-04-2016 дата публикации

FLEXIBLE ARCHITECTURE AND INSTRUCTION FOR ADVANCED ENCRYPTION STANDARD (AES)

Номер: US20160119130A1
Принадлежит: Intel Corporation

A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.

Подробнее
20-06-2019 дата публикации

TWO ADDRESS TRANSLATIONS FROM A SINGLE TABLE LOOK-ASIDE BUFFER READ

Номер: US20190188151A1
Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.

Подробнее
30-05-2019 дата публикации

PRE-STARTING SERVICES BASED ON TRAVERSAL OF A DIRECTED GRAPH DURING EXECUTION OF AN APPLICATION

Номер: US20190166023A1
Принадлежит:

A method and system for which a service call is referred to as an event and processing the service call is referred to as an action. A directed graph is generated for an application and has vertices representing services and edges representing events. The directed graph provides a map of process flow of the application. A traversal probability is associated with each event in the directed graph. Traversal of the directed graph is monitored during execution of the application and traversal probabilities for events in the directed graph which may still occur during the execution of the application are continually revised. Decision logic is applied during the execution of the application to decide whether to pre-start one service in the directed graph that may still be called prior to an event in the directed graph calling the one service. The one service decided upon by the decision logic is pre-started.

Подробнее
31-10-2002 дата публикации

System and method including distributed instruction buffers holding a second instruction form

Номер: US2002161987A1
Автор:
Принадлежит:

A system and method is provided for processing a first instruction set and a second instruction set in a single processor. The method includes storing a plurality of control signals in a plurality of buffers proximate to a plurality of execution units, wherein the control signals are predecoded instructions of the second instruction set, executing an instruction of the first instruction set in response to a branch instruction of the first instruction set, and executing the control signals for an instruction of the second instruction set in response to a branch instruction of the second instruction set.

Подробнее
27-04-2021 дата публикации

Mechanism for interrupting and resuming execution on an unprotected pipeline processor

Номер: US0010990398B2

Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.

Подробнее
24-05-2012 дата публикации

Correlation-based instruction prefetching

Номер: US20120131311A1
Автор: Yuan C. Chou
Принадлежит: Oracle International Corp

The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access.

Подробнее
31-05-2012 дата публикации

Miss buffer for a multi-threaded processor

Номер: US20120137077A1
Принадлежит: Oracle International Corp

A multi-threaded processor configured to allocate entries in a buffer for instruction cache misses is disclosed. Entries in the buffer may store thread state information for a corresponding instruction cache miss for one of a plurality of threads executable by the processor. The buffer may include dedicated entries and dynamically allocable entries, where the dedicated entries are reserved for a subset of the plurality of threads and the dynamically allocable entries are allocable to a group of two or more of the plurality of threads. In one embodiment, the dedicated entries are dedicated for use by a single thread and the dynamically allocable entries are allocable to any of the plurality of threads. The buffer may store two or more entries for a given thread at a given time. In some embodiments, the buffer may help ensure none of the plurality of threads experiences starvation with respect to instruction fetches.

Подробнее
02-08-2012 дата публикации

Guest to native block address mappings and management of native code storage

Номер: US20120198122A1
Автор: Mohammad Abdallah
Принадлежит: Soft Machines Inc

A method for managing mappings of storage on a code cache for a processor. The method includes storing a plurality of guest address to native address mappings as entries in a conversion look aside buffer, wherein the entries indicate guest addresses that have corresponding converted native addresses stored within a code cache memory, and receiving a subsequent request for a guest address at the conversion look aside buffer. The conversion look aside buffer is indexed to determine whether there exists an entry that corresponds to the index, wherein the index comprises a tag and an offset that is used to identify the entry that corresponds to the index. Upon a hit on the tag, the corresponding entry is accessed to retrieve a pointer to the code cache memory corresponding block of converted native instructions. The corresponding block of converted native instructions are fetched from the code cache memory for execution.

Подробнее
06-09-2012 дата публикации

Method, apparatus, and system for speculative execution event counter checkpointing and restoring

Номер: US20120227045A1
Принадлежит: Intel Corp

An apparatus, method, and system are described herein for providing programmable control of performance/event counters. An event counter is programmable to track different events, as well as to be checkpointed when speculative code regions are encountered. So when a speculative code region is aborted, the event counter is able to be restored to it pre-speculation value. Moreover, the difference between a cumulative event count of committed and uncommitted execution and the committed execution, represents an event count/contribution for uncommitted execution. From information on the uncommitted execution, hardware/software may be tuned to enhance future execution to avoid wasted execution cycles.

Подробнее
24-01-2013 дата публикации

Hardware acceleration components for translating guest instructions to native instructions

Номер: US20130024661A1
Автор: Mohammad Abdallah
Принадлежит: Soft Machines Inc

A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.

Подробнее
28-03-2013 дата публикации

Processor and instruction processing method in processor

Номер: US20130080747A1
Автор: Young-Su Kwon

The present invention relates to a processor including: an instruction cache configured to store at least some of first instructions stored in an external memory and second instructions each including a plurality of micro instructions; a micro cache configured to store third instructions corresponding to the plurality of micro instructions included in the second instructions; and a core configured to read out the first and second instructions from the instruction cache and perform calculation, in which the core performs calculation by the first instructions from the instruction cache under a normal mode, and when the process enters a micro instruction mode, the core performs calculation by the third instructions corresponding to the plurality of micro instructions provided from the micro cache.

Подробнее
03-10-2013 дата публикации

APPARATUS AND METHOD FOR DYNAMICALLY MANAGING MEMORY ACCESS BANDWIDTH IN MULTI-CORE PROCESSOR

Номер: US20130262826A1
Принадлежит:

An apparatus and method are described for performing history-based prefetching. For example a method according to one embodiment comprises: determining if a previous access signature exists in memory for a memory page associated with a current stream; if the previous access signature exists, reading the previous access signature from memory; and issuing prefetch operations using the previous access signature. 1. A method for dynamically adjusting prefetch requests to improve performance in a multi-core processor comprising:setting a current throttling threshold to one of a plurality of selectable threshold levels;determining a ratio of a number of current mid-level cache (MLC) hits to MLC demands;throttling down prefetch requests if the ratio of the number of current MLC hits to MLC demands is below the currently selected throttling threshold level.2. The method as in wherein the plurality of selectable threshold levels includes a low throttle level of 25% or ¼ claim 1 , a medium throttle level of 50% or ½ and the high throttle level comprises 75% or ¾.3. The method as in further comprising:disabling least recently used (LRU) hints when the current threshold level is set at a low throttle level, medium throttle level, or high throttle level.4. The method as in further comprising:determining if the current prefetch detector has more than one demand pending; andif not, then setting the current throttling threshold level to No Throttle.5. An apparatus for dynamically adjusting prefetch requests to improve performance in a multi-core processor comprising:a mid-level cache (MLC) for caching instructions and data according to a specified cache management policy;a prefetcher unit to prefetch instructions from memory, the instructions to be prefetched being identified by a prefetch detector;a memory controller with dynamic throttling logic to perform the operations of:setting a current throttling threshold to one of a plurality of selectable threshold levels;determining a ...

Подробнее
28-11-2013 дата публикации

Micro-Staging Device and Method for Micro-Staging

Номер: US20130318194A1
Автор: Jeffrey L. Timbs
Принадлежит: Dell Products LP

A micro-staging device has a wireless interface module for detecting a first data request that indicates a presence of a user and an application processor that establishes a network connection to a remote data center. The micro-staging device further allocates a portion of storage in a cache memory storage device for storing pre-fetched workflow data objects associated with the detected user.

Подробнее
23-01-2014 дата публикации

METHOD AND APPARATUS FOR CONTROLLING FETCH-AHEAD IN A VLES PROCESSOR ARCHITECTURE

Номер: US20140025931A1
Принадлежит: Freescale Semiconductor, Inc.

There is provided a method for controlling fetch-ahead of Fetch Sets into a decoupling First In First Out (FIFO) buffer of a Variable Length Execution Set (VLES) processor architecture, wherein a Fetch Set comprises at least a portion of a VLES group available for dispatch to processing resources within the VLES processor architecture, comprising, for each cycle, determining a number of VLES groups available for dispatch from previously pre-fetched Fetch Sets, and only requesting a fetch-ahead of a next Fetch Set in the next cycle if one of a select set of criteria related to the number of VLES groups available for dispatch is true. 1. A method for controlling fetch-ahead of Fetch Sets into a decoupling First In First Out (FIFO) buffer of a Variable Length Execution Set (VLES) processor architecture , wherein a Fetch Set comprises at least a portion of a VLES group available for dispatch to processing resources within the VLES processor architecture , comprising , for each cycle:determining a number of VLES groups available for dispatch from previously pre-fetched Fetch Sets; and the number of VLES groups available for dispatch is less than a predetermined starvation threshold,', 'the number of VLES groups available for dispatch is indicative of going below a predetermined upper limit threshold, or', 'the number of VLES groups available for dispatch is indicative of being between the predetermined starvation threshold and the predetermined upper limit threshold, and a fetch-ahead of a Fetch Set occurred in an immediately previous cycle., 'requesting a fetch-ahead of a next Fetch Set in the next cycle only if one or more of the following is asserted2. The method of claim 1 , wherein said determining a number of VLES groups available for dispatch from previously pre-fetched Fetch Sets comprises:receiving an available VLES count value from a previous cycle indicative of a number of VLES groups available in an immediately previous cycle;determining a number of VLES ...

Подробнее
27-02-2014 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20140059333A1
Принадлежит: Intel Corp

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.

Подробнее
10-04-2014 дата публикации

INFORMATION HANDLING SYSTEM INCLUDING HARDWARE AND SOFTWARE PREFETCH

Номер: US20140101413A1
Автор: Heisch Randall Ray

A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time. 1. A method , comprising:receiving, by a prefetch optimizer tool of an information handling system (IHS), an instruction sequence of interest including a plurality of instructions with respective software prefetch instructions in advance of particular load instructions;instructing, by the prefetch optimizer tool, a hardware prefetch mechanism in a processor of the IHS to fetch instructions from a memory store to one of a plurality of hardware prefetch depths;disabling, by the prefetch optimizer tool, particular prefetch instructions in the instruction sequence of interest;measuring, by the prefetch optimizer tool, execution times of the instruction sequence of interest when the particular prefetch instructions are disabled, the measuring being conducted by the prefetch optimizer tool while the hardware prefetch mechanism is set to a particular prefetch depth; andchanging, by the prefetch optimizer tool, the hardware prefetch depth of the hardware ...

Подробнее
06-01-2022 дата публикации

MACHINE LEARNING ARCHITECTURE SUPPORT FOR BLOCK SPARSITY

Номер: US20220004597A1
Автор: Azizi Omid
Принадлежит:

This disclosure relates matrix operation acceleration for different matrix sparsity patterns. A matrix operation accelerator may be designed to perform matrix operations more efficiently for a first matrix sparsity pattern rather than for a second matrix sparsity pattern. A matrix with the second sparsity pattern may be converted to a matrix with the first sparsity pattern and provided to the matrix operation accelerator. By rearranging the rows and/or columns of the matrix, the sparsity pattern of the matrix may be converted to a sparsity pattern that is suitable for computation with the matrix operation accelerator. 1. A system , comprising:matrix operation circuitry configured to perform a matrix operation on a first matrix having a first sparsity to generate a second matrix; and generate the first matrix via a transformation of a third matrix from a second sparsity to the first sparsity;', 'in response to the matrix operation being performed on the first matrix to generate the second matrix, inversely transforming the second matrix to generate a result matrix; and, 'control circuitry communicatively coupled to the matrix operation circuitry and configured tooutput the result matrix.2. The system of claim 1 , wherein the transformation of the third matrix comprises rearranging a plurality of columns of the third matrix claim 1 , rearranging a plurality of rows of the third matrix claim 1 , or rearranging the plurality of columns and the plurality of rows.3. The system of claim 2 , wherein rearrangement of the plurality of columns or rearrangement of the plurality of rows is based at least in part on a read order of the third matrix from memory.4. The system of claim 3 , wherein the read order comprises a coprime stride.5. The system of claim 2 , wherein rearrangement of the plurality of columns or rearrangement of the plurality of rows is based at least in part on a size of the third matrix or the second sparsity.6. The system of claim 1 , wherein inversely ...

Подробнее
05-01-2017 дата публикации

SYSTEM AND METHOD FOR INSTRUCTION SET CONVERSION

Номер: US20170003967A1
Автор: Lin Kenneth Chenghao

An instruction set conversion system and method is provided, which can convert guest instructions to host instructions for processor core execution. Through configuration, instruction sets supported by the processor core are easily expanded. A method for real-time conversion between host instruction addresses and guest instruction addresses is also provided, such that the processor core can directly read out the host instructions from a higher level cache, reducing the depth of a pipeline. 1. An instruction set conversion method , comprising:converting guest instructions to host instructions, and creating mapping relationships between guest instruction addresses and host instruction addresses;storing the host instruction in a cache memory that is accessed directly by a processor core;based on the host instruction address, perform a cache addressing operation to fetch directly the corresponding host instruction for processor core execution; orafter converting the guest instruction address outputted by the processor core to the host instruction address based on the mapping relationship, perform the cache addressingoperation to fetch the corresponding host instruction for processor core execution.262-. (canceled) The present invention generally relates to computer, communication, and integrated circuit technologies.Currently, if one processor core executes programs belonging to different instruction sets, the most common method is to use a software virtual machine (or a virtual layer). The role of the virtual machine is to translate or interpret a program composed of an instruction set (guest instruction set) that is not supported by the processor core to generate the corresponding instructions in the instruction set (host instruction set) supported by the processor core for processor core execution. In general, during the operating process, an interpretation method fetches all fields including opcodes and operands in the guest instruction using the virtual machine in ...

Подробнее
04-01-2018 дата публикации

Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Номер: US20180004510A1
Принадлежит: Intel Corp

A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Подробнее
04-01-2018 дата публикации

OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING PRIORITIZED DEPENDENCY CHAIN RESOLUTION

Номер: US20180004527A1
Принадлежит:

Operation of a computer processor that includes: receiving a first instruction indicating a first target register; receiving, from an instruction fetch unit of the computer processor, a first instruction and a branch instruction; responsive to determining that the branch instruction is dependent upon a result of the first instruction, updating a priority value corresponding to the first instruction; and issuing, in dependence upon the priority value for the first instruction having a higher priority than a priority value for another instruction, the first instruction to an execution unit of the computer processor. 1. A method of operation of a computer processor , wherein the method comprises:receiving, from an instruction fetch unit of the computer processor, a first instruction and a branch instruction;responsive to determining that the branch instruction is dependent upon a result of the first instruction, updating a priority value corresponding to the first instruction; andissuing, in dependence upon the priority value for the first instruction having a higher priority than a priority value for another instruction, the first instruction to an execution unit of the computer processor.2. The method of claim 1 , wherein issuing the first instruction is further dependent upon the first instruction being a ready instruction among a plurality of ready instructions.3. The method of claim 2 , wherein the first instruction is a compare instruction claim 2 , and wherein execution of the compare instruction sets a condition code flag within a condition code register claim 2 , and wherein the branch instruction determines whether to branch in dependence upon a value of the condition code flag of the condition code register.4. The method of claim 3 , wherein determining that the branch instruction is dependent upon the result of the first instruction comprises decoding the branch instruction to determine that the branch instruction depends upon the condition code flag of the ...

Подробнее
07-01-2021 дата публикации

Energy Efficient Processor Core Architecture for Image Processor

Номер: US20210004232A1
Принадлежит:

An apparatus that includes a program controller to fetch and issue instructions is described. The apparatus includes an execution lane having at least one execution unit to execute the instructions. The execution lane is part of an execution lane array that is coupled to a two dimensional shift register array structure, wherein, execution lane s of the execution lane array are located at respective array locations and are coupled to dedicated registers at same respective array locations in the two-dimensional shift register array. 1. (canceled)2. A system comprising:one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: a plurality of random access memories, and', 'a plurality of execution lanes, wherein different groups of the execution lanes are assigned to use a different respective random access memory of the plurality of random access memories;, 'receiving an input program to be executed on a device comprisingdetermining that the input program specifies two or more execution lanes in a same group of the plurality of execution lanes to compete for different memory locations in a same random access memory of the plurality of random access memories; andin response, modifying the input program to generate multiple instructions that cause execution lanes within each group to access a respective random access memory sequentially.3. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of rows of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different rows of the execution lanes.4. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of columns of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different columns of the execution lanes.5. The system of claim 2 , wherein the ...

Подробнее
07-01-2021 дата публикации

Pipeline flattener with conditional triggers

Номер: US20210004236A1
Принадлежит: Texas Instruments Inc

A semiconductor device comprising a processor having a pipelined architecture and a pipeline flattener and a method for operating a pipeline flattener in a semiconductor device are provided. The processor comprises a pipeline having a plurality of pipeline stages and a plurality of pipeline registers that are coupled between the pipeline stages. The pipeline flattener comprises a plurality of trigger registers for storing a trigger, wherein the trigger registers are coupled between the pipeline stages.

Подробнее
02-01-2020 дата публикации

PREFETCHER FOR DELINQUENT IRREGULAR LOADS

Номер: US20200004541A1
Принадлежит:

Disclosed embodiments relate to a prefetcher for delinquent irregular loads. In one example, a processor includes a cache memory, fetch and decode circuitry to fetch and decode instructions from a memory; and execution circuitry including a binary translator (BT) to respond to the decoded instructions by storing a plurality of decoded instructions in a BT cache, identifying a delinquent irregular load (DIRRL) among the plurality of decoded instructions, determining whether the DIRRL is prefetchable, and, if so, generating a custom prefetcher to cause the processor to prefetch a region of instructions leading up to the prefetchable DIRRL. 1. A method to be performed by a processor , the processor comprising:a cache memory;fetch and decode circuitry to fetch and decode instructions from a memory; and storing a plurality of the decoded instructions in a BT cache;', 'identifying a delinquent irregular load (DIRRL) among the stored instructions;', 'determining whether the DIRRL is prefetchable; and', 'if so, generating a custom prefetcher to cause the processor to prefetch a region of instructions leading up to the prefetchable DIRRL., 'execution circuitry comprising a binary translator (BT) to respond to the decoded instructions by2. The method of claim 1 , wherein the DIRRL is a delinquent load that experiences greater than a first threshold number of cache misses on successive dynamic instances.3. The method of claim 2 , wherein the DIRRL is an irregular load having at least a second threshold number of address deltas among its successive dynamic instances claim 2 , and wherein the second threshold number of address deltas covers less than a third threshold number of successive dynamic instances.4. The method of claim 3 , wherein the execution circuitry is to compute a backslice between two successive dynamic instances of the DIRRL claim 3 , and to determine that the DIRRL is prefetchable when the backslice comprises cycles made entirely of non-memory operations or ...

Подробнее
03-01-2019 дата публикации

STREAMING ENGINE WITH EARLY EXIT FROM LOOP LEVELS SUPPORTING EARLY EXIT LOOPS AND IRREGULAR LOOPS

Номер: US20190004798A1
Автор: Zbiciak Joseph
Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Upon a stream break instruction specifying one of the nested loops, the stream engine ends a current iteration of the loop. If the specified loop was not the outermost loop, the streaming engine begins an iteration of a next outer loop. If the specified loop was the outermost nested loop, the streaming engine ends the stream. The streaming engine places a vector of data elements in order in lanes within a stream head register. A stream break instruction is operable upon a vector break. 1. A digital data processor comprising:an instruction memory storing instructions each specifying a data processing operation and at least one data operand field;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one functional unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results in an instruction specified data register; an address generator for generating stream memory addresses corresponding to said stream of an instruction specified sequence of a plurality of data elements in said plural nested loops,', 'a stream head register storing a data element of said stream next to be used by said at least one operational unit;, 'a streaming engine connected to said instruction decoder operable in response to a stream start instruction to recall from memory a stream of an instruction specified sequence of a plurality of data elements including plural nested ...

Подробнее
03-01-2019 дата публикации

STREAM ENGINE WITH ELEMENT PROMOTION AND DECIMATION MODES

Номер: US20190004799A1
Автор: Zbiciak Joseph
Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to operational units for use as operands. A promotion unit optionally increases date element data size by an integral power of 2 either zero filing or sign filling the additional bits. A decimation unit optionally decimates data elements by an integral factor of 2. For ease of implementation the promotion factor must be greater than or equal to the decimation factor. 1. A digital data processor comprising:an instruction memory storing instructions each specifying a data processing operation and at least one data operand field;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one operational unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results; an element decimation unit receiving data elements of said stream recalled from memory, said element decimation unit operable to omit received data according to an instruction specified decimation factor, and', 'a stream head register receiving data elements from said element decimation unit, storing at least one data element of said stream next to be used by said at least one operational unit; and, 'a streaming engine connected to said instruction decoder operable in response to a stream start instruction to recall from memory a stream of an instruction specified sequence of a plurality of data elements, said streaming engine including'}wherein said at least one operational unit is responsive to a stream operand instruction to ...

Подробнее
03-01-2019 дата публикации

Instructions for vector operations with constant values

Номер: US20190004801A1
Принадлежит: Intel Corp

Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.

Подробнее
03-01-2019 дата публикации

STREAM PROCESSOR WITH DECOUPLED CROSSBAR FOR CROSS LANE OPERATIONS

Номер: US20190004814A1
Принадлежит:

Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands. 1. A system comprising:a multi-lane execution pipeline;a vector register file; anda crossbar; retrieve a plurality of data operands from the vector register file;', 'convey the plurality of data operands to the multi-lane execution pipeline via the crossbar responsive to determining a permutation is required; and', 'convey the plurality of data operands to the multi-lane execution pipeline by bypassing the crossbar responsive to determining a permutation is not required., 'wherein the system is configured to2. The system as recited in claim 1 , wherein the crossbar comprises multiple layers claim 1 , and wherein the system is further configured to:perform a first permutation of data operands across lanes of the multi-lane execution pipeline with a first layer of N×N crossbars, wherein N is a positive integer; andperform a second permutation of data operands across lanes of the multi-lane execution pipeline with a second layer of N×N crossbars.3. The system as recited in claim 1 , wherein the crossbar comprises a first N/2-by-N/2 ...

Подробнее
01-01-2015 дата публикации

PREDICTIVE FETCHING AND DECODING FOR SELECTED RETURN INSTRUCTIONS

Номер: US20150006854A1
Принадлежит:

Predictive fetching and decoding for selected instructions. A determination is made as to whether an instruction to be executed in a pipelined processor is a selected return instruction, the pipelined processor having a plurality of stages including an execute stage. Based on the instruction being the selected return instruction, obtaining from a data structure a predicted return address, the predicted return address being an address of an instruction to which it is predicted that processing is to be returned. Additionally, based on the instruction being the selected return instruction, operating state for the instruction at the predicted return address is predicted. The instruction is fetched at the predicted return address, prior to the selected return instruction reaching the execute stage, and decoding of the fetched instruction is initiated based on the predicted operating state. 1. A computer program product for facilitating processing within a processing environment , the computer program product comprising: determining whether an instruction to be executed in a pipelined processor is a selected return instruction, the pipelined processor having a plurality of stages including an execute stage;', 'based on the instruction being the selected return instruction, obtaining from a data structure a predicted return address, the predicted return address being an address of an instruction to which it is predicted that processing is to be returned;', 'based on the instruction being the selected return instruction, predicting operating state for the instruction at the predicted return address;', 'fetching the instruction at the predicted return address, prior to the selected return instruction reaching the execute stage; and', 'initiating decoding of the fetched instruction based on the predicted operating state., 'a computer readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a ...

Подробнее
01-01-2015 дата публикации

PREDICTIVE FETCHING AND DECODING FOR SELECTED INSTRUCTIONS

Номер: US20150006855A1
Принадлежит:

Predictive fetching and decoding for selected instructions (e.g., operating system instructions, hypervisor instructions or other such instructions). A determination is made that a selected instruction, such as a system call instruction, an asynchronous interrupt, a return from system call instruction or return from asynchronous interrupt, is to be executed. Based on determining that such an instruction is to be executed, a predicted address is determined for the selected instruction, which is the address to which processing transfers in order to provide the requested services. Then, fetching of instructions beginning at the predicted address prior to execution of the selected instruction is commenced. Further, speculative state relating to a selected instruction, including, for instance, an indication of the privilege level of the selected instruction or instructions executed on behalf of the selected instruction, is predicted and maintained. 1. A computer program product for facilitating processing within a processing environment , the computer program product comprising: predicting that a selected instruction is to execute in a pipelined processor, the pipelined processor having a plurality of stages of processing including an execute stage, and the selected instruction having a first privilege level and one or more other instructions executing in the pipelined processor having a second privilege level different from the first privilege level;', 'based on predicting the selected instruction is to execute, predicting an entry address for the selected instruction and operating state associated therewith, the entry address indicating a location at which an instruction is to be fetched based on the selected instruction;', 'based on predicting the entry address, fetching the instruction at the entry address prior to the selected instruction reaching the execute stage; and', 'initiating decoding of the fetched instruction based on the predicted operating state., 'a ...

Подробнее
14-01-2016 дата публикации

INSTRUCTION SET FOR ELIMINATING MISALIGNED MEMORY ACCESSES DURING PROCESSING OF AN ARRAY HAVING MISALIGNED DATA ROWS

Номер: US20160011870A1
Принадлежит:

A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction format of the instruction specifies a first input vector, a second input vector and a third input operand. The instruction execution pipeline comprises an instruction decode stage to decode the instruction. The instruction execution pipeline includes a functional unit to execute the instruction. The functional unit includes a routing network to route a first contiguous group of elements from a first end of one of the input vectors to a second end of the instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of the input vectors to a first end of the instruction's resultant vector. The first and second ends are opposite vector ends. The first and second groups of contiguous elements are defined from the third input operand. The instruction is not capable of routing non-contiguous groups of elements from the input vectors to the instruction's resultant vector. A software pipeline that uses the instruction is also described 1. A processor , comprising: an instruction fetch stage to fetch an instruction, the instruction format of the instruction specifying a first input vector, a second input vector and a third input operand;', 'an instruction decode stage to decode said instruction;', "a functional unit to execute the instruction, the functional unit including a routing network to route a first contiguous group of elements from a first end of one of said input vectors to a second end of said instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of said input vectors to a first end of said instruction's resultant vector, said first and second ends being opposite vector ends, wherein, said first and second groups of contiguous elements are defined from said third input operand, said ...

Подробнее
14-01-2016 дата публикации

INSTRUCTION FOR IMPLEMENTING VECTOR LOOPS OF ITERATIONS HAVING AN ITERATION DEPENDENT CONDITION

Номер: US20160011873A1
Автор: PLOTNIKOV Mikhail
Принадлежит:

A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction identifies an input vector operand whose input elements specify one or the other of two states. The instruction execution pipeline also includes an instruction decoder to decode the instruction. The instruction execution pipeline also includes a functional unit to execute the instruction and provide a resultant output vector. The functional unit includes logic circuitry to produce an element in a specific element position of the resultant output vector by performing an operation on a value derived from a base value using a stride in response to one but not the other of the two states being present in a corresponding element position of the input vector operand. 1. An apparatus , comprising: an instruction fetch stage to fetch an instruction, the instruction identifying an input vector operand whose input elements specify one or the other of two states;', 'an instruction decoder to decode the instruction;', 'a functional unit to execute the instruction and provide a resultant output vector, the functional unit including logic circuitry to produce an element in a specific element position of the resultant output vector by performing an operation on a value derived from a base value using a stride in response to one but not the other of the two states being present in a corresponding element position of the input vector operand., 'an instruction execution pipeline comprising2. The apparatus of wherein the corresponding element position of the input vector operand is the same element position as the specific element position of the resultant output vector.3. The apparatus of wherein the corresponding element position of the input vector operand is the immediately preceding element position of the specific element position of the resultant output vector.4. The apparatus of wherein a second input ...

Подробнее
11-01-2018 дата публикации

INSTRUCTION PRE-FETCHING

Номер: US20180011735A1
Принадлежит:

Pre-fetching instructions for tasks of an operating system (OS) is provided by calling a task scheduler that determines a load start time for a set of instructions for a particular task corresponding to a task switch condition. The OS calls, and in response to the load start time, a loader entity module that generates a pre-fetch request that loads the set of instructions for the particular task from a non-volatile memory circuit into a random access memory circuit. The OS calls the task scheduler to switch to the particular task. 1. A method for pre-fetching instructions for tasks under the control of an operating system (OS) , the method comprising: identifying a future task switch condition corresponding to a particular task and a task switch time;', 'accessing, in response to identifying the task switch condition, a task information queue to retrieve task information for the particular task; and', 'determining, based upon the task information and the task switch time, a load start time for a set of instructions for the particular task;, 'calling, by the OS, a task scheduler in response to a task scheduler event, the task schedulercalling, in response to the load start time and by task scheduler, a loader entity module, the loader entity module generating, a pre-fetch request to load the set of instructions for the particular task from a non-volatile memory circuit into a random access memory circuit; andswitching, by the task scheduler and in response to the task switch time, to the particular task.2. The method of claim 1 , wherein the set of instructions includes less than all instructions in the particular task and less than all constant data for the particular task claim 1 , and wherein the loader entity module generates additional load times for additional sets of instructions for the particular task.3. The method of claim 1 , wherein identifying the future task switch condition further comprises determining that a priority for the particular task is higher ...

Подробнее
14-01-2021 дата публикации

METHODS AND APPARATUS TO DYNAMICALLY ENABLE AND/OR DISABLE PREFETCHERS

Номер: US20210011726A1
Принадлежит:

Disclosed Methods, Apparatus, and articles of manufacture to dynamically enable and/or disable prefetchers are disclosed. An example apparatus include an interface to access telemetry data, the telemetry data corresponding to a counter of a core in a central processing unit, the counter corresponding to a first phase of a workload executed at the central processing unit; a prefetcher state selector to select a prefetcher state for a subsequent phase based on the telemetry data; and the interface to instruct the core in the central processing unit to operate in the subsequent phase according to the prefetcher state. 1. An apparatus to dynamically enable or disable a prefetcher , the apparatus comprising:an interface to access telemetry data, the telemetry data corresponding to a counter of a core in a central processing unit, the counter corresponding to a first phase of a workload executed at the central processing unit;a prefetcher state selector to select a prefetcher state for a subsequent phase based on the telemetry data; andthe interface to instruct the core in the central processing unit to operate in the subsequent phase according to the prefetcher state.2. The apparatus of claim 1 , further including a score determiner to determine a telemetry score for the core at the first phase using the telemetry data claim 1 , the telemetry score corresponding to instructions per second for the first phase.3. The apparatus of claim 2 , wherein the prefetcher state is a first prefetcher state claim 2 , the prefetcher state selector to:determine a first value corresponding to first telemetry scores for a second prefetcher state across a plurality of previous phases; anddetermine a second value corresponding to second telemetry scores for a third prefetcher state across the plurality of previous phases, the third prefetcher state different than the first prefetcher state.4. The apparatus of claim 3 , wherein the prefetcher state selector is to select the first prefetcher ...

Подробнее
14-01-2021 дата публикации

Handling Trace Data for Jumps in Program Flow

Номер: US20210011833A1
Автор: Robertson Iain
Принадлежит:

A processor supervisory unit for monitoring the program flow executed by a processor, the supervisory unit being arranged to store a set of values representing locations to which the program flow is expected to return after jumps in the program flow, the unit being capable of: in a first mode, on detecting a jump in the program flow to store a location value representing a location to which the program flow is expected to return from that jump; and in a second mode, on detecting a jump in the program flow to increment a counter associated with a location value representing a location to which the program flow is expected to return from that jump. 1. A processor supervisory unit configured to monitor the program flow executed by a processor , the supervisory unit being arranged to store a set of values representing locations to which the program flow is expected to return after jumps in the program flow , the unit being capable of:in a first mode, on detecting a jump in the program flow to store a location value representing a location to which the program flow is expected to return from that jump; andin a second mode, on detecting a jump in the program flow to increment a counter associated with a location value representing a location to which the program flow is expected to return from that jump.2. The processor supervisory unit as claimed in claim 1 , wherein the unit is configured to store no more than a predetermined number of location values simultaneously by operating in the first mode.3. The processor supervisory unit as claimed in claim 1 , wherein the unit is configured to claim 1 , on detecting a return in the program flow that corresponds to a jump for which a location value has been stored by operation in the first mode or for which a counter has been incremented by operation in the second mode:compare (i) the actual location to which program flow has returned with (ii) the location to which program flow was expected to return as indicated by the ...

Подробнее
03-02-2022 дата публикации

MICROPROCESSOR THAT FUSES LOAD AND COMPARE INSTRUCTIONS

Номер: US20220035634A1
Принадлежит:

Technology for fusing certain load instructions and compare-immediate instructions in a computer processor having a load-store architecture with respect to transferring data between memory and registers of the computer processor. In some embodiments the load and compare-immediate instructions are consecutive. In some embodiments, the instructions are only merged if: (i) the respective RA and RT fields of the two instructions match; (ii) the immediate field of the compare-immediate instruction has a certain value, or falls within a range of certain values; and/or (iii) the instructions are received in a consecutive manner. 2. The CIM of wherein the load instruction and the compare-immediate instruction are consecutive instructions.3. The CIM of further comprising:determining, by the instruction fetch unit, that an immediate field of the compare-immediate instruction has a value of 0, 1, or −1;wherein the fusion is further responsive to the determination that an immediate field of the compare-immediate instruction has a value of 0, 1, or −1.417-. (canceled)18. A computer-implemented method (CIM) comprising:receiving, by an instruction fetch unit of a load-store architecture style processor, a load instruction including an RT field value;receiving, by the instruction fetch unit, a compare-immediate instruction including an RA field value;determining that the load instruction and the compare-immediate instruction are consecutive instructions;determining, by the instruction fetch unit, that an immediate field of the compare-immediate instruction has a value that falls within a predetermined range of values;responsive to the determination that the load instruction and the compare-immediate instruction are consecutive instructions and further responsive to the determination that that an immediate field of the compare-immediate instruction has a value that falls within a predetermined range of values, fusing, by the instruction fetch unit, the load instruction and the ...

Подробнее
03-02-2022 дата публикации

INSTRUCTION DISPATCH ROUTING

Номер: US20220035636A1
Принадлежит:

A method of instruction dispatch routing comprises receiving an instruction for dispatch to one of a plurality of issue queues; determining a priority status of the instruction; selecting a rotation order based on the priority status, wherein a first rotation order is associated with priority instructions and a second rotation order, different from the first rotation order, is associated with non-priority instructions; selecting an issue queue of the plurality of issue queues based on the selected rotation order; and dispatching the instruction to the selected issue queue. 1. A method comprising:receiving an instruction for dispatch to one of a plurality of issue queues;determining a priority status of the instruction, wherein an instruction type designated as priority is determined based on historical data of instructions types dispatched in previous cycles;selecting a rotation order based on the priority status, wherein a first rotation order is associated with priority instructions and a second rotation order, different from the first rotation order, is associated with non-priority instructions;selecting an issue queue of the plurality of issue queues based on the selected rotation order; anddispatching the instruction to the selected issue queue.2. The method of claim 1 , wherein determining the priority status of the instruction includes checking a bit corresponding to the instruction in a vector received from an instruction fetch unit.3. The method of claim 1 , wherein determining the priority status of the instruction is based on an instruction type of the instruction;wherein one instruction type is designated as priority and at least one instruction type is designated as non-priority;wherein the first rotation order is adjusted to begin at a next issue queue in the first rotation order after a last used issue queue for the instruction type designated as priority; andwherein the second rotation order is adjusted to be opposite to the first rotation order.4. ...

Подробнее
19-01-2017 дата публикации

EXECUTION OF MICRO-OPERATIONS

Номер: US20170017490A1
Автор: CAULFIELD Ian Michael
Принадлежит:

Processing circuitry includes execute circuitry for executing micro-operations in response to instructions fetched from a data store. Control circuitry is provided to determine, based on availability of at least one processing resource, how many micro-operations are to be executed by the execute circuitry in response to a given set of one or more instructions fetched from the data store. 1. Processing circuitry comprising:execute circuitry to execute micro-operations in response to instructions fetched from a data store; andcontrol circuitry to determine, in dependence on availability of at least one processing resource, how many micro-operations are to be executed by the execute circuitry in response to a given set of one or more instructions fetched from the data store.2. The processing circuitry according to claim 1 , wherein the given set of one or more instructions is for triggering the execute circuitry to execute a plurality of processing steps; andthe control circuitry is configured to determine, in dependence on said availability of said at least one processing resource, whether to control the execute circuitry to execute a compound micro-operation corresponding to at least two of said plurality of processing steps, or individual micro-operations each corresponding to one of said at least two of said plurality of processing steps.3. The processing circuitry according to claim 1 , wherein said at least one processing resource comprises at least one operand required for processing of said given set of one or more instructions.4. The processing circuitry according to claim 1 , wherein the given set of one or more instructions is for triggering the execute circuitry to perform at least a first processing step requiring a first operand set comprising one or more operands and a second processing step requiring a second operand set comprising one or more operands.5. The processing circuitry according to claim 4 , wherein said control circuitry is configured to ...

Подробнее
19-01-2017 дата публикации

APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS

Номер: US20170017491A1
Принадлежит:

An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time. 1. A processor comprising:a plurality of simultaneous multithreading (SMT) cores, each of the SMT cores to perform out-of-order instruction execution for a plurality of threads;at least one shared cache circuit to be shared among two or more the of SMT cores; an instruction fetch circuit to fetch instructions of one or more of the threads,', 'an instruction decode circuit to decode the instructions,', 'a register renaming circuit to rename registers of a register file,', 'an instruction cache circuit to store instructions to be executed, and', 'a data cache circuit to store data;, 'at least one of ...

Подробнее
19-01-2017 дата публикации

APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS

Номер: US20170017492A1
Принадлежит:

An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time. 1. A system comprising:a plurality of processors;a first interconnect to communicatively couple two or more of the plurality of processors;a second interconnect to communicatively couple one or more of the plurality of processors to one or more other system components; anda system memory communicatively coupled to one or more of the processors; a plurality of simultaneous multithreading (SMT) cores, each of the SMT cores to perform out-of-order instruction execution for a plurality of threads;', 'at least one shared cache circuit to be shared among two or more the of SMT cores;', an instruction ...

Подробнее
21-01-2016 дата публикации

Prefetching instructions in a data processing apparatus

Номер: US20160019065A1
Принадлежит: ARM LTD

A data processing apparatus has prefetch circuitry for prefetching cache lines of instructions into an instruction cache. A prefetch lookup table is provided for storing prefetch entries, with each entry corresponding to a region of a memory address space and identifying at least one block of one or more cache lines within the corresponding region from which processing circuitry accessed an instruction on a previous occasion. When the processing circuitry executes an instruction from a new region, the prefetch circuitry looks up the table, and if it stores a prefetch entry for the new region, then the at least one block identified by the corresponding entry is prefetched into the cache.

Подробнее
15-01-2015 дата публикации

ANTICIPATED PREFETCHING FOR A PARENT CORE IN A MULTI-CORE CHIP

Номер: US20150019841A1
Принадлежит:

Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. A method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core. 1. A computer program product for prefetching data on a chip having a scout core and a parent core coupled to the scout core , the computer program product comprising: determining that a program executed by the parent core requires content stored in a location remote from the parent core;', 'sending a fetch table address determined by the parent core to the scout core based on determining that content remote from the parent core is required;', 'accessing a fetch table by the scout core, the fetch table indicated by the fetch table address, the fetch table indicating how many of pieces of content are to be fetched by the scout core and a location of the pieces of content;', 'based on the fetch table indicating. fetching the pieces of content by the scout core; and', 'returning the fetched pieces of content to the parent core., 'a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising2. The computer program product of claim 1 , wherein the fetch table is an address fetch table that includes a series of addresses indicating a specific location of where the pieces of content are stored on the computer ...

Подробнее
18-01-2018 дата публикации

INFORMATION PROCESSING DEVICE, STORAGE MEDIUM, AND METHOD

Номер: US20180018153A1
Автор: MUKAI Yuta
Принадлежит: FUJITSU LIMITED

A device includes a processor configured to: divide loop in a program into first loop and second loop when compiling the program, the loop accessing data of an array and prefetching data of the array to be accessed at a repetition after prescribed repetitions at each repetition, the first loop including one or more repetitions from an initial repetition to a repetition immediately before the repetition after the prescribed repetitions, the second loop including one or more repetitions from the repetition after the prescribed repetitions to a last repetition, and generate an intermediate language code configured to access data of the array using a first region in a cache memory and prefetch data of the array using a second region in the cache memory in the first loop, and to access and prefetch data of the array using the second region in the second loop. 1. An information processing device comprising:a memory; and divide loop processing in a program into first loop processing and second loop processing when compiling the program, the loop processing accessing data of an array and prefetching data of the array to be accessed at a repetition processing after prescribed repetition processings at each repetition processing in the loop processing, the first loop processing including one or more repetition processings from an initial repetition processing to a repetition processing immediately before the repetition processing after the prescribed repetition processings, the second loop processing including one or more repetition processings from the repetition processing after the prescribed repetition processings to a last repetition processing, and', 'generate an intermediate language code based on the program when compiling the program, the intermediate language code being configured to access data of the array by using a first region in a cache memory and prefetch data of the array by using a second region in the cache memory in the first loop processing, and to ...

Подробнее
16-01-2020 дата публикации

INSTRUCTION-BASED NON-DETERMINISTIC FINITE STATE AUTOMATA ACCELERATOR

Номер: US20200019404A1
Принадлежит:

An example processing device includes a memory including a non-deterministic finite automata (NFA) buffer configured to store a plurality of instructions defining an ordered sequence of instructions of at least a portion of an NFA graph, the portion of the NFA graph comprising a plurality of nodes arranged along a plurality of paths. The NFA engine determines a current symbol and one or more subsequent symbols of a payload segment that satisfy a match condition specified by a subset of instructions of the plurality of instructions for a path of the plurality of paths and in response to determining the current symbol and the one or more subsequent symbols of the payload segment that satisfy the match condition, outputs an indication that the payload data has resulted in a match. 1. A processing device comprising:a memory including a non-deterministic finite automata (NFA) buffer configured to store a plurality of instructions defining an ordered sequence of instructions of at least a portion of an NFA graph, the portion of the NFA graph comprising a plurality of nodes arranged along a plurality of paths; and a program counter storing a value defining a next instruction of the plurality of instructions; and', determine the current symbol and one or more subsequent symbols of the payload segment that satisfy a match condition specified by a subset of instructions of the plurality of instructions for a path of the plurality of paths, the subset of instructions comprising the next instruction and one or more subsequent instructions of the plurality of instructions; and', 'in response to determining the current symbol and the one or more subsequent symbols of the payload segment that satisfy the match condition, output an indication that the payload data has resulted in a match., 'a payload offset memory storing a value defining a position of a current symbol in an ordered sequence of symbols of a payload segment of payload data, the NFA engine further comprising a ...

Подробнее
26-01-2017 дата публикации

Operating a Pipeline Flattener in Order to Track Instructions for Complex Breakpoints

Номер: US20170024217A1
Принадлежит:

A semiconductor device comprising a processor having a pipelined architecture and a pipeline flattener and a method for operating a pipeline flattener in a semiconductor device are provided. The processor comprises a pipeline having a plurality of pipeline stages and a plurality of pipeline registers that are coupled between the pipeline stages. The pipeline flattener comprises a plurality of trigger registers for storing a trigger, wherein the trigger registers are coupled between the pipeline stages. 1. A semiconductor device comprising a processor having a pipelined architecture and a pipeline flattener , the processor comprising a pipeline having a plurality of pipeline stages and a plurality of pipeline registers which are coupled between the pipeline stages , wherein the pipeline flattener comprises a plurality of trigger registers for storing a trigger , wherein the trigger registers are coupled between the pipeline stages and wherein the pipeline flattener is configured to set the trigger register of the pipeline stage receiving an instruction to a predetermined trigger value indicating that the received instruction is selected for debug tracking , forward the trigger through the trigger registers of the pipeline together with the received instruction , determine whether the trigger indicates that the assigned instruction is selected for debug tracking and if so , provide the tracked debug data to a debug unit of the semiconductor device.2. The semiconductor device according to claim 1 , wherein the trigger registers are coupled in parallel to the respective pipeline register for each pipeline stage.3. The semiconductor device according to claim 2 , wherein the pipeline flattener comprises a counter unit for providing a sequential counter to each instruction entering the pipeline and further comprises a plurality of counter registers which are configured to receive the counter from the counter unit claim 2 , wherein the counter registers are coupled in ...

Подробнее
28-01-2016 дата публикации

POWER MANAGEMENT SYSTEM, SYSTEM-ON-CHIP INCLUDING THE SAME AND MOBILE DEVICE INCLUDING THE SAME

Номер: US20160026498A1
Автор: LEE JAE-GON, SONG Jin-ook
Принадлежит:

A power management system controlling power for a plurality of functional blocks included in a system-on-chip includes a plurality of programmable nano controllers, an instruction memory and a signal map memory. The instruction memory is shared by the nano controllers and stores a plurality of instructions that are used by the nano controllers. The signal map memory is shared by the nano controllers and stores a plurality of signals that are provided to the functional blocks and are controlled by the nano controllers. A first nano controller among the plurality of nano controllers is programmed as a central sequencer. Second through n-th nano controllers among the plurality of nano controllers are programmed as first sub-sequencers that are dependent on the first nano controller. 1. A power management system for controlling power to a plurality of functional blocks included in a system-on-chip , the power management system comprising:a plurality of nano controllers including first through n-th nano controllers that are programmable nano controllers, where n is a natural number that is greater than or equal to two;an instruction memory that is shared by the plurality of nano controllers, the instruction memory including a plurality of instructions stored therein that are used by the plurality of nano controllers; anda signal map memory that is shared by the plurality of nano controllers, the signal map memory configured to store a plurality of signals that are provided to the plurality of functional blocks under the control of one or more of the plurality of nano controllers,wherein the first nano controller is programmed as a central sequencer, and the second through n-th nano controllers are programmed as first sub-sequencers that are dependent on the first nano controller.2. The power management system of claim 1 , wherein each of the plurality of nano controllers includes:an instruction address generator that is configured to generate a target instruction address ...

Подробнее
22-01-2015 дата публикации

Compiler-control Method for Load Speculation In a Statically Scheduled Microprocessor

Номер: US20150026444A1
Принадлежит: Texas Instruments Inc

A statically scheduled processor compiler schedules a speculative load in the program before the data is needed. The compiler inserts a conditional instruction confirming or disaffirming the speculative load before the program behavior changes due to the speculative load. The condition is not based solely upon whether the speculative load address is correct but preferably includes dependence according to the original source code. The compiler may statically schedule two or more branches in parallel with orthogonal conditions.

Подробнее
25-01-2018 дата публикации

DETERMINING THE EFFECTIVENESS OF PREFETCH INSTRUCTIONS

Номер: US20180024836A1
Принадлежит:

Effectiveness of prefetch instructions is determined. A prefetch instruction is executed to request that data be fetched into a cache of the computing environment. The effectiveness of the prefetch instruction is determined. This includes updating, based on executing the prefetch instruction, a cache directory of the cache. The updating includes, in the cache directory, effectiveness data relating to the data. The effectiveness data includes whether the data was installed in the cache based on the prefetch instruction. Additionally, the determining the effectiveness includes obtaining at least a portion of the effectiveness data from the cache directory, and using the at least a portion of effectiveness data to determine the effectiveness of the prefetch instruction. 1. A computer program product for facilitating processing within a computing environment , said computer program product comprising: executing a prefetch instruction to request that data be fetched into a cache of the computing environment; and', updating, based on executing the prefetch instruction, a cache directory of the cache, the updating comprising including, in the cache directory, effectiveness information relating to the data, the effectiveness information including whether the data was installed in the cache based on the prefetch instruction;', 'obtaining at least a portion of the effectiveness information from the cache directory; and', 'using the at least a portion of the effectiveness information to determine the effectiveness of the prefetch instruction., 'determining effectiveness of the prefetch instruction, the determining the effectiveness comprising], 'a computer readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising2. The computer program product of claim 1 , wherein based on executing the prefetch instruction and the data being missing from the cache claim 1 , the data is installed ...

Подробнее
24-01-2019 дата публикации

DUAL DATA STREAMS SHARING DUAL LEVEL TWO CACHE ACCESS PORTS TO MAXIMIZE BANDWIDTH UTILIZATION

Номер: US20190026111A1
Принадлежит:

A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck. 1. A processing device comprising:a memory;first and second command queues each being configured to supply memory addresses;an arbiter coupled to the first command queue and to the second command queue to receive the memory addresses supplied by the first and second command queues, the arbiter being configured to select a memory address supplied by either the first command queue or the second command queue as a selected memory address based at least partially on a selected preference setting applied to the arbiter; andan interface coupled to the arbiter and to the memory, the interface being configured to submit a memory access request corresponding to the selected memory address to the memory.2. The processing device of claim 1 , further comprising an arbitration controller coupled to the arbiter claim 1 , the arbitration controller being configured to apply to the arbiter a selected preference claim 1 , wherein the selected preference setting of the arbiter is selected as one of a first preference setting in which the first command queue is a preferred command queue and the second command queue is a non-preferred command queue and a second preference setting in which the second command queue is a preferred command queue and the ...

Подробнее
28-01-2021 дата публикации

INSTRUCTION CACHE COHERENCE

Номер: US20210026770A1
Принадлежит:

A data processing apparatus is provided, which includes a cache to store operations produced by decoding instructions fetched from memory. The cache is indexed by virtual addresses of the instructions in the memory. Receiving circuitry receives an incoming invalidation request that references a physical address in the memory. Invalidation circuitry invalidates entries in the cache where the virtual address corresponds with the physical address. Coherency is thereby achieved when using a cache that is indexed using virtual addresses. 1. A data processing apparatus comprising:a cache to store operations produced by decoding instructions fetched from memory, wherein the cache is indexed by virtual addresses of the instructions in the memory;receiving circuitry to receive an incoming invalidation request, wherein the incoming invalidation request references a physical address in the memory; andinvalidation circuitry to invalidate entries in the cache where the virtual address corresponds with the physical address.2. The data processing apparatus according to claim 1 , comprisingcorrespondence table storage circuitry to store indications of physical addresses of the instructions fetched from the memory.3. The data processing apparatus according to claim 2 , whereinin response to the incoming invalidation request, the invalidation circuitry is adapted to determine a correspondence table index of the correspondence table storage circuitry that corresponds with the physical address referenced in the incoming invalidation request.4. The data processing apparatus according to claim 3 , whereinthe cache is adapted to store one of the operations in association with one of the correspondence table indexes of the correspondence table storage circuitry containing one of the indications of the physical addresses of one of the instructions that decodes to the one of the operations.5. The data processing apparatus according to claim 2 , whereinthe indications of the physical ...

Подробнее
02-02-2017 дата публикации

HYBRID COMPUTING MODULE

Номер: US20170031413A1
Принадлежит:

A hybrid system-on-chip provides a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fully integrated power management system that switches DC power at speeds that match or approach processor core dock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die. 1. A hybrid computing module comprising:a plurality of semiconductor die mounted upon a carrier comprising a substrate that provides signal communication between a plurality of said semiconductor die and passive circuit elements formed upon the carrier substrate;a fully integrated power management circuit module having a resonant gate transistor that switches electrical power at speeds that synchronously transfer data and digital process information sets between said plurality of semiconductor dies;at least one microprocessor die among the plurality of semiconductor die; and,a memory bank; and,at least one electro-optic interface.2. The hybrid computing module of , wherein the hybrid computing module in includes an additional fully integrated power management module that is frequency off-stepped from the fully integrated power module to supply power to circuit elements at a slower switching speed.3. The hybrid computing module of , wherein the additional fully integrated power management module in supplies power to a baseband processor.4. The hybrid computing module of wherein the plurality of semiconductor die in provide field programmability , main memory control/arbitration , application-specific , bus management , or analog-to-digital and/or digital-to-analog functionality.5. The hybrid computing module of wherein the microprocessor die in is a CPU or CPU.6. The hybrid computing module of , wherein the microprocessor die in comprise multiple processing cores.7. The hybrid computing module of , wherein the plurality of semiconductor die provide CPU and GPU functionality.8. The hybrid computing module of , wherein ...

Подробнее
02-02-2017 дата публикации

APPARATUS WITH REDUCED HARDWARE REGISTER SET

Номер: US20170031685A1
Автор: CRASKE Simon John
Принадлежит:

An apparatus comprises processing circuitry for processing program instructions according to a predetermined architecture defining a number of architectural registers accessible in response to the program instructions. A set of hardware registers is provided in hardware. A storage capacity of the set of hardware registers is insufficient for storing all the data associated with the architectural registers of the pre-determined architecture. Control circuitry is responsive to the program instructions to transfer data between the hardware registers and at least one register emulating memory location in memory for storing data corresponding to the architectural registers of the architecture. 1. An apparatus comprising:processing circuitry to process program instructions in accordance with a predetermined architecture defining a plurality of architectural registers accessible in response to the program instructions; anda set of hardware registers, wherein a storage capacity of the set of hardware registers is insufficient for storing data associated with all of the plurality of architectural registers of the predetermined architecture; andcontrol circuitry responsive to the program instructions to transfer data between the set of hardware registers and at least one register emulating memory location in memory for storing data corresponding to at least one of the plurality of architectural registers of the predetermined architecture.2. The apparatus according to claim 1 , wherein in response to a program instruction specifying at least one source architectural register for storing at least one operand value to be processed in response to the program instruction claim 1 , the control circuitry is configured to trigger a read operation to read the at least one operand value from a register emulating memory location corresponding to said at least one source architectural register and to store the at least one operand value in at least one register of said set of hardware ...

Подробнее
02-02-2017 дата публикации

HYBRID COMPUTING MODULE

Номер: US20170031843A1
Принадлежит:

A hybrid system-on-chip provides a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fully integrated power management system that switches DC power at speeds that match or approach processor core clock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die. 1. A general purpose computational operating system that comprises a hybrid computer module , which further includes: a microprocessor die mounted on the chip carrier;', 'a memory bank consisting of at least one discrete memory die mounted on the semiconductor chip carrier adjacent to the microprocessor die;', 'a fully integrated power management module having a resonant gate transistor embedded within it that synchronously transfers data from main memory to the microprocessor at the processor clock speed;', 'a memory management architecture and operating system that compiles program stacks as a collection of pointers to the addresses where elemental code blocks are stored in main memory;', 'a memory controller that sequentially references the pointers stored within the program stacks and fetches a copy of the program stack item referenced by the pointer from main memory and loads the copy into a microprocessor die;', 'an interrupt bus that halts the loading process when an alert to a program jump or change to a global variable is registered and sends a memory management variable to a look-up table;', 'a look-up table that redirects the controller to a new program stack following a program jump before it reinitiates the loading process;', 'a look-up table that fetches and stores the change to a global variable at its primary location in main memory before it reinitiates the loading process:, 'a semiconductor chip carrier having electrical traces and passive component networks monolithically formed on the surface of the carrier substrate to maintain and manage electrical signal communications betweenwherein program ...

Подробнее
02-02-2017 дата публикации

HYBRID COMPUTING MODULE

Номер: US20170031844A1
Принадлежит:

A hybrid system-on-chip provides a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fay integrated power management system that switches DC power at speeds that match or approach processor core clock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die. 1. A general purpose stack machine computing module having an operating system that comprises;a hybrid computer module, which includes: an application-specific integrated circuit (ASIC) processor die mounted on the chip carrier that is designed with machine code that matches and supports a structured programming language so it functions as the general purpose stack machine processor;', 'a main memory bank consisting of at least one discrete memory die mounted on the semiconductor chip carrier adjacent to the ASIC processor die;', 'a fully integrated power management module having a resonant gate transistor embedded within it that synchronously transfers data from main memory to the ASIC processor die at the processor clock speed;', 'a memory management architecture and operating system that compiles program stacks as a collection of pointers to the addresses where elemental code blocks are stored at a primary location in main memory;', 'a memory controller that sequentially references the pointers stored within the program stacks and fetches a copy of the item referenced by the pointer in the program stack from main memory and loads the copy into a microprocessor die;', 'an interrupt bus that halts the loading process when an alert to a program jump or change to a global variable is registered and sends a memory management variable to a look-up table;', 'a look-up table that redirects the controller to a new program stack following a program jump before it reinitiates the loading process;', 'a look-up table that fetches and stores the change to a global variable at its primary location in main memory before it ...

Подробнее
04-02-2016 дата публикации

Picoengine having a hash generator with remainder input s-box nonlinearizing

Номер: US20160034278A1
Автор: Gavin J. Stark
Принадлежит: Netronome Systems Inc

A processor includes a hash register and a hash generating circuit. The hash generating circuit includes a novel programmable nonlinearizing function circuit as well as a modulo-2 multiplier, a first modulo-2 summer, a modulor-2 divider, and a second modulo-2 summer. The nonlinearizing function circuit receives a hash value from the hash register and performs a programmable nonlinearizing function, thereby generating a modified version of the hash value. In one example, the nonlinearizing function circuit includes a plurality of separately enableable S-box circuits. The multiplier multiplies the input data by a programmable multiplier value, thereby generating a product value. The first summer sums a first portion of the product value with the modified hash value. The divider divides the resulting sum by a fixed divisor value, thereby generating a remainder value. The second summer sums the remainder value and the second portion of the input data, thereby generating a hash result.

Подробнее
01-02-2018 дата публикации

OUT-OF-ORDER BLOCK-BASED PROCESSOR

Номер: US20180032344A1
Принадлежит: Microsoft Technology Licensing, LLC

Technology related to out-of-order processor architectures is disclosed. In one example of the disclosed technology, a processor includes decode logic and issue logic. The decode logic is configured to decode a store mask of an instruction block. The instruction block can include load and store instructions. Each load and store instruction includes an identifier specifying a relative program order of the load or store instruction within the instruction block. The store mask identifies positions of the store instructions within the program order of the instruction block. The issue logic is configured to issue at least one of the instructions of the instruction block out of program order. The issue logic can be configured to use the decoded store mask to only issue load instructions after all store instructions preceding the load instructions have issued. 1. A processor comprising:decode logic configured to decode a store mask of an instruction block, the instruction block comprising load and store instructions, each load and store instruction including an identifier specifying a relative program order of the instruction within the instruction block, the store mask identifying positions of the store instructions within the program order of the instruction block; andissue logic configured to issue at least one instruction of the instruction block out of program order, and to use the decoded store mask to only issue load instructions after all store instructions preceding the load instructions have issued.2. The processor of claim 1 , wherein the issue logic is further configured to issue each store instruction in program order relative to other store instructions of the instruction block.3. The processor of claim 1 , wherein the issue logic is further configured to issue all load and store instructions of the instruction block in the sequential program order specified by the identifiers of the load and store instructions.4. The processor of claim 1 , wherein the decode ...

Подробнее
17-02-2022 дата публикации

HANDLING AND FUSING LOAD INSTRUCTIONS IN A PROCESSOR

Номер: US20220050679A1
Принадлежит:

A system, processor, and/or technique configured to: determine whether two or more load instructions are fusible for execution in a load store unit as a fused load instruction; in response to determining that two or more load instructions are fusible, transmit information to process the two or more fusible load instructions into a single entry of an issue queue; issue the information to process the two or more fusible load instructions from the single entry in the issue queue as a fused load instruction to the load store unit using a single issue port of the issue queue, wherein the fused load instruction contains the information to process the two or more fusible load instructions; execute the fused load instruction in the load store unit; and write back data obtained by executing the fused load instruction simultaneously to multiple entries in the register file. 1. A computer system for processing information , the computer system comprising:at least one processor having circuitry and logic to process instructions, the processor comprising:an instruction fetch unit having circuitry and logic to process instructions, the instruction fetch unit configured to fetch instructions;an instruction issue unit having circuitry and logic to process instructions, the instruction issue unit having an issue queue having a plurality of entries to hold the instructions and a plurality of issue ports to issue the instructions held in one or more of the plurality of issue queue entries;one or more execution units having circuitry and logic to process instructions, the one or more execution units including a load store unit to process one or more load and store instructions; anda register file to hold data for processing by the processor, the register file having a plurality of entries to hold the data,wherein the processor is configured to:determine whether two or more load instructions are fusible for execution in the load store unit as a fused load instruction;in response to ...

Подробнее
31-01-2019 дата публикации

PARALLEL PROCESSING OF FETCH BLOCKS OF DATA

Номер: US20190034205A1
Принадлежит:

A data processing system comprises fetch circuitry to fetch data as a sequence of blocks of data from a memory. Processing circuitry comprising a plurality of processing pipelines performs at least partially temporally overlapping processing by at least two processes so as to produce respective results for the combined sequence of blocks, i.e. the processing of the data is performed on a block-by-block process at least partially in parallel by the two processing pipelines. The processes performed may comprise a cryptographic hash processing operation performing verification of the data file and a AES MAC process serving to re-signature the data file. 1. Apparatus for processing data , comprising:fetch circuitry to fetch data as a sequence of blocks of data; andprocessing circuitry to subject a fetched block of data from among said sequence to at least partially temporally overlapping processing by at least two processes, and, for each of said at least two processes, to generate a result of said processing for said sequence.2. Apparatus as claimed in claim 1 , wherein said processing circuitry comprises at least two processing pipelines to subject said fetched data block to parallel processing to perform respective ones of said at least two processes.3. Apparatus as claimed in claim 2 , wherein said processing circuitry comprises synchronization circuitry to pause advancement to process a next fetched block of data by at least one of said at least two processing pipelines that completes processing of said fetched data block while another of said at least two processing pipelines continues to process said fetched data block.4. Apparatus as claimed in claim 1 , wherein said processing circuitry comprises at least two general purpose processors executing respective streams of program instructions to subject said fetched data block to parallel processing to perform respective ones of said plurality of processes.5. Apparatus as claimed in claim 4 , wherein at least one of ...

Подробнее
30-01-2020 дата публикации

Super-Thread Processor

Номер: US20200034153A1
Автор: Halle Kevin Sean
Принадлежит:

The disclosed inventions include a processor apparatus and method that enable a general purpose processor to achieve twice the operating frequency of typical processor implementations with a modest increase in area and a modest increase in energy per operation. The invention relies upon exploiting multiple independent streams of execution. Low area and low energy memory arrays used for register files operate a modest frequency. Instructions can be issued at a rate higher than this frequency by including logic that guarantees the spacing between instructions from the same thread are spaced wider than the time to access the register file. The result of the invention is the ability to overlap long latency structures, which allows using lower energy structures, thereby reducing energy per operation. 1. A processor system comprising:a pipeline including an execution unit;an instruction cache; each of the plurality of rows being assigned to an independent thread,', 'each of the plurality of rows including at least one program counter,', 'each of the plurality of rows including storage configured to store one or more instructions,', 'each of the plurality of rows including logic configured to fetch the one or more instructions from the instruction cache, and', 'each of the plurality of rows including logic configured to determine when an instruction is ready to be issued to the pipeline from the respective row; and, 'a context unit including a plurality of rows,'}issue logic configured to select a row from among the plurality of rows and to issue an instruction from the selected row to the pipeline.2. The system of claim 1 , wherein the logic configured to determine when an instruction is ready is responsive to a history of past actions in the respective row.3. The system of claim 1 , wherein the logic configured to determine when an instruction is ready is responsive to a history of past actions in the respective row.4. The system of claim 1 , wherein each of the ...

Подробнее
04-02-2021 дата публикации

PROCESSING UNIT, PROCESSOR, PROCESSING SYSTEM, ELECTRONIC DEVICE AND PROCESSING METHOD

Номер: US20210034364A1
Автор: LI Yudong
Принадлежит:

The present application discloses a method, electronic device, processing unit, processing system, and system for processing operations. The method includes reading instruction information from an instruction tightly-coupled memory, reading data information from a data tightly-coupled memory, and executing one or more operations corresponding to one or more instructions, the one or more instructions being executed based at least in part on the instruction information and the data information. 1. A processing unit , comprising:an instruction tightly-coupled memory that is configured to store instruction information and not data information;a data tightly-coupled memory that is configured to store data information and not instruction information; anda processor core, the processor core being configured to execute one or more instructions, wherein in connection with executing the one or more instructions, the processor core reads instruction information from the instruction tightly-coupled memory, and reads data information from the data tightly-coupled memory.2. The processing unit of claim 1 , wherein the instruction tightly-coupled memory stores only the instruction information claim 1 , and the data tightly-coupled memory stores only the data information.3. The processing unit of claim 2 , wherein the instruction information indicates one or more operations to be executed claim 2 , and the data information indicates one or more operands corresponding to the one or more operations indicated by the instruction information.4. The processing unit of claim 1 , wherein a capacity of the instruction tightly-coupled memory is less than a capacity of the data tightly-coupled memory.5. The processing unit of claim 4 , wherein the capacity of the instruction tightly-coupled memory is 64 kb claim 4 , and the capacity of the data tightly-coupled memory is 128 kb.6. The processing unit of claim 1 , wherein:the instruction tightly-coupled memory stores instruction information of ...

Подробнее
04-02-2021 дата публикации

CONTROLLER ADDRESS CONTENTION ASSUMPTION

Номер: US20210034438A1
Принадлежит:

Embodiments of the present invention are directed to a computer-implemented method for controller address contention assumption. A non-limiting example computer-implemented method includes a shared controller receiving a fetch request for data from a first requesting agent, the receiving via at least one intermediary controller. The shared controller performs an address compare using a memory address of the data. In response to the memory address matching a memory address stored in the shared controller, the shared controller acknowledges the at least one intermediary controller's fetch request, wherein upon acknowledgement, the at least one intermediary controller resets. In response to release of the data by a second requesting agent, the shared controller transmits the data to the first requesting agent. 1. A computer-implemented method comprising:receiving, by a shared controller, a fetch request for data from a first requesting agent, the receiving via at least one intermediary controller;performing, by the shared controller, an address compare using a memory address of the data;in response to the memory address matching a memory address stored in the shared controller, acknowledging, by the shared controller, the at least one intermediary controller's fetch request, wherein upon acknowledgement, the at least one intermediary controller resets; andin response to release of the data by a second requesting agent, transmitting, by the shared controller, the data to the first requesting agent.2. The computer-implemented method of claim 1 , wherein the acknowledging comprises:exchanging tokens by the shared controller and the at least one intermediary controller, wherein the at least one intermediary controller transmits an identity of the first requesting agent and a type of operation associated with the requested data, and wherein the shared controller transmits an acceptance.3. The computer-implemented method of claim 2 , the at least one intermediary controller ...

Подробнее
04-02-2021 дата публикации

Network Interface Device

Номер: US20210034526A1
Принадлежит: Solarflare Communications, Inc.

A network interface device comprises a programmable interface configured to provide a device interface with at least one bus between the network interface device and a host device. The programmable interface is programmable to support a plurality of different types of a device interface. 1. A network interface device comprising:a programmable interface configured to provide a device interface with at least one bus between the network interface device and a host device, the programmable interface being programmable to support a plurality of different types of a device interface.2. The network interface device as claimed in claim 1 , wherein said programmable interface is configured to support at least two instances of device interfaces at the same time.3. The network interface device as claimed in claim 2 , wherein the programmable interface comprises a common descriptor cache claim 2 , said common descriptor cache configured to store respective entries for transactions for the plurality of device interface instances.4. The network interface device as claimed in claim 3 , wherein an entry in said common descriptor cache comprises one or more of:pointer information;adapter instance and/or opaque endpoint index; ormetadata.5. The network interface device as claimed in claim 4 , wherein said metadata comprises one or more of:an indication if said pointer is a pointer, is a pointer to a data location or to a further pointer;a size associated with at least a part of said entry;an indication of an adaptor associated with said entry;an indication of one or more queues; anda location in one or more queues.6. The network interface device as claimed in claim 3 , wherein the common descriptor cache is at least partly partitioned with different partitions being associated with different device interface instances.7. The network interface device as claimed in claim 3 , wherein the common descriptor cache is shared between different device interface instances.8. The network ...

Подробнее
08-02-2018 дата публикации

Selective suppression of instruction translation lookaside buffer (itlb) access

Номер: US20180039499A1
Принадлежит: International Business Machines Corp

Processing of an instruction fetch from an instruction cache is provided, which includes: determining whether the next instruction fetch is from a same address page as a last instruction fetch from the instruction cache; and based, at least in part, on determining that the next instruction fetch is from the same address page, suppressing for the next instruction fetch an instruction address translation table access, and comparing for an address match results of an instruction directory access for the next instruction fetch with buffered results of a most-recent, instruction address translation table access for a prior instruction fetch from the instruction cache.

Подробнее
08-02-2018 дата публикации

QUALITY OF LOCATION-RELATED CONTENT

Номер: US20180039855A1
Принадлежит:

Briefly, example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to facilitate and/or support one or more operations and/or techniques for improving quality of location-related content, such as truncated location-related content, for example, while preserving and/or maintaining user privacy. 1. A method of executing computer instructions on at least one computing device in which the at least one computing device includes at least one processor and at least one memory , comprising:fetching computer instructions from the at least one memory of the at least one computing device for execution on the at least one processor of the at least one computing device;executing the fetched computer instructions on the at least one processor of the at least one computing device; andstoring in the at least one memory of the at least one computing device any results of having executed the fetched computer instructions on the at least one processor of the at least one computing device;wherein the computer instructions to be executed comprise instructions for improving quality of location-related content; converting location signals to image signal samples; and', 'performing image processing on the converted image signal samples; and, 'wherein the executing the fetched computer instructions further compriseswherein the storing in the at least one memory of the at least one computing device any results of having executed the fetched computer instructions on the at least one processor of the at least one computing device comprises: storing image processing results details in the at least one memory of the at least one computing device, the image processing results details resulting from the execution of the computer simulation on the at least one processor of the at least one computing device.2. The method of claim 1 , wherein the converting the location signals to the image signal ...

Подробнее
24-02-2022 дата публикации

PREFETCH MANAGEMENT IN A HIERARCHICAL CACHE SYSTEM

Номер: US20220058127A1
Принадлежит:

An apparatus includes a CPU core, a first memory cache with a first line size, and a second memory cache having a second line size larger than the first line size. Each line of the second memory cache includes an upper half and a lower half. A memory controller subsystem is coupled to the CPU core and to the first and second memory caches. Upon a miss in the first memory cache for a first target address, the memory controller subsystem determines that the first target address resulting in the miss maps to the lower half of a line in the second memory cache, retrieves the entire line from the second memory cache, and returns the entire line from the second memory cache to the first memory cache. 1. A device comprising:a first memory cache that includes a first set of memory lines;a second memory cache that includes a second set of memory lines, wherein each line of the second set of memory lines includes a first portion having a size equal to a line size of the first set of memory lines and a second portion having a size equal to the line size of the first set of memory lines; and{'claim-text': ['receive an access request for a set of data units directed to the first memory cache; and', {'claim-text': ['determine whether the first data unit misses in the first memory cache; and', {'claim-text': ['determine that the first data unit is associated with the first portion of a first line of the second set of memory lines of the second memory cache; and', 'determine, based on an order of the first data unit in the set of data units, whether to request data stored in the second portion of the first line of the second set of memory lines of the second memory cache.'], '#text': 'based on the first data unit missing in the first memory cache:'}], '#text': 'for a first data unit of the set of data units:'}], '#text': 'a memory controller coupled to the first memory cache and the second memory cache, wherein the memory controller is configured to:'}2. The device of claim 1 , ...

Подробнее
07-02-2019 дата публикации

SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSPOSE RECTANGULAR TILES

Номер: US20190042202A1
Принадлежит:

Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix. 1. A processor comprising:fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix;decode circuitry to decode the fetched rectangular matrix transpose instruction; andexecution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.2. The processor of claim 1 , wherein the specified first and second destination matrices together define a square matrix of elements.3. The processor of claim 1 , wherein the specified first and second destination matrices are disposed at contiguous locations.4. The processor of claim 3 , ...

Подробнее
07-02-2019 дата публикации

MATRIX MULTIPLICATION ACCELERATION OF SPARSE MATRICES USING COLUMN FOLDING AND SQUEEZING

Номер: US20190042237A1
Принадлежит:

Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the element and each corresponding non-zero element at row K, column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the specified output matrix. 1. A processor to execute a sparse matrix multiplication (SMM) instruction comprising:fetch and decode circuitry to fetch and decode the SMM instruction having fields to specify locations of first, second, and output matrices, the specified second matrix being a sparse matrix, the fetch circuitry further to fetch and store elements of the specified first and second matrices from their specified locations into a register file; andexecution circuitry, responsive to the decoded SMM instruction, to pack the second matrix stored in the register file by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the non-zero element and each corresponding non-zero element at row K and column N of the packed second matrix, and accumulate each generated product with a previous ...

Подробнее
07-02-2019 дата публикации

INSTRUCTIONS FOR FUSED MULTIPLY-ADD OPERATIONS WITH VARIABLE PRECISION INPUT OPERANDS

Номер: US20190042242A1
Принадлежит:

Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits. 1. A processor to execute an asymmetric fused multiply-add (FMA) instruction , comprising:fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively;decode circuitry to decode the fetched FMA instruction; anda single instruction multiple data (SIMD) execution circuit to execute the decoded FMA instruction to process as many elements of the second source vector as fit into a SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination;wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.2. The processor of claim 1 , wherein the SIMD execution circuit processes the as many elements concurrently.3. The processor of claim 1 , wherein the SIMD execution circuit processes the as many elements in a single clock cycle.4. The ...

Подробнее
07-02-2019 дата публикации

METHOD AND APPARATUS FOR EFFICIENT MATRIX TRANSPOSE

Номер: US20190042248A1
Принадлежит:

Disclosed embodiments relate to a method and apparatus for efficient matrix transpose. In one example, a processor to execute a matrix transpose instruction includes fetch circuitry to fetch the matrix transpose instruction specifying a destination matrix and a source matrix having (N×M) elements and (M×N) elements, respectively, a (N×M) load buffer, decode circuitry to decode the fetched matrix transpose instruction, and execution circuitry, responsive to the decoded matrix transpose instruction to, for each row X of M rows of the specified source matrix: fetch and buffer N elements of the row in a load register, and cause the N buffered elements to be written, in the same relative order as in the row, to column X of M columns of the load buffer, and the execution circuitry subsequently to write each of N rows of the load buffer to a same row of the load buffer. 1. A processor to execute a matrix transpose instruction , the processor comprising:fetch circuitry to fetch the matrix transpose instruction specifying a destination matrix and a source matrix having (N×M) elements and (M×N) elements, respectively;a (N×M) load buffer;decode circuitry to decode the fetched matrix transpose instruction; and fetch and buffer N elements of the row in a load register; and', 'cause the N buffered elements to be written, in the same relative order as in the row, to column X of M columns of the load buffer; and, 'execution circuitry, responsive to the decoded matrix transpose instruction to, for each row X of M rows of the specified source matrixthe execution circuitry subsequently to write each of N rows of the load buffer to a same row of the specified destination matrix.2. The processor of claim 1 , wherein the load buffer comprises a (N×M) matrix of registers within a reorder buffer of the processor claim 1 , and wherein the execution circuitry is to:generate an intermediate transposed result by causing each of the M buffered rows to be written to a corresponding column M of ...

Подробнее
07-02-2019 дата публикации

Systems and methods for performing matrix compress and decompress instructions

Номер: US20190042257A1
Принадлежит: Intel Corp

Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Подробнее
07-02-2019 дата публикации

Systems and methods for performing instructions specifying ternary tile logic operations

Номер: US20190042260A1
Принадлежит: Intel Corp

Disclosed embodiments relate to systems and methods for performing instructions specifying ternary tile operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction specifying a ternary tile operation, and locations of destination and first, second, and third source matrices, each of the matrices having M rows by N columns; and execution circuitry to respond to the decoded instruction by, for each equal-sized group of K elements of the specified first, second, and third source matrices, generate K results by performing the ternary tile operation in parallel on K corresponding elements of the specified first, second, and third source matrices, and store each of the K results to a corresponding element of the specified destination matrix, wherein corresponding elements of the specified source and destination matrices occupy a same relative position within their associated matrix.

Подробнее
07-02-2019 дата публикации

SYSTEMS AND METHODS FOR PERFORMING HORIZONTAL TILE OPERATIONS

Номер: US20190042261A1
Принадлежит:

Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations. 1. A processor comprising:fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises an equal number of elements;decode circuitry to decode the fetched instruction; andexecution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups of elements and writing each generated result to a corresponding location of the K specified destination locations.2. The processor of claim 1 , wherein the horizontal tile operation is one of add claim 1 , add-squares claim 1 , multiply claim 1 , maximum claim 1 , minimum claim 1 , logical AND claim 1 , logical OR claim 1 , and logical XOR.3. The processor of claim 1 , wherein N equals one and the specified by N is a packed-data vector comprising M elements.4. The processor of claim 1 , wherein either K equals M and each of the K groups of elements is a row of the specified M by N source matrix claim 1 , or K ...

Подробнее
07-02-2019 дата публикации

PAUSE COMMUNICATION FROM I/O DEVICES SUPPORTING PAGE FAULTS

Номер: US20190042461A1
Принадлежит:

A processing device includes a core to execute instructions, and memory management circuitry coupled to, memory, the core and an I/O device that supports page faults. The memory management circuitry includes an express invalidations circuitry, and a page translation permission circuitry. The memory management circuitry is to, while the core is executing the instructions, receive a command to pause communication between the I/O device and the memory. In response to receiving the command to pause the communication, modify permissions of page translations by the page translation permission circuitry and transmit an invalidation request, by the express invalidations circuitry to the I/O device, to cause cached page translations in the I/O device to be invalidated. 1. A processing device comprising:a core to execute instructions; and an express invalidations circuitry; and', 'a page translation permission circuitry, wherein the memory management circuitry is to:', receive a command to pause communication between the I/O device and the memory; and', modify permissions of page translation responses by the page translation permission circuitry; and', 'transmit an invalidation request, by the express invalidations circuitry to the I/O device, to cause cached page translations in the I/O device to be invalidated., 'in response to receiving the command to pause the communication], 'while the core is executing the instructions], 'memory management circuitry coupled to, memory, the core and an I/O device that supports page faults, the memory management circuitry comprising2. The processing device of claim 1 , wherein the memory management circuitry is further to:transmit the page translations comprising the modified permissions to the I/O device.3. The processing device of claim 1 , wherein the memory management circuitry is further to:forgo transmitting a response to a page fault request from the I/O device.4. The processing device of claim 1 , wherein the memory management ...

Подробнее
07-02-2019 дата публикации

Superimposing butterfly network controls for pattern combinations

Номер: US20190042517A1
Принадлежит: Texas Instruments Inc

A multilayer butterfly network is shown that is operable to transform and align a plurality of fields from an input to an output data stream. Many transformations are possible with such a network which may include separate control of each multiplexer. This invention supports a limited set of multiplexer control signals, which enables a similarly limited set of data transformations. This limited capability is offset by the reduced complexity of the multiplexor control circuits. This invention used precalculated inputs and simple combinatorial logic to generate control signals for the butterfly network. Controls are independent for each layer and therefore are dependent only on the input and output patterns. Controls for the layers can be calculated in parallel.

Подробнее
07-02-2019 дата публикации

APPARATUS AND METHOD FOR ARBITRARY QUBIT ROTATION

Номер: US20190042973A1
Принадлежит:

Apparatus and method for arbitrary qubit rotation. For example, one embodiment of a processor comprises: a decoder to decode a quantum rotation instruction specifying an arbitrary rotation value for performing a rotation of a quantum bit (qubit); a storage to store data for a plurality of waveform shapes/pulses; execution circuitry to perform the rotation of the qubit, the execution circuitry to combine a subset of the plurality of waveform shapes/pulses to approximate the arbitrary rotation value; and a classical-quantum (C-Q) interface coupled to the execution circuitry and comprising digital-to-analog circuitry to generate analog signals to rotate the qubit based on the approximation of the rotation value. 1. A processor comprising:a decoder to decode a quantum rotation instruction specifying an arbitrary rotation value for performing a rotation of a quantum bit (qubit);a storage to store data for a plurality of waveform shapes/pulses;execution circuitry to perform the rotation of the qubit, the execution circuitry to combine a subset of the plurality of waveform shapes/pulses to approximate the arbitrary rotation value; anda classical-quantum (C-Q) interface coupled to the execution circuitry and comprising digital-to-analog circuitry to generate analog signals to rotate the qubit based on the approximation of the rotation value.2. The processor of wherein the plurality of waveforms shapes/pulses comprise N waveforms shapes/pulses comprising values π claim 1 , π/2 claim 1 , π/4 claim 1 , π/8 claim 1 , π/16 . . . π/2.3. The processor of wherein the execution circuitry is to perform a binary search operation to combine different subsets of the plurality of waveform shapes/pulses to identify a combination which results in an approximation closest to the arbitrary rotation value.4. The processor of further comprising:a first source register to store a first value uniquely identifying the qubit, the quantum rotation instruction having a first operand to identify the ...

Подробнее
18-02-2021 дата публикации

INSTRUCTION SELECTION MECHANISM WITH CLASS-DEPENDENT AGE-ARRAY

Номер: US20210049018A1
Автор: SILBERMAN JOEL A.
Принадлежит:

Methods and systems for implementing an instruction selection mechanism with class-dependent age-array are described. In an example, a system can include a processor that may sequence instructions. The system can further include a memory operatively coupled to the processor. The system can further include an array allocated on the memory. The array can be operable to store instruction age designations associated with a plurality of instructions sequenced by the processor. The array can be further operable to store the instruction age designations based on instruction classes. The processor can be operable to fetch an instruction from the memory. The processor can be operable to dispatch the instruction to a queue. The processor can be operable to store the instruction age designations associated with the instruction, in the array, based on an instruction class of the instruction. 1. An apparatus comprising:a processor operable to at least sequence instructions;a memory operatively coupled to the processor; andan array allocated on the memory and operable to store instruction age designations associated with a plurality of instructions sequenced by the processor, the array operable to store the instruction age designations based on instruction classes, wherein an instruction age designation of an associated instruction indicates an instruction age of the associated instruction relative to other instructions in a same instruction class.2. The apparatus of claim 1 , wherein the array is allocated in a queue operable to store the plurality of instructions and issue the plurality of instructions to execution units.3. The apparatus of claim 1 , wherein the array is an n×n array of rows and columns corresponding to n instructions claim 1 , and an i-th row and i-th column are associated with an i-th instruction.4. The apparatus of claim 3 , wherein the array is allocated in a queue operable to store the plurality of instructions and issue the plurality of instructions to ...

Подробнее
18-02-2021 дата публикации

Forced Self Authentication

Номер: US20210049258A1
Автор: Kirschner Yuval
Принадлежит:

A computer system includes a memory, a processor and authentication enforcement hardware. The processor is configured to execute software, including an authentication program that authenticates data stored in the memory. The authentication enforcement hardware is coupled to the processor and is configured to verify that (i) the processor executes the authentication program periodically with at least a specified frequency, and that (ii) the authentication program successfully authenticates the data. 1. A computer system , comprising:a memory;a processor, configured to execute software, iecludi an authentication program that authenticates data stored in the memory; andauthentication enforcement hardware, which is coupled to the processor and is configured to verify that (i) the processor executes the authentication program periodically with at least a specified frequency, and that (ii) the authentication program successfully authenticates the data.2. The computer system according to . wherein the authentication enforcement hardware is configured to initiate a responsive action when the processor fails to execute the authentication program with at least the specified frequency.3. The computer system according to claim 1 , wherein the authentication enforcement hardware is configured to initiate a responsive action when the authentication program fails to authenticate the data.4. The computer system according to claim 1 , wherein the authentication program instructs the processor to assert a signal upon successfully authenticating the data claim 1 , and wherein the authentication enforcement hardware comprises a timer configured to verify that the signal is asserted with at least the specified frequency.5. The computer system according to claim 1 , wherein the processor is configured to execute the authentication program from a Read-Only Memory (ROM) claim 1 , and wherein the authentication enforcement hardware is configured to decide that a given run of the ...

Подробнее
16-02-2017 дата публикации

PROCESSOR INSTRUCTION SEQUENCE TRANSLATION

Номер: US20170046157A1
Принадлежит:

Computer readable medium and apparatus for translating a sequence of instructions is disclosed herein. In one embodiment, an operation includes recognizing a candidate multi-instruction sequence, determining that the multi-instruction sequence corresponds to a single instruction, and executing the multi-instruction sequence by executing the single instruction. 17.-. (canceled)8. A system , comprising:a memory storing a plurality of program instructions; and retrieve a multi-instruction sequence from the plurality of program instructions;', 'responsive to determining that operations of the multi-instruction sequence is functionally equivalent to to operation of a first instruction, replace the multi-instruction sequence with the first instruction; and', 'execute the multi-instruction sequence by executing the first instruction., 'a processor comprising a translator module configured to9. The system of claim 8 , wherein retrieving the multi-instruction sequence comprises:fetching a first block of instructions of the multi-instruction sequence from a cache; andscanning the first block of instructions to determine whether the first block of instructions is recognized.10. The system of claim 8 , wherein responsive to determining that operations of the multi-instruction sequence is functionally equivalent to operation of the first instruction claim 8 , determining whether the first block of instructions can be fully contained in a single line of a cache.11. The system of claim 10 , wherein responsive to determining that the first block of instructions can be fully contained in a single line of the cache claim 10 , replacing the first block of instructions with a second block of instructions in the cache.12. The system of claim 11 , wherein the second block of instructions is a single instruction corresponding to the multi-instruction sequence.13. The system of claim 11 , wherein executing the multi-instruction sequence by executing the single instruction comprises: ...

Подробнее
16-02-2017 дата публикации

DETERMINING PREFETCH INSTRUCTIONS BASED ON INSTRUCTION ENCODING

Номер: US20170046158A1
Принадлежит:

Systems and methods for identifying candidate load instructions for prefetch operations based on at least instruction encoding of the load instructions, include an identifier based on a function of at least one or more fields of a load instruction and optionally, a subset of bits of the PC value of the load instruction, wherein the one or more fields exclude a full address or program counter (PC) value of the load instruction. Prefetch mechanisms, including a prefetch table indexed by the identifier, can determine whether the load instruction is a candidate load instruction for prefetching load data, based on the identifier. The function may be a hash, a concatenation, or a combination thereof, of one or more bits of the one or more fields. The fields include one or more of a base register, a destination register, an immediate offset, an offset register, or other bits of instruction encoding of the load instruction. 1. A method of data prefetching , the method comprising:forming an identifier based on a function of at least one or more fields of a load instruction, wherein the one or more fields exclude a full address or program counter (PC) value of the load instruction; anddetermining whether the load instruction is a candidate load instruction for prefetching load data, based on the identifier.2. The method of claim 1 , wherein the function is a hash claim 1 , a concatenation claim 1 , or a combination thereof claim 1 , of one or more bits of the at least one or more fields.3. The method of claim 1 , wherein the one or more fields comprise one or more of a base register claim 1 , a destination register claim 1 , an immediate offset claim 1 , an offset register claim 1 , or other bits of instruction encoding of the load instruction.4. The method of claim 1 , further comprising calculating one or more addresses for prefetching load data based on prefetch information stored in a prefetch table if the load instruction is determined to be a candidate load instruction ...

Подробнее
15-02-2018 дата публикации

SYSTEM AND METHOD FOR LOAD AND STORE QUEUE ALLOCATIONS AT ADDRESS GENERATION TIME

Номер: US20180046463A1
Автор: KING JOHN M.
Принадлежит: Advanced Micro Devices, Inc.

A system and method for load queue (LDQ) and store queue (STQ) entry allocations at address generation time that maintains age-order of instructions is described. In particular, writing LDQ and STQ entries are delayed until address generation time. This allows the load and store operations to dispatch, and younger operations (which may not be store and load operations) to also dispatch and execute their instructions. The address generation of the load or store operation is held at an address generation scheduler queue (AGSQ) until a load or store queue entry is available for the operation. The tracking of load queue entries or store queue entries is effectively being done in the AGSQ instead of at the decode engine. The LDQ and STQ depth is not visible from a decode engine's perspective, and increases the effective processing and queue depth. 1. A method for processing micro-operations , the method comprising:fetching micro-operations;dispatching the micro-operations to an age-ordered scheduler queue, wherein the age-ordered scheduler queue holds a dispatch payload associated with each micro-operation;performing address generation for a micro-operation on a condition that an associated queue entry in a load/store queue is available and source information needed for the micro-operation is ready;reading the dispatch payload of the micro-operation; andsending the dispatch payload of the micro-operation to the load/store queue.2. The method of claim 1 , further comprising:associating, at dispatch time, each micro-operation with a queue entry in the load/store queue in program order to maintain age-order.3. The method of claim 1 , further comprising:updating an oldest uncommitted micro-operation queue entry number based on input from the load/store queue.4. The method of claim 3 , further comprising:comparing, at the age-ordered scheduler queue, the oldest uncommitted micro-operation queue entry to queue entries for each of the micro-operations to determine if the micro- ...

Подробнее
03-03-2022 дата публикации

ACCELERATING PROCESSOR BASED ARTIFICIAL NEURAL NETWORK COMPUTATION

Номер: US20220066776A1
Принадлежит:

An apparatus employed in a processing device comprises a processor configured to process data of a predefined data structure. A memory fetch device is coupled to the processor and is configured to determine addresses of the packed data for the processor. The packed data is stored on a memory device that is coupled to the processor. The memory fetch device is further configured to provide output data based on the addresses of the packed data to the processor, where the output data is configured according to the predefine data structure.

Подробнее
03-03-2022 дата публикации

METHODS AND APPARATUSES FOR COALESCING FUNCTION CALLS FOR RAY-TRACING

Номер: US20220066819A1
Принадлежит:

Methods and systems for executing threads in a thread-group, for example for ray-tracing. The threads are processed to collect, for each thread, a respective set of function call indicators over a respective number of call instances. The function call indicators are reordered across all threads and all call instances, to coalesce identical function call indicators to a common call instance, and non-identical function call indicators are reordered to different call instances. Function calls are executed across the threads of the thread-group, according to the reordered and coalesced function call indicators. In ray-tracing applications, the threads represent rays, each call instance is a ray-hit of a ray, and each function call is a shader call. 1. A method for thread execution , comprising:processing a set of threads belonging to a thread-group to collect, for each thread, a respective set of function call indicators over a respective number of call instances;reordering the function call indicators across all threads and all call instances, to coalesce identical function call indicators to a common call instance, and wherein non-identical function call indicators are reordered to different call instances; andexecuting function calls across the threads of the thread-group, according to the reordered and coalesced function call indicators.2. The method of claim 1 , wherein executing function calls comprises:prefetching instructions for a function call indicated by a function call indicator corresponding to a subsequent call instance, concurrent with execution of a function call indicated by a function call indicator corresponding to a current call instance.3. The method of claim 2 , wherein reordering the function call indicators comprises:reordering non-identical function call indicators, which indicate memory addresses close to each other, to be assigned to consecutive call instances.5. The method of claim 1 , wherein the function call indicators are function call ...

Подробнее
03-03-2022 дата публикации

APPARATUS AND METHOD WITH NEURAL NETWORK OPERATION

Номер: US20220067498A1
Автор: PARK Seongwook
Принадлежит: SAMSUNG ELECTRONICS CO., LTD.

A neural network operation apparatus includes: a buffer configured to store data for a neural network operation; a processor configured to change a fetching order of the data based on an observation range for fetching the data and a size of the buffer; and a first multiplexer configured to multiplex at least a portion of the data having the changed fetching order.

Подробнее
14-02-2019 дата публикации

EFFICIENT MITIGATION OF SIDE-CHANNEL BASED ATTACKS AGAINST SPECULATIVE EXECUTION PROCESSING ARCHITECTURES

Номер: US20190050230A1
Принадлежит: Intel Corporation

The present disclosure is directed to systems and methods for mitigating or eliminating the effectiveness of a side-channel based attack, such as one or more classes of an attack commonly known as Spectre. Novel instruction prefixes, and in certain embodiments one or more corresponding instruction prefix parameters, may be provided to enforce a serialized order of execution for particular instructions without serializing an entire instruction flow, thereby improving performance and mitigation reliability over existing solutions. In addition, improved mitigation of such attacks is provided by randomizing both the execution branch history as well as the source address of each vulnerable indirect branch, thereby eliminating the conditions required for such attacks. 1. A system for mitigating vulnerability to one or more side-channel based attacks , the system comprising:one or more processors; and fetch a first instruction for execution that includes a speculative execution (SE) lock prefix, and initiate execution of the first instruction;', 'after execution of the first instruction is initiated, fetch a second instruction; and', 'responsive to a determination that the second instruction includes an SE prefix associated with the SE lock prefix of the first instruction, prevent speculative execution of the second instruction until execution of the first instruction is completed., 'a storage device coupled to the one or more processors, the storage device including machine-readable instructions that, when executed by at least one of the one or more processors, cause the at least one processor to2. The system of claim 1 , wherein the machine-readable instructions further cause the at least one processor to:after execution of the first instruction is initiated, fetch a third instruction; andresponsive to a determination that the third instruction does not include a SE lock prefix and prior to the execution of the first instruction being completed, initiate speculative ...

Подробнее
22-02-2018 дата публикации

TRIPLE-PASS EXECUTION

Номер: US20180052684A1
Автор: Tran Thang
Принадлежит:

An execution pipeline architecture of a microprocessor employs a third-pass functional unit, for example, third-level of arithmetic logic unit (ALU) or third short-latency execution unit to execute instructions with reduced complexity and area cost of out-of-order execution. The third-pass functional unit allows instructions with long latency execution to be moved into a retire queue. The retire queue further includes the third functional unit (e.g., ALU), a reservation station and a graduate buffer. Data dependencies of dependent instructions in the retire queue is handled independently from the main pipeline. 1. A microprocessor including an extended pipeline stage comprising:a main execution pipeline processing instructions and configured to forward long latency instructions to a retire queue, the long latency instructions being instructions taking more than one cycle to execute; and 'a third-pass functional unit configured to receive result data of the long latency instructions and to process the third-pass instructions, the processing of the third-pass instructions being executed independently from the instructions in the main execution pipeline.', 'the retire queue configured to store the long latency instructions and third-pass instructions, the third-pass instructions partly depending on the long latency instructions, the retire queue further comprising2. The microprocessor of claim 1 , wherein the third-pass instructions include at least one of an arithmetic logic unit (ALU) instruction and a shift instruction.3. The microprocessor of claim 1 , wherein the long latency instructions and the third-pass instructions are stored in separate queues.4. The microprocessor of claim 1 , further comprising a data dependency matrix configured to forward data between a long latency instruction and a third-pass instruction.5. The microprocessor of claim 1 , further comprising a data dependency matrix configured to forward data between third-pass instructions.6. The ...

Подробнее
22-02-2018 дата публикации

PROCESSOR AND METHOD FOR EXECUTING INSTRUCTIONS ON PROCESSOR

Номер: US20180052685A1
Автор: OUYANG Jian, Qi Wei, Wang Yong
Принадлежит:

The present application discloses a processor and a method for executing an instruction on a processor. The method includes: fetching a to-be-executed instruction, the instruction comprising a source address field, a destination address field, an operation type field, and an operation parameter field; determining, in at least one execution unit, an execution unit controlled by a to-be-generated control signal according to the operation type field, determining a source address and a destination address of data operated by the execution unit controlled by the to-be-generated control signal according to the source address field and the destination address field, and determining a data amount of the data operated by the execution unit controlled by the to-be-generated control signal according to the operation parameter field; generating the control signal; and controlling, by using the control signal, the execution unit in the at least one execution unit to execute an operation. 1. A processor connected to a host , the processor comprising: the instruction fetching unit being configured to fetch a to-be-executed instruction, the instruction comprising a source address field, a destination address field, an operation type field, and an operation parameter field;', 'the decoder being configured to determine, in the at least one execution unit, an execution unit controlled by a to-be-generated control signal according to the operation type field, determine a source address and a destination address of data operated by the execution unit controlled by the to-be-generated control signal according to the source address field and the destination address field, determine a data amount of the data operated by the execution unit controlled by the to-be-generated control signal according to the operation parameter field, and generate the control signal according to the determined execution unit, the determined source address, the determined destination address, and the determined ...

Подробнее
22-02-2018 дата публикации

INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS

Номер: US20180052689A1
Принадлежит: Intel Corporation

Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively. 119-. (canceled)20. A processor comprising:at least one processing core to execute instructions and process data;multiple levels of cache, including a Level 1 (L1) cache;a first packed data source register;a second packed data source register;a decoder to decode instructions including a packed character comparison instruction, a first set of packed integer data elements to be stored in the first packed data source register and a second set of packed integer data elements to be stored in the second packed data source register, one or more packed integer data elements in the first set to encode an alphanumeric character and the packed integer data elements in the second set to encode a set of character ranges; andan execution circuit to execute the packed character comparison instruction to determine whether a first alphanumeric character encoded by a first packed integer data element in the first set falls within any of a set of ranges of alphanumeric characters defined by the packed integer data elements in the second set, the execution circuit to update a result register to provide an indication of whether the first alphanumeric character falls within one or more of the set of ranges.21. The processor of claim 20 , wherein the first set and second set of packed integer data elements comprise packed bytes.22. The processor of claim 20 , further comprising a microcode read only memory (ROM) to store microcode for the packed character comparison instruction.23. The processor of claim 20 , wherein the first and second packed data source registers comprise 128-bit packed data registers.24 ...

Подробнее
22-02-2018 дата публикации

Reorder buffer scoreboard

Номер: US20180052690A1
Автор: Thang Tran
Принадлежит: Synopsys Inc

Various embodiments of a microprocessor include a scoreboard implementation that directs the microprocessor to the location of data values. For example, the scoreboard may include individual bits that instruct the microprocessor to retrieve the data from a re-order buffer, retire queue, result bus, or register file. As a first step, the microprocessor receives an instruction indicating a process that requires data from one or more source registers. Instead of automatically retrieving the data from the register file, which is a costly process, the microprocessor may read the scoreboard to determine whether the needed data can be more cost-effectively retrieved from the re-order buffer, retire queue, or result busses. Therefore, the microprocessor can avoid costly data retrieval procedures. Additionally, the scoreboard implementation enables the microprocessor to handle limited out-of-order instructions, which improves overall performance of the microprocessor.

Подробнее
13-02-2020 дата публикации

SUPERIMPOSING BUTTERFLY NETWORK CONTROLS FOR PATTERN COMBINATIONS

Номер: US20200050573A1
Принадлежит:

A multilayer butterfly network is shown that is operable to transform and align a plurality of fields from an input to an output data stream. Many transformations are possible with such a network which may include separate control of each multiplexer. This invention supports a limited set of multiplexer control signals, which enables a similarly limited set of data transformations. This limited capability is offset by the reduced complexity of the multiplexor control circuits. This invention used precalculated inputs and simple combinatorial logic to generate control signals for the butterfly network. Controls are independent for each layer and therefore are dependent only on the input and output patterns. Controls for the layers can be calculated in parallel. 1. An apparatus for data transformation of an input data word of 2sections , where N is an integer , comprising: a first input receiving a bit corresponding to a precalculated shuffle pattern;', 'a second input receiving a bit corresponding to a precalculated replicate pattern;', 'a third input receiving a bit corresponding to a precalculated rotate pattern;', 'a first exclusive OR gate having a first input receiving the bit corresponding to the precalculated shuffle pattern, a second input receiving the bit corresponding the precalculated replicate pattern, and an output;', 'a second exclusive OR gate having a first input receiving the bit corresponding to the precalculated replicate pattern, a second input receiving the bit corresponding to the precalculated rotate pattern, and an output;', 'a third exclusive OR gate having a first input receiving the bit corresponding to the precalculated rotate pattern, a second input receiving the bit corresponding to the precalculated shuffle pattern, and an output; and', 'a control multiplexer having a first input receiving the bit corresponding to the precalculated shuffle pattern, a second input receiving the bit corresponding to the precalculated replicate pattern, a ...

Подробнее
15-05-2014 дата публикации

Execution of instruction loops using an instruction buffer

Номер: US20140136822A1
Принадлежит: Advanced Micro Devices Inc

In a normal, non-loop mode a uOp buffer receives and stores for dispatch the uOps generated by a decode stage based on a received instruction sequence. In response to detecting a loop in the instruction sequence, the uOp buffer is placed into a loop mode whereby, after the uOps associated with the loop have been stored at the uOp buffer, storage of further uOps at the buffer is suspended. To execute the loop, the uOp buffer repeatedly dispatches the uOps associated with the loop's instructions until the end condition of the loop is met and the uOp buffer exits the loop mode.

Подробнее
21-02-2019 дата публикации

PROCESSING VECTOR INSTRUCTIONS

Номер: US20190056933A1
Принадлежит:

Processing circuitry () performs multiple beats of processing in response to a vector instruction, each beat comprising processing corresponding to a portion of a vector value comprising multiple data elements. The processing circuitry () sets beat status information () indicating which beats of a group of two or more vector instructions have completed. In response to a return-from-event request indicating a return to processing of the given vector instruction, the processing circuitry () resumes processing of the group of uncompleted vector instructions while suppressing beats already completed, based on the beat status information (). 1. An apparatus comprising:processing circuitry to process vector instructions for which at least one of a source operand and a result value is a vector value comprising a plurality of data elements;wherein in response to a given vector instruction, the processing circuitry is configured to perform a plurality of beats of processing, each beat comprising processing corresponding to a portion of the vector value;the processing circuitry is configured to set beat status information indicative of which beats of a plurality of vector instructions including said given vector instruction have completed;in response to an event, the processing circuitry is configured to suspend processing of said given vector instruction; andin response to a return-from-event request indicating a return to processing of said given vector instruction, the processing circuitry is configured to resume processing of said plurality of vector instructions while suppressing the beats of said plurality of vector instructions indicated by said beat status information as having completed;wherein the vector value comprises data elements having one of a plurality of data element sizes specified by data element size information accessible to the processing circuitry; andeach beat of processing comprises processing corresponding to a fixed size portion of the vector value ...

Подробнее
03-03-2016 дата публикации

OPTIMIZE CONTROL-FLOW CONVERGENCE ON SIMD ENGINE USING DIVERGENCE DEPTH

Номер: US20160062771A1

There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC. 1. A method for selecting an active data stream while running a SPMD (Single Program Multiple Data) program of instructions on a SIMD (Single Instruction Multiple Data) machine , an instruction stream having one thread-PC (Program Counter) , the thread-PC indicating an instruction memory address which stores an instruction to be fetched next for the instruction stream;running the instruction stream over one or more input data streams (“lanes”), each lane being associated with a corresponding lane depth counter, a corresponding lane-PC of a lane indicating a memory address which stores the instruction to be fetched next for the lane when the lane is activated, and a lane activation bit indicating whether a corresponding lane is active or not;incrementing lane depth counters of all active lanes upon the thread-PC reaching a branch operation in the instruction stream;updating the lane-PC of each active lane according to targets of the branch operation; andselecting one or more active lanes and assigning a corresponding lane-PC to the thread-PC, and activating ...

Подробнее
20-02-2020 дата публикации

CONFIGURABLE SIMD MULTIPLICATION CIRCUIT

Номер: US20200057609A1
Принадлежит:

A configurable SIMD multiplication circuit is provided to perform multiplication on a multiplicand operand M and multiplier operand R with varying data element sizes supported. For each result element generated based on corresponding elements of the multiplicand operand M and the multiplier operand R, the multiplication is performed according to radix-N modified Booth multiplication, where N=2and P≥3. A Booth digit selection scheme is described for improving the efficiency with which higher radix modified Booth multiplication can be implemented in a configurable SIMD multiplier. 1. An apparatus comprising:a configurable SIMD multiplication circuit to perform multiplication on a multiplicand operand M and a multiplier operand R to generate a result value; andcontrol circuitry responsive to a multiplication command specifying a selected element size from a plurality of element sizes supported by the configurable SIMD multiplication circuit, to control the configurable SIMD multiplication circuit to generate the result value in which each of one or more result elements within the result value has a value corresponding to the product of a corresponding multiplicand element of the multiplicand operand M and a corresponding multiplier element of the multiplier operand R, said corresponding multiplicand element having the selected element size; in which:{'sup': 'P', 'for each of said plurality of element sizes supported by the configurable SIMD multiplication circuit, the configurable SIMD multiplication circuit is configured to generate each result element of the result value using radix-N modified Booth multiplication of the corresponding multiplicand element and the corresponding multiplier element, where N=2and P≥3.'}2. The apparatus according to claim 1 , in which a minimum element size supported by the configurable SIMD multiplication circuit is Ebits claim 1 , and Emodulo P is non-zero.3. The apparatus according to claim 1 , in which a minimum element size supported ...

Подробнее
20-02-2020 дата публикации

Methods and apparatus for full-system performance simulation

Номер: US20200057707A1
Автор: Jiang Xiaowei
Принадлежит:

The present application provides methods and systems for simulating full-system performance of a hardware device. An exemplary system for simulating full-system performance of a hardware device may include a cycle-accurate performance simulator configured to model a performance of a hardware component of a plurality of hardware components of a system, and the cycle-accurate performance simulator may include a first transactor. The system may also include a full-system simulator configured to model a performance of the plurality of hardware components of the system, and the full-system simulator includes a second transactor. The system may further include a communication mechanism between the first transactor and the second transactor, wherein the communication mechanism is configured to communicate data between the cycle-accurate performance simulator and the full-system simulator. 1. A system for simulating full-system performance of a hardware component , the system comprising:a cycle-accurate performance simulator configured to model a performance of the hardware component of a plurality of hardware components of an application system, the cycle-accurate performance simulator including a first transactor;a full-system simulator configured to model a performance of the plurality of hardware components of the application system, the full-system simulator including a second transactor; anda communication mechanism between the first transactor and the second transactor, wherein the communication mechanism is configured to communicate data between the cycle-accurate performance simulator and the full-system simulator.2. The system of claim 1 , wherein the communication mechanism is configured to:transmit the data through a plurality of pipes between the cycle-accurate performance simulator and the full-system simulator;access a shared memory between the cycle-accurate performance simulator and the full-system simulator; andcommunicate a request or a response through a ...

Подробнее
20-02-2020 дата публикации

PREFETCH MANAGEMENT IN A HIERARCHICAL CACHE SYSTEM

Номер: US20200057720A1
Принадлежит:

An apparatus includes a CPU core, a first memory cache with a first line size, and a second memory cache having a second line size larger than the first line size. Each line of the second memory cache includes an upper half and a lower half. A memory controller subsystem is coupled to the CPU core and to the first and second memory caches. Upon a miss in the first memory cache for a first target address, the memory controller subsystem determines that the first target address resulting in the miss maps to the lower half of a line in the second memory cache, retrieves the entire line from the second memory cache, and returns the entire line from the second memory cache to the first memory cache. 1. (canceled)2. The apparatus of claim 3 , wherein the second line size is twice the first line size.3. An apparatus claim 3 , comprising:a central processing unit (CPU) core;a first memory cache to store instructions for execution by the CPU core, the first memory cache having a first line size;a second memory cache to store instructions for execution by the CPU core, the second memory cache having a second line size, the second line size being larger than the first line size, each line of the second memory cache comprising an upper half and a lower half; and [ determine that the first target address maps to the lower half of a first line in the second memory cache;', 'retrieve the entire first line from the second memory cache; and', 'return the entire first line from the second memory cache to the first memory cache; and, 'upon a determination of a first miss in the first memory cache for a first target address, determine that the second target address maps to the upper half of a second line in the second memory cache; and', 'return to the first memory cache the upper half of the second line from the second memory cache and not the lower half of the second line from the second memory cache., 'upon a determination of a second miss in the first memory cache for a second ...

Подробнее
20-02-2020 дата публикации

Slot/sub-slot prefetch architecture for multiple memory requestors

Номер: US20200057723A1
Принадлежит: Texas Instruments Inc

A prefetch unit generates a prefetch address in response to an address associated with a memory read request received from the first or second cache. The prefetch unit includes a prefetch buffer that is arranged to store the prefetch address in an address buffer of a selected slot of the prefetch buffer, where each slot of the prefetch unit includes a buffer for storing a prefetch address, and two sub-slots. Each sub-slot includes a data buffer for storing data that is prefetched using the prefetch address stored in the slot, and one of the two sub-slots of the slot is selected in response to a portion of the generated prefetch address. Subsequent hits on the prefetcher result in returning prefetched data to the requestor in response to a subsequent memory read request received after the initial received memory read request.

Подробнее
02-03-2017 дата публикации

SYSTEM AND METHOD FOR MULTI-BRANCH SWITCHING

Номер: US20170060591A1
Принадлежит:

A system and method for multi-branch switching are provided. A memory has stored therein a program comprising at least one sequence of instructions, the at least one sequence of instructions comprising a plurality of branch instructions, at least one branch of the program reached upon execution of each one of the plurality of branch instructions. The processor is configured for fetching the plurality of branch instructions from the memory, separately buffering each branch of the program associated with each one of the fetched branch instructions, evaluating the fetched branch instructions in parallel, and executing the evaluated branch instructions in parallel. 1. A system comprising:a memory having stored therein a program comprising at least one sequence of instructions, the at least one sequence of instructions comprising a plurality of branch instructions, at least one branch of the program reached upon execution of each one of the plurality of branch instructions; and fetching the plurality of branch instructions from the memory;', 'separately buffering each branch of the program associated with each one of the fetched branch instructions;', 'evaluating the fetched branch instructions in parallel; and', 'executing the evaluated branch instructions in parallel., 'a processor configured for2. The system of claim 1 , wherein the processor is configured for resolving each condition upon which the evaluated branch instructions depend and accordingly identifying claim 1 , upon resolving the condition claim 1 , ones of the plurality of branch instructions that are not to be taken and one of the plurality of branch instructions to be taken.3. The system of claim 2 , wherein the processor is configured for discarding the ones of the plurality of branch instructions not to be taken and carrying on with execution of the one of the plurality of branch instructions to be taken.4. The system of claim 2 , wherein the processor is configured for preventing further evaluation ...

Подробнее
02-03-2017 дата публикации

METHODS FOR COMBINING INSTRUCTIONS AND APPARATUSES HAVING MULTIPLE DATA PIPES

Номер: US20170060594A1
Принадлежит:

A method for combining instructions, performed by a compiler, containing at least the following steps. First instructions are obtained, where each performs one of a calculation operation, a comparison operation, a logic operation, a selection operation, a branching operation, a LD/ST (Load/Store) operation, a SMP (sampling) operation and a complicated mathematics operation. The first instructions are combined as one combined instruction according to data dependencies between the first instructions. The combined instruction is sent to a SP (Stream Processor). 1. A method for combining instructions , performed by a compiler , the method comprising:obtaining a plurality of first instructions, wherein each first instruction performs one of a calculation operation, a comparison operation, a logic operation, a selection operation, a branching operation, a LD/ST (Load/Store) operation, a SMP (sampling) operation and a complicated mathematics operation;combining the first instructions as one combined instruction according to data dependencies between the first instructions; andsending the combined instruction to a SP (Stream Processor).2. The method of claim 1 , wherein the first instructions are combined according to the following rules:ALG+CMP+SEL;ALG+CMP+SEL+SFU/LS/SMP;ALG+CMP+Branch;ALG+LGC+SEL;ALG+LGC+SEL+SFU/LS/SMP; orALG+LGC+Branch,ALG indicates a calculation instruction, CMP indicates a comparison instruction, LGC indicates a logic instruction, SEL indicates a selection instruction, Branch indicates a branching instruction, SFU indicates a mathematics computation instruction, LS indicates a Load/Store instruction and SMP indicates a sampling instruction.3. The method of claim 1 , further comprising:obtaining a second instruction, wherein the second instruction is used for sending data from a CR (Common Register) or a CB (Constant Buffer) to another CR or a post-processing unit; andcombining the combined result for the first instructions with the second instruction ...

Подробнее
05-03-2015 дата публикации

MICROPROCESSOR WITH BOOT INDICATOR THAT INDICATES A BOOT ISA OF THE MICROPROCESSOR AS EITHER THE X86 ISA OR THE ARM ISA

Номер: US20150067301A1
Принадлежит:

A microprocessor includes a plurality of registers that holds an architectural state of the microprocessor and an indicator that indicates a boot instruction set architecture (ISA) of the microprocessor as either the x86 ISA or the Advanced RISC Machines (ARM) ISA. The microprocessor also includes a hardware instruction translator that translates x86 ISA instructions and ARM ISA instructions into microinstructions. The hardware instruction translator translates, as instructions of the boot ISA, the initial ISA instructions that the microprocessor fetches from architectural memory space after receiving a reset signal. The microprocessor also includes an execution pipeline, coupled to the hardware instruction translator. The execution pipeline executes the microinstructions to generate results defined by the x86 ISA and ARM ISA instructions. In response to the reset signal, the microprocessor initializes its architectural state in the plurality of registers as defined by the boot ISA prior to fetching the initial ISA instructions. 1. A microprocessor , comprising:a plurality of registers, that holds an architectural state of the microprocessor;an indicator, that indicates a boot instruction set architecture (ISA) of the microprocessor as either the x86 ISA or the Advanced RISC Machines (ARM) ISA;a hardware instruction translator, that translates x86 ISA instructions and ARM ISA instructions into microinstructions, wherein the hardware instruction translator translates, as instructions of the boot ISA, the initial ISA instructions that the microprocessor fetches from architectural memory space after receiving a reset signal;an execution pipeline, coupled to the hardware instruction translator, wherein the execution pipeline executes the microinstructions to generate results defined by the x86 ISA and ARM ISA instructions; andwherein in response to the reset signal, the microprocessor initializes its architectural state in the plurality of registers as defined by the ...

Подробнее
22-05-2014 дата публикации

PREFETCHING BASED UPON RETURN ADDRESSES

Номер: US20140143522A1
Принадлежит:

An apparatus for processing data includes signature generation circuitry for generating a signature value indicative of the current state of the apparatus in dependence upon a sequence of immediately preceding return addresses generating during execution of a stream of program instructions to reach that state of the apparatus. Prefetch circuitry performs one or more prefetch operations in dependence upon the signature value that is generated. The signature value may be generated by a hashing operation (such as an XOR) performed upon return addresses stored within a return address stack 1. Apparatus for processing data in response to execution of a stream of program instructions including call instructions with respective associated return addresses , said apparatus comprising:signature generation circuitry configured to generate a signature value indicative of a current state of said apparatus in dependence upon a plurality of return addresses generated during execution of said stream; andprefetch circuitry configured to perform one or more prefetch operations in dependence upon said signature value.2. Apparatus as claimed in claim 1 , wherein when execution of said stream either generates a new return address or performs a return operation that consumes a previously generated return address claim 1 , said signature generation circuitry generates an updated signature value.3. Apparatus as claimed in claim 2 , wherein said prefetch circuitry performs one or more prefetch operations in dependence upon said updated signature value.4. Apparatus as claimed in claim 2 , wherein said updated signature value is dependent upon whether said updated signature value was generated in response to generation of said new return address or consumption of said previously generated return address.5. Apparatus as claimed in claim 4 , wherein when said updated signature value is generated in response to generation of said new return address claim 4 , then said updated signature value is ...

Подробнее
17-03-2022 дата публикации

INSTRUCTION PREFETCH BASED ON THREAD DISPATCH COMMANDS

Номер: US20220083339A1
Принадлежит: Intel Corporation

A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations. 1. A graphics processing device comprising:a set of compute units to execute multiple threads of a workload;a cache coupled with the set of compute units; and receive a command to dispatch threads to multiple compute units in the set of compute units, wherein the command is associated with a kernel to be executed by the compute units;', 'determine a memory address associated with the kernel based on the command;', 'determine a prefetch block size associated with the kernel based on the command;', 'prefetch instructions associated with the kernel via the memory address and the prefetch block size associated with the kernel; and', 'load the cache with the instructions before a start of thread execution., 'circuitry coupled with the cache and the set of compute units, the circuitry configured to2. The graphics processing device as in claim 1 , wherein the cache is a global cache for the graphics processing device.3. The graphics processing device as in claim 1 , wherein the cache is a local cache for the set of compute units.4. The graphics processing device as in claim 1 , wherein the circuitry is configured to prefetch the instructions associated with the kernel in parallel with execution of the command to dispatch the multiple threads of the workload.5. The graphics processing device as in claim 1 , wherein the circuitry is additionally configured to prefetch constant data associated with the kernel.6. The graphics processing device as ...

Подробнее
08-03-2018 дата публикации

INDEPENDENT MAPPING OF THREADS

Номер: US20180067746A1
Принадлежит:

Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads. 1. A processor core comprising:a plurality of dispatch queues, wherein the plurality of dispatch queues are configured to receive instructions associated with a first number of threads;a first even plurality of parallel execution slices, wherein each of the first even plurality of parallel execution slices includes a corresponding first set of register files;a second even plurality of parallel execution slices, wherein each of the second even plurality of parallel execution slices includes a corresponding second set of register files; anda routing network configured to direct an output of the plurality of dispatch queues to the first even plurality of parallel execution slices and the second even plurality of parallel execution slices, wherein the first even plurality of parallel execution slices are configured to independently execute a first number of threads and the second even plurality of parallel execution slices are configured to independently execute a second number of threads that differ from the first number of threads, wherein the first even plurality of parallel execution slices write results of the execution of the first number of threads to the first set of register files, and wherein the second even plurality of parallel execution slices writes results of the execution of the second number of ...

Подробнее
27-02-2020 дата публикации

Pipelined Allocation for Operand Cache

Номер: US20200065104A1
Принадлежит: Apple Inc

Techniques are disclosed relating to controlling an operand cache in a pipelined fashion. An operand cache may cache operands fetched from the register file or generated by previous instructions to improve performance and/or reduce power consumption. In some embodiments, instructions are pipelined and separate tag information is maintained to indicate allocation of an operand cache entry and ownership of the operand cache entry. In some embodiments, this may allow an operand to remain in the operand cache (and potentially be retrieved or modified) during an interval between allocation of the entry for another operand and ownership of the entry by the other operand. This may improve operand cache efficiency by allowing the entry to be used while to retrieving the other operand from the register file, for example.

Подробнее
27-02-2020 дата публикации

STREAMING ENGINE WITH SEPARATELY SELECTABLE ELEMENT AND GROUP DUPLICATION

Номер: US20200065252A1
Автор: Zbiciak Joseph
Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. An element duplication unit optionally duplicates data element an instruction specified number of times. A vector masking unit limits data elements received from the element duplication unit to least significant bits within an instruction specified vector length. If the vector length is less than a stream head register size, the vector masking unit stores all 0's in excess lanes of the stream head register (group duplication disabled) or stores duplicate copies of the least significant bits in excess lanes of the stream head register. 1. A method comprising:receiving, from a memory of a computing system, a plurality of data elements stored in the memory, the plurality of data elements;applying duplication to each data element based on an element duplication factor to produce a set of duplicated data elements; andsupplying the set of duplicated data elements to a functional unit of a processing core of the computing system as at least part of a data stream.2. The method of claim 1 , wherein the element duplication factor is determined by accessing a register configured to store a stream definition template claim 1 , the stream definition template including a field containing the element duplication factor.3. The method of claim 2 , wherein the element duplication factor is an integer power of 2.4. The method of claim 3 , wherein the element duplication factor is selected as one of factors of 2 claim 3 , 4 claim 3 , 8 claim 3 , 16 claim 3 , 32 claim 3 , and 64.5. The method of claim 1 , wherein the memory is a first memory of a hierarchical memory system of the computing system claim 1 , and wherein the first memory is not hierarchically closest to the processing core.6. The method of ...

Подробнее
27-02-2020 дата публикации

Determining the effectiveness of prefetch instructions

Номер: US20200065253A1
Принадлежит: International Business Machines Corp

Effectiveness of prefetch instructions is determined. A prefetch instruction is executed to request that data be fetched into a cache of the computing environment. The effectiveness of the prefetch instruction is determined. This includes updating, based on executing the prefetch instruction, a cache directory of the cache. The updating includes, in the cache directory, effectiveness data relating to the data. The effectiveness data includes whether the data was installed in the cache based on the prefetch instruction. Additionally, the determining the effectiveness includes obtaining at least a portion of the effectiveness data from the cache directory, and using the at least a portion of effectiveness data to determine the effectiveness of the prefetch instruction.

Подробнее
09-03-2017 дата публикации

Hardware accelerated conversion system using pattern matching

Номер: US20170068540A1
Автор: Mohammad Abdallah
Принадлежит: SOFT MACHINES, INC.

A method for converting guest instructions into native instructions is disclosed. The method comprises accessing a guest instruction and performing a first level translation of the guest instruction. The performing comprises: (a) comparing the guest instruction to a plurality of group masks and a plurality of tags stored in multi-level conversion tables by pattern matching subfields of the guest instruction in a hierarchical manner, wherein the conversion tables store mappings of guest instruction bit-fields to corresponding native instruction bit-fields; and (b) responsive to a hit in a conversion table, substituting a bit-field in the guest instruction with a corresponding native equivalent of the bit-field. The method further comprises performing a second level translation of the guest instruction using a second level conversion table and outputting a resulting native instruction when the second level translation proceeds to completion.

Подробнее