Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 10914. Отображено 200.
10-01-2005 дата публикации

СИСТЕМА И СПОСОБ ДЛЯ ПРЕДВАРИТЕЛЬНОЙ ВЫБОРКИ ДАННЫХ В КЭШ-ПАМЯТИ, ОСНОВАННОЙ НА ИНТЕРВАЛЕ НЕУДАЧ

Номер: RU2003119149A
Принадлежит:

... 1. Устройство предварительных выборок для предварительной выборки данных для команды на основании интервала между неудачными обращениями к кэш, вызванными командой. 2. Устройство предварительных выборок по п.1, в котором устройство предварительных выборок имеет память для хранения таблицы предварительных выборок, содержащей одну или более записей, которые включают в себя интервал между неудачными обращениями к кэш, вызванными командой. 3. Устройство предварительных выборок по п.2, в котором таблица предварительных выборок содержит запись для команды, только если команда вызвала по меньшей мере два неудачных обращения к кэш. 4. Устройство предварительных выборок по п.2, в котором адреса предварительно выбираемых элементов данных определяются на основании интервала между неудачными обращениями к кэш, записанными в таблице предварительных выборок для команды. 5. Устройство предварительных выборок по п.2, в котором устройство предварительных выборок имеет “шумовой” фильтр для предотвращения ...

Подробнее
09-06-2010 дата публикации

Preload instruction control

Номер: GB0201006758D0
Автор:
Принадлежит:

Подробнее
12-07-1995 дата публикации

Manipulation of data

Номер: GB0009509987D0
Автор:
Принадлежит:

Подробнее
11-01-1989 дата публикации

METHODS OF OPERATING MICROPROCESSORS

Номер: GB0002169115B
Принадлежит: SONY CORP, * SONY CORPORATION

Подробнее
10-03-2010 дата публикации

System and Method for Reducing Execution Divergence in Paralle Processing Architectures

Номер: GB0002463142A
Принадлежит:

A method for reducing execution divergence among a plurality of threads executable within a parallel processing architecture (eg SIMD) includes an operation of determining, among a plurality of data sets (410) that function as operands for a plurality of different execution commands, a preferred execution type for the collective plurality of data sets. A data set is assigned from a data set pool to a thread (434) which, is to be executed by the parallel processing architecture, the assigned data set being of the preferred execution type, whereby the parallel processing architecture is operable to concurrently execute a plurality of threads, the plurality of concurrently executable threads including the thread having the assigned data set. An execution command (436) for which the assigned data functions as an operand is applied to each of the plurality of threads. The pool may comprise local memory storage (412 & 414). If one of the plurality of threads has terminated the data set may be ...

Подробнее
10-02-1982 дата публикации

Improvements in or relating to data processing systems including cache stores

Номер: GB0002080989A
Принадлежит:

A data processing system includes a cache store to provide an interface with a main storage unit for a central processing unit. The central processing unit includes a microprogram control unit in addition to control circuits for establishing the sequencing of the processing unit during the execution of program instructions. Both the microprogram control unit and control circuits include means for generating pre-read commands to the cache store in conjunction with normal processing operations during the processing of certain types of instructions. In response to pre-read commands, the cache store, during predetermined points of the processing of each such instruction, fetches information which is required by such instruction at a later point in the processing thereof.

Подробнее
04-08-2004 дата публикации

Memory access latency hiding with hint buffer

Номер: GB0002397918A
Принадлежит:

A request hint is issued prior to or while identifying whether requested data and/or one or more instructions are in a first memory. A second memory is accessed to fetch data and/or one or more instructions in response to the request hint. The data and/or instruction(s) accessed from the second memory are stored in a buffer. If the requested data and/or instructions(s) are not in the first memory, the data and/or instruction(s) are returned from the buffer.

Подробнее
03-07-2002 дата публикации

Processor multiprocessor system and method for data dependence speculative execution

Номер: GB0000211979D0
Автор:
Принадлежит:

Подробнее
25-08-2010 дата публикации

Data processing apparatus and method dependent on streaming preload instruction.

Номер: GB0002468007A
Принадлежит:

Data processing apparatus comprising a processor 10 and a cache memory 32 having a plurality of cache lines. A cache controller 34 is also provided comprising: preload circuitry 35 storing data values from a main memory into cache-lines operable in response to a streaming preload instruction received at the processor; identification circuitry 36 operable in response to the streaming preload instruction to identify cache lines for preferential reuse (for example by setting a valid bit associated with the cache line); cache maintenance circuitry 37 to select cache lines for reuse having regard to any preferred for reuse identification generated by the identification circuitry. In this way, a single streaming preload instruction can be used to trigger both a preload of cache lines of data values into the cache memory, and also to mark for preferential reuse other cache lines of the cache memory. Data values can be stored in cache lines following the current line address for preload or preceding ...

Подробнее
01-11-2006 дата публикации

Method and apparatus for dynamically adjusting the aggressiveness of an execute-ahead processor

Номер: GB0000618749D0
Автор:
Принадлежит:

Подробнее
03-11-1965 дата публикации

Asynchronous digital computer

Номер: GB0001008775A
Автор:
Принадлежит:

... 1,008,775. Digital electronic computers. GENERAL ELECTRIC CO. Ltd. Nov. 13, 1961 [Nov. 21, 1960], No.40597/61. Heading G4A. A binary serial-mode computer in which instructions and data are stored on a magnetic drum 10 is arranged to operate partly with single address and partly with two-address instructions, each word read from the drum being passed to a register such as 32 or 33 via an arithmetic unit 25 so that the simpler instructions may be carried out in one word-time. Where possible instructions (single address) are placed in alternate locations on the drum, and the operand required by a given instruction is placed in the next location. When an operation is to occupy more than one word-time, or the alternate arrangement of instructions and operands is not possible, a two-address instruction (double length) is used, one bit of the first word being reserved for indicating this; the second address then being that of the next instruction. An address consists of seven bits specifying a ...

Подробнее
13-02-1980 дата публикации

DATA PROCESSING SYSTEMS

Номер: GB0001561091A
Автор:
Принадлежит:

Подробнее
24-06-2020 дата публикации

Gateway pull model

Номер: GB0002579412A
Автор: BRIAN MANULA, Brian Manula
Принадлежит:

A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host A computer system comprising: {i) a computer subsystem configured to act as a work accelerator, and {ii) a gateway connected to the computer subsystem, the gateway enabling the transfer of data to the computer subsystem from external storage at pre-compiled data exchange synchronisation points attained by the computer subsystem, which act as a barrier between a compute phase and an exchange phase of the computer subsystem, wherein the computer subsystem is configured to pull data from a gateway transfer memory of the gateway in response to the pre-compiled data exchange synchronisation point attained by the subsystem, wherein the gateway comprises at least one processor configured to perform at least one operation to pre-load at least some of the data from a first memory of the gateway to the gateway transfer memory in advance of the pre-complied data exchange synchronisation point attained by ...

Подробнее
15-05-2010 дата публикации

SYSTEM WITH BROAD OPERAND ARCHITECTURE AND PROCEDURE

Номер: AT0000467171T
Принадлежит:

Подробнее
17-11-2003 дата публикации

System and method for linking speculative results of load operations to register values

Номер: AU2002367915A8
Принадлежит:

Подробнее
20-10-2003 дата публикации

Time-multiplexed speculative multi-threading to support single-threaded applications

Номер: AU2003222244A8
Принадлежит:

Подробнее
21-11-2002 дата публикации

ISSUANCE AND EXECUTION OF MEMORY INSTRUCTIONS TO AVOID READ-AFTER-WRITE HAZARDS

Номер: CA0002447425A1
Принадлежит:

A method and apparatus for issuing and executing memory instructions so as to maximize the number of requests issued to a highly pipelined memory and avoid reading data from memory (10) before a corresponding write to memory (10). The memory is divided into a number of regions, each of which is associated with a fence counter (18) that is incremented each time a memory instruction that is targeted to the memory region is issued and decremented each time there is a write to the memory region. After a fence instruction is issued, no further memory instructions (23) are issued if the counter for the memory region specified in the fence instruction is above a threshold. When a sufficient number of the outstanding issued instructions are executed, the counter will be decremented below the threshold and further memory instructions are then issued.

Подробнее
12-09-2003 дата публикации

METHOD OF PREFETCHING DATA/INSTRUCTIONS RELATED TO EXTERNALLY TRIGGERED EVENTS

Номер: CA0002478007A1
Автор: DOERING, ANDREAS
Принадлежит:

Method of prefetching data /instructions related to externally triggered events in a system including an infrastructure (18) having an input interface (20) for receiving data/ instructions to be handled by the infrastructure and an output interface (22) for transmitting data after they have been handled, a memory (14) for storing data/instructions when they are received by input interface, a processor (10) for processing at least some data/instructions, the processor having a cache wherein the data/instructions are stored before being processed, and an external source (26) for assigning sequential tasks to the processor. The method comprises the following steps which are performed while the processor is performing a previous task: determining the location in the memory of data/ instructions to be processed by the processor, indicating to the cache the addresses of these memory locations, fetching the contents of the memory locations and writing them into the cache, and assigning the task ...

Подробнее
24-05-2017 дата публикации

Cross-coupled level shifter with transition tracking circuits

Номер: CN0106716346A
Принадлежит:

Подробнее
04-05-2016 дата публикации

Used for implementing the forming memory access system and method of operation

Номер: CN0103218208B
Автор:
Принадлежит:

Подробнее
08-06-1994 дата публикации

Operation method of micro-computer with assemby line construction

Номер: CN0001024960C
Принадлежит: SONY CORP, SONY K.K.

Подробнее
16-11-1979 дата публикации

DISPOSITIF DE COMMANDE DE TRANSFERT DE DONNEES

Номер: FR0002423822A
Принадлежит:

Quand un opérande demandé par A9 ne se trouve pas dans l'antémémoire dans A13, la première portion de ligne de données transférée de la mémoire principale A16 ainsi que les portions de ligne suivantes de l'opérande sont transférées à l'antémémoire et en même temps à l'unité A9. Cette unité comporte des paires de registres d'opérandes et d'adresses d'opérandes pour recevoir des opérandes à longueur variable avec alignement. Ce dispositif permet d'accroître les performances des systèmes pourvus d'antémémoires.

Подробнее
14-12-1962 дата публикации

Synchronous digital computer

Номер: FR0001312094A
Автор:
Принадлежит:

Подробнее
30-08-2002 дата публикации

PROCEEDED OF MANAGEMENT Of INSTRUCTIONS WITHIN a PROCESSOR ARCHITECTURE DECOUPLEE HAS, IN PARTICULAR a PROCESSOR OF DIGITAL PROCESSING OF the SIGNAL, AND CORRESPONDING PROCESSOR

Номер: FR0002821449A1
Автор: COFLER ANDREW
Принадлежит:

L'unité de traitement DU est associée à une première mémoire RLDQ du type FIFO et à une deuxième mémoire DIDQ du type FIFO. Chaque instruction de chargement LDRx dans un registre est stockée dans la première mémoire RLDQ. Certaines au moins des autres instructions sont stockées dans la deuxième mémoire DIDQ. On extrait de la deuxième mémoire une instruction opérative impliquant au moins un registre, si aucune instruction de chargement temporellement plus ancienne et destinée à modifier la valeur du ou des registres associée à cette instruction opérative, n'est présente dans la première mémoire. Et, en présence d'une telle instruction de chargement temporellement plus ancienne, on extrait ladite instruction opérative de la deuxième mémoire seulement après que l'instruction modificatrice de chargement a été extraite de la première mémoire. Application à un processeur de traitement numérique du signal.

Подробнее
04-07-1986 дата публикации

PROCEDE D'EXPLOITATION D'UN MICROCALCULATEUR A CYCLE D'INSTRUCTION PERFECTIONNE

Номер: FR0002575563A
Автор: NOBUHISA WATANABE
Принадлежит:

UN MICROCALCULATEUR COMPORTE UN DECODEUR D'INSTRUCTION 4 ET UN COMPTEUR DE PROGRAMME 5. LE DECODEUR D'INSTRUCTION DECODE DES INSTRUCTIONS EXTRAITES ET DELIVRE UN SIGNAL DE COMMANDE ORDONNANT L'EXECUTION DE L'INSTRUCTION EXTRAITE. LE SIGNAL DE COMMANDE VENANT DU DECODEUR D'INSTRUCTION COMPORTE UNE COMPOSANTE COMMANDANT DES CYCLES D'INSTRUCTION QUI DECLENCHE UN CYCLE D'EXTRACTION AU DEBUT DE CHAQUE CYCLE D'INSTRUCTION AFIN D'EXTRAIRE L'OPERANDE RELATIF A L'INSTRUCTION EN COURS D'EXECUTION ET A MI-CHEMIN DE CHAQUE CYCLE D'INSTRUCTION AFIN D'EXTRAIRE LE CODE OP RELATIF A L'INSTRUCTION SUIVANTE. LE COMPTEUR DE PROGRAMME REPOND AU DECLENCHEMENT DE CHAQUE CYCLE D'EXTRACTION EN INCREMENTANT SA VALEUR DE COMPTAGE DE MANIERE A MAINTENIR CETTE DERNIERE COHERENTE AVEC L'ADRESSE A LAQUELLE IL EST DONNE ACCES LORS DE CHAQUE CYCLE D'EXTRACTION.

Подробнее
31-03-2006 дата публикации

METHOD AND APPARATUS FOR FACILITATING SPECULATIVE STORES IN A MULTIPROCESSOR SYSTEM

Номер: KR0100567099B1
Автор:
Принадлежит:

Подробнее
10-11-2004 дата публикации

TIME-MULTIPLEXED SPECULATIVE MULTI-THREADING TO SUPPORT SINGLE-THREADED APPLICATIONS

Номер: KR20040094888A
Принадлежит:

TIME-MULTIPEXED SPECULATIVE MULTI- THREADING TO SUPPORT SINGLE-THREADED APPLICATIONSABSTRACTOne embodiment of the present invention provides a system that facilitates interleaved execution of a head thread and a speculative thread within a single processor pipeline. The system operates by executing program instructions using the head thread, and by speculatively executing program instructions in advance of the head thread using the speculative thread, wherein the head thread and the speculative thread execute concurrently through time-multiplexed interleaving in the single processor pipeline. © KIPO & WIPO 2007 ...

Подробнее
09-05-2008 дата публикации

ADVANCED LOAD VALUE CHECK ENHANCEMENT

Номер: KR1020080041251A
Автор: RYCHLIK BOHUSLAV
Принадлежит:

Systems and methods for performing re-ordered computer instructions are disclosed. A computer processor loads a first value from a first memory address, and records both the first value and the second value in a table or queue. The processor stores a second value to the same memory address, and either evicts the previous table entry, or adds the second value to the previous table entry. Upon subsequently detecting the evicted table entry or inconsistent second value, the processor generates an exception that triggers recovery of speculative use of the first value. © KIPO & WIPO 2008 ...

Подробнее
21-08-2006 дата публикации

Method, apparatus and computer system for generating prefetches by speculatively executing code during stalls

Номер: TWI260540B
Автор:
Принадлежит:

One embodiment of the present invention provides a system that generates prefetches by speculatively executing code during Stalls through a technique known as ""hardware scout threading."" The system starts by executing code within a processor. Upon encountering a stall, the system speculatively executes the code from the point of the stall. without committing results of the speculative execution to the architectural state of the processor. If the system encounters a memory reference during this speculative execution, the system determines if a target address for the memory reference can be resolved. If so, the system issues a prefetch for the memory reference to load a cache line for the memory reference into a cache within the processor. In a variation on this embodiment, the processor supports simultaneous multithreading (SMT), which enables multiple threads to execute concurrently through time-multiplexed interleaving in a single processor pipeline. In this variation, the non-speculative ...

Подробнее
06-12-2007 дата публикации

GRAPHICS PROCESSOR WITH ARITHMETIC AND ELEMENTARY FUNCTION UNITS

Номер: WO2007140338A2
Принадлежит:

A graphics processor capable of efficiently performing arithmetic operations and computing elementary functions is described. The graphics processor has at least one arithmetic logic unit (ALU) that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. The ALU(s) and elementary function unit(s) may be arranged such that they can operate in parallel to improve throughput. The graphics processor may also include fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) four components of an attribute for one pixel or (2) one component of an attribute for four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost while still providing good performance.

Подробнее
21-12-2007 дата публикации

APPARATUS AND METHOD OF PREFETCHING DATA

Номер: WO2007145700A1
Автор: KELTCHER, Paul, S.
Принадлежит:

A device (300) and method is illustrated to prefetch information based on a location of an instruction that resulted in a cache miss during its execution. The prefetch information to be accessed is determined based on previous and current cache miss information. For example, information based on previous cache misses is stored at data records as prefetch information. This prefetch information includes location information based on an instruction that caused a previous cache miss, and is accessed to generate prefetch requests for a current cache miss. The prefetch information is updated based on current cache miss information.

Подробнее
16-08-2007 дата публикации

METHOD AND APPARATUS FOR SIMULTANEOUS SPECULATIVE THREADING

Номер: WO000002007092281A2
Принадлежит:

One embodiment of the present invention provides a system which performs simultaneous speculative threading. The system starts by executing instructions in normal execution mode using a first thread. Upon encountering a data-dependent stall condition, the first thread generates an architectural checkpoint and commences execution of instructions in execute-ahead mode. During execute-ahead mode, the first thread executes instructions that can be executed and defers instructions that cannot be executed into a deferred queue. When the data dependent stall condition has been resolved, the first thread generates a speculative checkpoint and continues execution in execute-ahead mode. At the same time, the second thread commences execution in a deferred mode, wherein the second thread executes instructions deferred by the first thread.

Подробнее
02-08-2007 дата публикации

PROCESSOR HAVING A DATA MOVER ENGINE THAT ASSOCIATES REGISTER ADDRESSES WITH MEMORY ADDRESSES

Номер: WO000002007087270A2
Принадлежит:

A RISC processor having a data moving engine and instructions that associate register addresses with memory addresses. In an embodiment, the instructions include a read-tie instruction, a single write-tie instruction, a dual write-tie instruction, and an untie instruction. The read-tie, single write-tie, and dual write-tie instructions are used to associate software accessible register addresses with memory addresses. These associations effect the operation of the data moving engine such that, for the duration of the associations, the data moving engine routes data to and from associated memory addresses and the execution unit of the processor in response to instructions that specify moving data to and from the associated register addresses. The invention reduces the number of instructions and hardware overhead associated with implementing program loops in a RISC processor.

Подробнее
18-11-2004 дата публикации

SYSTEM AND METHOD TO PREVENT IN-FLIGHT INSTANCES OF OPERATIONS FROM DISRUPTING OPERATION REPLAY WITHIN A DATA-SPECULATIVE MICROPROCESSOR

Номер: WO2004099977A2
Принадлежит:

A microprocessor (100) may include one or more functional units (126) configured to execute operations, a scheduler (118) configured to issue operations to the functional units (126) for execution, and at least one replay detection unit. The scheduler (118) may be configured to maintain state information (606) for each operation. Such state information (606) may, among other things, indicate whether an associated operation has completed execution. The replay detection unit may be configured to detect that one of the operations in the scheduler (118) should be replayed. If an instance of that operation is currently being executed by one of the functional units (126) when operation is detected as needing to be replayed, the replay detection unit is configured to inhibit an update to the state information (606) for that operation in response to execution of the in-flight instance of the operation. Various embodiments of computer systems (900) may include such a microprocessor (100).

Подробнее
09-08-2007 дата публикации

CROSS-ARCHITECTURE OPTIMIZATION

Номер: WO000002007089535A3
Принадлежит:

Embodiments include a device, apparatus, and a method. An apparatus includes a monitor circuit for determining an execution characteristic of a first instruction associated with a first computing machine architecture. The apparatus also includes a generator circuit for creating an optimization profile useable in an execution of a second instruction associated with a second computing machine architecture.

Подробнее
13-05-2004 дата публикации

ADAPTABLE DATAPATH FOR A DIGITAL PROCESSING SYSTEM

Номер: WO2004040414A3
Автор: RAMCHANDRAN, Amit
Принадлежит:

The present invention includes a adaptable high-performance node (RXN) with several features that enable it to provide high performance along with adaptability. A preferred embodiment of the RXN includes a run-time configurable data path and control path. The RXN supports multi-precision arithmetic including 8, 16, 24, and 32 bit codes. Data flow can be reconfigured to minimize register accesses for different operations. For example, multiply-accumulate operations can be performed with minimal, or no, register stores by reconfiguration of the data path. Predetermined kernels can be configured during a setup phase so that the RXN can efficiently execute, e.g., discrete cosine transform (DCT), fast-Fourier transform (FFT) and other operations. Other features are provided.

Подробнее
27-03-2008 дата публикации

INTELLIGENT PRE-FETCHING USING COMPOUND OPERATIONS

Номер: WO2008036499A1
Принадлежит:

A system and method for pre-fetching data uses a combination of heuristics to determine likely next data retrieval operations and an evaluation of available resources for executing speculative data operations. When local resources, such as cache memory for storing speculative command results is not available, the compound operation request may not be sent. When resources on a server-side system are insufficient, only the primary command of a compound operation request may be processed and speculative command requests may be rejected. Both local computing resources and network resources may be evaluated when determining whether to build or process a compound operations request.

Подробнее
11-07-2002 дата публикации

SYSTEM AND METHOD FOR PREFETCHING DATA INTO A CACHE BASED ON MISS DISTANCE

Номер: WO2002054230A2
Принадлежит:

A prefetcher to prefetch data for an instruction based on the distance between cache misses caused by the instruction. In an embodiment, the prefetcher includes a memory to store a prefetch table that contains one or more entries that include the distance between cache misses caused by an instruction. In a further embodiment, the addresss of data element sprefetched are determined based on the distance betweeen cache misses recorded in the prefetch table for the instruction.

Подробнее
15-05-2003 дата публикации

SYSTEM AND METHOD TO REDUCE EXECUTION OF INSTRUCTIONS INVOLVING UNRELIABLE DATA IN A SPECULATIVE PROCESSOR

Номер: WO2003040916A1
Принадлежит:

System and method to reduce execution of instructions involving unreliable data in a speculative processor. A method comprises identifying scratch values generated during speculative execution of a processor, and setting at least one tag associated with at least one data area of the processor to indicate that the data area holds a scratch value. Such data areas include registers, predicates, flags, and the like. Instructions may also be similarly tagged. The method may be executed by an execution engine in a computer processor.

Подробнее
11-07-2002 дата публикации

PROCESSOR ARCHITECTURE FOR SPECULATED VALUES

Номер: WO2002054229A1
Принадлежит:

The present invention pertains to a super-scalar processor (1.1) and is intended to make execution of instructions in processor (1.1) more efficient. Processor (1.1) contains a state machine (21) that speculates values of variables. State machine (21) also determines, for each of the speculated values, if there is a first instruction that is dependent upon the specualted value. Processor (1.1) also determines if the speculation of a value has failed and restarts execution from a specified instruction in response to the detection of an incorrectly speculated value. If this is the case, processor (1.1) restarts from the specified instruction that is first affected by the speculated value for which speculation has failed.

Подробнее
16-10-2007 дата публикации

Method and system for safe data dependency collapsing based on control-flow speculation

Номер: US0007284116B2
Принадлежит: Intel Corporation, INTEL CORP, INTEL CORPORATION

The present invention is directed to an apparatus and method for data collapsing based on control-flow speculation (conditional branch predictions). Because conditional branch outcomes are resolved based on actual data values, the conditional branch prediction provides potentially valuable insight into data values. Upon encountering a branch if equal instruction and this instruction is predicted as taken or a branch if not equal instruction and this instruction is predicted as not taken, this invention assumes that the two operands used to determine the conditional branch are equal. The data predictions are safe because a data misprediction means a conditional branch misprediction which results in a pipeline flush of the instructions following the conditional branch instruction including the data mispredictions.

Подробнее
20-04-2006 дата публикации

Processor, data processing system and method for synchronzing access to data in shared memory

Номер: US20060085605A1

A processing unit for a multiprocessor data processing system includes a processor core including a store-through upper level cache, an instruction sequencing unit that fetches instructions for execution, a data register, and at least one instruction execution unit. The instruction execution unit, responsive to receipt of a load-reserve instruction from the instruction sequencing unit, executes the load-reserve instruction to determine a load target address. The processor core, responsive to the execution of the load-reserve instruction, performs a corresponding load-reserve operation by accessing the store-through upper level cache utilizing the load target address to cause data associated with the load target address to be loaded from the store-through upper level cache into the data register and by establishing a reservation for a reservation granule including the load target address.

Подробнее
29-11-2007 дата публикации

Graphics processor with arithmetic and elementary function units

Номер: US20070273698A1
Принадлежит:

A graphics processor capable of efficiently performing arithmetic operations and computing elementary functions is described. The graphics processor has at least one arithmetic logic unit (ALU) that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. The ALU(s) and elementary function unit(s) may be arranged such that they can operate in parallel to improve throughput. The graphics processor may also include fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) four components of an attribute for one pixel or (2) one component of an attribute for four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost while still providing good performance.

Подробнее
29-08-2000 дата публикации

Processor configured to generate lookahead results from operand collapse unit and for inhibiting receipt/execution of the first instruction based on the lookahead result

Номер: US0006112293A1
Автор: Witt; David B.
Принадлежит: Advanced Micro Devices, Inc.

A processor includes a lookahead address/result calculation unit which is configured to receive operand information (either the operand or a tag identifying the instruction which will produce the operand value) corresponding to the source operands of one or more instructions. If the operands are available, lookahead address/result calculation unit may generate either a lookahead address for a memory operand of the instruction or a lookahead result corresponding to a functional instruction operation of the instruction. The lookahead address may be provided to a load/store unit for early initiation of a memory operation corresponding to the instruction. The lookahead result may be provided to a speculative operand source (e.g. a future file) for updating therein. A lookahead state for a register may thereby be provided early in the pipeline. Subsequent instructions may receive the lookahead state and use the lookahead state to generate additional lookahead state early. On the other hand, ...

Подробнее
07-07-1998 дата публикации

Prefetch instruction for improving performance in reduced instruction set processor

Номер: US0005778423A1
Принадлежит: Digital Equipment Corporation

A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the ...

Подробнее
12-07-2005 дата публикации

64-bit single cycle fetch scheme for megastar architecture

Номер: US0006918018B2

The 64-bit single cycle fetch method described here relates to a specific 'megastar' core processor employed in a range of new digital signal processor devices. The 'megastar' core incorporates 32-bit memory blocks arranged into separate entities or banks. Because the parent CPU has only three 16-bit buses, a maximum read in one clock cycle through the memory interface would normally be 48-bits. This invention describes an approach for a fetch method involving tapping into the memory bank data at an earlier stage prior to the memory interface. This allows the normal 48-bit fetch to be extended to 64-bits as required for full performance of the numerical processor accelerator and other speed critical operations and functions.

Подробнее
18-01-2000 дата публикации

Apparatus and method for tracking changes in address size and for different size retranslate second instruction with an indicator from address size

Номер: US0006016544A
Автор:
Принадлежит:

An apparatus and method for improving the execution speed of stack segment load operations is provided. Rather than delaying translation of instructions following stack segment loads, until the load is complete, the present invention presumes that no change will be made to the stack address size. Tracking of the stack address size, at the time of translation, is performed by a plurality of SAS bits associated with translated micro instructions, and logic is provided which compares the tracked SAS bits with any change in the stack address size. If no change is made by the stack load operation, the already translated instructions execute immediately. If a change is made by the stack load operation, logic interrupts processing of the translated instructions, and the instructions are retranslated using the new stack address size.

Подробнее
05-06-2001 дата публикации

Method and system for asynchronous array loading

Номер: US0006243822B1

The present invention decreases the delay associated with loading an array from memory by employing an asynchronous array preload unit. The asynchronous array preload unit provides continuous preliminary loading of data arrays located in a memory subsystem into a prefetch buffer. Array loading is performed asynchronously with respect to execution of the main program.

Подробнее
04-04-1995 дата публикации

CPU having pipelined instruction unit and effective address calculation unit with retained virtual address capability

Номер: US0005404467A
Автор:
Принадлежит:

A prefetch unit includes a Branch history table for providing an indication of an occurrence of a Branch instruction having a Target Address that was previously taken. A plurality of Branch mark bits are stored in an instruction queue, on a half word basis, in conjunction with a double word of instruction data that is prefetched from an instruction cache. The Branch Target Address is employed to redirect instruction prefetching. The Branch Target Address is also pipelined and follows the associated Branch instruction through an instruction pipeline. The prefetch unit includes circuitry for automatically self-filling the instruction pipeline. During a Fetch stage a previously generated Virtual Effective Address is applied to a translation buffer to generate a physical address which is used to access a data cache. The translation buffer includes a first and a second translation buffer, with the first translation buffer being a reduced subset of the second. The first translation buffer is ...

Подробнее
26-06-2001 дата публикации

Prefetch instruction mechanism for processor

Номер: US0006253306B1

Accordingly, a prefetch instruction mechanism is desired for implementing a prefetch instruction which is non-faulting, non-blocking, and non-modifying of architectural register state. Advantageously, a prefetch mechanism described herein is provided largely without the addition of substantial complexity to a load execution unit. In one embodiment, the non-faulting attribute of the prefetch mechanism is provided though use of the vector decode supplied Op sequence that activates an alternate exception handler. The non-modifying of architectural register state attribute is provided (in an exemplary embodiment) by first decoding a PREFETCH instruction to an Op sequence targeting a scratch register wherein the scratch register has scope limited to the Op sequence corresponding to the PREFETCH instruction. Although described in the context of a vector decode embodiment, the prefetch mechanism can be implemented with hardware decoders and suitable modifications to decode paths will be appreciated ...

Подробнее
04-06-2002 дата публикации

Apparatus for software initiated prefetch and method therefor

Номер: US0006401192B1

A mechanism and method for software hint initiated prefetch is provided. The prefetch may be directed to a prefetch of data for loading into a data cache, instructions for entry into an instruction cache or for either, in an embodiment having a combined cache. In response to a software instruction in an instruction stream, a plurality of prefetch specification data values are loaded into a register having a plurality of entries corresponding thereto. Prefetch specification data values include the address of the first cache line to be prefetched, and the stride, or the incremental offset, of the address of subsequent lines to be prefetched. Prefetch requests are generated by a prefetch control state machine using the prefetch specification data values stored in the register. Prefetch requests are issued to a hierarchy of cache memory devices. If a cache hit occurs having the specified cache coherency, the prefetch is vitiated. Otherwise, the request is passed to system memory for retrieval ...

Подробнее
24-05-2011 дата публикации

Processor architecture with wide operand cache

Номер: US0007948496B2

A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Подробнее
26-04-2011 дата публикации

Processor for executing switch and translate instructions requiring wide operands

Номер: US0007932911B2

A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Подробнее
28-06-2016 дата публикации

Modification of prefetch depth based on high latency event

Номер: US0009378144B2

A prefetch stream is established in a prefetch unit of a memory controller for a system memory at a lowest level of a volatile memory hierarchy of the data processing system based on a memory access request received from a processor core. The memory controller receives an indication of an upcoming high latency event affecting access to the system memory. In response to the indication, the memory controller temporarily increases a prefetch depth of the prefetch stream with respect to the system memory and issues, to the system memory, a plurality of prefetch requests in accordance with the temporarily increased prefetch depth in advance of the upcoming high latency event.

Подробнее
01-03-2007 дата публикации

Cryptography methods and apparatus

Номер: US2007050641A1
Принадлежит:

In a first aspect, a first cryptography method is provided. The first method includes the steps of (1) in response to receiving a request to perform a first operation on data in a first memory cacheline, accessing data associated with the first memory cacheline; (2) performing cryptography on data of the first memory cacheline when necessary; and (3) speculatively accessing data associated with a second memory cacheline based on the first memory cacheline before receiving a request to perform an operation on data in the second memory cacheline. Numerous other aspects are provided.

Подробнее
18-11-2004 дата публикации

System and method to handle operand re-fetch after invalidation with out-of-order fetch

Номер: US2004230776A1
Автор:
Принадлежит:

A system and method to re-fetch data lost for instructions with operands greater than eight bytes in length due to line invalidation in a multiprocessor computer system using microprocessors that perform out of order operand fetch in which it is not possible or desirable to kill the execution of the instruction when the storage access rules require that it appear that the operand data is accessed in program execution order.

Подробнее
06-12-2001 дата публикации

High-speed data processing using internal processor memory space

Номер: US2001049744A1
Автор:
Принадлежит:

Significant performance improvements can be realized in data processing systems by confining the operation of a processor within its internal register file so as to reduce the instruction count executed by the processor. Data, which is sufficiently small enough to fit within the internal register file, can be transferred into the internal register file, and execution results can be removed therefrom, using direct memory accesses that are independent of the processor, thus enabling the processor to avoid execution of load and store instructions to manipulate externally stored data. Further, the data and execution results of the processing activity are also accessed and manipulated by the processor entirely within the internal register file. The reduction in instruction count, coupled with the standardization of multiple processors and their instruction sets, enables the realization of a highly scaleable, high-performing symmetrical multi-processing system at manageable complexity and cost ...

Подробнее
31-12-2019 дата публикации

Determining the effectiveness of prefetch instructions

Номер: US0010521350B2

Effectiveness of prefetch instructions is determined. A prefetch instruction is executed to request that data be fetched into a cache of the computing environment. The effectiveness of the prefetch instruction is determined. This includes updating, based on executing the prefetch instruction, a cache directory of the cache. The updating includes, in the cache directory, effectiveness data relating to the data. The effectiveness data includes whether the data was installed in the cache based on the prefetch instruction. Additionally, the determining the effectiveness includes obtaining at least a portion of the effectiveness data from the cache directory, and using the at least a portion of effectiveness data to determine the effectiveness of the prefetch instruction.

Подробнее
12-12-2019 дата публикации

METHOD FOR PERFORMING RANDOM READ ACCESS TO A BLOCK OF DATA USING PARALLEL LUT READ INSTRUCTION IN VECTOR PROCESSORS

Номер: US20190377578A1
Принадлежит:

This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.

Подробнее
04-07-2002 дата публикации

Multi-mode non-binary predictor

Номер: US2002087850A1
Автор:
Принадлежит:

A multi-mode predictor for a processor having a plurality of prediction modes is disclosed. The prediction modes are used to predict non-binary values. The predictor is a multi-mode predictor comprising a per-IP ("PIP") table and a next value table. The PIP table includes a plurality of PIP information fields and the next value table includes a plurality of fields. The multi-mode predictor also includes a plurality of prediction modes. The processor includes a set of instructions that index the PIP table to provide a valid signal. The processor also includes a set of predicted values for the set of instructions. The set of predicted values is stored in the PIP table and the next value table. According to the valid signal a hit/miss condition in the next value table, a predicted value is selected from the PIP table or the next value table.

Подробнее
20-06-2019 дата публикации

TWO ADDRESS TRANSLATIONS FROM A SINGLE TABLE LOOK-ASIDE BUFFER READ

Номер: US20190188151A1
Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.

Подробнее
27-04-2021 дата публикации

Mechanism for interrupting and resuming execution on an unprotected pipeline processor

Номер: US0010990398B2

Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.

Подробнее
16-12-2021 дата публикации

STREAMING ENGINE WITH ERROR DETECTION, CORRECTION AND RESTART

Номер: US20210390018A1
Принадлежит:

Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.

Подробнее
14-11-2002 дата публикации

Scalable processer

Номер: US2002169947A1
Автор:
Принадлежит:

A method and apparatus for issuing and executing memory instructions from a computer system so as to (1) maximize the number of requests issued to a highly pipe-lined memory, the only limitation being data dependencies in the program and (2) avoid reading data from memory before a corresponding write to memory. The memory instructions are organized to read and write into memory, by using explicit move instructions, thereby avoiding any data storage limitations in the processor. The memory requests are organized to carry complete information, so that they can be processed independently when memory returns the requested data. The memory is divided into a number of regions, each of which is associated with a fence counter. The fence counter for a memory region is incremented each time a memory instruction that is targeted to the memory region is issued and decremented each time there is a write to the memory region. After a fence instruction is issued, no further memory instructions are issued ...

Подробнее
17-03-2020 дата публикации

Streaming engine with cache-like stream data storage and lifetime tracking

Номер: US0010592243B2

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.

Подробнее
17-08-2021 дата публикации

Systems and methods to load a tile register pair

Номер: US0011093247B2
Принадлежит: Intel Corporation, INTEL CORP

Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

Подробнее
22-12-2022 дата публикации

INSTRUCTION TO QUERY FOR MODEL-DEPENDENT INFORMATION

Номер: US20220405100A1
Принадлежит:

An instruction is executed to perform a query function. The executing includes obtaining information relating to a selected model of a processor. The information includes at least one model-dependent data attribute of the selected model of the processor. The information is placed in a selected location for use by at least one application in performing one or more functions.

Подробнее
05-04-2022 дата публикации

Method and apparatus for vector permutation

Номер: US0011294826B2
Принадлежит: Texas Instruments Incorporated

A method is provided that includes performing, by a processor in response to a vector permutation instruction, permutation of values stored in lanes of a vector to generate a permuted vector, wherein the permutation is responsive to a control storage location storing permute control input for each lane of the permuted vector, wherein the permute control input corresponding to each lane of the permuted vector indicates a value to be stored in the lane of the permuted vector, wherein the permute control input for at least one lane of the permuted vector indicates a value of a selected lane of the vector is to be stored in the at least one lane, and storing the permuted vector in a storage location indicated by an operand of the vector permutation instruction.

Подробнее
17-05-2006 дата публикации

Probing computer memory latency

Номер: EP0000933698B1
Принадлежит: Hewlett-Packard Company

Подробнее
07-06-2006 дата публикации

Microprocessor with variable latency stack cache

Номер: EP0001555617A3
Автор: Hooker, Rodney E.
Принадлежит:

A variable latency cache memory is disclosed. The cache memory includes a plurality of storage elements for storing stack memory data in a first-in-first-out manner. The cache memory distinguishes between pop and load instruction requests and provides pop data faster than load data by speculating that pop data will be in the top cache line of the cache. The cache memory also speculates that stack data requested by load instructions will be in the top one or more cache lines of the cache memory. Consequently, if the source virtual address of a load instruction hits in the top of the cache memory, the data is speculatively provided faster than the case where the data is in a lower cache line or where a full physical address compare is required or where the data must be provided from a non-stack cache memory in the microprocessor, but slower than pop data.

Подробнее
18-08-2004 дата публикации

Information processing apparatus and method, and scheduling device

Номер: EP0000797143B1

Подробнее
14-12-1994 дата публикации

CPU HAVING PIPELINED INSTRUCTION UNIT AND EFFECTIVE ADDRESS CALCULATION UNIT WITH RETAINED VIRTUAL ADDRESS CAPABILITY

Номер: EP0000628184A1
Принадлежит:

A prefetch unit includes a Branch history table for providing an indication of an occurrence of a Branch instruction having a Target Address that was previously taken. A plurality of Branch mark bits are stored in an instruction queue, on a half word basis, in conjunction with a double word of instruction data that is prefetched from an instruction cache. The Branch Target Address is employed to redirect instruction prefetching. The Branch Target Address is also pipelined and follows the associated Branch instruction through an instruction pipeline. The prefetch unit includes circuitry for automatically self-filling the instruction pipeline. During a Fetch stage a previously generated Virtual Effective Address is applied to a translation buffer to generate a physical address which is used to access a data cache. The translation buffer includes a first and a second translation buffer, with the first translation buffer being a reduced subset of the second. The first translation buffer is ...

Подробнее
14-11-2001 дата публикации

PROCESSOR CONFIGURED TO PREDECODE RELATIVE CONTROL TRANSFER INSTRUCTIONS

Номер: EP0001031075B1
Автор: WITT, David, B.
Принадлежит: ADVANCED MICRO DEVICES INC.

Подробнее
30-01-1980 дата публикации

DATA PROCESSING SYSTEM

Номер: JP0055013498A
Принадлежит:

Подробнее
10-05-1979 дата публикации

DATENVERARBEITUNGSEINRICHTUNG MIT EINEM MIKROBEFEHLSSPEICHER

Номер: DE0002847934A1
Принадлежит:

Подробнее
16-10-2008 дата публикации

Verfahren und Vorrichtung zum spekulativen Vorabruf in einer Mehrprozessor-/Mehrkern-Nachrichtenübermittlungsmaschine

Номер: DE102008016178A1
Принадлежит:

In einigen Ausführungsformen schließt die Erfindung eine neuartige Kombination aus Verfahren zum Vorabruf (Prefetching) von Daten und Übermitteln von Nachrichten zwischen und unter Kernen in einer Mehrprozessor-/Mehrkernplattform ein. In einer Ausführungsform weist ein Empfangskern eine Nachrichtenwarteschlange und einen Nachrichten-Prefetcher auf. Ankommende Nachrichten werden gleichzeitig in die Nachrichtenwarteschlange und den Nachrichten-Prefetcher geschrieben. Der Prefetcher ruft die Daten, die in der empfangenen Nachricht referenziert sind, spekulativ ab, so daß die Daten zur Verfügung stehen, wenn die Nachricht in der Ausführungs-Pipeline ausgeführt wird oder kurz danach. Andere Ausführungsformen werden beschrieben und beansprucht.

Подробнее
23-12-2020 дата публикации

Gateway pull model

Номер: GB0002579412B
Автор: BRIAN MANULA, Brian Manula
Принадлежит: GRAPHCORE LTD, Graphcore Limited

Подробнее
24-11-1993 дата публикации

MULTI-LEVEL CACHE SYSTEM

Номер: GB0009320511D0
Автор:
Принадлежит:

Подробнее
26-01-2012 дата публикации

Parallel loop management

Номер: US20120023316A1
Принадлежит: International Business Machines Corp

The illustrative embodiments comprise a method, data processing system, and computer program product having a processor unit for processing instructions with loops. A processor unit creates a first group of instructions having a first set of loops and second group of instructions having a second set of loops from the instructions. The first set of loops have a different order of parallel processing from the second set of loops. A processor unit processes the first group. The processor unit monitors terminations in the first set of loops during processing of the first group. The processor unit determines whether a number of terminations being monitored in the first set of loops is greater than a selectable number of terminations. In response to a determination that the number of terminations is greater than the selectable number of terminations, the processor unit ceases processing the first group and processes the second group.

Подробнее
08-03-2012 дата публикации

Method and apparatus for handling critical blocking of store-to-load forwarding

Номер: US20120059971A1
Принадлежит: Advanced Micro Devices Inc

The present invention provides a method and apparatus for handling critical blocking of store-to-load forwarding. One embodiment of the method includes recording a load that matches an address of a store in a store queue before the store has valid data. The load is blocked because the store does not have valid data. The method also includes replaying the load in response to the store receiving valid data so that the valid data is forwarded from the store queue to the load.

Подробнее
22-03-2012 дата публикации

Constant Buffering for a Computational Core of a Programmable Graphics Processing Unit

Номер: US20120069033A1
Принадлежит: Via Technologies Inc

Embodiments of the present disclosure are directed to graphics processing systems, comprising: a plurality of execution units, wherein one of the execution units is configurable to process a thread corresponding to a rendering context, wherein the rendering context comprises a plurality of constants with a priority level; a constant buffer configurable to store the constants of the rendering context into a plurality of slot in a physical storage space; and an execution unit control unit configurable to assign the thread to one of the execution units; a constant buffer control unit providing a translation table for the rendering context to map the corresponding constants into the slots of the physical storage space. Comparable methods are also disclosed.

Подробнее
05-04-2012 дата публикации

Tracking written addresses of a shared memory of a multi-core processor

Номер: US20120084498A1
Принадлежит: LSI Corp

Described embodiments provide a method of controlling processing flow in a network processor having one or more processing modules. A given one of the processing modules loads a script into a compute engine. The script includes instructions for the compute engine. The given one of the processing modules loads a register file into the compute engine. The register file includes operands for the instructions of the loaded script. A tracking vector of the compute engine is initialized to a default value, and the compute engine executes the instructions of the loaded script based on the operands of the loaded register file. The compute engine updates corresponding portions of the register file with updated data corresponding to the executed script. The tracking vector tracks the updated portions of the register file. The compute engine provides the tracking vector and the updated register file to the given one of the processing modules.

Подробнее
03-05-2012 дата публикации

Out-of-order load/store queue structure

Номер: US20120110280A1
Автор: Christopher D. Bryant
Принадлежит: Advanced Micro Devices Inc

The present invention provides a method and apparatus for supporting embodiments of an out-of-order load/store queue structure. One embodiment of the apparatus includes a first queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes one or more additional queues for storing memory operation in response to completion of a memory operation. The embodiment of the apparatus is configured to remove the memory operation from the first queue in response to the completion.

Подробнее
17-05-2012 дата публикации

Retirement serialisation of status register access operations

Номер: US20120124340A1
Автор: James Nolan Hardage
Принадлежит: ARM LTD

A processor 2 for performing out-of-order execution of a stream of program instructions includes a special register access pipeline for performing status access instructions accessing a status register 20 . In order to serialise these status access instructions relative to other instructions within the system access timing control circuitry 32 permits dispatch of other instructions to proceed but controls the commit queue and the result queue such that no program instructions in program order succeeding the status access instruction are permitted to complete until after a trigger state has been detected in which all program instructions preceding in program order the status access instruction have been performed and made any updates to the architectural state. This is followed by the performance of the status access instruction itself.

Подробнее
07-06-2012 дата публикации

Unified scheduler for a processor multi-pipeline execution unit and methods

Номер: US20120144173A1
Принадлежит: Advanced Micro Devices Inc

A unified scheduler for a processor execution unit and methods are disclosed for providing faster throughput of micro-instruction/operation execution with respect to a multi-pipeline processor execution unit. In one example, an execution unit has a plurality of pipelines that operate at a predetermined clock rate, each pipeline configured to process a selected subset of microinstructions. The execution unit has a scheduler that includes a unified queue configured to queue microinstructions for all of the pipelines and a picker configured to direct a queued microinstruction to an appropriate pipeline for processing based on an indication of readiness for picking. Preferably, when all of the pipelines are ready to receive a microinstruction for processing and there is at least one microinstruction queued that is ready for picking for each pipeline, the picker picks and directs a queued microinstructions to each of the pipelines in a single clock cycle.

Подробнее
14-06-2012 дата публикации

Cache Line Fetching and Fetch Ahead Control Using Post Modification Information

Номер: US20120151150A1
Принадлежит: LSI Corp

A method is provided for performing cache line fetching and/or cache fetch ahead in a processing system including at least one processor core and at least one data cache operatively coupled with the processor. The method includes the steps of: retrieving post modification information from the processor core and a memory address corresponding thereto; and the processing system performing, as a function of the post modification information and the memory address retrieved from the processor core, cache line fetching and/or cache fetch ahead control in the processing system.

Подробнее
26-07-2012 дата публикации

Predicting a result for an actual instruction when processing vector instructions

Номер: US20120191957A1
Автор: Jeffry E. Gonion
Принадлежит: Apple Inc

The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters an Actual instruction. Upon determining that a result of the Actual instruction is predictable, the processor dispatches a prediction micro-operation associated with the Actual instruction, wherein the prediction micro-operation generates a predicted result vector for the Actual instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.

Подробнее
26-07-2012 дата публикации

Sharing a fault-status register when processing vector instructions

Номер: US20120192005A1
Автор: Jeffry E. Gonion
Принадлежит: Apple Inc

The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition. However, the processor does not update bit positions in the architectural FSR upon encountering a fault condition for the first vector instruction and the subsequent vector instructions.

Подробнее
09-08-2012 дата публикации

Embedded opcode within an intermediate value passed between instructions

Номер: US20120204006A1
Автор: Jorn Nystad
Принадлежит: ARM LTD

A data processing system 2 is used to evaluate a data processing function by executing a sequence of program instructions including an intermediate value generating instruction Inst 0 and an intermediate value consuming instruction Inst 1 . In dependence upon one or more input operands to the evaluation, an embedded opcode within the intermediate value passed between the intermediate value generating instruction and the intermediate value consuming instruction may be set to have a value indicating that a substitute instruction should be used in place of the intermediate value consuming instruction. The instructions may be floating point instructions, such as a floating point power instruction evaluating the data processing function a b .

Подробнее
16-08-2012 дата публикации

Running unary operation instructions for processing vectors

Номер: US20120210099A1
Автор: Jeffry E. Gonion
Принадлежит: Apple Inc

During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.

Подробнее
27-09-2012 дата публикации

Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines

Номер: US20120246450A1
Автор: Mohammad Abdallah
Принадлежит: Soft Machines Inc

A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.

Подробнее
27-09-2012 дата публикации

Method and apparatus for enhancing scheduling in an advanced microprocessor

Номер: US20120246453A1

Apparatus and a method for causing scheduler software to produce code which executes more rapidly by ignoring some of the normal constraints placed on its scheduling operations and simply scheduling certain instructions to run as fast as possible, raising an exception if the scheduling violates a scheduling constraint, and determining steps to be taken for correctly executing each set of instructions about which an exception is raised.

Подробнее
18-10-2012 дата публикации

Allocation of counters from a pool of counters to track mappings of logical registers to physical registers for mapper based instruction executions

Номер: US20120265969A1
Принадлежит: International Business Machines Corp

A computer system assigns a particular counter from among a plurality of counters currently in a counter free pool to count a number of mappings of logical registers from among a plurality of logical registers to a particular physical register from among a plurality of physical registers, responsive to an execution of an instruction by a mapper unit mapping at least one logical register from among the plurality of logical registers to the particular physical register, wherein the number of the plurality of counters is less than a number of the plurality of physical registers. The computer system, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool.

Подробнее
03-01-2013 дата публикации

Processing vectors using wrapping add and subtract instructions in the macroscalar architecture

Номер: US20130007422A1
Автор: Jeffry E. Gonion
Принадлежит: Apple Inc

Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.

Подробнее
14-02-2013 дата публикации

Word line late kill in scheduler

Номер: US20130042089A1
Принадлежит: Advanced Micro Devices Inc

A method for picking an instruction for execution by a processor includes providing a multiple-entry vector, each entry in the vector including an indication of whether a corresponding instruction is ready to be picked. The vector is partitioned into equal-sized groups, and each group is evaluated starting with a highest priority group. The evaluating includes logically canceling all other groups in the vector when a group is determined to include an indication that an instruction is ready to be picked, whereby the vector only includes a positive indication for the one instruction that is ready to be picked.

Подробнее
21-02-2013 дата публикации

Memory Management Unit Tag Memory with CAM Evaluate Signal

Номер: US20130046927A1
Автор: Ravindraraj Ramaraju
Принадлежит: Individual

A method and data processing system for accessing an entry in a memory array by placing a tag memory unit ( 114 ) in parallel with an operand adder circuit ( 112 ) to enable tag lookup and generation of speculative way hit/miss information ( 126 ) directly from the operands ( 111, 113 ) without using the output sum of the operand adder. PGZ-encoded address bits ( 0:51 ) from the operands ( 111, 113 ) are applied with a carry-out value (Cout 48 ) to a content-addressable memory array ( 114 ) having compact bitcells with embedded partial A+B=K logic to generate two speculative hit/miss signals under control of a delayed evaluate signal. A sum value (EA 51 ) computed from the least significant base and offset address bits determines which of the speculative hit/miss signals is selected for output ( 126 ).

Подробнее
28-03-2013 дата публикации

Method, apparatus and instructions for parallel data conversions

Номер: US20130080742A1
Автор: Gopalan Ramanujam
Принадлежит: Individual

Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.

Подробнее
04-04-2013 дата публикации

Tracking operand liveliness information in a computer system and performance function based on the liveliness information

Номер: US20130086367A1
Принадлежит: International Business Machines Corp

Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.

Подробнее
04-04-2013 дата публикации

Generating compiled code that indicates register liveness

Номер: US20130086548A1
Принадлежит: International Business Machines Corp

Object code is generated from an internal representation that includes a plurality of source operands. The generating includes performing for each source operand in the internal representation determining whether a last use has occurred for the source operand. The determining includes accessing a data flow graph to determine whether all uses of a live range have been emitted. If it is determined that a last use has occurred for the source operand, an architected resource associated with the source operand is marked for last-use indication. A last-use indication is then generated for the architected resource. Instructions and the last-use indications are emitted into the object code.

Подробнее
25-04-2013 дата публикации

Data processing device and method, and processor unit of same

Номер: US20130103930A1
Автор: Takashi Horikawa
Принадлежит: NEC Corp

A processor unit ( 200 ) includes: cache memory ( 210 ); an instruction execution unit ( 220 ); a processing unit ( 230 ) that detects fact that a thread enters an exclusive control section which is specified in advance to become a bottleneck; a processing unit ( 240 ) that detects a fact that the thread exits the exclusive control section; and an execution flag ( 250 ) that indicates whether there is the thread that is executing a process in the exclusive control section based on detection results. The cache memory ( 210 ) temporarily stores a priority flag in each cache entry, and the priority flag indicates whether data is to be used during execution in the exclusive control section. When the execution flag ( 250 ) is set, the processor unit ( 200 ) sets the priority flag that belongs to an access target of cache entries. The processor unit ( 200 ) leaves data used in the exclusive control section in the cache memory by determining a replacement target of cache entries using the priority flag when a cache miss occurs.

Подробнее
09-05-2013 дата публикации

Low overhead operation latency aware scheduler

Номер: US20130117543A1
Принадлежит: Advanced Micro Devices Inc

A method and apparatus for processing multi-cycle instructions include picking a multi-cycle instruction and directing the picked multi-cycle instruction to a pipeline. The pipeline includes a pipeline control configured to detect a latency and a repeat rate of the picked multi-cycle instruction and to count clock cycles based on the detected latency and the detected repeat rate. The method and apparatus further include detecting the repeat rate and the latency of the picked multi-cycle instruction, and counting clock cycles based on the detected repeat rate and the latency of the picked multi-cycle instruction.

Подробнее
16-05-2013 дата публикации

Processor with power control via instruction issuance

Номер: US20130124900A1
Принадлежит: Advanced Micro Devices Inc

Methods and apparatuses are provided for power control in a processor. The apparatus comprises a plurality of operational units arranged as a group of operational units. A power consumption monitor determines when cumulative power consumption of the group of operational units exceeds a threshold (e.g., either or both of the cumulative power threshold and the cumulative power rate threshold) during a time interval, after which a filter for issuing instructions to the group of operational units suspends instruction issuance to the group of operational units for the remainder of the time interval. The method comprises monitoring cumulative power consumption by a group of operational units within a processor over a time interval. If the cumulative power consumption of the group of operational units exceeds the threshold, instruction issuance to the group of operational units is suspended for the remainder of the time interval.

Подробнее
06-06-2013 дата публикации

System and method for performing shaped memory access operations

Номер: US20130145124A1
Принадлежит: Individual

One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.

Подробнее
13-06-2013 дата публикации

Micro architecture for indirect access to a register file in a processor

Номер: US20130151818A1
Принадлежит: International Business Machines Corp

A method and system for improving performance and latency of instruction execution within an execution pipeline in a processor. The method includes finding, while decoding an instruction, a pointer register used by the instruction; reading the pointer register; validating a pointer register entry; reading, if the pointer register entry is valid, a register file entry; validating a register file entry; validating, if the register file entry is invalid, a valid register file entry wherein the valid register file entry is in the register file's future file; bypassing, if the valid register file entry is valid, a valid register file value from the register file's future file to the execution pipeline wherein the valid register file value is in the valid register file entry; and executing the instruction using the valid register file value; wherein at least one of the steps is carried out using a computer device.

Подробнее
04-07-2013 дата публикации

Processor for Executing Wide Operand Operations Using a Control Register and a Results Register

Номер: US20130173888A1
Принадлежит: Microunity Systems Engineering Inc

A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Подробнее
18-07-2013 дата публикации

Use of Loop and Addressing Mode Instruction Set Semantics to Direct Hardware Prefetching

Номер: US20130185516A1
Принадлежит: Qualcomm Inc

Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an auto-increment-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines.

Подробнее
12-09-2013 дата публикации

Method, apparatus and instructions for parallel data conversions

Номер: US20130238879A1
Автор: Gopalan Ramanujam
Принадлежит: Individual

Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits. The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.

Подробнее
03-10-2013 дата публикации

Methods and apparatus to avoid surges in di/dt by throttling gpu execution performance

Номер: US20130262831A1
Принадлежит: Nvidia Corp

Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

Подробнее
03-10-2013 дата публикации

Instruction Scheduling for Reducing Register Usage

Номер: US20130262832A1
Принадлежит: Advanced Micro Devices Inc

A method, computer program product, and system are provided for scheduling a plurality of instructions in a computing system. For example, the method can generate a plurality of instruction lineages, in which the plurality of instruction lineages is assigned to one or more registers. Each of the plurality of instruction lineages has at least one node representative of an instruction from the plurality of instructions. The method can also determine a node order based on respective priority values associated with each of the nodes. Further, the method can include scheduling the plurality of instructions based on the node order and the one or more registers assigned to the one or more registers.

Подробнее
03-10-2013 дата публикации

Memory Disambiguation Hardware To Support Software Binary Translation

Номер: US20130262838A1
Принадлежит: Intel Corp

A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.

Подробнее
31-10-2013 дата публикации

Method and device for determining parallelism of tasks of a program

Номер: US20130290975A1
Принадлежит: Intel Corp

A method and device for determining parallelism of tasks of a program comprises generating a task data structure to track the tasks and assigning a node of the task data structure to each executing task. Each node includes a task identification number and a wait number. The task identification number uniquely identifies the corresponding task from other currently executing tasks and the wait number corresponds to the task identification number of a node corresponding to the last descendant task of the corresponding task that was executed prior to a wait command. The parallelism of the tasks is determined by comparing the relationship between the tasks.

Подробнее
07-11-2013 дата публикации

Semiconductor device

Номер: US20130297916A1
Принадлежит: Renesas Electronics Corp

A related art semiconductor device suffers from a problem that a processing capacity is decayed by switching an occupied state for each partition. A semiconductor device according to the present invention includes an execution unit that executes an arithmetic instruction, and a scheduler including multiple first setting registers each defining a correspondence relationship between hardware threads and partitions, and generates a thread select signal on the basis of a partition schedule and a thread schedule. The scheduler outputs a thread select signal designating a specific hardware thread without depending on the thread schedule as the partition indicated by a first occupation control signal according to a first occupation control signal output when the execution unit executes a first occupation start instruction.

Подробнее
14-11-2013 дата публикации

MFENCE and LFENCE Micro-Architectural Implementation Method and System

Номер: US20130305018A1
Принадлежит: Individual

A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.

Подробнее
14-11-2013 дата публикации

Speeding Up Younger Store Instruction Execution after a Sync Instruction

Номер: US20130305022A1
Принадлежит: International Business Machines Corp

Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction.

Подробнее
28-11-2013 дата публикации

Micro-Staging Device and Method for Micro-Staging

Номер: US20130318194A1
Автор: Jeffrey L. Timbs
Принадлежит: Dell Products LP

A micro-staging device has a wireless interface module for detecting a first data request that indicates a presence of a user and an application processor that establishes a network connection to a remote data center. The micro-staging device further allocates a portion of storage in a cache memory storage device for storing pre-fetched workflow data objects associated with the detected user.

Подробнее
28-11-2013 дата публикации

Predicting and avoiding operand-store-compare hazards in out-of-order microprocessors

Номер: US20130318330A1
Принадлежит: International Business Machines Corp

A method and information processing system manage load and store operations that can be executed out-of-order. At least one of a load instruction and a store instruction is executed. A determination is made that an operand store compare hazard has been encountered. An entry within an operand store compare hazard prediction table is created based on the determination. The entry includes at least an instruction address of the instruction that has been executed and a hazard indicating flag associated with the instruction. The hazard indicating flag indicates that the instruction has encountered the operand store compare hazard. When a load instruction is associated with the hazard indicating flag, the load instruction becomes dependent upon all store instructions associated with a substantially similar hazard indicating flag.

Подробнее
05-12-2013 дата публикации

ISSUING INSTRUCTIONS TO EXECUTION PIPELINES BASED ON REGISTER-ASSOCIATED PREFERENCES, AND RELATED INSTRUCTION PROCESSING CIRCUITS, PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA

Номер: US20130326197A1
Принадлежит: QUALCOMM INCORPORATED

Issuing instructions to execution pipelines based on register-associated preferences and related instruction processing circuits, systems, methods, and computer-readable media are disclosed. In one embodiment, an instruction is detected in an instruction stream. Upon determining that the instruction specifies at least one source register, an execution pipeline preference(s) is determined based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table, and the instruction is issued to an execution pipeline based on the execution pipeline preference(s). Upon a determination that the instruction specifies at least one target register, at least one pipeline indicator associated with the at least one target register in the pipeline issuance table is updated based on the execution pipeline to which the instruction is issued. In this manner, optimal forwarding of instructions may be facilitated, thus improving processor performance. 1. A method for processing computer instructions , comprising:detecting an instruction in an instruction stream; determining at least one execution pipeline preference for the instruction based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table; and', 'issuing the instruction to an execution pipeline based on the at least one execution pipeline preference; and, 'upon determining that the instruction specifies at least one source register 'updating at least one pipeline indicator associated with the at least one target register in the pipeline issuance table based on the execution pipeline to which the instruction is issued.', 'upon determining that the instruction specifies at least one target register2. The method of claim 1 , wherein issuing the instruction to the execution pipeline comprises issuing the instruction to a preferred execution pipeline indicated by the at least one execution pipeline preference.3. The method of ...

Подробнее
05-12-2013 дата публикации

Integrated circuit devices and methods for scheduling and executing a restricted load operation

Номер: US20130326200A1
Принадлежит: FREESCALE SEMICONDUCTOR INC

An integrated circuit device comprising at least one instruction processing module arranged to compare validation data with data stored within a target register upon receipt of a load validation instruction. Wherein, the instruction processing module is further arranged to proceed with execution of a next sequential instruction if the validation data matches the stored data within the target register, and to load the validation data into the target register if the validation data does not match the stored data within the target register.

Подробнее
12-12-2013 дата публикации

Computer system

Номер: US20130332925A1
Принадлежит: Renesas Electronics Corp

There is a need to provide a computer system capable of preventing a failure from propagating and recovering from the failure. VCPU# 0 through VCPU# 2 each operate different OS's. VCPU# 0 operates a management OS that manages the other OS's. When notified of bus error occurrence, a virtual CPU execution portion 201 operates only VCPU# 0 regardless of an execution sequence stored in schedule register A. VCPU# 0 reinitializes a bus where an error occurred.

Подробнее
19-12-2013 дата публикации

Selectively controlling instruction execution in transactional processing

Номер: US20130339328A1
Принадлежит: International Business Machines Corp

Execution of instructions in a transactional environment is selectively controlled. A TRANSACTION BEGIN instruction initiates a transaction and includes controls that selectively indicate whether certain types of instructions are permitted to execute within the transaction. The controls include one or more of an allow access register modification control and an allow floating point operation control.

Подробнее
19-12-2013 дата публикации

NONTRANSACTIONAL STORE INSTRUCTION

Номер: US20130339669A1

A NONTRANSACTIONAL STORE instruction, executed in transactional execution mode, performs stores that are retained, even if a transaction associated with the instruction aborts. The stores include user-specified information that may facilitate debugging of an aborted transaction. 1. A method of executing an instruction within a computing environment , said method comprising: an operation code to specify a nontransactional store operation;', 'a first operand; and', 'a second operand to designate a location for the first operand; and, 'obtaining, by a processor, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture, the machine instruction comprising 'nontransactionally placing the first operand at the location specified by the second operand, wherein information stored at the second operand is retained despite an abort of a transaction associated with the machine instruction, and wherein the nontransactionally placing is delayed until an end of transactional execution mode of the processor.', 'executing, by the processor, the machine instruction, the executing comprising2. The method of claim 1 , wherein the end of transactional execution mode results from an end of an outermost transaction associated with the machine instruction or an abort condition.3. The method of claim 1 , wherein multiple nontransactional stores appear as concurrent stores to other processors.4. The method of claim 1 , further comprising:determining whether the processor is in transactional execution mode;based on the processor being in the transactional execution mode, determining whether the transaction is a constrained transaction or a nonconstrained transaction; andbased on the transaction being a nonconstrained transaction, continuing execution of the machine instruction.5. The method of claim 4 , wherein based on the transaction being a constrained transaction claim 4 , providing a program exception and ...

Подробнее
19-12-2013 дата публикации

Next Instruction Access Intent Instruction

Номер: US20130339672A1
Принадлежит: International Business Machines Corp

Executing a Next Instruction Access Intent instruction by a computer. The processor obtains an access intent instruction indicating an access intent. The access intent is associated with an operand of a next sequential instruction. The access intent indicates usage of the operand by one or more instructions subsequent to the next sequential instruction. The computer executes the access intent instruction. The computer obtains the next sequential instruction. The computer executes the next sequential instruction, which comprises based on the access intent, adjusting one or more cache behaviors for the operand of the next sequential instruction.

Подробнее
19-12-2013 дата публикации

Restricted instructions in transactional execution

Номер: US20130339685A1
Принадлежит: International Business Machines Corp

Restricted instructions are prohibited from execution within a transaction. There are classes of instructions that are restricted regardless of type of transaction: constrained or nonconstrained. There are instructions only restricted in constrained transactions, and there are instructions that are selectively restricted for given transactions based on controls specified on instructions used to initiate the transactions.

Подробнее
26-12-2013 дата публикации

PIPELINING OUT-OF-ORDER INSTRUCTIONS

Номер: US20130346729A1
Принадлежит:

Systems, methods and computer program product provide for pipelining out-of-order instructions. Embodiments comprise an instruction reservation station for short instructions of a short latency type and long instructions of a long latency type, an issue queue containing at least two short instructions of a short latency type, which are to be chained to match a latency of a long instruction of a long latency type, a register file, at least one execution pipeline for instructions of a short latency type and at least one execution pipeline for instructions of a long latency type; wherein results of the at least one execution pipeline for instructions of the short latency type are written to the register file, preserved in an auxiliary buffer, or forwarded to inputs of said execution pipelines. Data of the auxiliary buffer are written to the register file. 1. A method comprising:determining an instruction chain comprising at least a first instruction having a first latency and a second instruction having a second latency, the first latency and the second latency each being less than a third latency of a third instruction; andsubmitting the instruction chain to a first execution pipeline of a processor and the third instruction to a second execution pipeline of the processor, wherein execution of the instruction chain at least partially overlaps execution of the third instruction.2. The method according to claim 1 , further comprising writing a result of the instruction chain to a register file during a writeback slot for the third instruction.3. The method according to claim 1 , further comprising:determining whether the second instruction is dependent on data from the first instruction; andin response determining that the second instruction is dependent on data from the first instruction, forwarding the data from the first instruction to the second instruction.4. The method according to claim 3 , further comprising writing a result of the second instruction into a ...

Подробнее
26-12-2013 дата публикации

Arithmetic processing apparatus, and cache memory control device and cache memory control method

Номер: US20130346730A1
Автор: Naohiro Kiyota
Принадлежит: Fujitsu Ltd

An arithmetic processing apparatus includes a plurality of processors, each of the processors having an arithmetic unit and a cache memory. The processor includes an instruction port that holds a plurality of instructions accessing data of the cache memory, a first determination unit that validates a first flag when receiving an invalidation request for data in the cache memory, a cache index of a target address and a way ID of the received request match with a cache index of a designated address and a way ID of the load instruction, a second determination unit that validates a second flag when target data is transmitted due to a cache miss, and an instruction re-execution determination unit that instructs re-execution of an instruction subsequent to the load instruction when both the first flag and the second flag are validated at the time of completion of an instruction in the instruction port.

Подробнее
16-01-2014 дата публикации

Systems, apparatuses, and methods for performing a double blocked sum of absolute differences

Номер: US20140019713A1
Принадлежит: Intel Corp

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Подробнее
30-01-2014 дата публикации

Computer architecture with a hardware accumulator reset

Номер: US20140033203A1
Принадлежит: Mobileye Technologies Ltd

A processor with an accumulator. An event is selected to produce one or more selected events. A reset signal to the accumulator is generated responsive to the selected event. Responsive to the reset signal, the accumulator is reset to zero or another initial value while avoiding breaking pipelined execution of the processor.

Подробнее
06-02-2014 дата публикации

Space efficient checkpoint facility and technique for processor with integrally indexed register mapping and free-list arrays

Номер: US20140040595A1
Автор: Thang M. Tran
Принадлежит: FREESCALE SEMICONDUCTOR INC

A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.

Подробнее
06-02-2014 дата публикации

DATA PROCESSOR

Номер: US20140040600A1
Автор: Arakawa Fumio
Принадлежит:

It is to provide a data processor which maintains compatibility with an existing instruction set such as a 16-bit fixed-length instruction set and in which an instruction code space is extended. 1. A data processor with multiple instruction pipelines , comprising:a global instruction queue that sequentially accumulates multiple instruction codes that are fetched in parallel; anda dispatch circuit that conducts a search with respect to the multiple instruction codes that are output from the global instruction queue, for every instruction code type, and distributes the instruction code to every instruction pipeline based on a result of the search,wherein an instruction set is retained that additionally defines as a separate instruction a prohibition combination pattern resulting from the multiple specific instruction codes, by which original processing of the individual instruction code is prohibited.2. The data processor according to claim 1 , whereinthe instruction that is additionally defined by the prohibition combination pattern of the multiple specific instruction codes is limited to an instruction type that is the same as the instruction code defined only with a latter-half instruction code pattern of the combination pattern.3. The data processor according to claim 2 , whereinin the prohibition combination pattern of the multiple specific instruction codes, a former half and a latter half of the instruction code pattern are different from each other.4. The data processor according to claim 3 , whereinthe dispatch circuit outputs the detected instruction code as being valid and outputs the instruction code that immediately precedes the detected instruction code, as a prefix code candidate when an intended instruction code of the instruction code type is detected in a search unit of the multiple instruction codes, a search target, outputs the front instruction code as being valid when the intended instruction code of the instruction code type is detected in the ...

Подробнее
27-02-2014 дата публикации

Allocation of counters from a pool of counters to track mappings of logical registers to physical registers for mapper based instruction executions

Номер: US20140059329A1
Принадлежит: International Business Machines Corp

A computer system assigns a particular counter from among a plurality of counters currently in a counter free pool to count a number of mappings of logical registers from among a plurality of logical registers to a particular physical register from among a plurality of physical registers, responsive to an execution of an instruction by a mapper unit mapping at least one logical register from among the plurality of logical registers to the particular physical register, wherein the number of the plurality of counters is less than a number of the plurality of physical registers. The computer system, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool.

Подробнее
27-02-2014 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20140059333A1
Принадлежит: Intel Corp

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.

Подробнее
06-03-2014 дата публикации

Instruction insertion in state machine engines

Номер: US20140068234A1
Автор: David R. Brown
Принадлежит: Micron Technology Inc

State machine engines are disclosed, including those having an instruction insertion register. One such instruction insertion register may provide an initialization instruction, such as to prepare a state machine engine for data analysis. An instruction insertion register may also provide an instruction in an attempt to resolve an error that occurs during operation of a state machine engine. An instruction insertion register may also be used to debug a state machine engine, such as after the state machine experiences a fatal error.

Подробнее
13-03-2014 дата публикации

Identifying load-hit-store conflicts

Номер: US20140075158A1
Принадлежит: International Business Machines Corp

A computing device identifies a load instruction and store instruction pair that causes a load-hit-store conflict. A processor tags a first load instruction that instructs the processor to load a first data set from memory. The processor stores an address at which the first load instruction is located in memory in a special purpose register. The processor determines where the first load instruction has a load-hit-store conflict with a first store instruction. If the processor determines the first load instruction has a load-hit store conflict with the first store instruction, the processor stores an address at which the first data set is located in memory in a second special purpose register, tags the first data set being stored by the first store instruction, stores an address at which the first store instruction is located in memory in a third special purpose register and increases a conflict counter.

Подробнее
27-03-2014 дата публикации

Memory address aliasing detection

Номер: US20140089271A1
Принадлежит: Intel Corp

Method and apparatus to efficiently detect violations of data dependency relationships. A memory address associated with a computer instruction may be obtained. A current state of the memory address may be identified. The current state may include whether the memory address is associated with a read or a store instruction, and whether the memory address is associated with a set or a check. A previously accumulated state associated with the memory address may be retrieved from a data structure. The previously accumulated state may include whether the memory address was previously associated with a read or a store instruction, and whether the memory address was previously associated with a set or a check. If a transition from the previously accumulated state to the current state is invalid, a failure condition may be signaled.

Подробнее
03-04-2014 дата публикации

CROSS-PIPE SERIALIZATION FOR MULTI-PIPELINE PROCESSOR

Номер: US20140095836A1

Embodiments relate to cross-pipe serialization for a multi-pipeline computer processor. An aspect includes receiving, by a processor, the processor comprising a first pipeline, the first pipeline comprising a serialization pipeline, and a second pipeline, the second pipeline comprising a non-serialization pipeline, a request comprising a first subrequest for the first pipeline and a second subrequest for the second pipeline. Another aspect includes completing the first subrequest by the first pipeline. Another aspect includes, based on completing the first subrequest by the first pipeline, sending cross-pipe unlock signal from the first pipeline to the second pipeline. Yet another aspect includes, based on receiving the cross-pipe unlock signal by the second pipeline, completing the second subrequest by the second pipeline. 1. A computer system for cross-pipe serialization for a multi-pipeline computer processor , the system comprising: receiving, by the processor, a request comprising a first subrequest for the first pipeline and a second subrequest for the second pipeline;', 'completing the first subrequest by the first pipeline;', 'based on completing the first subrequest by the first pipeline, sending a cross-pipe unlock signal from the first pipeline to the second pipeline;', 'based on receiving the cross-pipe unlock signal by the second pipeline, completing the second subrequest by the second pipeline., 'a processor, the processor comprising a first pipeline, the first pipeline comprising a serialization pipeline, and a second pipeline, the second pipeline comprising a non-serialization pipeline, the system configured to perform a method comprising2. The computer system of claim 1 , wherein completing the first subrequest by the first pipeline comprising loading a first instance of a shared resource claim 1 , and wherein completing the second subrequest by the second pipeline comprises loading a second instance of the shared resource.3. The computer system of ...

Подробнее
03-04-2014 дата публикации

Systems, Apparatuses, and Methods for Performing Conflict Detection and Broadcasting Contents of a Register to Data Element Positions of Another Register

Номер: US20140095843A1
Принадлежит: Intel Corp

Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.

Подробнее
03-04-2014 дата публикации

Tracking Operand Liveliness Information in a Computer System and Performing Function Based on the Liveliness Information

Номер: US20140095848A1

Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module. 1. A computer system for maintaining liveness information for executing programs , the system comprising:processor configured to communicate with a main storage, the processor comprising an instruction fetcher, an instruction optimizer and one or more execution units for executing optimized instructions, the processor configured to perform a method comprising:maintaining, by a processor, current operand state information, the current operand state information for indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA), the first program module currently being executed by the processor;accessing a current operand, by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.2. The computer system according to claim 1 , further comprising:based on the current operand being disabled, the accessing comprising at least one of a) and b) comprising:a) returning an architecture-specified value, and where the architecture-specified value is any one of an undefined ...

Подробнее
06-01-2022 дата публикации

COMMAND-AWARE HARDWARE ARCHITECTURE

Номер: US20220004388A1
Принадлежит: Lilac Cloud, Inc.

In an embodiment, responsive to determining: (a) a first command is not of a particular command type associated with one or more hardware modules associated with a particular routing node, or (b) at least one argument used for executing the first command is not available: transmitting the first command to another routing node in the hardware routing mesh. Upon receiving a second command of the command bundle and determining: (a) the second command is of the particular command type associated with the hardware module(s), and (b) arguments used by the second command are available: transmitting the second command to the hardware module(s) associated with the particular routing node for execution by the hardware module(s). Thereafter, the command bundle is modified based on execution of the second command by at least refraining from transmitting the second command of the command bundle to any other routing nodes in the hardware routing mesh. 1. A hardware routing mesh , comprising:a plurality of routing nodes associated respectively with a plurality of hardware modules, the plurality of routing nodes comprising a first routing node communicatively coupled to a first hardware module of the plurality of hardware modules, receiving a first command of a command bundle being streamed through the plurality of routing nodes, wherein the command bundle is modified based on execution of commands as the command bundle is streamed through the plurality of routing nodes;', 'responsive to determining that (a) the first command is not of a particular command type associated with the first hardware module, or (b) at least one argument used for executing the first command is not received in association with the first command: transmitting the first command to a second routing node;', 'receiving a second command of the command bundle being streamed through the plurality of routing nodes;', 'responsive to determining that (a) the second command is of the particular command type ...

Подробнее
06-01-2022 дата публикации

Method and Apparatus of Providing a Function as a Service (FAAS) Deployment of an Application

Номер: US20220004422A1
Принадлежит:

It is disclosed a method and an apparatus () of providing a function as a service, faas, deployment of an application. A deployment unit is generated () per group of application blocks, where said deployment unit comprises said group of application blocks, and an implementation of function invocation for functions being accessed by groups of application blocks. Function invocations of the group of application blocks are constrained or bound () to libraries of supporting implementations. Deployment units are provided () together with the element invocations attached to said libraries, to a lifecycle manager of a faas platform, whereby the faas platform implements the faas deployment of said application the performance targets of which, being related to the groups of application blocks. This disclosure enables a developer to adjust the performance of an application without having to change the logic of application implementations. 117-. (canceled)18. A method of providing a function as a service (FAAS) deployment of an application comprising application blocks , the method comprising an application builder component:generating a deployment unit per group of application blocks; where the deployment unit comprises, in addition to the group of application blocks, an implementation of function invocation for each function being accessed by each group of application blocks; where generating the deployment unit comprises constraining function invocations of the group of application blocks to one or more libraries of implementations; andproviding the deployment units, together with the function invocations attached to the one or more libraries of implementations, to a lifecycle manager of a FAAS platform that is connected to the application builder component;whereby the FAAS platform implements the FAAS deployment of the application the performance targets of which are related to the groups of application blocks.19. The method of claim 18 , wherein the groups of application ...

Подробнее
06-01-2022 дата публикации

TRANSACTION-ENABLED METHODS FOR PROVIDING PROVABLE ACCESS TO A DISTRIBUTED LEDGER WITH A TOKENIZED INSTRUCTION SET

Номер: US20220004927A1
Автор: Cella Charles Howard
Принадлежит:

Transaction-enabled methods for providing provable access to a distributed ledger with a tokenized instruction set for polymer production processes are described. A method may include accessing a distributed ledger comprising an instruction set for a polymer production process and tokenizing the instruction set. The method may further include interpreting an instruction set access request and providing a provable access to the instruction set. The method may further include providing commands to a production tool of the polymer production process and recording the transaction on the distributed ledger. 1. A method , comprising:accessing a distributed ledger comprising an instruction set, wherein the instruction set comprises an instruction set for a polymer production process;tokenizing the instruction set;interpreting an instruction set access request;in response to the instruction set access request, providing a provable access to the instruction set;providing commands to a production tool of the polymer production process in response to the instruction set access request; andrecording a transaction on the distributed ledger in response to the providing commands to the production tool.2. The method of claim 1 , wherein the instruction set comprises an instruction set for a chemical synthesis subprocess of the polymer production process.3. The method of claim 2 , further comprising providing commands to a production tool of the chemical synthesis subprocess of the polymer production process in response to the instruction set access request and recording a transaction on the distributed ledger in response to the providing commands to the production tool of the chemical synthesis subprocess of the polymer production process.4. The method of claim 1 , wherein the instruction set comprises a field programmable gate array (FPGA) instruction set.5. The method of claim 1 , wherein the instruction set further includes an application programming interface (API).6. The ...

Подробнее
06-01-2022 дата публикации

SYSTEM AND METHOD FOR EFFICIENT MULTI-GPU RENDERING OF GEOMETRY BY SUBDIVIDING GEOMETRY

Номер: US20220005146A1
Автор: Cerny Mark E.
Принадлежит:

A method for graphics processing. The method including rendering graphics for an application using graphics processing units (GPUs). The method including using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry. The method including during the rendering of the image frame, subdividing one or more of the plurality of pieces of geometry into smaller pieces, and dividing the responsibility for rendering these smaller portions of geometry among the plurality of GPUs, wherein each of the smaller portions of geometry is processed by a corresponding GPU. The method including for those pieces of geometry that are not subdivided, dividing the responsibility for rendering the pieces of geometry among the plurality of GPUs, wherein each of these pieces of geometry is processed by a corresponding GPU. 1. A method for graphics processing , comprising:rendering graphics for an application using a plurality of graphics processing units (GPUs);using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry;during the rendering of the image frame, subdividing one or more of the plurality of pieces of geometry into smaller pieces, and dividing the responsibility for rendering these smaller portions of geometry among the plurality of GPUs, wherein each of the smaller portions of geometry is processed by a corresponding GPU, and;for those pieces of geometry that are not subdivided, dividing the responsibility for rendering the pieces of geometry among the plurality of GPUs, wherein each of these pieces of geometry is processed by a corresponding GPU.2. The method of claim 1 , wherein a process for the rendering the image frame includes a geometry analysis phase of rendering claim 1 , or a Z pre-pass phase of rendering claim 1 , or a geometry pass phase of rendering.3. The method of claim 2 , further comprising:during the geometry analysis phase of rendering, or Z pre-pass phase of ...

Подробнее
05-01-2017 дата публикации

VARIABLE LATENCY PIPE FOR INTERLEAVING INSTRUCTION TAGS IN A MICROPROCESSOR

Номер: US20170003969A1
Принадлежит:

Techniques disclosed herein describe a variable latency pipe for interleaving instruction tags in a processor. According to one embodiment presented herein, an instruction tag is associated with an instruction upon issue of the instruction from the issue queue. One of a plurality of positions in the latency pipe is determined. The pipe stores one or more instruction tags, each associated with a respective instruction. The pipe also stores the instruction tags in a respective position based on the latency of each respective instruction. The instruction tag is stored at the determined position in the pipe. 17-. (canceled)8. A processor , comprising:an issue queue configured to store an instruction therein and further configured to issue the instruction;a tagger configured to associate the instruction with an instruction tag upon the issue of the instruction from the issue queue; anda pipe having a plurality of positions being ordered from a head position to a tail position configured to determine one of the plurality of positions to store the instruction tag and further configured to store the instruction tag at the determined position, whereinthe pipe stores one or more instruction tags each associated with a respective instruction, the pipe stores the one or more instruction tags in a respective position based on the latency of each of the respective instructions, and the pipe determines the position of the instruction tag based on a latency of the instruction relative to the latency of each of the respective instructions.9. The processor of claim 8 , wherein the pipe is further configured to:broadcast an instruction tag stored at the tail position in the pipe; andremove the instruction tag from the pipe.10. The processor of claim 9 , wherein the broadcasted instruction tag wakes up an instruction in the issue queue that is dependent on an instruction associated with the broadcasted instruction tag.11. The processor of claim 9 , wherein the broadcasted instruction ...

Подробнее
07-01-2016 дата публикации

Systems And Methods For Processing Inline Constants

Номер: US20160004536A1
Принадлежит: Freescale Semiconductor Inc.

Disclosed is a digital processor comprising an instruction memory having a first input, a second input, a first output, and a second output. A program counter register is in communication with the first input of the instruction memory. The program counter register is configured to store an address of an instruction to be fetched. A data pointer register is in communication with the second input of the instruction memory. The data pointer register is configured to store an address of a data value in the instruction memory. An instruction buffer is in communication with the first output of the instruction memory. The instruction buffer is arranged to receive an instruction according to a value at the program counter register. A data buffer is in communication with the second output of the instruction memory. The data buffer is arranged to receive a data value according to a value at the data pointer register. 1. A digital processor , comprising:an instruction memory having a first input, a second input, a first output, and a second output;a program counter register in communication with the first input of the instruction memory, the program counter register configured to store an address of an instruction to be fetched;a data pointer register in communication with the second input of the instruction memory, the data pointer register configured to store an address of a data value in the instruction memory;an instruction buffer in communication with the first output of the instruction memory, the instruction buffer arranged to receive an instruction according to a value at the program counter register; anda data buffer in communication with the second output of the instruction memory, the data buffer arranged to receive a data value according to a value at the data pointer register.2. The digital processor of claim 1 , further comprising:a register file;an instruction decode function that receives the instruction from the instruction buffer, decodes the instruction, and ...

Подробнее
07-01-2016 дата публикации

COMMITTING HARDWARE TRANSACTIONS THAT ARE ABOUT TO RUN OUT OF RESOURCE

Номер: US20160004537A1
Принадлежит:

A transactional memory system determines whether a hardware transaction can be salvaged. A processor of the transactional memory system begins execution of a transaction in a transactional memory environment. Based on detection that an amount of available resource for transactional execution is below a predetermined threshold level, the processor determines whether the transaction can be salvaged. Based on determining that the transaction can not be salvaged, the processor aborts the transaction. Based on determining the transaction can be salvaged, the processor performs a salvage operation, wherein the salvage operation comprises one or more of: determining that the transaction can be brought to a stable state without exceeding the amount of available resource for transactional execution, and bringing the transaction to a stable state; and determining that a resource can be made available, and making the resource available. 1. A method for determining whether a hardware transaction can be salvaged , the method comprising:beginning execution, by a processor, of a transaction in a transactional memory environment;based on detection that an amount of available resource for transactional execution is below a predetermined threshold level, determining, by the processor, whether the transaction can be salvaged;based on determining that the transaction can not be salvaged, aborting, by the processor, the transaction; and a) determining, by the processor, that the transaction can be brought to a stable state without exceeding the amount of available resource for transactional execution, and bringing the transaction to a stable state; and', 'b) determining, by the processor, that a resource can be made available, and making the resource available., 'based on determining the transaction can be salvaged, performing, by the processor, a salvage operation, wherein the salvage operation comprises one or more of2. The method of claim 1 , further comprising claim 1 , based on ...

Подробнее
04-01-2018 дата публикации

Administering instruction tags in a computer processor

Номер: US20180004516A1
Принадлежит: International Business Machines Corp

Administering ITAGs in a computer processor, includes, for each instruction in a single-thread mode: incrementing a value of a wrap around counter; setting a wrap bit to a predefined value if incrementing the value causes the counter to wrap around; generating, in dependence upon the counter value and the wrap bit, an ITAG for the instruction, the ITAG comprising a bit string having a wrap bit and an index comprising the counter value; and, for each instruction in a multi-thread mode: incrementing the value of the wrap around counter; setting a wrap bit to a predefined value if incrementing the value causes the counter to wrap around; and generating, in dependence upon the counter value and the wrap bit, an ITAG for the instruction, the ITAG comprising a bit string having the wrap bit, a thread identifier, and an index comprising the counter value.

Подробнее
04-01-2018 дата публикации

OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING PRIORITIZED DEPENDENCY CHAIN RESOLUTION

Номер: US20180004527A1
Принадлежит:

Operation of a computer processor that includes: receiving a first instruction indicating a first target register; receiving, from an instruction fetch unit of the computer processor, a first instruction and a branch instruction; responsive to determining that the branch instruction is dependent upon a result of the first instruction, updating a priority value corresponding to the first instruction; and issuing, in dependence upon the priority value for the first instruction having a higher priority than a priority value for another instruction, the first instruction to an execution unit of the computer processor. 1. A method of operation of a computer processor , wherein the method comprises:receiving, from an instruction fetch unit of the computer processor, a first instruction and a branch instruction;responsive to determining that the branch instruction is dependent upon a result of the first instruction, updating a priority value corresponding to the first instruction; andissuing, in dependence upon the priority value for the first instruction having a higher priority than a priority value for another instruction, the first instruction to an execution unit of the computer processor.2. The method of claim 1 , wherein issuing the first instruction is further dependent upon the first instruction being a ready instruction among a plurality of ready instructions.3. The method of claim 2 , wherein the first instruction is a compare instruction claim 2 , and wherein execution of the compare instruction sets a condition code flag within a condition code register claim 2 , and wherein the branch instruction determines whether to branch in dependence upon a value of the condition code flag of the condition code register.4. The method of claim 3 , wherein determining that the branch instruction is dependent upon the result of the first instruction comprises decoding the branch instruction to determine that the branch instruction depends upon the condition code flag of the ...

Подробнее
04-01-2018 дата публикации

Advanced processor architecture

Номер: US20180004530A1
Автор: Martin Vorbach
Принадлежит: HYPERION CORE Inc

The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).

Подробнее
04-01-2018 дата публикации

TECHNIQUES FOR HYBRID COMPUTER THREAD CREATION AND MANAGEMENT

Номер: US20180004554A1
Принадлежит:

A technique for operating a computer system to support an application, a first application server environment, and a second application server environment includes intercepting a work request relating to the application issued to the first application server environment prior to execution of the work request. A thread adapted for execution in the first application server environment is created. A context is attached to the thread that non-disruptively modifies the thread into a hybrid thread that is additionally suitable for execution in the second application server environment. The hybrid thread is returned to the first application server environment. 1. A method of operating a computer system to support an application , a first application server environment , and a second application server environment , the method comprising:intercepting, by a request interceptor component executing on the computer system, a work request relating to the application issued to the first application server environment prior to execution of the work request;responsive to the request interceptor component, creating, using the computer system, a thread adapted for execution in the first application server environment by an executor component;responsive to the executor component, attaching to the thread, by a thread dispatcher component executing on the computer system, a context to non-disruptively modify the thread into a hybrid thread that is additionally suitable for execution in the second application server environment; andresponsive to the thread dispatcher component, returning the hybrid thread to the first application server environment by a catcher component executing on the computer system.2. The method of claim 1 , wherein the context comprises transactional control data of one of the application server environments claim 1 , security control data of one of the application server environments claim 1 , monitoring control data of one of the application server environments ...

Подробнее
07-01-2021 дата публикации

Energy Efficient Processor Core Architecture for Image Processor

Номер: US20210004232A1
Принадлежит:

An apparatus that includes a program controller to fetch and issue instructions is described. The apparatus includes an execution lane having at least one execution unit to execute the instructions. The execution lane is part of an execution lane array that is coupled to a two dimensional shift register array structure, wherein, execution lane s of the execution lane array are located at respective array locations and are coupled to dedicated registers at same respective array locations in the two-dimensional shift register array. 1. (canceled)2. A system comprising:one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: a plurality of random access memories, and', 'a plurality of execution lanes, wherein different groups of the execution lanes are assigned to use a different respective random access memory of the plurality of random access memories;, 'receiving an input program to be executed on a device comprisingdetermining that the input program specifies two or more execution lanes in a same group of the plurality of execution lanes to compete for different memory locations in a same random access memory of the plurality of random access memories; andin response, modifying the input program to generate multiple instructions that cause execution lanes within each group to access a respective random access memory sequentially.3. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of rows of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different rows of the execution lanes.4. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of columns of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different columns of the execution lanes.5. The system of claim 2 , wherein the ...

Подробнее
02-01-2020 дата публикации

HIGH PARALLELISM COMPUTING SYSTEM AND INSTRUCTION SCHEDULING METHOD THEREOF

Номер: US20200004514A1
Принадлежит:

A high parallelism computing system and instruction scheduling method thereof are disclosed. The computing system comprises: an instruction reading and distribution module for reading a plurality of types of instructions in a specific order, and distributing the acquired instructions to corresponding function modules according to the types; an internal buffer for buffering data and instructions for performing computation; a plurality of function modules each of which sequentially executes instructions of the present type distributed by the instruction reading and distribution module and reads the data from the internal buffer; and wherein the specific order is obtained by topologically sorting the instructions according to a directed acyclic graph consisting of the types and dependency relationships. By reading the instructions based on the topological sorting the directed acyclic graph constructed according to the types and dependency relationships, the deadlock caused by the instruction dependencies can be avoided by a relatively simple operation. 1. A system , comprising:an instruction reading and distribution module for reading a plurality of types of instructions in a specific order, and distributing the instructions to corresponding function modules according to the types;an internal buffer for buffering data and instructions for performing computation; anda plurality of function modules each of which sequentially executes instructions of a present type distributed by the instruction reading and distribution module and reads the data from the internal buffer; andwherein the specific order is obtained by topologically sorting the instructions according to a directed acyclic graph consisting of the types and dependency relationships.2. The system of claim 1 , wherein the directed acyclic graph simplify dependencies of a certain instruction on two or more instructions of another type into a direct dependency on the last instruction in the two or more instructions ...

Подробнее
02-01-2020 дата публикации

SYSTEMS AND METHODS TO PREDICT LOAD DATA VALUES

Номер: US20200004536A1
Принадлежит:

Disclosed embodiments relate to predicting load data. In one example, a processor a pipeline having stages ordered as fetch, decode, allocate, write back, and commit, a training table to store an address, predicted data, a state, and a count of instances of unchanged return data, and tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage to set the state of the first load instruction in the training table to the first state when the count reaches a first threshold. 1. A processor comprising:fetch and decode circuitry to fetch and decode load instructions;a pipeline having stages ordered as fetch, decode, allocate, write back, and commit;a training table to store, for each of a plurality of load instructions, an address, predicted data, a state, and a count of instances of unchanged return data; and when no match exists, adding a new entry reflecting the first load instruction; when a match exists, but has different predicted data than the data returned for the first load instruction, reset the count and set the state to a second state; and', 'when a matching entry with matching predicted data exists, increment the count and, when the incremented count reaches a first threshold, set the state to the first state., 'tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage by2. The processor of claim 1 , wherein claim 1 , when the predicted data is used to optimize execution during ...

Подробнее
02-01-2020 дата публикации

APPARATUSES, METHODS, AND SYSTEMS FOR CONDITIONAL OPERATIONS IN A CONFIGURABLE SPATIAL ACCELERATOR

Номер: US20200004538A1
Принадлежит:

Systems, methods, and apparatuses relating to conditional operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes an output buffer of a first processing element coupled to an input buffer of a second processing element via a first data path that is to send a first dataflow token from the output buffer of the first processing element to the input buffer of the second processing element when the first dataflow token is received in the output buffer of the first processing element; an output buffer of a third processing element coupled to the input buffer of the second processing element via a second data path that is to send a second dataflow token from the output buffer of the third processing element to the input buffer of the second processing element when the second dataflow token is received in the output buffer of the third processing element; a first backpressure path from the input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the input buffer of the second processing element; a second backpres sure path from the input buffer of the second processing element to the third processing element to indicate to the third processing element when storage is not available in the input buffer of the second processing element; and a scheduler of the second processing element to cause storage of the first dataflow token from the first data path into the input buffer of the second processing element when both the first backpres sure path indicates storage is available in the input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a first value. 1. An apparatus comprising:an output buffer of a first processing element coupled to an input buffer of a second processing element via a first data path that is ...

Подробнее
02-01-2020 дата публикации

System and method for prediction of multiple read commands directed to non-sequential data

Номер: US20200004540A1
Принадлежит: Western Digital Technologies Inc

Systems and methods for predicting read commands and pre-fetching data when a memory device is receiving random read commands to non-sequentially addressed data locations are disclosed. A limited length search sequence of prior read commands is generated and that search sequence is then converted into an index value in a predetermined set of index values. A history pattern match table having entries indexed to that predetermined set of index values contains a plurality of read commands that have previously followed the search sequence represented by the index value. The index value is obtained via application of a many-to-one algorithm to the search sequence. The index value obtained from the search sequence may be used to find, and pre-fetch data for, a plurality of next read commands in the table that previously followed a search sequence having that index value.

Подробнее
02-01-2020 дата публикации

Determining and predicting derived values

Номер: US20200004544A1
Принадлежит: International Business Machines Corp

A predicted value to be used in register-indirect branching is predicted. The predicted value is to be stored in one or more locations based on the prediction. An offset for a predicted derived value is obtained. The predicted derived value is to be used as a pointer to a reference data structure providing access to variables used in processing. The predicted derived value is generated using the predicted value and the offset. The predicted derived value is used to access the reference data structure during processing.

Подробнее
02-01-2020 дата публикации

Information processing device and non-transitory computer readable medium

Номер: US20200004545A1
Автор: Atsushi Monna
Принадлежит: Fuji Xerox Co Ltd

An information processing device includes a generation unit and a providing unit. The generation unit generates data corresponding to a process specified by a user. The providing provides, after generation of the data, identification information to be used by the user to issue an instruction for a post-process for the data generated by the generation unit.

Подробнее
02-01-2020 дата публикации

SHARED COMPARE LANES FOR DEPENDENCY WAKE UP IN A PAIR-BASED ISSUE QUEUE

Номер: US20200004546A1
Принадлежит:

An apparatus for shared compare lanes for dependency wakeup in a double issue queue includes a source dependency module that determines a number of source dependencies for two instructions to be paired in a row of a double issue queue of a processor. A source dependency includes an unavailable status of a dependent source for data required by the two instructions where the data is produced by another instruction. The apparatus includes a pairing determination module that writes each of the two instructions into a separate row of the double issue queue in response to the source dependency module determining that the number of source dependencies is greater than a source dependency maximum and pairs the two instructions in one row of the double issue queue in response to the source dependency module determining that the number of source dependencies is less than or equal to the source dependency maximum. 1. An apparatus comprising:a source dependency module that determines a number of source dependencies for two instructions intended to be paired in a row of a double issue queue of a processor, a source dependency comprising an unavailable status of a dependent source for data required by the two instructions where the data is produced by another instruction; anda pairing determination module that writes each of the two instructions into a separate row of the double issue queue in response to the source dependency module determining that the number of source dependencies is greater than a source dependency maximum and that pairs the two instructions in one row of the double issue queue in response to the source dependency module determining that the number of source dependencies is less than or equal to the source dependency maximum.2. The apparatus of claim 1 , wherein the source dependency maximum is equal to a number of dependency trackers available to the double issue queue claim 1 , the dependency trackers each tracking a source dependency of paired instructions ...

Подробнее
02-01-2020 дата публикации

Apparatus and method for using predicted result values

Номер: US20200004547A1
Принадлежит: ARM LTD

An apparatus and method are provided for using predicted result values. The apparatus has a processing unit that comprises processing circuitry for executing a sequence of instructions, and value prediction circuitry for identifying a predicted result value for at least one instruction. A result producing structure is provided that is responsive to a request issued from the processing unit when the processing circuitry is executing a first instruction, to produce a result value for the first instruction and return that result value to the processing unit. While waiting for the result value from the result producing structure, the processing circuitry can be arranged to speculatively execute at least one dependent instruction using a predicted result value for the first instruction as obtained from the value prediction circuitry. The request issued from the processing unit includes a signature value indicative of the predicted result value, and the result producing structure references the signature value in order to detect whether a mispredict condition exists indicating that the predicted result value differs from the result value. The apparatus further provides a mispredict signal transmission path via which the result producing structure, when the mispredict condition is detected, can assert a mispredict signal for receipt by the processing unit prior to the result value being available to the processing unit. Such an approach can reduce the misprediction penalty associated with using a mispredicted result value.

Подробнее
02-01-2020 дата публикации

Combining load or store instructions

Номер: US20200004550A1
Принадлежит: Qualcomm Inc

Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.

Подробнее
02-01-2020 дата публикации

APPRATUS AND METHOD FOR USING PREDICTED RESULT VALUES

Номер: US20200004551A1
Принадлежит:

An apparatus and method are provided for using predicted result values. The apparatus has processing circuitry for executing a sequence of instructions, and value prediction storage that comprises a plurality of entries, where each entry is used to identify a predicted result value for an instruction allocated to that entry. Dispatch circuitry maintains a record of pending instructions awaiting execution by the processing circuitry, and selects pending instructions from the record for dispatch to the processing circuitry for execution. The dispatch circuitry is arranged to enable at least one pending instruction to be speculatively executed by the processing circuitry using as a source operand a predicted result value provided by the value prediction storage. Allocation circuitry is arranged to apply a default allocation policy to identify a first instruction to be allocated an entry in the value prediction storage. However, the allocation circuitry is further responsive to a trigger condition to identify a dependent instruction whose result value will be dependent on the result value produced by executing the first instruction, and to then allocate an entry in the value prediction storage to store a predicted result value for the identified dependent instruction. Such an approach can enable performance improvements to be achieved through the use of predicted result values even in situations where the prediction accuracy of the predicted result value for the first instruction proves not to be that high, by instead enabling a predicted result value for the dependent instruction to be used to allow speculative execution of further dependent instructions. 1. An apparatus comprising:processing circuitry to execute a sequence of instructions;value prediction storage comprising a plurality of entries, each entry being used to identify a predicted result value for an instruction allocated to that entry;dispatch circuitry to maintain a record of pending instructions ...

Подробнее
02-01-2020 дата публикации

COOPERATIVE WORKGROUP SCHEDULING AND CONTEXT PREFETCHING

Номер: US20200004586A1
Принадлежит:

A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup. 1. A method comprising:preempting a first workgroup in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal, wherein the first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value;scheduling a second workgroup for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value; andprefetching a third context into registers of the processor core based on the first hint and the second value.2. The method of claim 1 , wherein preempting the first workgroup comprises storing the first context in a memory claim 1 , and wherein scheduling the second workgroup for execution comprises writing the second context from the memory into the registers of the processor core.3. The method of claim 2 , wherein writing the second context from the memory into the registers of the processor core comprises prefetching the second ...

Подробнее
03-01-2019 дата публикации

METHODS AND APPARATUS FOR HANDLING RUNTIME MEMORY DEPENDENCIES

Номер: US20190004804A1
Принадлежит: Intel Corporation

An integrated circuit may include elastic datapaths or pipelines, through which software threads or iterations of loops, may be executed. Throttling circuitry may be coupled along an elastic pipeline in the integrated circuit. The throttling circuitry may include dependency detection circuitry that dynamically detect memory dependency issues that may arise during runtime. To mitigate these dependency issues, the throttling circuitry may assert stall signals to upstream stages in the pipeline. 1. An integrated circuit , comprising:a memory circuit; and memory access circuitry that reads from the memory circuit using a load address and that writes into the memory circuit using a store address; and', 'throttling circuitry coupled to the memory access circuitry, the throttling circuitry is configured to compare the load address with the store address and to selectively stall a stage in the pipelined datapath based on the comparison., 'a pipelined datapath coupled to the memory circuit, the pipelined datapath comprises2. The integrated circuit defined in claim 1 , wherein the throttling circuitry comprises an address table configured to store a plurality of store addresses claim 1 , and wherein the plurality of store addresses include the store address.3. The integrated circuit defined in claim 1 , wherein the throttling circuitry comprises an address table configured to store a plurality of load addresses claim 1 , and wherein the plurality of load addresses include the load address.4. The integrated circuit defined in claim 1 , wherein the memory access circuitry comprises:a memory loading circuit that reads from the memory circuit using the load address;a memory storing circuit that writes into the memory circuit using the store address, wherein at least a portion of the throttling circuitry is interposed between the memory loading circuit and the stalled stage.5. The integrated circuit defined in claim 4 , wherein the pipelined datapath further comprises compute ...

Подробнее
03-01-2019 дата публикации

Stream processor with overlapping execution

Номер: US20190004807A1
Принадлежит: Advanced Micro Devices Inc

Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

Подробнее
03-01-2019 дата публикации

INSTRUCTIONS FOR REMOTE ATOMIC OPERATIONS

Номер: US20190004810A1
Принадлежит:

Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location. 1. A processor to execute an instruction atomically and with weak order , the processor comprising:fetch circuitry to fetch the instruction from a code storage, the instruction comprising an opcode, a source identifier, and a destination identifier;decode circuitry to decode the fetched instruction; anda scheduling circuit to select an execution circuit among multiple circuits in the system to execute the instruction, the scheduling circuit further to schedule execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance;wherein the execution circuit is to execute the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, wherein the executing comprises atomically reading a datum from a location identified by the destination identifier, performing an ...

Подробнее
03-01-2019 дата публикации

STREAM PROCESSOR WITH DECOUPLED CROSSBAR FOR CROSS LANE OPERATIONS

Номер: US20190004814A1
Принадлежит:

Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands. 1. A system comprising:a multi-lane execution pipeline;a vector register file; anda crossbar; retrieve a plurality of data operands from the vector register file;', 'convey the plurality of data operands to the multi-lane execution pipeline via the crossbar responsive to determining a permutation is required; and', 'convey the plurality of data operands to the multi-lane execution pipeline by bypassing the crossbar responsive to determining a permutation is not required., 'wherein the system is configured to2. The system as recited in claim 1 , wherein the crossbar comprises multiple layers claim 1 , and wherein the system is further configured to:perform a first permutation of data operands across lanes of the multi-lane execution pipeline with a first layer of N×N crossbars, wherein N is a positive integer; andperform a second permutation of data operands across lanes of the multi-lane execution pipeline with a second layer of N×N crossbars.3. The system as recited in claim 1 , wherein the crossbar comprises a first N/2-by-N/2 ...

Подробнее
03-01-2019 дата публикации

STREAMING ENGINE WITH SHORT CUT START INSTRUCTIONS

Номер: US20190004853A1
Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream recalled memory. Streams are started by one of two types of stream start instructions. A stream start ordinary instruction specifies a register storing a stream start address and a register of storing a stream definition template which specifies stream parameters. A stream start short-cut instruction specifies a register storing a stream start address and an implied stream definition template. A functional unit is responsive to a stream operand instruction to receive at least one operand from a stream head register. The stream template supports plural nested loops with short-cut start instructions limited to a single loop. The stream template supports data element promotion to larger data element size with sign extension or zero extension. A set of allowed stream short-cut start instructions includes various data sizes and promotion factors. 1. A digital data processor comprising:an instruction memory storing instructions each specifying a data processing operation and at least one data operand field;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one functional unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results; an address generator for generating stream memory addresses corresponding to said stream of an instruction specified sequence of a plurality of data elements,', 'a stream head register storing a data element of said stream next to be used by said at least one functional unit;, 'a streaming engine connected to said instruction decoder operable in response to a stream start instruction to recall from memory a ...

Подробнее
04-01-2018 дата публикации

HYBRID MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING HYBRID MEMORY CELL UNITS

Номер: US20180005107A1
Принадлежит:

A recurrent neural network including an input layer, a hidden layer, and an output layer, wherein the hidden layer includes hybrid memory cell units, each of the hybrid memory cell units including a first memory cells of a first type, the first memory cells being configured to remember a first cell state value fed back to each of gates to determine a degree to which each of the gates is open or closed, and configured to continue to update the first cell state value, and a second memory cells of a second type, each second memory cell of the second memory cells including a first time gate configured to control a second cell state value of the second memory cell based on phase signals of an oscillatory frequency, and a second time gate configured to control an output value of the second memory cell based on the phase signals, and each second memory cell of the second memory cells being configured to remember the second cell state value. 1. A recurrent neural network comprising:an input layer;a hidden layer; andan output layer, a first memory cells of a first type, the first memory cells being configured to remember a first cell state value fed back to each of gates to determine a degree to which each of the gates is open or closed, and configured to continue to update the first cell state value; and', 'a second memory cells of a second type, each second memory cell of the second memory cells comprising a first time gate configured to control a second cell state value of the second memory cell based on phase signals of an oscillatory frequency, and a second time gate configured to control an output value of the second memory cell based on the phase signals, and each second memory cell of the second memory cells being configured to remember the second cell state value., 'wherein the hidden layer comprises hybrid memory cell units, each of the hybrid memory cell units comprising2. The recurrent neural network of claim 1 , wherein each of the hybrid memory cell units is ...

Подробнее
03-01-2019 дата публикации

High-Speed, Fixed-Function, Computer Accelerator

Номер: US20190004995A1
Принадлежит: WISCONSIN ALUMNI RESEARCH FOUNDATION

A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.

Подробнее
03-01-2019 дата публикации

DEPLOYMENT OF INDEPENDENT DATABASE ARTIFACT GROUPS

Номер: US20190005108A1
Принадлежит:

A dependency graph is generated for database files. An unvisited node of the dependency graph is selected and a breadth-first-search performed starting from the selected unvisited node. Results of the breadth-first-search is defined as a group. A group assignment for the database files is returned. 1. A computer-implemented method , comprising:generating a dependency graph for database files;selecting an unvisited node of the dependency graph;performing a breadth-first-search (BFS) starting from the selected unvisited node;defining results of the BFS as a group; andreturning a group assignment for the database files.2. The computer-implemented method of claim 1 , wherein the selected unvisited node is selected arbitrarily and the BFS traverses nodes of the dependency graph in both incoming and outgoing directions of the dependency graph edges.3. The computer-implemented method of claim 1 , wherein each node of the dependency graph is assigned a unique ID from 0 to n−1 claim 1 , where n is the node count in the dependency graph.4. The computer-implemented method of claim 1 , further comprising marking entries in a Boolean array to indicate visited nodes in the dependency graph.5. The computer-implemented method of claim 1 , wherein a group is a set of file uniform resource identifiers.6. The computer-implemented method of claim 1 , further comprising:selecting another unvisited node of the dependency graph if unvisited nodes exist in the dependency graph; andfiltering files for deployment if no unvisited nodes exist in the dependency graph.7. The computer-implemented method of claim 1 , further comprising initiating deployment of database files based upon group information.8. A non-transitory claim 1 , computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising:generating a dependency graph for database files;selecting an unvisited node of the dependency graph;performing a breadth-first-search (BFS) ...

Подробнее
01-01-2015 дата публикации

Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

Номер: US20150006840A1
Автор: Martin Ohmacht
Принадлежит: International Business Machines Corp

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

Подробнее
01-01-2015 дата публикации

INSTRUCTION ORDER ENFORCEMENT PAIRS OF INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS

Номер: US20150006851A1
Принадлежит:

A processor of an aspect includes an instruction fetch unit to fetch a pair of instruction order enforcement instructions. The pair of instruction order enforcement instructions are part of an instruction set of the processor. The pair of instruction order enforcement instructions includes an activation instruction and an enforcement instruction. The activation instruction is to occur before the enforcement instruction in a program order. The processor also includes an instruction order enforcement module. The instruction order enforcement module, in response to the pair of the instruction order enforcement instructions, is to prevent instructions occurring after the enforcement instruction in the program order, from being processed prior to the activation instruction, in an out-of-order portion of the processor. Other processors are also disclosed, as are various methods, systems, and instructions. 1. A processor comprising:an instruction fetch unit to fetch a pair of instruction order enforcement instructions, which are to be part of an instruction set of the processor, the pair of instruction order enforcement instructions to include an activation instruction and an enforcement instruction, the activation instruction to occur before the enforcement instruction in a program order; andan instruction order enforcement module, in response to the pair of the instruction order enforcement instructions, to prevent instructions occurring after the enforcement instruction in the program order, from being processed prior to the activation instruction, in an out-of-order portion of the processor.2. The processor of claim 1 , wherein the instruction order enforcement module comprises:an activation module to activate instruction order enforcement, in response to the activation instruction, at a first stage of a pipeline of the processor; anda blocking module coupled with the activation module, the blocking module, while the instruction order enforcement is activated, to ...

Подробнее
01-01-2015 дата публикации

Mode dependent partial width load to wider register processors, methods, and systems

Номер: US20150006856A1
Принадлежит: Intel Corp

A method of an aspect is performed by a processor. The method includes receiving a partial width load instruction. The partial width load instruction indicates a memory location of a memory as a source operand and indicates a register as a destination operand. The method includes loading data from the indicated memory location to the processor in response to the partial width load instruction. The method includes writing at least a portion of the loaded data to a partial width of the register in response to the partial width load instruction. The method includes finishing writing the register with a set of bits stored in a remaining width of the register that have bit values that depend on a partial width load mode of the processor. The partial width load instruction does not indicate the partial width load mode. Other methods, processors, and systems are also disclosed.

Подробнее
01-01-2015 дата публикации

MULTIFUNCTIONAL HEXADECIMAL INSTRUCTION FORM SYSTEM AND PROGRAM PRODUCT

Номер: US20150006859A1
Принадлежит:

A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle. 1. A computer system supporting both Binary Floating Point (BFP) operations and non-BFP floating point operations , wherein the non-BFP floating point operations comprise Hexadecimal Floating Point (HFP) operations , the computer system comprising:computer memory;a processor in communications with the computer memory the processor comprising an instruction fetching element for fetching instructions from memory to perform a method comprising:providing at least three operands of a combination multiply add/subtract instruction to a main fraction dataflow element of a floating point element, each of said operands comprising BFP floating point format values or non-BFP floating point format values, said non-BFP floating point format values comprising floating point format values other than BFP format values, and said main fraction dataflow element configured to process both BFP floating point format values and non-BFP floating point format values for said operands;responsive to the combined multiply add/subtract instruction being a non-BFP floating point multiply add/subtract instruction, the main fraction dataflow element performing a non-BFP floating point operation on the three operands to produce a non-BFP main fraction result, the ...

Подробнее
20-01-2022 дата публикации

GRAPHICS PROCESSORS

Номер: US20220020108A1
Автор: Uhrenholt Olof Henrik
Принадлежит: ARM LIMITED

To suspend the processing for a group of one or more execution threads currently executing a shader program for an output being generated by a graphics processor, the issuing of shader program instructions for execution by the group of one or more execution threads is stopped, and any outstanding register-content affecting transactions for the group of one or more execution threads are allowed to complete. Once all outstanding register-content affecting transactions for the group of one or more execution threads have completed, the content of the registers associated with the threads of the group of one or more execution threads, and a set of state information for the group of one or more execution threads, including at least an indication of the last instruction in the shader program that was executed for the threads of the group of one or more execution threads, are stored to memory. 1. A method of operating a data processor that includes a programmable execution unit operable to execute programs , and in which when executing a program , the programmable execution unit executes the program for respective groups of one or more execution threads , each execution thread in a group of execution threads corresponding to a respective work item of an output being generated , and each execution thread having an associated set of registers for storing data for the execution thread , the method comprising:in response to a command to suspend the processing of an output being generated by the data processor: stopping the issuing of program instructions for execution by the group of one or more execution threads;', 'waiting for any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads to complete; and', storing to memory:', 'the content of the registers associated with the threads of the group of one or more execution threads; and', 'a set of ...

Подробнее
27-01-2022 дата публикации

OUT OF ORDER MEMORY REQUEST TRACKING STRUCTURE AND TECHNIQUE

Номер: US20220027160A1
Принадлежит:

In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic. 1. A memory request tracking circuit for use with a streaming cache memory , the memory request tracking circuit comprising:a tag check configured to detect cache misses;plural tracking queues; anda queue mapper coupled to the tag check and the plural tracking queues, the queue mapper being configured to distribute request tracking information to the plural tracking queues to enable in-order and out-of-order memory request returns.2. The memory request tracking circuit of wherein the queue mapper is programmable to preserve in-order memory request return handling for a first type of memory requests and to enable out-of-order memory request return handling for a second type of memory requests different from the first type of memory requests.3. The memory request tracking circuit of wherein the first and second types of memory requests are selected from the group consisting of loads from local or global memory; texture memory/storage; and acceleration data structure storage.4. The memory request tracking circuit of wherein the plural tracking queues comprise first through N tracking queues claim 1 , and the queue mapper allocates a first tracking queue to a particular warp and distributes certain types of memory requests evenly across second through N tracking queues.5. The memory request tracking circuit of wherein the plural tracking queues each comprise a first-in-first-out storage.6. The memory request tracking circuit of further includes a pipelined checker picker that selects tracking queue outputs for application ...

Подробнее
08-01-2015 дата публикации

COMPACT LINKED-LIST-BASED MULTI-THREADED INSTRUCTION GRADUATION BUFFER

Номер: US20150012730A1
Автор: Svendsen Kjeld
Принадлежит:

A processor and instruction graduation unit for a processor. In one embodiment, a processor or instruction graduation unit according to the present invention includes a linked-list-based multi-threaded graduation buffer and a graduation controller. The graduation buffer stores identification values generated by an instruction decode and dispatch unit of the processor as part of one or more linked-list data structures. Each linked-list data structure formed is associated with a particular program thread running on the processor. The number of linked-list data structures formed is variable and related to the number of program threads running on the processor. The graduation controller includes linked-list head identification registers and linked-list tail identification registers that facilitate reading and writing identifications values to linked-list data structures associated with particular program threads. The linked-list head identification registers determine which executed instruction result or results are next to be written to a register file. 1. A processor , comprising:a results buffer having a plurality of registers, each register for temporarily storing a result of an executed instruction prior to the result being written to a register file;a results buffer allocater that generates identification values, wherein each identification value identifies a register of the results buffer in which an executed instruction result can be temporarily stored; anda graduation buffer having a plurality of registers, wherein identification values generated by the results buffer allocater are temporarily stored as part of a linked-list data structure.2. The processor of claim 1 , wherein identification values generated by the results buffer allocater are temporarily stored as part of a plurality of linked-list data structures claim 1 , each linked-list data structure being associated with a particular program thread.3. The processor of claim 1 , further comprising:a ...

Подробнее
14-01-2016 дата публикации

Silent memory instructions and miss-rate tracking to optimize switching policy on threads in a processing device

Номер: US20160011874A1
Принадлежит: Intel Corp

A processing device implementing silent memory instructions and miss-rate tracking to optimize switching policy on threads is disclosed. A processing device of the disclosure includes a branch prediction unit (BPU) to predict that an instruction of a first thread in a current execution context of the processing device is a delinquent instruction, indicate that the first thread including the delinquent instruction is in a silent execution mode, indicate that the delinquent instruction is to be executed as a silent instruction, switch an execution context of the processing device to a second thread, and when the execution context returns to the first thread, cause the delinquent instruction to be re-executed as a regular instruction.

Подробнее
11-01-2018 дата публикации

Computing System and controller thereof

Номер: US20180011710A1
Автор: Kaiyuan Guo, Song Yao

Computing system and controller thereof are disclosed for ensuring the correct logical relationship between multiple instructions during their parallel execution. The computing system comprises: a plurality of functional modules each performing a respective function in response to an instruction for the given functional module; and a controller for determining whether or not to send an instruction to a corresponding functional module according to dependency relationship between the plurality of instructions.

Подробнее
11-01-2018 дата публикации

METHOD FOR EXECUTING MULTITHREADED INSTRUCTIONS GROUPED INTO BLOCKS

Номер: US20180011738A1
Автор: Abdallah Mohammad
Принадлежит:

A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline. 1. A method of executing multithreaded instructions grouped into blocks , the method comprising:receiving an incoming instruction sequence at a front end of an execution pipeline;grouping instructions from the instruction sequence to form instruction blocks, where a first group of instruction blocks are a part of a first thread of execution and a second group of instruction blocks are a part of a second thread of execution;storing the first group of instruction blocks and the second group of instruction blocks in a scheduler array, where a first commit pointer points to a location in the scheduler array for a next block of the first group to be executed and a second commit pointer points to a next block of the second group to be execute;scheduling the instructions of the instruction blocks to execute in accordance with a position in the scheduler array; andtracking execution of the first thread and second thread to enforce a fairness policy using allocation counters to track a number of instruction blocks in the scheduler array for each thread.2. The method of claim 1 , wherein each allocation counter tracks a number of entries in a correlated thread pointer map.3. The method of claim 1 , wherein the fairness policy prevents any thread from exceeding an allocation threshold.4. The method of claim 1 , further comprising:tracking an array entry to place a next block in the scheduler array for each thread using a separate allocate pointer.5. The method of claim 1 , wherein ...

Подробнее