Поиск патентов

Настройки

Глубина выборки

Укажите год

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Ключевые слова. Может быть несколько по одной на строку

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка

Автор

Ведите корректный номера.

Владелец

Ведите корректный номера.

Классы IPC

Ведите корректный номера.

Классы CPC

Ведите корректный номера.

Начиная с года

Укажите год

Заканчивая годом

Укажите год

Применить Всего найдено 10914. Отображено 200.

10-01-2005 дата публикации

СИСТЕМА И СПОСОБ ДЛЯ ПРЕДВАРИТЕЛЬНОЙ ВЫБОРКИ ДАННЫХ В КЭШ-ПАМЯТИ, ОСНОВАННОЙ НА ИНТЕРВАЛЕ НЕУДАЧ

Номер: RU2003119149A

Автор: АЛ-ДАДЖАНИ Халид (US), АБДАЛЛАХ Мохаммад (US)

Принадлежит:

... 1. Устройство предварительных выборок для предварительной выборки данных для команды на основании интервала между неудачными обращениями к кэш, вызванными командой. 2. Устройство предварительных выборок по п.1, в котором устройство предварительных выборок имеет память для хранения таблицы предварительных выборок, содержащей одну или более записей, которые включают в себя интервал между неудачными обращениями к кэш, вызванными командой. 3. Устройство предварительных выборок по п.2, в котором таблица предварительных выборок содержит запись для команды, только если команда вызвала по меньшей мере два неудачных обращения к кэш. 4. Устройство предварительных выборок по п.2, в котором адреса предварительно выбираемых элементов данных определяются на основании интервала между неудачными обращениями к кэш, записанными в таблице предварительных выборок для команды. 5. Устройство предварительных выборок по п.2, в котором устройство предварительных выборок имеет “шумовой” фильтр для предотвращения ...

Подробнее

Номер записи: 1

08-12-2005 дата публикации

Prozessorarchitektur mit einem Array von Pipelines und einer Registerdatei zum Vermeiden von Datenrisiken und Verfahren zur Datenweiterleitung in einer derartigen Prozessorarchitektur

Номер: DE0010303053B4

Автор: FETZER ERIC S, SOLTIS DONALD C, UNDY STEPHEN R, FETZER, ERIC S., SOLTIS, DONALD C., UNDY, STEPHEN R.

Принадлежит: HEWLETT PACKARD DEVELOPMENT CO, HEWLETT-PACKARD DEVELOPMENT CO., L.P.

Подробнее

Номер записи: 2

09-04-1998 дата публикации

Fliessbandsteuerung und Registerübersetzung in Mikroprozessor

Номер: DE0069408769D1

Автор: BLUHM MARK, GARIBAY RAUL A, MCMAHAN STEVEN C, BEARD DOUGLAS, HERVIN MARK W, EITRHEIM JOHN K, BLUHM, MARK, PLANO, TEXAS 75082, US, GARIBAY, RAUL A., JR, DALLAS, TEXAS 75248, US, MCMAHAN, STEVEN C., RICHARDSON, TEXAS 75082, US, BEARD, DOUGLAS, DALLAS, TEXAS 75093, US, HERVIN, MARK W., DALLAS, TEXAS 75252, US, EITRHEIM, JOHN K., DALLAS, TEXAS 75093, US

Принадлежит: CYRIX CORP, CYRIX CORP., RICHARDSON, TEX., US

Подробнее

Номер записи: 3

09-09-1999 дата публикации

Rechner mit einer Parallelverarbeitungsfähigkeit

Номер: DE0069325826D1

Автор: HOTTA TAKASHI, NAKATSUKA YASUHIRO, TANAKA SHIGEYA, YAMADA HIROMICHI, MAEJIMA HIDEO, HOTTA, TAKASHI, NAKATSUKA, YASUHIRO, TANAKA, SHIGEYA, YAMADA, HIROMICHI, MAEJIMA, HIDEO

Принадлежит: HITACHI LTD, HITACHI

Подробнее

Номер записи: 4

18-02-1999 дата публикации

Verfahren für Datenvorgriff in Cache-Speicher und Rechner zu dessen Durchführung

Номер: DE0069130726D1

Автор: CHI CHI-HUNG, CHI, CHI-HUNG, NL-5656 AA EINDHOVEN, NL

Принадлежит: KONINKL PHILIPS ELECTRONICS NV, KONINKLIJKE PHILIPS ELECTRONICS N.V., EINDHOVEN, NL

Подробнее

Номер записи: 5

29-04-1999 дата публикации

Vorrichtung zur Auflösung von einer variablen Anzahl von möglichen Speicherzugriffskonflikten in einem Pipeline-Rechnersystem und Verfahren dazu

Номер: DE0068928812T2

Автор: HETHERINGTON RICKEY C, FITE DAVID B, MURRAY JOHN E, FOSSUM TRYGGVE, WEBB DAVID A, HETHERINGTON, RICKEY C., NORTHBORO MASSACHUSETTS 01532, US, FITE, DAVID B., NORTHBORO MASSACHUSETTS 01532, US, MURRAY, JOHN E., ACTON MASSACHUSETTS 01581, US, FOSSUM, TRYGGVE, NORTHBORO MASSACHUSETTS 01532, US, WEBB, DAVID A., JNR., BERLIN MASSACHUSETTS 01503, US

Принадлежит: DIGITAL EQUIPMENT CORP, DIGITAL EQUIPMENT CORP., MAYNARD, MASS., US

Подробнее

Номер записи: 6

15-02-2001 дата публикации

Zentrale Verarbeitungseinheit mit Cache-Speicher

Номер: DE0069611510D1

Автор: GENDUSO THOMAS BASILIO, VANDERSLICE EDWARD ROBERT, GENDUSO, THOMAS BASILIO, VANDERSLICE, EDWARD ROBERT

Принадлежит: IBM, INTERNATIONAL BUSINESS MACHINES CORP., ARMONK

Подробнее

Номер записи: 7

09-06-2010 дата публикации

Preload instruction control

Номер: GB0201006758D0

Автор:

Принадлежит:

Подробнее

Номер записи: 8

12-07-1995 дата публикации

Manipulation of data

Номер: GB0009509987D0

Автор:

Принадлежит:

Подробнее

Номер записи: 9

11-01-1989 дата публикации

METHODS OF OPERATING MICROPROCESSORS

Номер: GB0002169115B

Автор: WATANABE NOBUHISA, NOBUHISA * WATANABE

Принадлежит: SONY CORP, * SONY CORPORATION

Подробнее

Номер записи: 10

10-03-2010 дата публикации

System and Method for Reducing Execution Divergence in Paralle Processing Architectures

Номер: GB0002463142A

Автор: AILA TIMO, LAINE SAMULI, AILA TIMO, LAINE SAMULI, LUEBKE DAVID, GARLAND MICHAEL, HOBEROCK JARED, TIMO AILA, SAMULI LAINE

Принадлежит:

A method for reducing execution divergence among a plurality of threads executable within a parallel processing architecture (eg SIMD) includes an operation of determining, among a plurality of data sets (410) that function as operands for a plurality of different execution commands, a preferred execution type for the collective plurality of data sets. A data set is assigned from a data set pool to a thread (434) which, is to be executed by the parallel processing architecture, the assigned data set being of the preferred execution type, whereby the parallel processing architecture is operable to concurrently execute a plurality of threads, the plurality of concurrently executable threads including the thread having the assigned data set. An execution command (436) for which the assigned data functions as an operand is applied to each of the plurality of threads. The pool may comprise local memory storage (412 & 414). If one of the plurality of threads has terminated the data set may be ...

Подробнее

Номер записи: 11

10-02-1982 дата публикации

Improvements in or relating to data processing systems including cache stores

Номер: GB0002080989A

Автор: Shelly, William A, Wilhite, John E, Ryan, Charles P

Принадлежит:

A data processing system includes a cache store to provide an interface with a main storage unit for a central processing unit. The central processing unit includes a microprogram control unit in addition to control circuits for establishing the sequencing of the processing unit during the execution of program instructions. Both the microprogram control unit and control circuits include means for generating pre-read commands to the cache store in conjunction with normal processing operations during the processing of certain types of instructions. In response to pre-read commands, the cache store, during predetermined points of the processing of each such instruction, fetches information which is required by such instruction at a later point in the processing thereof.

Подробнее

Номер записи: 12

15-04-2020 дата публикации

Prefetching of data and instructions in a data processing apparatus

Номер: GB0002514636B

Автор: GEOFFRAY MATTHIEU LACOURBA, PHILIPPE JEAN-PIERRE RAPHALEN, Geoffray Matthieu Lacourba, Philippe Jean-Pierre Raphalen

Принадлежит: ADVANCED RISC MACH LTD, ARM Limited

Подробнее

Номер записи: 13

04-08-2004 дата публикации

Memory access latency hiding with hint buffer

Номер: GB0002397918A

Автор: Maiyuran,Subramaniam, Garg,Vivek, Abdallah,Mohammad A, Keshava,Jagannath, MAIYURAN SUBRAMANIAM, GARG VIVEK, ABDALLAH MOHAMMAD A, KESHAVA JAGANNATH, SUBRAMANIAM * MAIYURAN, VIVEK * GARG, MOHAMMAD A * ABDALLAH, JAGANNATH * KESHAVA

Принадлежит:

A request hint is issued prior to or while identifying whether requested data and/or one or more instructions are in a first memory. A second memory is accessed to fetch data and/or one or more instructions in response to the request hint. The data and/or instruction(s) accessed from the second memory are stored in a buffer. If the requested data and/or instructions(s) are not in the first memory, the data and/or instruction(s) are returned from the buffer.

Подробнее

Номер записи: 14

03-07-2002 дата публикации

Processor multiprocessor system and method for data dependence speculative execution

Номер: GB0000211979D0

Автор:

Принадлежит:

Подробнее

Номер записи: 15

25-08-2010 дата публикации

Data processing apparatus and method dependent on streaming preload instruction.

Номер: GB0002468007A

Автор: SYMES DOMINIC HUGO, CALLAN JONATHAN SEAN, FRANCIS HEDLEY JAMES, METER PAUL GILBERT, SYMES DOMINIC HUGO, CALLAN JONATHAN SEAN, FRANCIS HEDLEY JAMES, METER PAUL GILBERT, DOMINIC HUGO SYMES, JONATHAN SEAN CALLAN, HEDLEY JAMES FRANCIS, PAUL GILBERT METER

Принадлежит:

Data processing apparatus comprising a processor 10 and a cache memory 32 having a plurality of cache lines. A cache controller 34 is also provided comprising: preload circuitry 35 storing data values from a main memory into cache-lines operable in response to a streaming preload instruction received at the processor; identification circuitry 36 operable in response to the streaming preload instruction to identify cache lines for preferential reuse (for example by setting a valid bit associated with the cache line); cache maintenance circuitry 37 to select cache lines for reuse having regard to any preferred for reuse identification generated by the identification circuitry. In this way, a single streaming preload instruction can be used to trigger both a preload of cache lines of data values into the cache memory, and also to mark for preferential reuse other cache lines of the cache memory. Data values can be stored in cache lines following the current line address for preload or preceding ...

Подробнее

Номер записи: 16

25-12-2019 дата публикации

Next instruction access intent instruction

Номер: GB0002519017B

Автор: CHRISTIAN JACOBI, TIMOTHY SLEGEL, CHUNG-LUNG KEVIN SHUM, GUSTAV SITTMANN, Christian Jacobi, Timothy Slegel, Chung-Lung Kevin Shum, Gustav Sittmann

Принадлежит: IBM, International Business Machines Corporation

Подробнее

Номер записи: 17

01-11-2006 дата публикации

Method and apparatus for dynamically adjusting the aggressiveness of an execute-ahead processor

Номер: GB0000618749D0

Автор:

Принадлежит:

Подробнее

Номер записи: 18

03-11-1965 дата публикации

Asynchronous digital computer

Номер: GB0001008775A

Автор:

Принадлежит:

... 1,008,775. Digital electronic computers. GENERAL ELECTRIC CO. Ltd. Nov. 13, 1961 [Nov. 21, 1960], No.40597/61. Heading G4A. A binary serial-mode computer in which instructions and data are stored on a magnetic drum 10 is arranged to operate partly with single address and partly with two-address instructions, each word read from the drum being passed to a register such as 32 or 33 via an arithmetic unit 25 so that the simpler instructions may be carried out in one word-time. Where possible instructions (single address) are placed in alternate locations on the drum, and the operand required by a given instruction is placed in the next location. When an operation is to occupy more than one word-time, or the alternate arrangement of instructions and operands is not possible, a two-address instruction (double length) is used, one bit of the first word being reserved for indicating this; the second address then being that of the next instruction. An address consists of seven bits specifying a ...

Подробнее

Номер записи: 19

13-02-1980 дата публикации

DATA PROCESSING SYSTEMS

Номер: GB0001561091A

Автор:

Принадлежит:

Подробнее

Номер записи: 20

24-06-2020 дата публикации

Gateway pull model

Номер: GB0002579412A

Автор: BRIAN MANULA, Brian Manula

Принадлежит:

A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host A computer system comprising: {i) a computer subsystem configured to act as a work accelerator, and {ii) a gateway connected to the computer subsystem, the gateway enabling the transfer of data to the computer subsystem from external storage at pre-compiled data exchange synchronisation points attained by the computer subsystem, which act as a barrier between a compute phase and an exchange phase of the computer subsystem, wherein the computer subsystem is configured to pull data from a gateway transfer memory of the gateway in response to the pre-compiled data exchange synchronisation point attained by the subsystem, wherein the gateway comprises at least one processor configured to perform at least one operation to pre-load at least some of the data from a first memory of the gateway to the gateway transfer memory in advance of the pre-complied data exchange synchronisation point attained by ...

Подробнее

Номер записи: 21

15-05-2010 дата публикации

SYSTEM WITH BROAD OPERAND ARCHITECTURE AND PROCEDURE

Номер: AT0000467171T

Автор: HANSEN CRAIG, HANSEN, CRAIG

Принадлежит:

Подробнее

Номер записи: 22

15-04-2012 дата публикации

SYSTEM, CONTROLLER AND PROCEDURE FOR THE CONTROLLING OF COMMUNICATION BETWEEN A PROCESSOR AND AN EXTERNAL PERIPHERY DEVICE

Номер: AT0000550720T

Автор: NG CHEE, KABRA NITIN, NG, CHEE, KABRA, NITIN

Принадлежит:

Подробнее

Номер записи: 23

15-01-1998 дата публикации

PROCEDURE FOR PREPROCESSING SEVERAL INSTRUCTIONS

Номер: AT0000161640T

Автор: GRUNDMANN WILLIAM R, FITE DAVID B, MANLEY DWIGHT P, MCKEEN FRANCIS X, GRUNDMANN, WILLIAM R., FITE, DAVID B., MANLEY, DWIGHT P., MCKEEN, FRANCIS X.

Принадлежит:

Подробнее

Номер записи: 24

15-02-2003 дата публикации

MULTI-PROCESSOR DIGITAL DATA PROCESSING SYSTEM AND PROCEDURE FOR THE OPERATION OF THIS SYSTEM

Номер: AT0000231996T

Автор: FRANK STEVEN J, GOODMAN NATHAN, BURKHARDT III HENRY, MARGULIES BENSON I, LEE LINDA Q, WEBER FREDERICK D, FRANK, STEVEN J., GOODMAN, NATHAN, BURKHARDT III, HENRY, MARGULIES, BENSON I., LEE, LINDA Q., WEBER, FREDERICK D.

Принадлежит:

Подробнее

Номер записи: 25

15-09-1998 дата публикации

DEVICE FOR THE DISSOLUTION OF A VARIABLE NUMBER OF POSSIBLE MEMORY ACCESS CONFLICTS IN A PIPELINE COMPUTER SYSTEM AND A PROCEDURE FOR IT

Номер: AT0000170995T

Автор: HETHERINGTON RICKEY C, FITE DAVID B, MURRAY JOHN E, FOSSUM TRYGGVE, WEBB DAVID A JNR, HETHERINGTON, RICKEY C., FITE, DAVID B., MURRAY, JOHN E., FOSSUM, TRYGGVE, WEBB, DAVID A., JNR.

Принадлежит:

Подробнее

Номер записи: 26

15-06-2002 дата публикации

MULTIPROCESSOR SYSTEM WITH REPEATED COMMAND SOURCES

Номер: AT0000218225T

Автор: FRANK STEVEN, BURKHARDT HENRY III, WEBER FREDERICK D, LEE LINDA Q, FRANK, STEVEN, BURKHARDT, HENRY, III, WEBER, FREDERICK D., LEE, LINDA Q.

Принадлежит:

Подробнее

Номер записи: 27

17-11-2003 дата публикации

System and method for linking speculative results of load operations to register values

Номер: AU2002367915A8

Автор: PICKETT JAMES K, JAMES K. PICKETT

Принадлежит:

Подробнее

Номер записи: 28

20-10-2003 дата публикации

Time-multiplexed speculative multi-threading to support single-threaded applications

Номер: AU2003222244A8

Автор: CHAUDHRY SHAILENDER, TREMBLAY MARC, SHAILENDER CHAUDHRY, MARC TREMBLAY

Принадлежит:

Подробнее

Номер записи: 29

25-02-2004 дата публикации

DATA PROCESSING METHOD AND DEVICE

Номер: AU2003260323A1

Автор: VORBACH MARTIN, BECKER JURGEN, WEINHARDT MARKUS, BAUMGARTE VOLKER, MAY FRANK, MARTIN VORBACH, JURGEN BECKER, MARKUS WEINHARDT, VOLKER BAUMGARTE, FRANK MAY

Принадлежит:

Подробнее

Номер записи: 30

21-11-2002 дата публикации

ISSUANCE AND EXECUTION OF MEMORY INSTRUCTIONS TO AVOID READ-AFTER-WRITE HAZARDS

Номер: CA0002447425A1

Автор: EKANADHAM, KATTAMURI, BILARDI, GIANFRANCO, PATTNAIK, PRATAP CHANDRA

Принадлежит:

A method and apparatus for issuing and executing memory instructions so as to maximize the number of requests issued to a highly pipelined memory and avoid reading data from memory (10) before a corresponding write to memory (10). The memory is divided into a number of regions, each of which is associated with a fence counter (18) that is incremented each time a memory instruction that is targeted to the memory region is issued and decremented each time there is a write to the memory region. After a fence instruction is issued, no further memory instructions (23) are issued if the counter for the memory region specified in the fence instruction is above a threshold. When a sufficient number of the outstanding issued instructions are executed, the counter will be decremented below the threshold and further memory instructions are then issued.

Подробнее

Номер записи: 31

12-09-2003 дата публикации

METHOD OF PREFETCHING DATA/INSTRUCTIONS RELATED TO EXTERNALLY TRIGGERED EVENTS

Номер: CA0002478007A1

Автор: DOERING, ANDREAS

Принадлежит:

Method of prefetching data /instructions related to externally triggered events in a system including an infrastructure (18) having an input interface (20) for receiving data/ instructions to be handled by the infrastructure and an output interface (22) for transmitting data after they have been handled, a memory (14) for storing data/instructions when they are received by input interface, a processor (10) for processing at least some data/instructions, the processor having a cache wherein the data/instructions are stored before being processed, and an external source (26) for assigning sequential tasks to the processor. The method comprises the following steps which are performed while the processor is performing a previous task: determining the location in the memory of data/ instructions to be processed by the processor, indicating to the cache the addresses of these memory locations, fetching the contents of the memory locations and writing them into the cache, and assigning the task ...

Подробнее

Номер записи: 32

24-05-2017 дата публикации

Cross-coupled level shifter with transition tracking circuits

Номер: CN0106716346A

Автор: DU YUN, SHANG HONGJIANG, ZHU HAIKUN

Принадлежит:

Подробнее

Номер записи: 33

04-05-2016 дата публикации

Used for implementing the forming memory access system and method of operation

Номер: CN0103218208B

Автор:

Принадлежит:

Подробнее

Номер записи: 34

08-06-1994 дата публикации

Operation method of micro-computer with assemby line construction

Номер: CN0001024960C

Автор: HITOSHINAGA WATARIDO, WATARIDO HITOSHINAGA

Принадлежит: SONY CORP, SONY K.K.

Подробнее

Номер записи: 35

16-11-1979 дата публикации

DISPOSITIF DE COMMANDE DE TRANSFERT DE DONNEES

Номер: FR0002423822A

Автор: PATRICK M. GANNON; JOHN S. LIPTAY; JAMES W. RYMARCZYK

Принадлежит:

Quand un opérande demandé par A9 ne se trouve pas dans l'antémémoire dans A13, la première portion de ligne de données transférée de la mémoire principale A16 ainsi que les portions de ligne suivantes de l'opérande sont transférées à l'antémémoire et en même temps à l'unité A9. Cette unité comporte des paires de registres d'opérandes et d'adresses d'opérandes pour recevoir des opérandes à longueur variable avec alignement. Ce dispositif permet d'accroître les performances des systèmes pourvus d'antémémoires.

Подробнее

Номер записи: 36

19-09-1986 дата публикации

PROCESS AND DEVICE TO MANAGE the TRANSFERS Of INFORMATION BETWEEN a REPORT UNIT AND the VARIOUS TREATMENT UNITS Of a SYSTEM OF DIGITAL PROCESSING OF INFORMATION

Номер: FR0002479532B1

Автор: MICHEL ISERT ET MICHEL JEAN-MARIE GUILLEMET, GUILLEMET MICHEL JEAN-MARIE, ISERT MICHEL

Принадлежит:

Подробнее

Номер записи: 37

14-12-1962 дата публикации

Synchronous digital computer

Номер: FR0001312094A

Автор:

Принадлежит:

Подробнее

Номер записи: 38

17-08-1990 дата публикации

MICROPROCESSOR HAS ARCHITECTURE PIPELINE INCLUDING/UNDERSTANDING an INSTRUCTION Of ACCESS HAS DATA IN EXTERNAL STORAGE, AND PROCEEDED Of ACCESS HAS THESE DATA

Номер: FR0002643166A1

Автор: LESLIE D. KOHN, KOHN LESLIE D

Принадлежит:

Подробнее

Номер записи: 39

30-08-2002 дата публикации

PROCEEDED OF MANAGEMENT Of INSTRUCTIONS WITHIN a PROCESSOR ARCHITECTURE DECOUPLEE HAS, IN PARTICULAR a PROCESSOR OF DIGITAL PROCESSING OF the SIGNAL, AND CORRESPONDING PROCESSOR

Номер: FR0002821449A1

Автор: COFLER ANDREW

Принадлежит:

L'unité de traitement DU est associée à une première mémoire RLDQ du type FIFO et à une deuxième mémoire DIDQ du type FIFO. Chaque instruction de chargement LDRx dans un registre est stockée dans la première mémoire RLDQ. Certaines au moins des autres instructions sont stockées dans la deuxième mémoire DIDQ. On extrait de la deuxième mémoire une instruction opérative impliquant au moins un registre, si aucune instruction de chargement temporellement plus ancienne et destinée à modifier la valeur du ou des registres associée à cette instruction opérative, n'est présente dans la première mémoire. Et, en présence d'une telle instruction de chargement temporellement plus ancienne, on extrait ladite instruction opérative de la deuxième mémoire seulement après que l'instruction modificatrice de chargement a été extraite de la première mémoire. Application à un processeur de traitement numérique du signal.

Подробнее

Номер записи: 40

04-07-1986 дата публикации

PROCEDE D'EXPLOITATION D'UN MICROCALCULATEUR A CYCLE D'INSTRUCTION PERFECTIONNE

Номер: FR0002575563A

Автор: NOBUHISA WATANABE

Принадлежит:

UN MICROCALCULATEUR COMPORTE UN DECODEUR D'INSTRUCTION 4 ET UN COMPTEUR DE PROGRAMME 5. LE DECODEUR D'INSTRUCTION DECODE DES INSTRUCTIONS EXTRAITES ET DELIVRE UN SIGNAL DE COMMANDE ORDONNANT L'EXECUTION DE L'INSTRUCTION EXTRAITE. LE SIGNAL DE COMMANDE VENANT DU DECODEUR D'INSTRUCTION COMPORTE UNE COMPOSANTE COMMANDANT DES CYCLES D'INSTRUCTION QUI DECLENCHE UN CYCLE D'EXTRACTION AU DEBUT DE CHAQUE CYCLE D'INSTRUCTION AFIN D'EXTRAIRE L'OPERANDE RELATIF A L'INSTRUCTION EN COURS D'EXECUTION ET A MI-CHEMIN DE CHAQUE CYCLE D'INSTRUCTION AFIN D'EXTRAIRE LE CODE OP RELATIF A L'INSTRUCTION SUIVANTE. LE COMPTEUR DE PROGRAMME REPOND AU DECLENCHEMENT DE CHAQUE CYCLE D'EXTRACTION EN INCREMENTANT SA VALEUR DE COMPTAGE DE MANIERE A MAINTENIR CETTE DERNIERE COHERENTE AVEC L'ADRESSE A LAQUELLE IL EST DONNE ACCES LORS DE CHAQUE CYCLE D'EXTRACTION.

Подробнее

Номер записи: 41

31-03-2006 дата публикации

METHOD AND APPARATUS FOR FACILITATING SPECULATIVE STORES IN A MULTIPROCESSOR SYSTEM

Номер: KR0100567099B1

Автор:

Принадлежит:

Подробнее

Номер записи: 42

10-11-2004 дата публикации

TIME-MULTIPLEXED SPECULATIVE MULTI-THREADING TO SUPPORT SINGLE-THREADED APPLICATIONS

Номер: KR20040094888A

Автор: TREMBLAY MARC, CHAUDHRY SHAILENDER

Принадлежит:

TIME-MULTIPEXED SPECULATIVE MULTI- THREADING TO SUPPORT SINGLE-THREADED APPLICATIONSABSTRACTOne embodiment of the present invention provides a system that facilitates interleaved execution of a head thread and a speculative thread within a single processor pipeline. The system operates by executing program instructions using the head thread, and by speculatively executing program instructions in advance of the head thread using the speculative thread, wherein the head thread and the speculative thread execute concurrently through time-multiplexed interleaving in the single processor pipeline. © KIPO & WIPO 2007 ...

Подробнее

Номер записи: 43

09-05-2008 дата публикации

ADVANCED LOAD VALUE CHECK ENHANCEMENT

Номер: KR1020080041251A

Автор: RYCHLIK BOHUSLAV

Принадлежит:

Systems and methods for performing re-ordered computer instructions are disclosed. A computer processor loads a first value from a first memory address, and records both the first value and the second value in a table or queue. The processor stores a second value to the same memory address, and either evicts the previous table entry, or adds the second value to the previous table entry. Upon subsequently detecting the evicted table entry or inconsistent second value, the processor generates an exception that triggers recovery of speculative use of the first value. © KIPO & WIPO 2008 ...

Подробнее

Номер записи: 44

21-08-2006 дата публикации

Method, apparatus and computer system for generating prefetches by speculatively executing code during stalls

Номер: TWI260540B

Автор:

Принадлежит:

One embodiment of the present invention provides a system that generates prefetches by speculatively executing code during Stalls through a technique known as ""hardware scout threading."" The system starts by executing code within a processor. Upon encountering a stall, the system speculatively executes the code from the point of the stall. without committing results of the speculative execution to the architectural state of the processor. If the system encounters a memory reference during this speculative execution, the system determines if a target address for the memory reference can be resolved. If so, the system issues a prefetch for the memory reference to load a cache line for the memory reference into a cache within the processor. In a variation on this embodiment, the processor supports simultaneous multithreading (SMT), which enables multiple threads to execute concurrently through time-multiplexed interleaving in a single processor pipeline. In this variation, the non-speculative ...

Подробнее

Номер записи: 45

06-12-2007 дата публикации

GRAPHICS PROCESSOR WITH ARITHMETIC AND ELEMENTARY FUNCTION UNITS

Номер: WO2007140338A2

Автор: BOURD, Alexei, V. , DU, Yun , YU, Chun , JIAO, Guofang

Принадлежит:

A graphics processor capable of efficiently performing arithmetic operations and computing elementary functions is described. The graphics processor has at least one arithmetic logic unit (ALU) that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. The ALU(s) and elementary function unit(s) may be arranged such that they can operate in parallel to improve throughput. The graphics processor may also include fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) four components of an attribute for one pixel or (2) one component of an attribute for four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost while still providing good performance.

Подробнее

Номер записи: 46

21-12-2007 дата публикации

APPARATUS AND METHOD OF PREFETCHING DATA

Номер: WO2007145700A1

Автор: KELTCHER, Paul, S.

Принадлежит:

A device (300) and method is illustrated to prefetch information based on a location of an instruction that resulted in a cache miss during its execution. The prefetch information to be accessed is determined based on previous and current cache miss information. For example, information based on previous cache misses is stored at data records as prefetch information. This prefetch information includes location information based on an instruction that caused a previous cache miss, and is accessed to generate prefetch requests for a current cache miss. The prefetch information is updated based on current cache miss information.

Подробнее

Номер записи: 47

16-08-2007 дата публикации

METHOD AND APPARATUS FOR SIMULTANEOUS SPECULATIVE THREADING

Номер: WO000002007092281A2

Автор: CHAUDHRY, Shailender , TREMBLAY, Marc , CAPRIOLI, Paul

Принадлежит:

One embodiment of the present invention provides a system which performs simultaneous speculative threading. The system starts by executing instructions in normal execution mode using a first thread. Upon encountering a data-dependent stall condition, the first thread generates an architectural checkpoint and commences execution of instructions in execute-ahead mode. During execute-ahead mode, the first thread executes instructions that can be executed and defers instructions that cannot be executed into a deferred queue. When the data dependent stall condition has been resolved, the first thread generates a speculative checkpoint and continues execution in execute-ahead mode. At the same time, the second thread commences execution in a deferred mode, wherein the second thread executes instructions deferred by the first thread.

Подробнее

Номер записи: 48

02-08-2007 дата публикации

PROCESSOR HAVING A DATA MOVER ENGINE THAT ASSOCIATES REGISTER ADDRESSES WITH MEMORY ADDRESSES

Номер: WO000002007087270A2

Автор: KISHORE, Karagada, Ramarao , KISSELL, Kevin, D. , THEKKATH, Radhika , RAJAGOPALAN, Vidya , BELOEV, Georgi, Zlatkov

Принадлежит:

A RISC processor having a data moving engine and instructions that associate register addresses with memory addresses. In an embodiment, the instructions include a read-tie instruction, a single write-tie instruction, a dual write-tie instruction, and an untie instruction. The read-tie, single write-tie, and dual write-tie instructions are used to associate software accessible register addresses with memory addresses. These associations effect the operation of the data moving engine such that, for the duration of the associations, the data moving engine routes data to and from associated memory addresses and the execution unit of the processor in response to instructions that specify moving data to and from the associated register addresses. The invention reduces the number of instructions and hardware overhead associated with implementing program loops in a RISC processor.

Подробнее

Номер записи: 49

18-11-2004 дата публикации

SYSTEM AND METHOD TO PREVENT IN-FLIGHT INSTANCES OF OPERATIONS FROM DISRUPTING OPERATION REPLAY WITHIN A DATA-SPECULATIVE MICROPROCESSOR

Номер: WO2004099977A2

Автор: FILIPPO, Michael, A., PICKETT, James, K., SANDER, Benjamin, T.

Принадлежит:

A microprocessor (100) may include one or more functional units (126) configured to execute operations, a scheduler (118) configured to issue operations to the functional units (126) for execution, and at least one replay detection unit. The scheduler (118) may be configured to maintain state information (606) for each operation. Such state information (606) may, among other things, indicate whether an associated operation has completed execution. The replay detection unit may be configured to detect that one of the operations in the scheduler (118) should be replayed. If an instance of that operation is currently being executed by one of the functional units (126) when operation is detected as needing to be replayed, the replay detection unit is configured to inhibit an update to the state information (606) for that operation in response to execution of the in-flight instance of the operation. Various embodiments of computer systems (900) may include such a microprocessor (100).

Подробнее

Номер записи: 50

09-08-2007 дата публикации

CROSS-ARCHITECTURE OPTIMIZATION

Номер: WO000002007089535A3

Автор: FERREN, Bran , HILLIS, W., Daniel , MANGIONE-SMITH, William, Henry , MYHRVOLD, Nathan, P. , TEGREENE, Clarence, T. , WOOD, Lowell, L., Jr.

Принадлежит:

Embodiments include a device, apparatus, and a method. An apparatus includes a monitor circuit for determining an execution characteristic of a first instruction associated with a first computing machine architecture. The apparatus also includes a generator circuit for creating an optimization profile useable in an execution of a second instruction associated with a second computing machine architecture.

Подробнее

Номер записи: 51

13-05-2004 дата публикации

ADAPTABLE DATAPATH FOR A DIGITAL PROCESSING SYSTEM

Номер: WO2004040414A3

Автор: RAMCHANDRAN, Amit

Принадлежит:

The present invention includes a adaptable high-performance node (RXN) with several features that enable it to provide high performance along with adaptability. A preferred embodiment of the RXN includes a run-time configurable data path and control path. The RXN supports multi-precision arithmetic including 8, 16, 24, and 32 bit codes. Data flow can be reconfigured to minimize register accesses for different operations. For example, multiply-accumulate operations can be performed with minimal, or no, register stores by reconfiguration of the data path. Predetermined kernels can be configured during a setup phase so that the RXN can efficiently execute, e.g., discrete cosine transform (DCT), fast-Fourier transform (FFT) and other operations. Other features are provided.

Подробнее

Номер записи: 52

27-03-2008 дата публикации

INTELLIGENT PRE-FETCHING USING COMPOUND OPERATIONS

Номер: WO2008036499A1

Автор: GANAPATHY, Ramanathan , FULLER, Jeffrey C. , GEORGE, Matthew , KRUSE, David M.

Принадлежит:

A system and method for pre-fetching data uses a combination of heuristics to determine likely next data retrieval operations and an evaluation of available resources for executing speculative data operations. When local resources, such as cache memory for storing speculative command results is not available, the compound operation request may not be sent. When resources on a server-side system are insufficient, only the primary command of a compound operation request may be processed and speculative command requests may be rejected. Both local computing resources and network resources may be evaluated when determining whether to build or process a compound operations request.

Подробнее

Номер записи: 53

11-07-2002 дата публикации

SYSTEM AND METHOD FOR PREFETCHING DATA INTO A CACHE BASED ON MISS DISTANCE

Номер: WO2002054230A2

Автор: ABDALLAH, Mohammad, AL-DAJANI, Khalid

Принадлежит:

A prefetcher to prefetch data for an instruction based on the distance between cache misses caused by the instruction. In an embodiment, the prefetcher includes a memory to store a prefetch table that contains one or more entries that include the distance between cache misses caused by an instruction. In a further embodiment, the addresss of data element sprefetched are determined based on the distance betweeen cache misses recorded in the prefetch table for the instruction.

Подробнее

Номер записи: 54

15-05-2003 дата публикации

SYSTEM AND METHOD TO REDUCE EXECUTION OF INSTRUCTIONS INVOLVING UNRELIABLE DATA IN A SPECULATIVE PROCESSOR

Номер: WO2003040916A1

Автор: WILKERSON, Christopher

Принадлежит:

System and method to reduce execution of instructions involving unreliable data in a speculative processor. A method comprises identifying scratch values generated during speculative execution of a processor, and setting at least one tag associated with at least one data area of the processor to indicate that the data area holds a scratch value. Such data areas include registers, predicates, flags, and the like. Instructions may also be similarly tagged. The method may be executed by an execution engine in a computer processor.

Подробнее

Номер записи: 55

11-07-2002 дата публикации

PROCESSOR ARCHITECTURE FOR SPECULATED VALUES

Номер: WO2002054229A1

Автор: STRÖMBERGSON, Joachim, CARLSSON, Magnus, VASELL, Jonas

Принадлежит:

The present invention pertains to a super-scalar processor (1.1) and is intended to make execution of instructions in processor (1.1) more efficient. Processor (1.1) contains a state machine (21) that speculates values of variables. State machine (21) also determines, for each of the speculated values, if there is a first instruction that is dependent upon the specualted value. Processor (1.1) also determines if the speculation of a value has failed and restarts execution from a specified instruction in response to the detection of an incorrectly speculated value. If this is the case, processor (1.1) restarts from the specified instruction that is first affected by the speculated value for which speculation has failed.

Подробнее

Номер записи: 56

16-10-2007 дата публикации

Method and system for safe data dependency collapsing based on control-flow speculation

Номер: US0007284116B2

Автор: Stephan J. Jourdan, Freddy Gabbay, Ronny Ronen, Adi Yoaz, JOURDAN STEPHAN J, GABBAY FREDDY, RONEN RONNY, YOAZ ADI, JOURDAN STEPHAN J.

Принадлежит: Intel Corporation, INTEL CORP, INTEL CORPORATION

The present invention is directed to an apparatus and method for data collapsing based on control-flow speculation (conditional branch predictions). Because conditional branch outcomes are resolved based on actual data values, the conditional branch prediction provides potentially valuable insight into data values. Upon encountering a branch if equal instruction and this instruction is predicted as taken or a branch if not equal instruction and this instruction is predicted as not taken, this invention assumes that the two operands used to determine the conditional branch are equal. The data predictions are safe because a data misprediction means a conditional branch misprediction which results in a pipeline flush of the instructions following the conditional branch instruction including the data mispredictions.

Подробнее

Номер записи: 57

20-04-2006 дата публикации

Processor, data processing system and method for synchronzing access to data in shared memory

Номер: US20060085605A1

Автор: Guy Guthrie, Sheldon Levenstein, William Starke, Derek Williams

Принадлежит: International Business Machines Corporation

A processing unit for a multiprocessor data processing system includes a processor core including a store-through upper level cache, an instruction sequencing unit that fetches instructions for execution, a data register, and at least one instruction execution unit. The instruction execution unit, responsive to receipt of a load-reserve instruction from the instruction sequencing unit, executes the load-reserve instruction to determine a load target address. The processor core, responsive to the execution of the load-reserve instruction, performs a corresponding load-reserve operation by accessing the store-through upper level cache utilizing the load target address to cause data associated with the load target address to be loaded from the store-through upper level cache into the data register and by establishing a reservation for a reservation granule including the load target address.

Подробнее

Номер записи: 58

29-11-2007 дата публикации

Graphics processor with arithmetic and elementary function units

Номер: US20070273698A1

Автор: Yun Du, Guofang Jiao, Chun Yu, Alexei V. Bourd

Принадлежит:

A graphics processor capable of efficiently performing arithmetic operations and computing elementary functions is described. The graphics processor has at least one arithmetic logic unit (ALU) that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. The ALU(s) and elementary function unit(s) may be arranged such that they can operate in parallel to improve throughput. The graphics processor may also include fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) four components of an attribute for one pixel or (2) one component of an attribute for four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost while still providing good performance.

Подробнее

Номер записи: 59

29-08-2000 дата публикации

Processor configured to generate lookahead results from operand collapse unit and for inhibiting receipt/execution of the first instruction based on the lookahead result

Номер: US0006112293A1

Автор: Witt; David B.

Принадлежит: Advanced Micro Devices, Inc.

A processor includes a lookahead address/result calculation unit which is configured to receive operand information (either the operand or a tag identifying the instruction which will produce the operand value) corresponding to the source operands of one or more instructions. If the operands are available, lookahead address/result calculation unit may generate either a lookahead address for a memory operand of the instruction or a lookahead result corresponding to a functional instruction operation of the instruction. The lookahead address may be provided to a load/store unit for early initiation of a memory operation corresponding to the instruction. The lookahead result may be provided to a speculative operand source (e.g. a future file) for updating therein. A lookahead state for a register may thereby be provided early in the pipeline. Subsequent instructions may receive the lookahead state and use the lookahead state to generate additional lookahead state early. On the other hand, ...

Подробнее

Номер записи: 60

07-07-1998 дата публикации

Prefetch instruction for improving performance in reduced instruction set processor

Номер: US0005778423A1

Автор: Sites; Richard Lee, Witek; Richard T.

Принадлежит: Digital Equipment Corporation

A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the ...

Подробнее

Номер записи: 61

12-07-2005 дата публикации

64-bit single cycle fetch scheme for megastar architecture

Номер: US0006918018B2

Автор: Roshan J. Samuel, Jason D. Kridner, SAMUEL ROSHAN J, KRIDNER JASON D, SAMUEL ROSHAN J., KRIDNER JASON D.

Принадлежит: Texas Instruments Incorporated, TEXAS INSTRUMENTS INC, TEXAS INSTRUMENTS INCORPORATED

The 64-bit single cycle fetch method described here relates to a specific 'megastar' core processor employed in a range of new digital signal processor devices. The 'megastar' core incorporates 32-bit memory blocks arranged into separate entities or banks. Because the parent CPU has only three 16-bit buses, a maximum read in one clock cycle through the memory interface would normally be 48-bits. This invention describes an approach for a fetch method involving tapping into the memory bank data at an earlier stage prior to the memory interface. This allows the normal 48-bit fetch to be extended to 64-bits as required for full performance of the numerical processor accelerator and other speed critical operations and functions.

Подробнее

Номер записи: 62

18-01-2000 дата публикации

Apparatus and method for tracking changes in address size and for different size retranslate second instruction with an indicator from address size

Номер: US0006016544A

Автор:

Принадлежит:

An apparatus and method for improving the execution speed of stack segment load operations is provided. Rather than delaying translation of instructions following stack segment loads, until the load is complete, the present invention presumes that no change will be made to the stack address size. Tracking of the stack address size, at the time of translation, is performed by a plurality of SAS bits associated with translated micro instructions, and logic is provided which compares the tracked SAS bits with any change in the stack address size. If no change is made by the stack load operation, the already translated instructions execute immediately. If a change is made by the stack load operation, logic interrupts processing of the translated instructions, and the instructions are retranslated using the new stack address size.

Подробнее

Номер записи: 63

05-06-2001 дата публикации

Method and system for asynchronous array loading

Номер: US0006243822B1

Автор: Boris A. Babaian, Mikhail L. Chudakov, Oleg A. Konopleff, Yuli K. Sakhin, Andrey A. Vechtomov, BABAIAN BORIS A, CHUDAKOV MIKHAIL L, KONOPLEFF OLEG A, SAKHIN YULI K, VECHTOMOV ANDREY A, BABAIAN BORIS A., CHUDAKOV MIKHAIL L., KONOPLEFF OLEG A., SAKHIN YULI K., VECHTOMOV ANDREY A.

Принадлежит: Elbrus International Limited, ELBRUS INTERNAT LTD, ELBRUS INTERNATIONAL LIMITED

The present invention decreases the delay associated with loading an array from memory by employing an asynchronous array preload unit. The asynchronous array preload unit provides continuous preliminary loading of data arrays located in a memory subsystem into a prefetch buffer. Array loading is performed asynchronously with respect to execution of the main program.

Подробнее

Номер записи: 64

04-04-1995 дата публикации

CPU having pipelined instruction unit and effective address calculation unit with retained virtual address capability

Номер: US0005404467A

Автор:

Принадлежит:

A prefetch unit includes a Branch history table for providing an indication of an occurrence of a Branch instruction having a Target Address that was previously taken. A plurality of Branch mark bits are stored in an instruction queue, on a half word basis, in conjunction with a double word of instruction data that is prefetched from an instruction cache. The Branch Target Address is employed to redirect instruction prefetching. The Branch Target Address is also pipelined and follows the associated Branch instruction through an instruction pipeline. The prefetch unit includes circuitry for automatically self-filling the instruction pipeline. During a Fetch stage a previously generated Virtual Effective Address is applied to a translation buffer to generate a physical address which is used to access a data cache. The translation buffer includes a first and a second translation buffer, with the first translation buffer being a reduced subset of the second. The first translation buffer is ...

Подробнее

Номер записи: 65

26-06-2001 дата публикации

Prefetch instruction mechanism for processor

Номер: US0006253306B1

Автор: Amos Ben-Meir, John G. Favor, BEN-MEIR AMOS, FAVOR JOHN G, FAVOR JOHN G.

Принадлежит: Advanced Micro Devices, Inc., ADVANCED MICRO DEVICES INC, ADVANCED MICRO DEVICES, INC.

Accordingly, a prefetch instruction mechanism is desired for implementing a prefetch instruction which is non-faulting, non-blocking, and non-modifying of architectural register state. Advantageously, a prefetch mechanism described herein is provided largely without the addition of substantial complexity to a load execution unit. In one embodiment, the non-faulting attribute of the prefetch mechanism is provided though use of the vector decode supplied Op sequence that activates an alternate exception handler. The non-modifying of architectural register state attribute is provided (in an exemplary embodiment) by first decoding a PREFETCH instruction to an Op sequence targeting a scratch register wherein the scratch register has scope limited to the Op sequence corresponding to the PREFETCH instruction. Although described in the context of a vector decode embodiment, the prefetch mechanism can be implemented with hardware decoders and suitable modifications to decode paths will be appreciated ...

Подробнее

Номер записи: 66

04-06-2002 дата публикации

Apparatus for software initiated prefetch and method therefor

Номер: US0006401192B1

Автор: David Andrew Schroter, Michael Thomas Vaden, SCHROTER DAVID ANDREW, VADEN MICHAEL THOMAS

Принадлежит: International Business Machines Corporation, IBM, INTERNATIONAL BUSINESS MACHINES CORPORATION

A mechanism and method for software hint initiated prefetch is provided. The prefetch may be directed to a prefetch of data for loading into a data cache, instructions for entry into an instruction cache or for either, in an embodiment having a combined cache. In response to a software instruction in an instruction stream, a plurality of prefetch specification data values are loaded into a register having a plurality of entries corresponding thereto. Prefetch specification data values include the address of the first cache line to be prefetched, and the stride, or the incremental offset, of the address of subsequent lines to be prefetched. Prefetch requests are generated by a prefetch control state machine using the prefetch specification data values stored in the register. Prefetch requests are issued to a hierarchy of cache memory devices. If a cache hit occurs having the specified cache coherency, the prefetch is vitiated. Otherwise, the request is passed to system memory for retrieval ...

Подробнее

Номер записи: 67

24-05-2011 дата публикации

Processor architecture with wide operand cache

Номер: US0007948496B2

Автор: Craig Hansen, John Moussouris, Alexia Massalin, HANSEN CRAIG, MOUSSOURIS JOHN, MASSALIN ALEXIA

Принадлежит: MicroUnity Systems Engineering, Inc., MICROUNITY SYSTEMS ENG, MICROUNITY SYSTEMS ENGINEERING, INC.

A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Подробнее

Номер записи: 68

26-04-2011 дата публикации

Processor for executing switch and translate instructions requiring wide operands

Номер: US0007932911B2

Автор: Craig Hansen, John Moussouris, Alexia Massalin, HANSEN CRAIG, MOUSSOURIS JOHN, MASSALIN ALEXIA

Принадлежит: MicroUnity Systems Engineering, Inc., MICROUNITY SYSTEMS ENG, MICROUNITY SYSTEMS ENGINEERING, INC.

A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Подробнее

Номер записи: 69

28-06-2016 дата публикации

Modification of prefetch depth based on high latency event

Номер: US0009378144B2

Автор: John S Dodson, Miles R. Dooley, Benjiman L. Goodman, Jody B. Joyner, Stephen J. Powell, Eric E. Retter, Jeffrey A. Stuecheli, DODSON JOHN S, DOOLEY MILES R, GOODMAN BENJIMAN L, JOYNER JODY B, POWELL STEPHEN J, RETTER ERIC E, STUECHELI JEFFREY A, DOOLEY MILES R., GOODMAN BENJIMAN L., JOYNER JODY B., POWELL STEPHEN J., RETTER ERIC E., STUECHELI JEFFREY A.

Принадлежит: International Business Machines Corporation, IBM, INTERNATIONAL BUSINESS MACHINES CORPORATION

A prefetch stream is established in a prefetch unit of a memory controller for a system memory at a lowest level of a volatile memory hierarchy of the data processing system based on a memory access request received from a processor core. The memory controller receives an indication of an upcoming high latency event affecting access to the system memory. In response to the indication, the memory controller temporarily increases a prefetch depth of the prefetch stream with respect to the system memory and issues, to the system memory, a plurality of prefetch requests in accordance with the temporarily increased prefetch depth in advance of the upcoming high latency event.

Подробнее

Номер записи: 70

01-03-2007 дата публикации

Cryptography methods and apparatus

Номер: US2007050641A1

Автор: FLYNN WILLIAM T, SHEDIVY DAVID A, FLYNN WILLIAM T., SHEDIVY DAVID A.

Принадлежит:

In a first aspect, a first cryptography method is provided. The first method includes the steps of (1) in response to receiving a request to perform a first operation on data in a first memory cacheline, accessing data associated with the first memory cacheline; (2) performing cryptography on data of the first memory cacheline when necessary; and (3) speculatively accessing data associated with a second memory cacheline based on the first memory cacheline before receiving a request to perform an operation on data in the second memory cacheline. Numerous other aspects are provided.

Подробнее

Номер записи: 71

18-11-2004 дата публикации

System and method to handle operand re-fetch after invalidation with out-of-order fetch

Номер: US2004230776A1

Автор:

Принадлежит:

A system and method to re-fetch data lost for instructions with operands greater than eight bytes in length due to line invalidation in a multiprocessor computer system using microprocessors that perform out of order operand fetch in which it is not possible or desirable to kill the execution of the instruction when the storage access rules require that it appear that the operand data is accessed in program execution order.

Подробнее

Номер записи: 72

06-12-2001 дата публикации

High-speed data processing using internal processor memory space

Номер: US2001049744A1

Автор:

Принадлежит:

Significant performance improvements can be realized in data processing systems by confining the operation of a processor within its internal register file so as to reduce the instruction count executed by the processor. Data, which is sufficiently small enough to fit within the internal register file, can be transferred into the internal register file, and execution results can be removed therefrom, using direct memory accesses that are independent of the processor, thus enabling the processor to avoid execution of load and store instructions to manipulate externally stored data. Further, the data and execution results of the processing activity are also accessed and manipulated by the processor entirely within the internal register file. The reduction in instruction count, coupled with the standardization of multiple processors and their instruction sets, enables the realization of a highly scaleable, high-performing symmetrical multi-processing system at manageable complexity and cost ...

Подробнее

Номер записи: 73

31-12-2019 дата публикации

Determining the effectiveness of prefetch instructions

Номер: US0010521350B2

Автор: Michael K. Gschwind, Christian Jacobi, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel, GSCHWIND MICHAEL K, JACOBI CHRISTIAN, SAPORITO ANTHONY, SHUM CHUNG-LUNG K, SLEGEL TIMOTHY J, Gschwind, Michael K., Jacobi, Christian, Saporito, Anthony, Shum, Chung-Lung K., Slegel, Timothy J.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, IBM

Effectiveness of prefetch instructions is determined. A prefetch instruction is executed to request that data be fetched into a cache of the computing environment. The effectiveness of the prefetch instruction is determined. This includes updating, based on executing the prefetch instruction, a cache directory of the cache. The updating includes, in the cache directory, effectiveness data relating to the data. The effectiveness data includes whether the data was installed in the cache based on the prefetch instruction. Additionally, the determining the effectiveness includes obtaining at least a portion of the effectiveness data from the cache directory, and using the at least a portion of effectiveness data to determine the effectiveness of the prefetch instruction.

Подробнее

Номер записи: 74

12-12-2019 дата публикации

METHOD FOR PERFORMING RANDOM READ ACCESS TO A BLOCK OF DATA USING PARALLEL LUT READ INSTRUCTION IN VECTOR PROCESSORS

Номер: US20190377578A1

Автор: Jayasree Sankaranarayanan, Dipan Kumar Mandal

Принадлежит:

This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.

Подробнее

Номер записи: 75

04-07-2002 дата публикации

Multi-mode non-binary predictor

Номер: US2002087850A1

Автор:

Принадлежит:

A multi-mode predictor for a processor having a plurality of prediction modes is disclosed. The prediction modes are used to predict non-binary values. The predictor is a multi-mode predictor comprising a per-IP ("PIP") table and a next value table. The PIP table includes a plurality of PIP information fields and the next value table includes a plurality of fields. The multi-mode predictor also includes a plurality of prediction modes. The processor includes a set of instructions that index the PIP table to provide a valid signal. The processor also includes a set of predicted values for the set of instructions. The set of predicted values is stored in the PIP table and the next value table. According to the valid signal a hit/miss condition in the next value table, a predicted value is selected from the PIP table or the next value table.

Подробнее

Номер записи: 76

20-06-2019 дата публикации

TWO ADDRESS TRANSLATIONS FROM A SINGLE TABLE LOOK-ASIDE BUFFER READ

Номер: US20190188151A1

Автор: Joseph Zbiciak, Son H. Tran, ZBICIAK JOSEPH, TRAN SON H, Zbiciak, Joseph, Tran, Son H.

Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.

Подробнее

Номер записи: 77

27-04-2021 дата публикации

Mechanism for interrupting and resuming execution on an unprotected pipeline processor

Номер: US0010990398B2

Автор: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca, ANDERSON TIMOTHY D, ZBICIAK JOSEPH, CHIRCA KAI, Anderson, Timothy D., Zbiciak, Joseph, Chirca, Kai

Принадлежит: Texas Instruments Incorporated, TEXAS INSTRUMENTS INC, TEXAS INSTRUMENTS INCORPORATED

Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.

Подробнее

Номер записи: 78

16-12-2021 дата публикации

STREAMING ENGINE WITH ERROR DETECTION, CORRECTION AND RESTART

Номер: US20210390018A1

Автор: Joseph Zbiciak, Timothy Anderson

Принадлежит:

Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.

Подробнее

Номер записи: 79

14-11-2002 дата публикации

Scalable processer

Номер: US2002169947A1

Автор:

Принадлежит:

A method and apparatus for issuing and executing memory instructions from a computer system so as to (1) maximize the number of requests issued to a highly pipe-lined memory, the only limitation being data dependencies in the program and (2) avoid reading data from memory before a corresponding write to memory. The memory instructions are organized to read and write into memory, by using explicit move instructions, thereby avoiding any data storage limitations in the processor. The memory requests are organized to carry complete information, so that they can be processed independently when memory returns the requested data. The memory is divided into a number of regions, each of which is associated with a fence counter. The fence counter for a memory region is incremented each time a memory instruction that is targeted to the memory region is issued and decremented each time there is a write to the memory region. After a fence instruction is issued, no further memory instructions are issued ...

Подробнее

Номер записи: 80

17-03-2020 дата публикации

Streaming engine with cache-like stream data storage and lifetime tracking

Номер: US0010592243B2

Автор: Joseph Zbiciak, ZBICIAK JOSEPH, Zbiciak, Joseph

Принадлежит: TEXAS INSTRUMENTS INCORPORATED, TEXAS INSTRUMENTS INC

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.

Подробнее

Номер записи: 81

17-08-2021 дата публикации

Systems and methods to load a tile register pair

Номер: US0011093247B2

Автор: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, SADE RAANAN, RUBANOVICH SIMON, GRADSTEIN AMIT, SPERBER ZEEV, HEINECKE ALEXANDER, VALENTINE ROBERT, CHARNEY MARK J, TOLL BRET, CORBAL JESUS, OULD-AHMED-VALL ELMOUSTAPHA, ADELMAN MENACHEM, Sade, Raanan, Rubanovich, Simon, Gradstein, Amit, Sperber, Zeev, Heinecke, Alexander, Valentine, Robert, Charney, Mark J., Toll, Bret, Corbal, Jesus, Ould-Ahmed-Vall, Elmoustapha, Adelman, Menachem

Принадлежит: Intel Corporation, INTEL CORP

Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

Подробнее

Номер записи: 82

22-12-2022 дата публикации

INSTRUCTION TO QUERY FOR MODEL-DEPENDENT INFORMATION

Номер: US20220405100A1

Автор: Timothy Slegel, Laith M. AlBarakat, Jonathan D. Bradbury, Cedric Lichtenau, Simon Weishaupt

Принадлежит:

An instruction is executed to perform a query function. The executing includes obtaining information relating to a selected model of a processor. The information includes at least one model-dependent data attribute of the selected model of the processor. The information is placed in a selected location for use by at least one application in performing one or more functions.

Подробнее

Номер записи: 83

05-04-2022 дата публикации

Method and apparatus for vector permutation

Номер: US0011294826B2

Автор: Timothy David Anderson, Mujibur Rahman, Dheera Balasubramanian Samudrala, Peter Richard Dent, Duc Quang Bui

Принадлежит: Texas Instruments Incorporated

A method is provided that includes performing, by a processor in response to a vector permutation instruction, permutation of values stored in lanes of a vector to generate a permuted vector, wherein the permutation is responsive to a control storage location storing permute control input for each lane of the permuted vector, wherein the permute control input corresponding to each lane of the permuted vector indicates a value to be stored in the lane of the permuted vector, wherein the permute control input for at least one lane of the permuted vector indicates a value of a selected lane of the vector is to be stored in the at least one lane, and storing the permuted vector in a storage location indicated by an operand of the vector permutation instruction.

Подробнее

Номер записи: 84

17-05-2006 дата публикации

Probing computer memory latency

Номер: EP0000933698B1

Автор: Morris, Dale C., Hunt, Douglas B.

Принадлежит: Hewlett-Packard Company

Подробнее

Номер записи: 85

07-06-2006 дата публикации

Microprocessor with variable latency stack cache

Номер: EP0001555617A3

Автор: Hooker, Rodney E.

Принадлежит:

A variable latency cache memory is disclosed. The cache memory includes a plurality of storage elements for storing stack memory data in a first-in-first-out manner. The cache memory distinguishes between pop and load instruction requests and provides pop data faster than load data by speculating that pop data will be in the top cache line of the cache. The cache memory also speculates that stack data requested by load instructions will be in the top one or more cache lines of the cache memory. Consequently, if the source virtual address of a load instruction hits in the top of the cache memory, the data is speculatively provided faster than the case where the data is in a lower cache line or where a full physical address compare is required or where the data must be provided from a non-stack cache memory in the microprocessor, but slower than pop data.

Подробнее

Номер записи: 86

18-08-2004 дата публикации

Information processing apparatus and method, and scheduling device

Номер: EP0000797143B1

Автор: Kakiage, Touru, Suzuki, Masato

Принадлежит: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.

Подробнее

Номер записи: 87

24-04-1991 дата публикации

Translation look ahead based cache access

Номер: EP0000424163A3

Автор: Celtruda, Joseph Orazio, Hua, Kien Anh, Hunt, Anderson Herrick, Liu, Lishing, Peir, Jih-Kwon, Pruett, David Raymond, Temple III, Joseph Lester

Принадлежит:

Подробнее

Номер записи: 88

14-12-1994 дата публикации

CPU HAVING PIPELINED INSTRUCTION UNIT AND EFFECTIVE ADDRESS CALCULATION UNIT WITH RETAINED VIRTUAL ADDRESS CAPABILITY

Номер: EP0000628184A1

Автор: SABA, John, A., SCHWARTZ, Martin, J., TANG-KONG, Richard

Принадлежит:

A prefetch unit includes a Branch history table for providing an indication of an occurrence of a Branch instruction having a Target Address that was previously taken. A plurality of Branch mark bits are stored in an instruction queue, on a half word basis, in conjunction with a double word of instruction data that is prefetched from an instruction cache. The Branch Target Address is employed to redirect instruction prefetching. The Branch Target Address is also pipelined and follows the associated Branch instruction through an instruction pipeline. The prefetch unit includes circuitry for automatically self-filling the instruction pipeline. During a Fetch stage a previously generated Virtual Effective Address is applied to a translation buffer to generate a physical address which is used to access a data cache. The translation buffer includes a first and a second translation buffer, with the first translation buffer being a reduced subset of the second. The first translation buffer is ...

Подробнее

Номер записи: 89

14-11-2001 дата публикации

PROCESSOR CONFIGURED TO PREDECODE RELATIVE CONTROL TRANSFER INSTRUCTIONS

Номер: EP0001031075B1

Автор: WITT, David, B.

Принадлежит: ADVANCED MICRO DEVICES INC.

Подробнее

Номер записи: 90

09-01-2002 дата публикации

A computer system having cache prefetching capability based on CPU request types

Номер: EP0000803817B1

Автор: Genduso, Thomas Basilio, Vanderslice, Edward Robert

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Подробнее

Номер записи: 91

28-10-1998 дата публикации

CPU HAVING PIPELINED INSTRUCTION UNIT AND EFFECTIVE ADDRESS CALCULATION UNIT WITH RETAINED VIRTUAL ADDRESS CAPABILITY

Номер: EP0000628184B1

Автор: SABA, John, A., SCHWARTZ, Martin, J., TANG-KONG, Richard

Принадлежит: Samsung Electronics Co., Ltd.

Подробнее

Номер записи: 92

30-01-1980 дата публикации

DATA PROCESSING SYSTEM

Номер: JP0055013498A

Автор: JIEEMUZU OO ROZERU, DEBITSUDO DABURIYU ABUMAIYAA

Принадлежит:

Подробнее

Номер записи: 93

10-05-1979 дата публикации

DATENVERARBEITUNGSEINRICHTUNG MIT EINEM MIKROBEFEHLSSPEICHER

Номер: DE0002847934A1

Автор: WARD WILLIAM PEARSON, GILLOW GEORGE BRACEY, PEARSON WARD,WILLIAM, BRACEY GILLOW,GEORGE

Принадлежит:

Подробнее

Номер записи: 94

16-10-2008 дата публикации

Verfahren und Vorrichtung zum spekulativen Vorabruf in einer Mehrprozessor-/Mehrkern-Nachrichtenübermittlungsmaschine

Номер: DE102008016178A1

Автор: KUNZE AARON, JOHNSON ERIK, GARTLER HERMANN, KUNZE, AARON, JOHNSON, ERIK, GARTLER, HERMANN

Принадлежит:

In einigen Ausführungsformen schließt die Erfindung eine neuartige Kombination aus Verfahren zum Vorabruf (Prefetching) von Daten und Übermitteln von Nachrichten zwischen und unter Kernen in einer Mehrprozessor-/Mehrkernplattform ein. In einer Ausführungsform weist ein Empfangskern eine Nachrichtenwarteschlange und einen Nachrichten-Prefetcher auf. Ankommende Nachrichten werden gleichzeitig in die Nachrichtenwarteschlange und den Nachrichten-Prefetcher geschrieben. Der Prefetcher ruft die Daten, die in der empfangenen Nachricht referenziert sind, spekulativ ab, so daß die Daten zur Verfügung stehen, wenn die Nachricht in der Ausführungs-Pipeline ausgeführt wird oder kurz danach. Andere Ausführungsformen werden beschrieben und beansprucht.

Подробнее

Номер записи: 95

29-09-2005 дата публикации

Datenverarbeitungssystem mit mehreren Prozessoren, die eine Registerbank gemeinsam benutzen

Номер: DE0069826404T2

Автор: SIDMAN STEVEN B, SIDMAN, STEVEN B.

Принадлежит: SHARP KK, SHARP K.K., OSAKA

Подробнее

Номер записи: 96

30-09-2020 дата публикации

Systems, apparatuses, and methods for stride pattern gathering of data elements and stride pattern scattering of data elements

Номер: GB0002503169B

Автор: ROBERT C VALENTINE, CHRISTOPHER J HUGHES, JESUS CORBAL SAN ADRIAN, ROGER ESPASA SANS, BRET L TOLL, MILIND BABURAO GIRKAR, ANDREW THOMAS FORSYTH, EDWARD THOMAS GROCHOWSKI, JONATHAN CANNON HALL, Robert C Valentine, Christopher J Hughes, Jesus Corbal San Adrian, Roger Espasa Sans, Bret L Toll, Milind Baburao Girkar, Andrew Thomas Forsyth, Edward Thomas Grochowski, Jonathan Cannon Hall

Принадлежит: INTEL CORP, Intel Corporation

Подробнее

Номер записи: 97

23-12-2020 дата публикации

Gateway pull model

Номер: GB0002579412B

Автор: BRIAN MANULA, Brian Manula

Принадлежит: GRAPHCORE LTD, Graphcore Limited

Подробнее

Номер записи: 98

24-11-1993 дата публикации

MULTI-LEVEL CACHE SYSTEM

Номер: GB0009320511D0

Автор:

Принадлежит:

Подробнее

Номер записи: 99

09-09-2009 дата публикации

Prefetching from a dynamic random acess memory to a static random access memory

Номер: GB0002445262B

Автор: BLACK BRYAN P, ANNAVARAM MURALI M, MCCAULEY DONALD W, DEVALE JOHN P, BRYAN P BLACK, MURALI M ANNAVARAM, DONALD W MCCAULEY, JOHN P DEVALE

Принадлежит: INTEL CORP, INTEL CORPORATION

Подробнее

Номер записи: 100

26-01-2012 дата публикации

Parallel loop management

Номер: US20120023316A1

Автор: Brian Flachs, Charles R. Johns, Ulrich Weigand

Принадлежит: International Business Machines Corp

The illustrative embodiments comprise a method, data processing system, and computer program product having a processor unit for processing instructions with loops. A processor unit creates a first group of instructions having a first set of loops and second group of instructions having a second set of loops from the instructions. The first set of loops have a different order of parallel processing from the second set of loops. A processor unit processes the first group. The processor unit monitors terminations in the first set of loops during processing of the first group. The processor unit determines whether a number of terminations being monitored in the first set of loops is greater than a selectable number of terminations. In response to a determination that the number of terminations is greater than the selectable number of terminations, the processor unit ceases processing the first group and processes the second group.

Подробнее

Номер записи: 101

08-03-2012 дата публикации

Method and apparatus for handling critical blocking of store-to-load forwarding

Номер: US20120059971A1

Автор: Bradley Burgess, Christopher D. Bryant, David Kaplan, Tarun Nakra

Принадлежит: Advanced Micro Devices Inc

The present invention provides a method and apparatus for handling critical blocking of store-to-load forwarding. One embodiment of the method includes recording a load that matches an address of a store in a store queue before the store has valid data. The load is blocked because the store does not have valid data. The method also includes replaying the load in response to the store receiving valid data so that the valid data is forwarded from the store queue to the load.

Подробнее

Номер записи: 102

22-03-2012 дата публикации

Constant Buffering for a Computational Core of a Programmable Graphics Processing Unit

Номер: US20120069033A1

Автор: John Brothers, Yang (Jeff) Jiao, Yijung Su

Принадлежит: Via Technologies Inc

Embodiments of the present disclosure are directed to graphics processing systems, comprising: a plurality of execution units, wherein one of the execution units is configurable to process a thread corresponding to a rendering context, wherein the rendering context comprises a plurality of constants with a priority level; a constant buffer configurable to store the constants of the rendering context into a plurality of slot in a physical storage space; and an execution unit control unit configurable to assign the thread to one of the execution units; a constant buffer control unit providing a translation table for the rendering context to map the corresponding constants into the slots of the physical storage space. Comparable methods are also disclosed.

Подробнее

Номер записи: 103

05-04-2012 дата публикации

Tracking written addresses of a shared memory of a multi-core processor

Номер: US20120084498A1

Автор: Charles Edwards Peet, JR., Chris Randall Stone, David Sonnier

Принадлежит: LSI Corp

Described embodiments provide a method of controlling processing flow in a network processor having one or more processing modules. A given one of the processing modules loads a script into a compute engine. The script includes instructions for the compute engine. The given one of the processing modules loads a register file into the compute engine. The register file includes operands for the instructions of the loaded script. A tracking vector of the compute engine is initialized to a default value, and the compute engine executes the instructions of the loaded script based on the operands of the loaded register file. The compute engine updates corresponding portions of the register file with updated data corresponding to the executed script. The tracking vector tracks the updated portions of the register file. The compute engine provides the tracking vector and the updated register file to the given one of the processing modules.

Подробнее

Номер записи: 104

03-05-2012 дата публикации

Out-of-order load/store queue structure

Номер: US20120110280A1

Автор: Christopher D. Bryant

Принадлежит: Advanced Micro Devices Inc

The present invention provides a method and apparatus for supporting embodiments of an out-of-order load/store queue structure. One embodiment of the apparatus includes a first queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes one or more additional queues for storing memory operation in response to completion of a memory operation. The embodiment of the apparatus is configured to remove the memory operation from the first queue in response to the completion.

Подробнее

Номер записи: 105

17-05-2012 дата публикации

Retirement serialisation of status register access operations

Номер: US20120124340A1

Автор: James Nolan Hardage

Принадлежит: ARM LTD

A processor 2 for performing out-of-order execution of a stream of program instructions includes a special register access pipeline for performing status access instructions accessing a status register 20 . In order to serialise these status access instructions relative to other instructions within the system access timing control circuitry 32 permits dispatch of other instructions to proceed but controls the commit queue and the result queue such that no program instructions in program order succeeding the status access instruction are permitted to complete until after a trigger state has been detected in which all program instructions preceding in program order the status access instruction have been performed and made any updates to the architectural state. This is followed by the performance of the status access instruction itself.

Подробнее

Номер записи: 106

07-06-2012 дата публикации

Unified scheduler for a processor multi-pipeline execution unit and methods

Номер: US20120144173A1

Автор: Ganesh VENKATARAMANAN, Mike Butler, Sean Lie

Принадлежит: Advanced Micro Devices Inc

A unified scheduler for a processor execution unit and methods are disclosed for providing faster throughput of micro-instruction/operation execution with respect to a multi-pipeline processor execution unit. In one example, an execution unit has a plurality of pipelines that operate at a predetermined clock rate, each pipeline configured to process a selected subset of microinstructions. The execution unit has a scheduler that includes a unified queue configured to queue microinstructions for all of the pipelines and a picker configured to direct a queued microinstruction to an appropriate pipeline for processing based on an indication of readiness for picking. Preferably, when all of the pipelines are ready to receive a microinstruction for processing and there is at least one microinstruction queued that is ready for picking for each pipeline, the picker picks and directs a queued microinstructions to each of the pipelines in a single clock cycle.

Подробнее

Номер записи: 107

14-06-2012 дата публикации

Cache Line Fetching and Fetch Ahead Control Using Post Modification Information

Номер: US20120151150A1

Автор: Alexander Rabinovitch, Leonid Dubrovin

Принадлежит: LSI Corp

A method is provided for performing cache line fetching and/or cache fetch ahead in a processing system including at least one processor core and at least one data cache operatively coupled with the processor. The method includes the steps of: retrieving post modification information from the processor core and a memory address corresponding thereto; and the processing system performing, as a function of the post modification information and the memory address retrieved from the processor core, cache line fetching and/or cache fetch ahead control in the processing system.

Подробнее

Номер записи: 108

26-07-2012 дата публикации

Predicting a result for an actual instruction when processing vector instructions

Номер: US20120191957A1

Автор: Jeffry E. Gonion

Принадлежит: Apple Inc

The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters an Actual instruction. Upon determining that a result of the Actual instruction is predictable, the processor dispatches a prediction micro-operation associated with the Actual instruction, wherein the prediction micro-operation generates a predicted result vector for the Actual instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.

Подробнее

Номер записи: 109

26-07-2012 дата публикации

Sharing a fault-status register when processing vector instructions

Номер: US20120192005A1

Автор: Jeffry E. Gonion

Принадлежит: Apple Inc

The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition. However, the processor does not update bit positions in the architectural FSR upon encountering a fault condition for the first vector instruction and the subsequent vector instructions.

Подробнее

Номер записи: 110

09-08-2012 дата публикации

Embedded opcode within an intermediate value passed between instructions

Номер: US20120204006A1

Автор: Jorn Nystad

Принадлежит: ARM LTD

A data processing system 2 is used to evaluate a data processing function by executing a sequence of program instructions including an intermediate value generating instruction Inst 0 and an intermediate value consuming instruction Inst 1 . In dependence upon one or more input operands to the evaluation, an embedded opcode within the intermediate value passed between the intermediate value generating instruction and the intermediate value consuming instruction may be set to have a value indicating that a substitute instruction should be used in place of the intermediate value consuming instruction. The instructions may be floating point instructions, such as a floating point power instruction evaluating the data processing function a b .

Подробнее

Номер записи: 111

16-08-2012 дата публикации

Running unary operation instructions for processing vectors

Номер: US20120210099A1

Автор: Jeffry E. Gonion

Принадлежит: Apple Inc

During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.

Подробнее

Номер записи: 112

27-09-2012 дата публикации

Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines

Номер: US20120246450A1

Автор: Mohammad Abdallah

Принадлежит: Soft Machines Inc

A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.

Подробнее

Номер записи: 113

27-09-2012 дата публикации

Method and apparatus for enhancing scheduling in an advanced microprocessor

Номер: US20120246453A1

Автор: Charles R. Price, Godfrey P. D'Souza, Guillermo J. Rozas, Paul S. Serris

Принадлежит: D Souza Godfrey P, Price Charles R, Rozas Guillermo J, Serris Paul S

Apparatus and a method for causing scheduler software to produce code which executes more rapidly by ignoring some of the normal constraints placed on its scheduling operations and simply scheduling certain instructions to run as fast as possible, raising an exception if the scheduling violates a scheduling constraint, and determining steps to be taken for correctly executing each set of instructions about which an exception is raised.

Подробнее

Номер записи: 114

18-10-2012 дата публикации

Allocation of counters from a pool of counters to track mappings of logical registers to physical registers for mapper based instruction executions

Номер: US20120265969A1

Автор: Brian D. Barrick, Gregory W. Alexander, John W. Ward, III

Принадлежит: International Business Machines Corp

A computer system assigns a particular counter from among a plurality of counters currently in a counter free pool to count a number of mappings of logical registers from among a plurality of logical registers to a particular physical register from among a plurality of physical registers, responsive to an execution of an instruction by a mapper unit mapping at least one logical register from among the plurality of logical registers to the particular physical register, wherein the number of the plurality of counters is less than a number of the plurality of physical registers. The computer system, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool.

Подробнее

Номер записи: 115

03-01-2013 дата публикации

Processing vectors using wrapping add and subtract instructions in the macroscalar architecture

Номер: US20130007422A1

Автор: Jeffry E. Gonion

Принадлежит: Apple Inc

Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.

Подробнее

Номер записи: 116

14-02-2013 дата публикации

Word line late kill in scheduler

Номер: US20130042089A1

Автор: James Vinh, Kyle S. Viau, Srikanth Arekapudi

Принадлежит: Advanced Micro Devices Inc

A method for picking an instruction for execution by a processor includes providing a multiple-entry vector, each entry in the vector including an indication of whether a corresponding instruction is ready to be picked. The vector is partitioned into equal-sized groups, and each group is evaluated starting with a highest priority group. The evaluating includes logically canceling all other groups in the vector when a group is determined to include an indication that an instruction is ready to be picked, whereby the vector only includes a positive indication for the one instruction that is ready to be picked.

Подробнее

Номер записи: 117

21-02-2013 дата публикации

Memory Management Unit Tag Memory with CAM Evaluate Signal

Номер: US20130046927A1

Автор: Ravindraraj Ramaraju

Принадлежит: Individual

A method and data processing system for accessing an entry in a memory array by placing a tag memory unit ( 114 ) in parallel with an operand adder circuit ( 112 ) to enable tag lookup and generation of speculative way hit/miss information ( 126 ) directly from the operands ( 111, 113 ) without using the output sum of the operand adder. PGZ-encoded address bits ( 0:51 ) from the operands ( 111, 113 ) are applied with a carry-out value (Cout 48 ) to a content-addressable memory array ( 114 ) having compact bitcells with embedded partial A+B=K logic to generate two speculative hit/miss signals under control of a delayed evaluate signal. A sum value (EA 51 ) computed from the least significant base and offset address bits determines which of the speculative hit/miss signals is selected for output ( 126 ).

Подробнее

Номер записи: 118

28-03-2013 дата публикации

Method, apparatus and instructions for parallel data conversions

Номер: US20130080742A1

Автор: Gopalan Ramanujam

Принадлежит: Individual

Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.

Подробнее

Номер записи: 119

04-04-2013 дата публикации

Tracking operand liveliness information in a computer system and performance function based on the liveliness information

Номер: US20130086367A1

Автор: Michael K. Gschwind, Valentina Salapura

Принадлежит: International Business Machines Corp

Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.

Подробнее

Номер записи: 120

04-04-2013 дата публикации

Generating compiled code that indicates register liveness

Номер: US20130086548A1

Автор: Michael K. Gschwind, Valentina Salapura

Принадлежит: International Business Machines Corp

Object code is generated from an internal representation that includes a plurality of source operands. The generating includes performing for each source operand in the internal representation determining whether a last use has occurred for the source operand. The determining includes accessing a data flow graph to determine whether all uses of a live range have been emitted. If it is determined that a last use has occurred for the source operand, an architected resource associated with the source operand is marked for last-use indication. A last-use indication is then generated for the architected resource. Instructions and the last-use indications are emitted into the object code.

Подробнее

Номер записи: 121

25-04-2013 дата публикации

Data processing device and method, and processor unit of same

Номер: US20130103930A1

Автор: Takashi Horikawa

Принадлежит: NEC Corp

A processor unit ( 200 ) includes: cache memory ( 210 ); an instruction execution unit ( 220 ); a processing unit ( 230 ) that detects fact that a thread enters an exclusive control section which is specified in advance to become a bottleneck; a processing unit ( 240 ) that detects a fact that the thread exits the exclusive control section; and an execution flag ( 250 ) that indicates whether there is the thread that is executing a process in the exclusive control section based on detection results. The cache memory ( 210 ) temporarily stores a priority flag in each cache entry, and the priority flag indicates whether data is to be used during execution in the exclusive control section. When the execution flag ( 250 ) is set, the processor unit ( 200 ) sets the priority flag that belongs to an access target of cache entries. The processor unit ( 200 ) leaves data used in the exclusive control section in the cache memory by determining a replacement target of cache entries using the priority flag when a cache miss occurs.

Подробнее

Номер записи: 122

09-05-2013 дата публикации

Low overhead operation latency aware scheduler

Номер: US20130117543A1

Автор: Ganesh VENKATARAMANAN, Michael G. Butler

Принадлежит: Advanced Micro Devices Inc

A method and apparatus for processing multi-cycle instructions include picking a multi-cycle instruction and directing the picked multi-cycle instruction to a pipeline. The pipeline includes a pipeline control configured to detect a latency and a repeat rate of the picked multi-cycle instruction and to count clock cycles based on the detected latency and the detected repeat rate. The method and apparatus further include detecting the repeat rate and the latency of the picked multi-cycle instruction, and counting clock cycles based on the detected repeat rate and the latency of the picked multi-cycle instruction.

Подробнее

Номер записи: 123

16-05-2013 дата публикации

Processor with power control via instruction issuance

Номер: US20130124900A1

Автор: Ali Ibrahim, Brian D. Emberling, Jeff Herman, Krishna Sitaraman, Seth Hendrickson, Stephen D. Presant

Принадлежит: Advanced Micro Devices Inc

Methods and apparatuses are provided for power control in a processor. The apparatus comprises a plurality of operational units arranged as a group of operational units. A power consumption monitor determines when cumulative power consumption of the group of operational units exceeds a threshold (e.g., either or both of the cumulative power threshold and the cumulative power rate threshold) during a time interval, after which a filter for issuing instructions to the group of operational units suspends instruction issuance to the group of operational units for the remainder of the time interval. The method comprises monitoring cumulative power consumption by a group of operational units within a processor over a time interval. If the cumulative power consumption of the group of operational units exceeds the threshold, instruction issuance to the group of operational units is suspended for the remainder of the time interval.

Подробнее

Номер записи: 124

06-06-2013 дата публикации

System and method for performing shaped memory access operations

Номер: US20130145124A1

Автор: Jack Hilaire Choquette, Manuel Olivier Gautho, Ming Y. (Michael) Siu, Xiaogang Qiu

Принадлежит: Individual

One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.

Подробнее

Номер записи: 125

13-06-2013 дата публикации

Micro architecture for indirect access to a register file in a processor

Номер: US20130151818A1

Автор: Alejandro Rico Carro, Amit Golander, Erez Barak, Jeffrey H. Derby, Nadav Levison, Omer Heymann, Robert K. Montoye, Sagi Manole

Принадлежит: International Business Machines Corp

A method and system for improving performance and latency of instruction execution within an execution pipeline in a processor. The method includes finding, while decoding an instruction, a pointer register used by the instruction; reading the pointer register; validating a pointer register entry; reading, if the pointer register entry is valid, a register file entry; validating a register file entry; validating, if the register file entry is invalid, a valid register file entry wherein the valid register file entry is in the register file's future file; bypassing, if the valid register file entry is valid, a valid register file value from the register file's future file to the execution pipeline wherein the valid register file value is in the valid register file entry; and executing the instruction using the valid register file value; wherein at least one of the steps is carried out using a computer device.

Подробнее

Номер записи: 126

04-07-2013 дата публикации

Processor for Executing Wide Operand Operations Using a Control Register and a Results Register

Номер: US20130173888A1

Автор: Alexia Massalin, Craig Hansen, John Moussouris

Принадлежит: Microunity Systems Engineering Inc

A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Подробнее

Номер записи: 127

18-07-2013 дата публикации

Use of Loop and Addressing Mode Instruction Set Semantics to Direct Hardware Prefetching

Номер: US20130185516A1

Автор: Elizabeth Abraham, Lucian Codrescu, Peter G. Sassone, Suman Mamidi, Suresh K. Venkumahanti

Принадлежит: Qualcomm Inc

Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an auto-increment-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines.

Подробнее

Номер записи: 128

12-09-2013 дата публикации

Method, apparatus and instructions for parallel data conversions

Номер: US20130238879A1

Автор: Gopalan Ramanujam

Принадлежит: Individual

Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits. The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.

Подробнее

Номер записи: 129

03-10-2013 дата публикации

Methods and apparatus to avoid surges in di/dt by throttling gpu execution performance

Номер: US20130262831A1

Автор: Jack Hilaire Choquette, Olivier Giroux, Peter Michael NELSON

Принадлежит: Nvidia Corp

Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

Подробнее

Номер записи: 130

03-10-2013 дата публикации

Instruction Scheduling for Reducing Register Usage

Номер: US20130262832A1

Автор: Gang Chen, Srinivasa B. Yadavalli

Принадлежит: Advanced Micro Devices Inc

A method, computer program product, and system are provided for scheduling a plurality of instructions in a computing system. For example, the method can generate a plurality of instruction lineages, in which the plurality of instruction lineages is assigned to one or more registers. Each of the plurality of instruction lineages has at least one node representative of an instruction from the plurality of instructions. The method can also determine a node order based on respective priority values associated with each of the nodes. Further, the method can include scheduling the plurality of instructions based on the node order and the one or more registers assigned to the one or more registers.

Подробнее

Номер записи: 131

03-10-2013 дата публикации

Memory Disambiguation Hardware To Support Software Binary Translation

Номер: US20130262838A1

Автор: Abhay S. KANHERE, Arvind Krishnaswamy, Muawya M. Al-Otoom, Omar M. Shaikh, Paul Caprioli

Принадлежит: Intel Corp

A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.

Подробнее

Номер записи: 132

31-10-2013 дата публикации

Method and device for determining parallelism of tasks of a program

Номер: US20130290975A1

Автор: Jeffrey V. Olivier, Paul M. Petersen, Zhiqiang Ma

Принадлежит: Intel Corp

A method and device for determining parallelism of tasks of a program comprises generating a task data structure to track the tasks and assigning a node of the task data structure to each executing task. Each node includes a task identification number and a wait number. The task identification number uniquely identifies the corresponding task from other currently executing tasks and the wait number corresponds to the task identification number of a node corresponding to the last descendant task of the corresponding task that was executed prior to a wait command. The parallelism of the tasks is determined by comparing the relationship between the tasks.

Подробнее

Номер записи: 133

07-11-2013 дата публикации

Semiconductor device

Номер: US20130297916A1

Автор: Hitoshi Suzuki, Koji Adachi

Принадлежит: Renesas Electronics Corp

A related art semiconductor device suffers from a problem that a processing capacity is decayed by switching an occupied state for each partition. A semiconductor device according to the present invention includes an execution unit that executes an arithmetic instruction, and a scheduler including multiple first setting registers each defining a correspondence relationship between hardware threads and partitions, and generates a thread select signal on the basis of a partition schedule and a thread schedule. The scheduler outputs a thread select signal designating a specific hardware thread without depending on the thread schedule as the partition indicated by a first occupation control signal according to a first occupation control signal output when the execution unit executes a first occupation start instruction.

Подробнее

Номер записи: 134

14-11-2013 дата публикации

MFENCE and LFENCE Micro-Architectural Implementation Method and System

Номер: US20130305018A1

Автор: Salvador Palanca, Shekoufeh Qawami, Stephen Fischer, SUBRAMANIAM MAIYURAN

Принадлежит: Individual

A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.

Подробнее

Номер записи: 135

14-11-2013 дата публикации

Speeding Up Younger Store Instruction Execution after a Sync Instruction

Номер: US20130305022A1

Автор: Benjamin W. Stolt, Bryan J. Lloyd, David S. Ray, Dung Q. Nguyen, Hung Q. Le, Shih-Hsiung S. Tung, Susan E. Eisen

Принадлежит: International Business Machines Corp

Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction.

Подробнее

Номер записи: 136

28-11-2013 дата публикации

Micro-Staging Device and Method for Micro-Staging

Номер: US20130318194A1

Автор: Jeffrey L. Timbs

Принадлежит: Dell Products LP

A micro-staging device has a wireless interface module for detecting a first data request that indicates a presence of a user and an application processor that establishes a network connection to a remote data center. The micro-staging device further allocates a portion of storage in a cache memory storage device for storing pre-fetched workflow data objects associated with the detected user.

Подробнее

Номер записи: 137

28-11-2013 дата публикации

Predicting and avoiding operand-store-compare hazards in out-of-order microprocessors

Номер: US20130318330A1

Автор: Brian Curran, Brian R. Prasky, Brian W. Thompto, Christian Jacobi, Gregory W. Alexander, James R. Mitchell, Jonathan T. Hsieh, Khary J. Alexander

Принадлежит: International Business Machines Corp

A method and information processing system manage load and store operations that can be executed out-of-order. At least one of a load instruction and a store instruction is executed. A determination is made that an operand store compare hazard has been encountered. An entry within an operand store compare hazard prediction table is created based on the determination. The entry includes at least an instruction address of the instruction that has been executed and a hazard indicating flag associated with the instruction. The hazard indicating flag indicates that the instruction has encountered the operand store compare hazard. When a load instruction is associated with the hazard indicating flag, the load instruction becomes dependent upon all store instructions associated with a substantially similar hazard indicating flag.

Подробнее

Номер записи: 138

05-12-2013 дата публикации

ISSUING INSTRUCTIONS TO EXECUTION PIPELINES BASED ON REGISTER-ASSOCIATED PREFERENCES, AND RELATED INSTRUCTION PROCESSING CIRCUITS, PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA

Номер: US20130326197A1

Автор: Brown Melinda J., Dieffenderfer James N., Mcllvaine Michael S., Morrow Michael W., Stempel Brian M.

Принадлежит: QUALCOMM INCORPORATED

Issuing instructions to execution pipelines based on register-associated preferences and related instruction processing circuits, systems, methods, and computer-readable media are disclosed. In one embodiment, an instruction is detected in an instruction stream. Upon determining that the instruction specifies at least one source register, an execution pipeline preference(s) is determined based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table, and the instruction is issued to an execution pipeline based on the execution pipeline preference(s). Upon a determination that the instruction specifies at least one target register, at least one pipeline indicator associated with the at least one target register in the pipeline issuance table is updated based on the execution pipeline to which the instruction is issued. In this manner, optimal forwarding of instructions may be facilitated, thus improving processor performance. 1. A method for processing computer instructions , comprising:detecting an instruction in an instruction stream; determining at least one execution pipeline preference for the instruction based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table; and', 'issuing the instruction to an execution pipeline based on the at least one execution pipeline preference; and, 'upon determining that the instruction specifies at least one source register 'updating at least one pipeline indicator associated with the at least one target register in the pipeline issuance table based on the execution pipeline to which the instruction is issued.', 'upon determining that the instruction specifies at least one target register2. The method of claim 1 , wherein issuing the instruction to the execution pipeline comprises issuing the instruction to a preferred execution pipeline indicated by the at least one execution pipeline preference.3. The method of ...

Подробнее

Номер записи: 139

05-12-2013 дата публикации

Integrated circuit devices and methods for scheduling and executing a restricted load operation

Номер: US20130326200A1

Автор: Amir KLEEN, Doron Schupper, Idan Rozenberg, Itzhak Barak, Yuval Peled

Принадлежит: FREESCALE SEMICONDUCTOR INC

An integrated circuit device comprising at least one instruction processing module arranged to compare validation data with data stored within a target register upon receipt of a load validation instruction. Wherein, the instruction processing module is further arranged to proceed with execution of a next sequential instruction if the validation data matches the stored data within the target register, and to load the validation data into the target register if the validation data does not match the stored data within the target register.

Подробнее

Номер записи: 140

12-12-2013 дата публикации

Computer system

Номер: US20130332925A1

Автор: Hirotaka Motai, Hitoshi Suzuki, Koji Adachi, Yasuhiro Tawara

Принадлежит: Renesas Electronics Corp

There is a need to provide a computer system capable of preventing a failure from propagating and recovering from the failure. VCPU# 0 through VCPU# 2 each operate different OS's. VCPU# 0 operates a management OS that manages the other OS's. When notified of bus error occurrence, a virtual CPU execution portion 201 operates only VCPU# 0 regardless of an execution sequence stored in schedule register A. VCPU# 0 reinitializes a bus where an error occurred.

Подробнее

Номер записи: 141

19-12-2013 дата публикации

Selectively controlling instruction execution in transactional processing

Номер: US20130339328A1

Автор: Christian Jacobi, Dan F. Greiner, Robert R. Rogers, Timothy J. Slegel

Принадлежит: International Business Machines Corp

Execution of instructions in a transactional environment is selectively controlled. A TRANSACTION BEGIN instruction initiates a transaction and includes controls that selectively indicate whether certain types of instructions are permitted to execute within the transaction. The controls include one or more of an allow access register modification control and an allow floating point operation control.

Подробнее

Номер записи: 142

19-12-2013 дата публикации

NONTRANSACTIONAL STORE INSTRUCTION

Номер: US20130339669A1

Автор: Greiner Dan F., Jacobi Christian, Slegel Timothy J.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

A NONTRANSACTIONAL STORE instruction, executed in transactional execution mode, performs stores that are retained, even if a transaction associated with the instruction aborts. The stores include user-specified information that may facilitate debugging of an aborted transaction. 1. A method of executing an instruction within a computing environment , said method comprising: an operation code to specify a nontransactional store operation;', 'a first operand; and', 'a second operand to designate a location for the first operand; and, 'obtaining, by a processor, a machine instruction for execution, the machine instruction being defined for computer execution according to a computer architecture, the machine instruction comprising 'nontransactionally placing the first operand at the location specified by the second operand, wherein information stored at the second operand is retained despite an abort of a transaction associated with the machine instruction, and wherein the nontransactionally placing is delayed until an end of transactional execution mode of the processor.', 'executing, by the processor, the machine instruction, the executing comprising2. The method of claim 1 , wherein the end of transactional execution mode results from an end of an outermost transaction associated with the machine instruction or an abort condition.3. The method of claim 1 , wherein multiple nontransactional stores appear as concurrent stores to other processors.4. The method of claim 1 , further comprising:determining whether the processor is in transactional execution mode;based on the processor being in the transactional execution mode, determining whether the transaction is a constrained transaction or a nonconstrained transaction; andbased on the transaction being a nonconstrained transaction, continuing execution of the machine instruction.5. The method of claim 4 , wherein based on the transaction being a constrained transaction claim 4 , providing a program exception and ...

Подробнее

Номер записи: 143

19-12-2013 дата публикации

Next Instruction Access Intent Instruction

Номер: US20130339672A1

Автор: Christian Jacobi, Chung-Lung Kevin Chum, Gustav E. Sittmann, III, Timothy J. Slegel

Принадлежит: International Business Machines Corp

Executing a Next Instruction Access Intent instruction by a computer. The processor obtains an access intent instruction indicating an access intent. The access intent is associated with an operand of a next sequential instruction. The access intent indicates usage of the operand by one or more instructions subsequent to the next sequential instruction. The computer executes the access intent instruction. The computer obtains the next sequential instruction. The computer executes the next sequential instruction, which comprises based on the access intent, adjusting one or more cache behaviors for the operand of the next sequential instruction.

Подробнее

Номер записи: 144

19-12-2013 дата публикации

Restricted instructions in transactional execution

Номер: US20130339685A1

Автор: Christian Jacobi, Dan F. Greiner, Timothy J. Slegel

Принадлежит: International Business Machines Corp

Restricted instructions are prohibited from execution within a transaction. There are classes of instructions that are restricted regardless of type of transaction: constrained or nonconstrained. There are instructions only restricted in constrained transactions, and there are instructions that are selectively restricted for given transactions based on controls specified on instructions used to initiate the transactions.

Подробнее

Номер записи: 145

26-12-2013 дата публикации

PIPELINING OUT-OF-ORDER INSTRUCTIONS

Номер: US20130346729A1

Автор: Barowski Harry, Niggemeier Tim

Принадлежит:

Systems, methods and computer program product provide for pipelining out-of-order instructions. Embodiments comprise an instruction reservation station for short instructions of a short latency type and long instructions of a long latency type, an issue queue containing at least two short instructions of a short latency type, which are to be chained to match a latency of a long instruction of a long latency type, a register file, at least one execution pipeline for instructions of a short latency type and at least one execution pipeline for instructions of a long latency type; wherein results of the at least one execution pipeline for instructions of the short latency type are written to the register file, preserved in an auxiliary buffer, or forwarded to inputs of said execution pipelines. Data of the auxiliary buffer are written to the register file. 1. A method comprising:determining an instruction chain comprising at least a first instruction having a first latency and a second instruction having a second latency, the first latency and the second latency each being less than a third latency of a third instruction; andsubmitting the instruction chain to a first execution pipeline of a processor and the third instruction to a second execution pipeline of the processor, wherein execution of the instruction chain at least partially overlaps execution of the third instruction.2. The method according to claim 1 , further comprising writing a result of the instruction chain to a register file during a writeback slot for the third instruction.3. The method according to claim 1 , further comprising:determining whether the second instruction is dependent on data from the first instruction; andin response determining that the second instruction is dependent on data from the first instruction, forwarding the data from the first instruction to the second instruction.4. The method according to claim 3 , further comprising writing a result of the second instruction into a ...

Подробнее

Номер записи: 146

26-12-2013 дата публикации

Arithmetic processing apparatus, and cache memory control device and cache memory control method

Номер: US20130346730A1

Автор: Naohiro Kiyota

Принадлежит: Fujitsu Ltd

An arithmetic processing apparatus includes a plurality of processors, each of the processors having an arithmetic unit and a cache memory. The processor includes an instruction port that holds a plurality of instructions accessing data of the cache memory, a first determination unit that validates a first flag when receiving an invalidation request for data in the cache memory, a cache index of a target address and a way ID of the received request match with a cache index of a designated address and a way ID of the load instruction, a second determination unit that validates a second flag when target data is transmitted due to a cache miss, and an instruction re-execution determination unit that instructs re-execution of an instruction subsequent to the load instruction when both the first flag and the second flag are validated at the time of completion of an instruction in the instruction port.

Подробнее

Номер записи: 147

16-01-2014 дата публикации

Systems, apparatuses, and methods for performing a double blocked sum of absolute differences

Номер: US20140019713A1

Автор: Amit Gradstein, Elmoustapha Ould-Ahmed-Vall, Mostafa HAGOG, Robert Valentine, Simon Rubanovich, Zeev Sperber

Принадлежит: Intel Corp

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Подробнее

Номер записи: 148

30-01-2014 дата публикации

Computer architecture with a hardware accumulator reset

Номер: US20140033203A1

Автор: Gil Israel Dogon, Yosef Kreinin, Yosi Arbeli

Принадлежит: Mobileye Technologies Ltd

A processor with an accumulator. An event is selected to produce one or more selected events. A reset signal to the accumulator is generated responsive to the selected event. Responsive to the reset signal, the accumulator is reset to zero or another initial value while avoiding breaking pipelined execution of the processor.

Подробнее

Номер записи: 149

06-02-2014 дата публикации

Space efficient checkpoint facility and technique for processor with integrally indexed register mapping and free-list arrays

Номер: US20140040595A1

Автор: Thang M. Tran

Принадлежит: FREESCALE SEMICONDUCTOR INC

A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.

Подробнее

Номер записи: 150

06-02-2014 дата публикации

DATA PROCESSOR

Номер: US20140040600A1

Автор: Arakawa Fumio

Принадлежит:

It is to provide a data processor which maintains compatibility with an existing instruction set such as a 16-bit fixed-length instruction set and in which an instruction code space is extended. 1. A data processor with multiple instruction pipelines , comprising:a global instruction queue that sequentially accumulates multiple instruction codes that are fetched in parallel; anda dispatch circuit that conducts a search with respect to the multiple instruction codes that are output from the global instruction queue, for every instruction code type, and distributes the instruction code to every instruction pipeline based on a result of the search,wherein an instruction set is retained that additionally defines as a separate instruction a prohibition combination pattern resulting from the multiple specific instruction codes, by which original processing of the individual instruction code is prohibited.2. The data processor according to claim 1 , whereinthe instruction that is additionally defined by the prohibition combination pattern of the multiple specific instruction codes is limited to an instruction type that is the same as the instruction code defined only with a latter-half instruction code pattern of the combination pattern.3. The data processor according to claim 2 , whereinin the prohibition combination pattern of the multiple specific instruction codes, a former half and a latter half of the instruction code pattern are different from each other.4. The data processor according to claim 3 , whereinthe dispatch circuit outputs the detected instruction code as being valid and outputs the instruction code that immediately precedes the detected instruction code, as a prefix code candidate when an intended instruction code of the instruction code type is detected in a search unit of the multiple instruction codes, a search target, outputs the front instruction code as being valid when the intended instruction code of the instruction code type is detected in the ...

Подробнее

Номер записи: 151

27-02-2014 дата публикации

Allocation of counters from a pool of counters to track mappings of logical registers to physical registers for mapper based instruction executions

Номер: US20140059329A1

Автор: Brian D. Barrick, Gregory W. Alexander, John W. Ward, III

Принадлежит: International Business Machines Corp

A computer system assigns a particular counter from among a plurality of counters currently in a counter free pool to count a number of mappings of logical registers from among a plurality of logical registers to a particular physical register from among a plurality of physical registers, responsive to an execution of an instruction by a mapper unit mapping at least one logical register from among the plurality of logical registers to the particular physical register, wherein the number of the plurality of counters is less than a number of the plurality of physical registers. The computer system, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool.

Подробнее

Номер записи: 152

27-02-2014 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20140059333A1

Автор: Alexandre J. Farcy, Ilhyun Kim, Konrad K. Lai, Martin G. Dixon, Matthew Merten, Prakash Math, Rajesh S. Parthasarathy, Ravi Rajwar, Robert S. Chappell, Vijaykumar Kadgi

Принадлежит: Intel Corp

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.

Подробнее

Номер записи: 153

06-03-2014 дата публикации

Instruction insertion in state machine engines

Номер: US20140068234A1

Автор: David R. Brown

Принадлежит: Micron Technology Inc

State machine engines are disclosed, including those having an instruction insertion register. One such instruction insertion register may provide an initialization instruction, such as to prepare a state machine engine for data analysis. An instruction insertion register may also provide an instruction in an attempt to resolve an error that occurs during operation of a state machine engine. An instruction insertion register may also be used to debug a state machine engine, such as after the state machine experiences a fatal error.

Подробнее

Номер записи: 154

13-03-2014 дата публикации

Identifying load-hit-store conflicts

Номер: US20140075158A1

Автор: Alexander E. Mericas, Madhavi G. Valluri, Satish K. Sadasivam, Venkat R. Indukuru

Принадлежит: International Business Machines Corp

A computing device identifies a load instruction and store instruction pair that causes a load-hit-store conflict. A processor tags a first load instruction that instructs the processor to load a first data set from memory. The processor stores an address at which the first load instruction is located in memory in a special purpose register. The processor determines where the first load instruction has a load-hit-store conflict with a first store instruction. If the processor determines the first load instruction has a load-hit store conflict with the first store instruction, the processor stores an address at which the first data set is located in memory in a second special purpose register, tags the first data set being stored by the first store instruction, stores an address at which the first store instruction is located in memory in a third special purpose register and increases a conflict counter.

Подробнее

Номер записи: 155

27-03-2014 дата публикации

Memory address aliasing detection

Номер: US20140089271A1

Автор: Ho-Seop Kim, Muawya M. Al-Otoom, Omar SHAIKH, Paul Caprioli, Ryan Carlson

Принадлежит: Intel Corp

Method and apparatus to efficiently detect violations of data dependency relationships. A memory address associated with a computer instruction may be obtained. A current state of the memory address may be identified. The current state may include whether the memory address is associated with a read or a store instruction, and whether the memory address is associated with a set or a check. A previously accumulated state associated with the memory address may be retrieved from a data structure. The previously accumulated state may include whether the memory address was previously associated with a read or a store instruction, and whether the memory address was previously associated with a set or a check. If a transition from the previously accumulated state to the current state is invalid, a failure condition may be signaled.

Подробнее

Номер записи: 156

03-04-2014 дата публикации

CROSS-PIPE SERIALIZATION FOR MULTI-PIPELINE PROCESSOR

Номер: US20140095836A1

Автор: BERGER DEANNA POSTLES DUNN, Fee Michael F., Kaminski Edward J., Orf Diane L.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to cross-pipe serialization for a multi-pipeline computer processor. An aspect includes receiving, by a processor, the processor comprising a first pipeline, the first pipeline comprising a serialization pipeline, and a second pipeline, the second pipeline comprising a non-serialization pipeline, a request comprising a first subrequest for the first pipeline and a second subrequest for the second pipeline. Another aspect includes completing the first subrequest by the first pipeline. Another aspect includes, based on completing the first subrequest by the first pipeline, sending cross-pipe unlock signal from the first pipeline to the second pipeline. Yet another aspect includes, based on receiving the cross-pipe unlock signal by the second pipeline, completing the second subrequest by the second pipeline. 1. A computer system for cross-pipe serialization for a multi-pipeline computer processor , the system comprising: receiving, by the processor, a request comprising a first subrequest for the first pipeline and a second subrequest for the second pipeline;', 'completing the first subrequest by the first pipeline;', 'based on completing the first subrequest by the first pipeline, sending a cross-pipe unlock signal from the first pipeline to the second pipeline;', 'based on receiving the cross-pipe unlock signal by the second pipeline, completing the second subrequest by the second pipeline., 'a processor, the processor comprising a first pipeline, the first pipeline comprising a serialization pipeline, and a second pipeline, the second pipeline comprising a non-serialization pipeline, the system configured to perform a method comprising2. The computer system of claim 1 , wherein completing the first subrequest by the first pipeline comprising loading a first instance of a shared resource claim 1 , and wherein completing the second subrequest by the second pipeline comprises loading a second instance of the shared resource.3. The computer system of ...

Подробнее

Номер записи: 157

03-04-2014 дата публикации

Systems, Apparatuses, and Methods for Performing Conflict Detection and Broadcasting Contents of a Register to Data Element Positions of Another Register

Номер: US20140095843A1

Автор: Bret L. Toll, Christopher J. Hughes, Elmoustapha Ould-Ahmed_Vall, Jesus Corbal, Mark J. Charney, Milind B. Girkar, Robert Valentine

Принадлежит: Intel Corp

Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.

Подробнее

Номер записи: 158

03-04-2014 дата публикации

Tracking Operand Liveliness Information in a Computer System and Performing Function Based on the Liveliness Information

Номер: US20140095848A1

Автор: Gschwind Michael K., Salapura Valentina

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module. 1. A computer system for maintaining liveness information for executing programs , the system comprising:processor configured to communicate with a main storage, the processor comprising an instruction fetcher, an instruction optimizer and one or more execution units for executing optimized instructions, the processor configured to perform a method comprising:maintaining, by a processor, current operand state information, the current operand state information for indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA), the first program module currently being executed by the processor;accessing a current operand, by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.2. The computer system according to claim 1 , further comprising:based on the current operand being disabled, the accessing comprising at least one of a) and b) comprising:a) returning an architecture-specified value, and where the architecture-specified value is any one of an undefined ...

Подробнее

Номер записи: 159

06-01-2022 дата публикации

COMMAND-AWARE HARDWARE ARCHITECTURE

Номер: US20220004388A1

Автор: Devanagondi Harish Ramamurthy, Lu Jui-Yang, Sabato Simon Luigi, Shah Jay

Принадлежит: Lilac Cloud, Inc.

In an embodiment, responsive to determining: (a) a first command is not of a particular command type associated with one or more hardware modules associated with a particular routing node, or (b) at least one argument used for executing the first command is not available: transmitting the first command to another routing node in the hardware routing mesh. Upon receiving a second command of the command bundle and determining: (a) the second command is of the particular command type associated with the hardware module(s), and (b) arguments used by the second command are available: transmitting the second command to the hardware module(s) associated with the particular routing node for execution by the hardware module(s). Thereafter, the command bundle is modified based on execution of the second command by at least refraining from transmitting the second command of the command bundle to any other routing nodes in the hardware routing mesh. 1. A hardware routing mesh , comprising:a plurality of routing nodes associated respectively with a plurality of hardware modules, the plurality of routing nodes comprising a first routing node communicatively coupled to a first hardware module of the plurality of hardware modules, receiving a first command of a command bundle being streamed through the plurality of routing nodes, wherein the command bundle is modified based on execution of commands as the command bundle is streamed through the plurality of routing nodes;', 'responsive to determining that (a) the first command is not of a particular command type associated with the first hardware module, or (b) at least one argument used for executing the first command is not received in association with the first command: transmitting the first command to a second routing node;', 'receiving a second command of the command bundle being streamed through the plurality of routing nodes;', 'responsive to determining that (a) the second command is of the particular command type ...

Подробнее

Номер записи: 160

06-01-2022 дата публикации

Method and Apparatus of Providing a Function as a Service (FAAS) Deployment of an Application

Номер: US20220004422A1

Автор: FORMANEK Bence, Gerö Balázs Peter, Kern András

Принадлежит:

It is disclosed a method and an apparatus () of providing a function as a service, faas, deployment of an application. A deployment unit is generated () per group of application blocks, where said deployment unit comprises said group of application blocks, and an implementation of function invocation for functions being accessed by groups of application blocks. Function invocations of the group of application blocks are constrained or bound () to libraries of supporting implementations. Deployment units are provided () together with the element invocations attached to said libraries, to a lifecycle manager of a faas platform, whereby the faas platform implements the faas deployment of said application the performance targets of which, being related to the groups of application blocks. This disclosure enables a developer to adjust the performance of an application without having to change the logic of application implementations. 117-. (canceled)18. A method of providing a function as a service (FAAS) deployment of an application comprising application blocks , the method comprising an application builder component:generating a deployment unit per group of application blocks; where the deployment unit comprises, in addition to the group of application blocks, an implementation of function invocation for each function being accessed by each group of application blocks; where generating the deployment unit comprises constraining function invocations of the group of application blocks to one or more libraries of implementations; andproviding the deployment units, together with the function invocations attached to the one or more libraries of implementations, to a lifecycle manager of a FAAS platform that is connected to the application builder component;whereby the FAAS platform implements the FAAS deployment of the application the performance targets of which are related to the groups of application blocks.19. The method of claim 18 , wherein the groups of application ...

Подробнее

Номер записи: 161

06-01-2022 дата публикации

TRANSACTION-ENABLED METHODS FOR PROVIDING PROVABLE ACCESS TO A DISTRIBUTED LEDGER WITH A TOKENIZED INSTRUCTION SET

Номер: US20220004927A1

Автор: Cella Charles Howard

Принадлежит:

Transaction-enabled methods for providing provable access to a distributed ledger with a tokenized instruction set for polymer production processes are described. A method may include accessing a distributed ledger comprising an instruction set for a polymer production process and tokenizing the instruction set. The method may further include interpreting an instruction set access request and providing a provable access to the instruction set. The method may further include providing commands to a production tool of the polymer production process and recording the transaction on the distributed ledger. 1. A method , comprising:accessing a distributed ledger comprising an instruction set, wherein the instruction set comprises an instruction set for a polymer production process;tokenizing the instruction set;interpreting an instruction set access request;in response to the instruction set access request, providing a provable access to the instruction set;providing commands to a production tool of the polymer production process in response to the instruction set access request; andrecording a transaction on the distributed ledger in response to the providing commands to the production tool.2. The method of claim 1 , wherein the instruction set comprises an instruction set for a chemical synthesis subprocess of the polymer production process.3. The method of claim 2 , further comprising providing commands to a production tool of the chemical synthesis subprocess of the polymer production process in response to the instruction set access request and recording a transaction on the distributed ledger in response to the providing commands to the production tool of the chemical synthesis subprocess of the polymer production process.4. The method of claim 1 , wherein the instruction set comprises a field programmable gate array (FPGA) instruction set.5. The method of claim 1 , wherein the instruction set further includes an application programming interface (API).6. The ...

Подробнее

Номер записи: 162

06-01-2022 дата публикации

SYSTEM AND METHOD FOR EFFICIENT MULTI-GPU RENDERING OF GEOMETRY BY SUBDIVIDING GEOMETRY

Номер: US20220005146A1

Автор: Cerny Mark E.

Принадлежит:

A method for graphics processing. The method including rendering graphics for an application using graphics processing units (GPUs). The method including using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry. The method including during the rendering of the image frame, subdividing one or more of the plurality of pieces of geometry into smaller pieces, and dividing the responsibility for rendering these smaller portions of geometry among the plurality of GPUs, wherein each of the smaller portions of geometry is processed by a corresponding GPU. The method including for those pieces of geometry that are not subdivided, dividing the responsibility for rendering the pieces of geometry among the plurality of GPUs, wherein each of these pieces of geometry is processed by a corresponding GPU. 1. A method for graphics processing , comprising:rendering graphics for an application using a plurality of graphics processing units (GPUs);using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry;during the rendering of the image frame, subdividing one or more of the plurality of pieces of geometry into smaller pieces, and dividing the responsibility for rendering these smaller portions of geometry among the plurality of GPUs, wherein each of the smaller portions of geometry is processed by a corresponding GPU, and;for those pieces of geometry that are not subdivided, dividing the responsibility for rendering the pieces of geometry among the plurality of GPUs, wherein each of these pieces of geometry is processed by a corresponding GPU.2. The method of claim 1 , wherein a process for the rendering the image frame includes a geometry analysis phase of rendering claim 1 , or a Z pre-pass phase of rendering claim 1 , or a geometry pass phase of rendering.3. The method of claim 2 , further comprising:during the geometry analysis phase of rendering, or Z pre-pass phase of ...

Подробнее

Номер записи: 163

05-01-2017 дата публикации

VARIABLE LATENCY PIPE FOR INTERLEAVING INSTRUCTION TAGS IN A MICROPROCESSOR

Номер: US20170003969A1

Автор: Ayub Salma, BOWMAN Josh, Chadha Sundeep, JEGANATHAN Dhivya, KUCHARSKI Cliff, Nguyen Dung Q.

Принадлежит:

Techniques disclosed herein describe a variable latency pipe for interleaving instruction tags in a processor. According to one embodiment presented herein, an instruction tag is associated with an instruction upon issue of the instruction from the issue queue. One of a plurality of positions in the latency pipe is determined. The pipe stores one or more instruction tags, each associated with a respective instruction. The pipe also stores the instruction tags in a respective position based on the latency of each respective instruction. The instruction tag is stored at the determined position in the pipe. 17-. (canceled)8. A processor , comprising:an issue queue configured to store an instruction therein and further configured to issue the instruction;a tagger configured to associate the instruction with an instruction tag upon the issue of the instruction from the issue queue; anda pipe having a plurality of positions being ordered from a head position to a tail position configured to determine one of the plurality of positions to store the instruction tag and further configured to store the instruction tag at the determined position, whereinthe pipe stores one or more instruction tags each associated with a respective instruction, the pipe stores the one or more instruction tags in a respective position based on the latency of each of the respective instructions, and the pipe determines the position of the instruction tag based on a latency of the instruction relative to the latency of each of the respective instructions.9. The processor of claim 8 , wherein the pipe is further configured to:broadcast an instruction tag stored at the tail position in the pipe; andremove the instruction tag from the pipe.10. The processor of claim 9 , wherein the broadcasted instruction tag wakes up an instruction in the issue queue that is dependent on an instruction associated with the broadcasted instruction tag.11. The processor of claim 9 , wherein the broadcasted instruction ...

Подробнее

Номер записи: 164

07-01-2016 дата публикации

Systems And Methods For Processing Inline Constants

Номер: US20160004536A1

Автор: KAHNE BRIAN C., Scott Jeffrey W., Wilson Peter J.

Принадлежит: Freescale Semiconductor Inc.

Disclosed is a digital processor comprising an instruction memory having a first input, a second input, a first output, and a second output. A program counter register is in communication with the first input of the instruction memory. The program counter register is configured to store an address of an instruction to be fetched. A data pointer register is in communication with the second input of the instruction memory. The data pointer register is configured to store an address of a data value in the instruction memory. An instruction buffer is in communication with the first output of the instruction memory. The instruction buffer is arranged to receive an instruction according to a value at the program counter register. A data buffer is in communication with the second output of the instruction memory. The data buffer is arranged to receive a data value according to a value at the data pointer register. 1. A digital processor , comprising:an instruction memory having a first input, a second input, a first output, and a second output;a program counter register in communication with the first input of the instruction memory, the program counter register configured to store an address of an instruction to be fetched;a data pointer register in communication with the second input of the instruction memory, the data pointer register configured to store an address of a data value in the instruction memory;an instruction buffer in communication with the first output of the instruction memory, the instruction buffer arranged to receive an instruction according to a value at the program counter register; anda data buffer in communication with the second output of the instruction memory, the data buffer arranged to receive a data value according to a value at the data pointer register.2. The digital processor of claim 1 , further comprising:a register file;an instruction decode function that receives the instruction from the instruction buffer, decodes the instruction, and ...

Подробнее

Номер записи: 165

07-01-2016 дата публикации

COMMITTING HARDWARE TRANSACTIONS THAT ARE ABOUT TO RUN OUT OF RESOURCE

Номер: US20160004537A1

Автор: Busaba Fadi Y., Cain, Gschwind Michael Karl, III Harold W., Michael Maged M., Salapura Valentina

Принадлежит:

A transactional memory system determines whether a hardware transaction can be salvaged. A processor of the transactional memory system begins execution of a transaction in a transactional memory environment. Based on detection that an amount of available resource for transactional execution is below a predetermined threshold level, the processor determines whether the transaction can be salvaged. Based on determining that the transaction can not be salvaged, the processor aborts the transaction. Based on determining the transaction can be salvaged, the processor performs a salvage operation, wherein the salvage operation comprises one or more of: determining that the transaction can be brought to a stable state without exceeding the amount of available resource for transactional execution, and bringing the transaction to a stable state; and determining that a resource can be made available, and making the resource available. 1. A method for determining whether a hardware transaction can be salvaged , the method comprising:beginning execution, by a processor, of a transaction in a transactional memory environment;based on detection that an amount of available resource for transactional execution is below a predetermined threshold level, determining, by the processor, whether the transaction can be salvaged;based on determining that the transaction can not be salvaged, aborting, by the processor, the transaction; and a) determining, by the processor, that the transaction can be brought to a stable state without exceeding the amount of available resource for transactional execution, and bringing the transaction to a stable state; and', 'b) determining, by the processor, that a resource can be made available, and making the resource available., 'based on determining the transaction can be salvaged, performing, by the processor, a salvage operation, wherein the salvage operation comprises one or more of2. The method of claim 1 , further comprising claim 1 , based on ...

Подробнее

Номер записи: 166

04-01-2018 дата публикации

Administering instruction tags in a computer processor

Номер: US20180004516A1

Автор: Albert J. Van Norstrand, Jr., David S. Levitan, Hung Q. Le, Kurt A. Feiste

Принадлежит: International Business Machines Corp

Administering ITAGs in a computer processor, includes, for each instruction in a single-thread mode: incrementing a value of a wrap around counter; setting a wrap bit to a predefined value if incrementing the value causes the counter to wrap around; generating, in dependence upon the counter value and the wrap bit, an ITAG for the instruction, the ITAG comprising a bit string having a wrap bit and an index comprising the counter value; and, for each instruction in a multi-thread mode: incrementing the value of the wrap around counter; setting a wrap bit to a predefined value if incrementing the value causes the counter to wrap around; and generating, in dependence upon the counter value and the wrap bit, an ITAG for the instruction, the ITAG comprising a bit string having the wrap bit, a thread identifier, and an index comprising the counter value.

Подробнее

Номер записи: 167

04-01-2018 дата публикации

OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING PRIORITIZED DEPENDENCY CHAIN RESOLUTION

Номер: US20180004527A1

Автор: ADEEB KHANDKER N., BOWMAN JOSHUA W., GODDARD BRANDON R., Nguyen Dung Q., Nguyen Tu-An T., TOLENTINO EULA A., VICTOR BRIAN D., WONG BRENDAN M.

Принадлежит:

Operation of a computer processor that includes: receiving a first instruction indicating a first target register; receiving, from an instruction fetch unit of the computer processor, a first instruction and a branch instruction; responsive to determining that the branch instruction is dependent upon a result of the first instruction, updating a priority value corresponding to the first instruction; and issuing, in dependence upon the priority value for the first instruction having a higher priority than a priority value for another instruction, the first instruction to an execution unit of the computer processor. 1. A method of operation of a computer processor , wherein the method comprises:receiving, from an instruction fetch unit of the computer processor, a first instruction and a branch instruction;responsive to determining that the branch instruction is dependent upon a result of the first instruction, updating a priority value corresponding to the first instruction; andissuing, in dependence upon the priority value for the first instruction having a higher priority than a priority value for another instruction, the first instruction to an execution unit of the computer processor.2. The method of claim 1 , wherein issuing the first instruction is further dependent upon the first instruction being a ready instruction among a plurality of ready instructions.3. The method of claim 2 , wherein the first instruction is a compare instruction claim 2 , and wherein execution of the compare instruction sets a condition code flag within a condition code register claim 2 , and wherein the branch instruction determines whether to branch in dependence upon a value of the condition code flag of the condition code register.4. The method of claim 3 , wherein determining that the branch instruction is dependent upon the result of the first instruction comprises decoding the branch instruction to determine that the branch instruction depends upon the condition code flag of the ...

Подробнее

Номер записи: 168

04-01-2018 дата публикации

Advanced processor architecture

Номер: US20180004530A1

Автор: Martin Vorbach

Принадлежит: HYPERION CORE Inc

The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).

Подробнее

Номер записи: 169

04-01-2018 дата публикации

TECHNIQUES FOR HYBRID COMPUTER THREAD CREATION AND MANAGEMENT

Номер: US20180004554A1

Автор: Bohm Fraser, Hargreaves Ivan D., Horn Julian, Mitchell Ian J.

Принадлежит:

A technique for operating a computer system to support an application, a first application server environment, and a second application server environment includes intercepting a work request relating to the application issued to the first application server environment prior to execution of the work request. A thread adapted for execution in the first application server environment is created. A context is attached to the thread that non-disruptively modifies the thread into a hybrid thread that is additionally suitable for execution in the second application server environment. The hybrid thread is returned to the first application server environment. 1. A method of operating a computer system to support an application , a first application server environment , and a second application server environment , the method comprising:intercepting, by a request interceptor component executing on the computer system, a work request relating to the application issued to the first application server environment prior to execution of the work request;responsive to the request interceptor component, creating, using the computer system, a thread adapted for execution in the first application server environment by an executor component;responsive to the executor component, attaching to the thread, by a thread dispatcher component executing on the computer system, a context to non-disruptively modify the thread into a hybrid thread that is additionally suitable for execution in the second application server environment; andresponsive to the thread dispatcher component, returning the hybrid thread to the first application server environment by a catcher component executing on the computer system.2. The method of claim 1 , wherein the context comprises transactional control data of one of the application server environments claim 1 , security control data of one of the application server environments claim 1 , monitoring control data of one of the application server environments ...

Подробнее

Номер записи: 170

07-01-2021 дата публикации

Energy Efficient Processor Core Architecture for Image Processor

Номер: US20210004232A1

Автор: Finchelstein Daniel Frederic, Meixner Albert, Redgrave Jason Rupert, Shacham Ofer, Zhu Qiuling

Принадлежит:

An apparatus that includes a program controller to fetch and issue instructions is described. The apparatus includes an execution lane having at least one execution unit to execute the instructions. The execution lane is part of an execution lane array that is coupled to a two dimensional shift register array structure, wherein, execution lane s of the execution lane array are located at respective array locations and are coupled to dedicated registers at same respective array locations in the two-dimensional shift register array. 1. (canceled)2. A system comprising:one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: a plurality of random access memories, and', 'a plurality of execution lanes, wherein different groups of the execution lanes are assigned to use a different respective random access memory of the plurality of random access memories;, 'receiving an input program to be executed on a device comprisingdetermining that the input program specifies two or more execution lanes in a same group of the plurality of execution lanes to compete for different memory locations in a same random access memory of the plurality of random access memories; andin response, modifying the input program to generate multiple instructions that cause execution lanes within each group to access a respective random access memory sequentially.3. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of rows of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different rows of the execution lanes.4. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of columns of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different columns of the execution lanes.5. The system of claim 2 , wherein the ...

Подробнее

Номер записи: 171

02-01-2020 дата публикации

HIGH PARALLELISM COMPUTING SYSTEM AND INSTRUCTION SCHEDULING METHOD THEREOF

Номер: US20200004514A1

Автор: FANG Shaoxia, SHAN Yi, SUI Lingzhi, WANG Junbin, YU Qian

Принадлежит:

A high parallelism computing system and instruction scheduling method thereof are disclosed. The computing system comprises: an instruction reading and distribution module for reading a plurality of types of instructions in a specific order, and distributing the acquired instructions to corresponding function modules according to the types; an internal buffer for buffering data and instructions for performing computation; a plurality of function modules each of which sequentially executes instructions of the present type distributed by the instruction reading and distribution module and reads the data from the internal buffer; and wherein the specific order is obtained by topologically sorting the instructions according to a directed acyclic graph consisting of the types and dependency relationships. By reading the instructions based on the topological sorting the directed acyclic graph constructed according to the types and dependency relationships, the deadlock caused by the instruction dependencies can be avoided by a relatively simple operation. 1. A system , comprising:an instruction reading and distribution module for reading a plurality of types of instructions in a specific order, and distributing the instructions to corresponding function modules according to the types;an internal buffer for buffering data and instructions for performing computation; anda plurality of function modules each of which sequentially executes instructions of a present type distributed by the instruction reading and distribution module and reads the data from the internal buffer; andwherein the specific order is obtained by topologically sorting the instructions according to a directed acyclic graph consisting of the types and dependency relationships.2. The system of claim 1 , wherein the directed acyclic graph simplify dependencies of a certain instruction on two or more instructions of another type into a direct dependency on the last instruction in the two or more instructions ...

Подробнее

Номер записи: 172

02-01-2020 дата публикации

SYSTEMS AND METHODS TO PREDICT LOAD DATA VALUES

Номер: US20200004536A1

Автор: DECHENE Mark J., KRYUKOV PAVEL I., Shevgoor Manjunath, Shwartsman Stanislav

Принадлежит:

Disclosed embodiments relate to predicting load data. In one example, a processor a pipeline having stages ordered as fetch, decode, allocate, write back, and commit, a training table to store an address, predicted data, a state, and a count of instances of unchanged return data, and tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage to set the state of the first load instruction in the training table to the first state when the count reaches a first threshold. 1. A processor comprising:fetch and decode circuitry to fetch and decode load instructions;a pipeline having stages ordered as fetch, decode, allocate, write back, and commit;a training table to store, for each of a plurality of load instructions, an address, predicted data, a state, and a count of instances of unchanged return data; and when no match exists, adding a new entry reflecting the first load instruction; when a match exists, but has different predicted data than the data returned for the first load instruction, reset the count and set the state to a second state; and', 'when a matching entry with matching predicted data exists, increment the count and, when the incremented count reaches a first threshold, set the state to the first state., 'tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage by2. The processor of claim 1 , wherein claim 1 , when the predicted data is used to optimize execution during ...

Подробнее

Номер записи: 173

02-01-2020 дата публикации

APPARATUSES, METHODS, AND SYSTEMS FOR CONDITIONAL OPERATIONS IN A CONFIGURABLE SPATIAL ACCELERATOR

Номер: US20200004538A1

Автор: DIAMOND MITCHELL, Fleming, JR. KERMIN E., KEEN BENJAMIN, ZOU Ping

Принадлежит:

Systems, methods, and apparatuses relating to conditional operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes an output buffer of a first processing element coupled to an input buffer of a second processing element via a first data path that is to send a first dataflow token from the output buffer of the first processing element to the input buffer of the second processing element when the first dataflow token is received in the output buffer of the first processing element; an output buffer of a third processing element coupled to the input buffer of the second processing element via a second data path that is to send a second dataflow token from the output buffer of the third processing element to the input buffer of the second processing element when the second dataflow token is received in the output buffer of the third processing element; a first backpressure path from the input buffer of the second processing element to the first processing element to indicate to the first processing element when storage is not available in the input buffer of the second processing element; a second backpres sure path from the input buffer of the second processing element to the third processing element to indicate to the third processing element when storage is not available in the input buffer of the second processing element; and a scheduler of the second processing element to cause storage of the first dataflow token from the first data path into the input buffer of the second processing element when both the first backpres sure path indicates storage is available in the input buffer of the second processing element and a conditional token received in a conditional queue of the second processing element from another processing element is a first value. 1. An apparatus comprising:an output buffer of a first processing element coupled to an input buffer of a second processing element via a first data path that is ...

Подробнее

Номер записи: 174

02-01-2020 дата публикации

System and method for prediction of multiple read commands directed to non-sequential data

Номер: US20200004540A1

Автор: Ariel Navon, Eran Sharon, Idan Alrod

Принадлежит: Western Digital Technologies Inc

Systems and methods for predicting read commands and pre-fetching data when a memory device is receiving random read commands to non-sequentially addressed data locations are disclosed. A limited length search sequence of prior read commands is generated and that search sequence is then converted into an index value in a predetermined set of index values. A history pattern match table having entries indexed to that predetermined set of index values contains a plurality of read commands that have previously followed the search sequence represented by the index value. The index value is obtained via application of a many-to-one algorithm to the search sequence. The index value obtained from the search sequence may be used to find, and pre-fetch data for, a plurality of next read commands in the table that previously followed a search sequence having that index value.

Подробнее

Номер записи: 175

02-01-2020 дата публикации

Determining and predicting derived values

Номер: US20200004544A1

Автор: Michael K. Gschwind, Valentina Salapura

Принадлежит: International Business Machines Corp

A predicted value to be used in register-indirect branching is predicted. The predicted value is to be stored in one or more locations based on the prediction. An offset for a predicted derived value is obtained. The predicted derived value is to be used as a pointer to a reference data structure providing access to variables used in processing. The predicted derived value is generated using the predicted value and the offset. The predicted derived value is used to access the reference data structure during processing.

Подробнее

Номер записи: 176

02-01-2020 дата публикации

Information processing device and non-transitory computer readable medium

Номер: US20200004545A1

Автор: Atsushi Monna

Принадлежит: Fuji Xerox Co Ltd

An information processing device includes a generation unit and a providing unit. The generation unit generates data corresponding to a process specified by a user. The providing provides, after generation of the data, identification information to be used by the user to issue an instruction for a post-process for the data generated by the generation unit.

Подробнее

Номер записи: 177

02-01-2020 дата публикации

SHARED COMPARE LANES FOR DEPENDENCY WAKE UP IN A PAIR-BASED ISSUE QUEUE

Номер: US20200004546A1

Автор: GENDEN MICHAEL J., Le Hung Q., Nguyen Dung Q., THOMTO BRIAN W.

Принадлежит:

An apparatus for shared compare lanes for dependency wakeup in a double issue queue includes a source dependency module that determines a number of source dependencies for two instructions to be paired in a row of a double issue queue of a processor. A source dependency includes an unavailable status of a dependent source for data required by the two instructions where the data is produced by another instruction. The apparatus includes a pairing determination module that writes each of the two instructions into a separate row of the double issue queue in response to the source dependency module determining that the number of source dependencies is greater than a source dependency maximum and pairs the two instructions in one row of the double issue queue in response to the source dependency module determining that the number of source dependencies is less than or equal to the source dependency maximum. 1. An apparatus comprising:a source dependency module that determines a number of source dependencies for two instructions intended to be paired in a row of a double issue queue of a processor, a source dependency comprising an unavailable status of a dependent source for data required by the two instructions where the data is produced by another instruction; anda pairing determination module that writes each of the two instructions into a separate row of the double issue queue in response to the source dependency module determining that the number of source dependencies is greater than a source dependency maximum and that pairs the two instructions in one row of the double issue queue in response to the source dependency module determining that the number of source dependencies is less than or equal to the source dependency maximum.2. The apparatus of claim 1 , wherein the source dependency maximum is equal to a number of dependency trackers available to the double issue queue claim 1 , the dependency trackers each tracking a source dependency of paired instructions ...

Подробнее

Номер записи: 178

02-01-2020 дата публикации

Apparatus and method for using predicted result values

Номер: US20200004547A1

Автор: Alexei FEDOROV, Chiloda Ashan Senarath PATHIRANE, David Michael Bull, Vladimir Vasekin

Принадлежит: ARM LTD

An apparatus and method are provided for using predicted result values. The apparatus has a processing unit that comprises processing circuitry for executing a sequence of instructions, and value prediction circuitry for identifying a predicted result value for at least one instruction. A result producing structure is provided that is responsive to a request issued from the processing unit when the processing circuitry is executing a first instruction, to produce a result value for the first instruction and return that result value to the processing unit. While waiting for the result value from the result producing structure, the processing circuitry can be arranged to speculatively execute at least one dependent instruction using a predicted result value for the first instruction as obtained from the value prediction circuitry. The request issued from the processing unit includes a signature value indicative of the predicted result value, and the result producing structure references the signature value in order to detect whether a mispredict condition exists indicating that the predicted result value differs from the result value. The apparatus further provides a mispredict signal transmission path via which the result producing structure, when the mispredict condition is detected, can assert a mispredict signal for receipt by the processing unit prior to the result value being available to the processing unit. Such an approach can reduce the misprediction penalty associated with using a mispredicted result value.

Подробнее

Номер записи: 179

02-01-2020 дата публикации

Combining load or store instructions

Номер: US20200004550A1

Автор: Brian Stempel, Harsh THAKKER, James Norris Dieffenderfer, Kevin JAGET, Manish Garg, Michael Morrow, Pritha GHOSHAL, Rodney Wayne Smith, Sang Hoon Lee, Thomas Philip Speier, Yusuf Cagatay Tekmen

Принадлежит: Qualcomm Inc

Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.

Подробнее

Номер записи: 180

02-01-2020 дата публикации

APPRATUS AND METHOD FOR USING PREDICTED RESULT VALUES

Номер: US20200004551A1

Автор: BULL David Michael, FEDOROV Alexei, VASEKIN Vladimir

Принадлежит:

An apparatus and method are provided for using predicted result values. The apparatus has processing circuitry for executing a sequence of instructions, and value prediction storage that comprises a plurality of entries, where each entry is used to identify a predicted result value for an instruction allocated to that entry. Dispatch circuitry maintains a record of pending instructions awaiting execution by the processing circuitry, and selects pending instructions from the record for dispatch to the processing circuitry for execution. The dispatch circuitry is arranged to enable at least one pending instruction to be speculatively executed by the processing circuitry using as a source operand a predicted result value provided by the value prediction storage. Allocation circuitry is arranged to apply a default allocation policy to identify a first instruction to be allocated an entry in the value prediction storage. However, the allocation circuitry is further responsive to a trigger condition to identify a dependent instruction whose result value will be dependent on the result value produced by executing the first instruction, and to then allocate an entry in the value prediction storage to store a predicted result value for the identified dependent instruction. Such an approach can enable performance improvements to be achieved through the use of predicted result values even in situations where the prediction accuracy of the predicted result value for the first instruction proves not to be that high, by instead enabling a predicted result value for the dependent instruction to be used to allow speculative execution of further dependent instructions. 1. An apparatus comprising:processing circuitry to execute a sequence of instructions;value prediction storage comprising a plurality of entries, each entry being used to identify a predicted result value for an instruction allocated to that entry;dispatch circuitry to maintain a record of pending instructions ...

Подробнее

Номер записи: 181

02-01-2020 дата публикации

COOPERATIVE WORKGROUP SCHEDULING AND CONTEXT PREFETCHING

Номер: US20200004586A1

Автор: BECKMANN Bradford, Dutu Alexandru, SINCLAIR Matthew David, Wood David A.

Принадлежит:

A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup. 1. A method comprising:preempting a first workgroup in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal, wherein the first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value;scheduling a second workgroup for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value; andprefetching a third context into registers of the processor core based on the first hint and the second value.2. The method of claim 1 , wherein preempting the first workgroup comprises storing the first context in a memory claim 1 , and wherein scheduling the second workgroup for execution comprises writing the second context from the memory into the registers of the processor core.3. The method of claim 2 , wherein writing the second context from the memory into the registers of the processor core comprises prefetching the second ...

Подробнее

Номер записи: 182

03-01-2019 дата публикации

METHODS AND APPARATUS FOR HANDLING RUNTIME MEMORY DEPENDENCIES

Номер: US20190004804A1

Автор: Garvey Joseph, Hagiescu Miriste Andrei Mihai, Sinclair Byron

Принадлежит: Intel Corporation

An integrated circuit may include elastic datapaths or pipelines, through which software threads or iterations of loops, may be executed. Throttling circuitry may be coupled along an elastic pipeline in the integrated circuit. The throttling circuitry may include dependency detection circuitry that dynamically detect memory dependency issues that may arise during runtime. To mitigate these dependency issues, the throttling circuitry may assert stall signals to upstream stages in the pipeline. 1. An integrated circuit , comprising:a memory circuit; and memory access circuitry that reads from the memory circuit using a load address and that writes into the memory circuit using a store address; and', 'throttling circuitry coupled to the memory access circuitry, the throttling circuitry is configured to compare the load address with the store address and to selectively stall a stage in the pipelined datapath based on the comparison., 'a pipelined datapath coupled to the memory circuit, the pipelined datapath comprises2. The integrated circuit defined in claim 1 , wherein the throttling circuitry comprises an address table configured to store a plurality of store addresses claim 1 , and wherein the plurality of store addresses include the store address.3. The integrated circuit defined in claim 1 , wherein the throttling circuitry comprises an address table configured to store a plurality of load addresses claim 1 , and wherein the plurality of load addresses include the load address.4. The integrated circuit defined in claim 1 , wherein the memory access circuitry comprises:a memory loading circuit that reads from the memory circuit using the load address;a memory storing circuit that writes into the memory circuit using the store address, wherein at least a portion of the throttling circuitry is interposed between the memory loading circuit and the stalled stage.5. The integrated circuit defined in claim 4 , wherein the pipelined datapath further comprises compute ...

Подробнее

Номер записи: 183

03-01-2019 дата публикации

Stream processor with overlapping execution

Номер: US20190004807A1

Автор: Bin He, Brian D. Emberling, Jian Yang, Jiasheng Chen, Michael J. Mantor, Qingcheng WANG, YunXiao Zou

Принадлежит: Advanced Micro Devices Inc

Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

Подробнее

Номер записи: 184

03-01-2019 дата публикации

INSTRUCTIONS FOR REMOTE ATOMIC OPERATIONS

Номер: US20190004810A1

Автор: Hughes Christopher J., JAYASIMHA DODDABALLAPUR N., Park Jong Soo, Sury Samantika S., Svennebring Jonas, Xiang Lingxiang

Принадлежит:

Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location. 1. A processor to execute an instruction atomically and with weak order , the processor comprising:fetch circuitry to fetch the instruction from a code storage, the instruction comprising an opcode, a source identifier, and a destination identifier;decode circuitry to decode the fetched instruction; anda scheduling circuit to select an execution circuit among multiple circuits in the system to execute the instruction, the scheduling circuit further to schedule execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance;wherein the execution circuit is to execute the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, wherein the executing comprises atomically reading a datum from a location identified by the destination identifier, performing an ...

Подробнее

Номер записи: 185

03-01-2019 дата публикации

STREAM PROCESSOR WITH DECOUPLED CROSSBAR FOR CROSS LANE OPERATIONS

Номер: US20190004814A1

Автор: Carson Derek, Chen Jiasheng, Hakami Mohammad Reza, He Bin, Lottes Timothy, Mantor Michael J., Smith Justin David

Принадлежит:

Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands. 1. A system comprising:a multi-lane execution pipeline;a vector register file; anda crossbar; retrieve a plurality of data operands from the vector register file;', 'convey the plurality of data operands to the multi-lane execution pipeline via the crossbar responsive to determining a permutation is required; and', 'convey the plurality of data operands to the multi-lane execution pipeline by bypassing the crossbar responsive to determining a permutation is not required., 'wherein the system is configured to2. The system as recited in claim 1 , wherein the crossbar comprises multiple layers claim 1 , and wherein the system is further configured to:perform a first permutation of data operands across lanes of the multi-lane execution pipeline with a first layer of N×N crossbars, wherein N is a positive integer; andperform a second permutation of data operands across lanes of the multi-lane execution pipeline with a second layer of N×N crossbars.3. The system as recited in claim 1 , wherein the crossbar comprises a first N/2-by-N/2 ...

Подробнее

Номер записи: 186

03-01-2019 дата публикации

STREAMING ENGINE WITH SHORT CUT START INSTRUCTIONS

Номер: US20190004853A1

Автор: Anderson Timothy, Zbiciak Joseph

Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream recalled memory. Streams are started by one of two types of stream start instructions. A stream start ordinary instruction specifies a register storing a stream start address and a register of storing a stream definition template which specifies stream parameters. A stream start short-cut instruction specifies a register storing a stream start address and an implied stream definition template. A functional unit is responsive to a stream operand instruction to receive at least one operand from a stream head register. The stream template supports plural nested loops with short-cut start instructions limited to a single loop. The stream template supports data element promotion to larger data element size with sign extension or zero extension. A set of allowed stream short-cut start instructions includes various data sizes and promotion factors. 1. A digital data processor comprising:an instruction memory storing instructions each specifying a data processing operation and at least one data operand field;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one functional unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results; an address generator for generating stream memory addresses corresponding to said stream of an instruction specified sequence of a plurality of data elements,', 'a stream head register storing a data element of said stream next to be used by said at least one functional unit;, 'a streaming engine connected to said instruction decoder operable in response to a stream start instruction to recall from memory a ...

Подробнее

Номер записи: 187

04-01-2018 дата публикации

HYBRID MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING HYBRID MEMORY CELL UNITS

Номер: US20180005107A1

Автор: LIU Shih-Chii, NEIL Daniel, Pfeiffer Michael

Принадлежит:

A recurrent neural network including an input layer, a hidden layer, and an output layer, wherein the hidden layer includes hybrid memory cell units, each of the hybrid memory cell units including a first memory cells of a first type, the first memory cells being configured to remember a first cell state value fed back to each of gates to determine a degree to which each of the gates is open or closed, and configured to continue to update the first cell state value, and a second memory cells of a second type, each second memory cell of the second memory cells including a first time gate configured to control a second cell state value of the second memory cell based on phase signals of an oscillatory frequency, and a second time gate configured to control an output value of the second memory cell based on the phase signals, and each second memory cell of the second memory cells being configured to remember the second cell state value. 1. A recurrent neural network comprising:an input layer;a hidden layer; andan output layer, a first memory cells of a first type, the first memory cells being configured to remember a first cell state value fed back to each of gates to determine a degree to which each of the gates is open or closed, and configured to continue to update the first cell state value; and', 'a second memory cells of a second type, each second memory cell of the second memory cells comprising a first time gate configured to control a second cell state value of the second memory cell based on phase signals of an oscillatory frequency, and a second time gate configured to control an output value of the second memory cell based on the phase signals, and each second memory cell of the second memory cells being configured to remember the second cell state value., 'wherein the hidden layer comprises hybrid memory cell units, each of the hybrid memory cell units comprising2. The recurrent neural network of claim 1 , wherein each of the hybrid memory cell units is ...

Подробнее

Номер записи: 188

03-01-2019 дата публикации

High-Speed, Fixed-Function, Computer Accelerator

Номер: US20190004995A1

Автор: Anthony Nowatzki, Karthikeyan Sankaralingam, Vinay Gangadhar

Принадлежит: WISCONSIN ALUMNI RESEARCH FOUNDATION

A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.

Подробнее

Номер записи: 189

03-01-2019 дата публикации

DEPLOYMENT OF INDEPENDENT DATABASE ARTIFACT GROUPS

Номер: US20190005108A1

Автор: Bregler Jonathan, Bunte Alexander, Harren Arne, Kellner Andreas, Kuntze Daniel, Lueders Simon, Sauermann Volker, Schnaubelt Michael, Tran Le-Huan Stefan

Принадлежит:

A dependency graph is generated for database files. An unvisited node of the dependency graph is selected and a breadth-first-search performed starting from the selected unvisited node. Results of the breadth-first-search is defined as a group. A group assignment for the database files is returned. 1. A computer-implemented method , comprising:generating a dependency graph for database files;selecting an unvisited node of the dependency graph;performing a breadth-first-search (BFS) starting from the selected unvisited node;defining results of the BFS as a group; andreturning a group assignment for the database files.2. The computer-implemented method of claim 1 , wherein the selected unvisited node is selected arbitrarily and the BFS traverses nodes of the dependency graph in both incoming and outgoing directions of the dependency graph edges.3. The computer-implemented method of claim 1 , wherein each node of the dependency graph is assigned a unique ID from 0 to n−1 claim 1 , where n is the node count in the dependency graph.4. The computer-implemented method of claim 1 , further comprising marking entries in a Boolean array to indicate visited nodes in the dependency graph.5. The computer-implemented method of claim 1 , wherein a group is a set of file uniform resource identifiers.6. The computer-implemented method of claim 1 , further comprising:selecting another unvisited node of the dependency graph if unvisited nodes exist in the dependency graph; andfiltering files for deployment if no unvisited nodes exist in the dependency graph.7. The computer-implemented method of claim 1 , further comprising initiating deployment of database files based upon group information.8. A non-transitory claim 1 , computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising:generating a dependency graph for database files;selecting an unvisited node of the dependency graph;performing a breadth-first-search (BFS) ...

Подробнее

Номер записи: 190

01-01-2015 дата публикации

Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

Номер: US20150006840A1

Автор: Martin Ohmacht

Принадлежит: International Business Machines Corp

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

Подробнее

Номер записи: 191

01-01-2015 дата публикации

INSTRUCTION ORDER ENFORCEMENT PAIRS OF INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS

Номер: US20150006851A1

Автор: DIXON Martin Guy, RASH William C., SANTIAGO Yazmin A.

Принадлежит:

A processor of an aspect includes an instruction fetch unit to fetch a pair of instruction order enforcement instructions. The pair of instruction order enforcement instructions are part of an instruction set of the processor. The pair of instruction order enforcement instructions includes an activation instruction and an enforcement instruction. The activation instruction is to occur before the enforcement instruction in a program order. The processor also includes an instruction order enforcement module. The instruction order enforcement module, in response to the pair of the instruction order enforcement instructions, is to prevent instructions occurring after the enforcement instruction in the program order, from being processed prior to the activation instruction, in an out-of-order portion of the processor. Other processors are also disclosed, as are various methods, systems, and instructions. 1. A processor comprising:an instruction fetch unit to fetch a pair of instruction order enforcement instructions, which are to be part of an instruction set of the processor, the pair of instruction order enforcement instructions to include an activation instruction and an enforcement instruction, the activation instruction to occur before the enforcement instruction in a program order; andan instruction order enforcement module, in response to the pair of the instruction order enforcement instructions, to prevent instructions occurring after the enforcement instruction in the program order, from being processed prior to the activation instruction, in an out-of-order portion of the processor.2. The processor of claim 1 , wherein the instruction order enforcement module comprises:an activation module to activate instruction order enforcement, in response to the activation instruction, at a first stage of a pipeline of the processor; anda blocking module coupled with the activation module, the blocking module, while the instruction order enforcement is activated, to ...

Подробнее

Номер записи: 192

01-01-2015 дата публикации

Mode dependent partial width load to wider register processors, methods, and systems

Номер: US20150006856A1

Автор: Martin Guy Dixon, William C. Rash, Yazmin A. SANTIAGO

Принадлежит: Intel Corp

A method of an aspect is performed by a processor. The method includes receiving a partial width load instruction. The partial width load instruction indicates a memory location of a memory as a source operand and indicates a register as a destination operand. The method includes loading data from the indicated memory location to the processor in response to the partial width load instruction. The method includes writing at least a portion of the loaded data to a partial width of the register in response to the partial width load instruction. The method includes finishing writing the register with a set of bits stored in a remaining width of the register that have bit values that depend on a partial width load mode of the processor. The partial width load instruction does not indicate the partial width load mode. Other methods, processors, and systems are also disclosed.

Подробнее

Номер записи: 193

01-01-2015 дата публикации

MULTIFUNCTIONAL HEXADECIMAL INSTRUCTION FORM SYSTEM AND PROGRAM PRODUCT

Номер: US20150006859A1

Автор: Schwarz Eric M., Smith, SR. Ronald M.

Принадлежит:

A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle. 1. A computer system supporting both Binary Floating Point (BFP) operations and non-BFP floating point operations , wherein the non-BFP floating point operations comprise Hexadecimal Floating Point (HFP) operations , the computer system comprising:computer memory;a processor in communications with the computer memory the processor comprising an instruction fetching element for fetching instructions from memory to perform a method comprising:providing at least three operands of a combination multiply add/subtract instruction to a main fraction dataflow element of a floating point element, each of said operands comprising BFP floating point format values or non-BFP floating point format values, said non-BFP floating point format values comprising floating point format values other than BFP format values, and said main fraction dataflow element configured to process both BFP floating point format values and non-BFP floating point format values for said operands;responsive to the combined multiply add/subtract instruction being a non-BFP floating point multiply add/subtract instruction, the main fraction dataflow element performing a non-BFP floating point operation on the three operands to produce a non-BFP main fraction result, the ...

Подробнее

Номер записи: 194

20-01-2022 дата публикации

GRAPHICS PROCESSORS

Номер: US20220020108A1

Автор: Uhrenholt Olof Henrik

Принадлежит: ARM LIMITED

To suspend the processing for a group of one or more execution threads currently executing a shader program for an output being generated by a graphics processor, the issuing of shader program instructions for execution by the group of one or more execution threads is stopped, and any outstanding register-content affecting transactions for the group of one or more execution threads are allowed to complete. Once all outstanding register-content affecting transactions for the group of one or more execution threads have completed, the content of the registers associated with the threads of the group of one or more execution threads, and a set of state information for the group of one or more execution threads, including at least an indication of the last instruction in the shader program that was executed for the threads of the group of one or more execution threads, are stored to memory. 1. A method of operating a data processor that includes a programmable execution unit operable to execute programs , and in which when executing a program , the programmable execution unit executes the program for respective groups of one or more execution threads , each execution thread in a group of execution threads corresponding to a respective work item of an output being generated , and each execution thread having an associated set of registers for storing data for the execution thread , the method comprising:in response to a command to suspend the processing of an output being generated by the data processor: stopping the issuing of program instructions for execution by the group of one or more execution threads;', 'waiting for any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads to complete; and', storing to memory:', 'the content of the registers associated with the threads of the group of one or more execution threads; and', 'a set of ...

Подробнее

Номер записи: 195

27-01-2022 дата публикации

OUT OF ORDER MEMORY REQUEST TRACKING STRUCTURE AND TECHNIQUE

Номер: US20220027160A1

Автор: FETTERMAN Michael A., GADRE Shirish, GEBHART Mark, HAYENGA Mitchell, HEINRICH Steven, JANDHYALA Ramesh, MADHAVAN Raghavan, PARANJAPE Omkar, ROBERTSON JAMES, SCHOTTMILLER Jeff

Принадлежит:

In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic. 1. A memory request tracking circuit for use with a streaming cache memory , the memory request tracking circuit comprising:a tag check configured to detect cache misses;plural tracking queues; anda queue mapper coupled to the tag check and the plural tracking queues, the queue mapper being configured to distribute request tracking information to the plural tracking queues to enable in-order and out-of-order memory request returns.2. The memory request tracking circuit of wherein the queue mapper is programmable to preserve in-order memory request return handling for a first type of memory requests and to enable out-of-order memory request return handling for a second type of memory requests different from the first type of memory requests.3. The memory request tracking circuit of wherein the first and second types of memory requests are selected from the group consisting of loads from local or global memory; texture memory/storage; and acceleration data structure storage.4. The memory request tracking circuit of wherein the plural tracking queues comprise first through N tracking queues claim 1 , and the queue mapper allocates a first tracking queue to a particular warp and distributes certain types of memory requests evenly across second through N tracking queues.5. The memory request tracking circuit of wherein the plural tracking queues each comprise a first-in-first-out storage.6. The memory request tracking circuit of further includes a pipelined checker picker that selects tracking queue outputs for application ...

Подробнее

Номер записи: 196

08-01-2015 дата публикации

COMPACT LINKED-LIST-BASED MULTI-THREADED INSTRUCTION GRADUATION BUFFER

Номер: US20150012730A1

Автор: Svendsen Kjeld

Принадлежит:

A processor and instruction graduation unit for a processor. In one embodiment, a processor or instruction graduation unit according to the present invention includes a linked-list-based multi-threaded graduation buffer and a graduation controller. The graduation buffer stores identification values generated by an instruction decode and dispatch unit of the processor as part of one or more linked-list data structures. Each linked-list data structure formed is associated with a particular program thread running on the processor. The number of linked-list data structures formed is variable and related to the number of program threads running on the processor. The graduation controller includes linked-list head identification registers and linked-list tail identification registers that facilitate reading and writing identifications values to linked-list data structures associated with particular program threads. The linked-list head identification registers determine which executed instruction result or results are next to be written to a register file. 1. A processor , comprising:a results buffer having a plurality of registers, each register for temporarily storing a result of an executed instruction prior to the result being written to a register file;a results buffer allocater that generates identification values, wherein each identification value identifies a register of the results buffer in which an executed instruction result can be temporarily stored; anda graduation buffer having a plurality of registers, wherein identification values generated by the results buffer allocater are temporarily stored as part of a linked-list data structure.2. The processor of claim 1 , wherein identification values generated by the results buffer allocater are temporarily stored as part of a plurality of linked-list data structures claim 1 , each linked-list data structure being associated with a particular program thread.3. The processor of claim 1 , further comprising:a ...

Подробнее

Номер записи: 197

14-01-2016 дата публикации

Silent memory instructions and miss-rate tracking to optimize switching policy on threads in a processing device

Номер: US20160011874A1

Автор: Chen Dan, Doron Orenstein, Enric G. Codina, Jacob Doweck, Josep M. Codina, Rekai Gonzalez-Alberquilla, Tanausu Ramirez, Tomer Stark

Принадлежит: Intel Corp

A processing device implementing silent memory instructions and miss-rate tracking to optimize switching policy on threads is disclosed. A processing device of the disclosure includes a branch prediction unit (BPU) to predict that an instruction of a first thread in a current execution context of the processing device is a delinquent instruction, indicate that the first thread including the delinquent instruction is in a silent execution mode, indicate that the delinquent instruction is to be executed as a silent instruction, switch an execution context of the processing device to a second thread, and when the execution context returns to the first thread, cause the delinquent instruction to be re-executed as a regular instruction.

Подробнее

Номер записи: 198

11-01-2018 дата публикации

Computing System and controller thereof

Номер: US20180011710A1

Автор: Kaiyuan Guo, Song Yao

Принадлежит: Beijing Deephi Technology Co Ltd, Deephi Technology Co Ltd

Computing system and controller thereof are disclosed for ensuring the correct logical relationship between multiple instructions during their parallel execution. The computing system comprises: a plurality of functional modules each performing a respective function in response to an instruction for the given functional module; and a controller for determining whether or not to send an instruction to a corresponding functional module according to dependency relationship between the plurality of instructions.

Подробнее

Номер записи: 199

11-01-2018 дата публикации

METHOD FOR EXECUTING MULTITHREADED INSTRUCTIONS GROUPED INTO BLOCKS

Номер: US20180011738A1

Автор: Abdallah Mohammad

Принадлежит:

A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline. 1. A method of executing multithreaded instructions grouped into blocks , the method comprising:receiving an incoming instruction sequence at a front end of an execution pipeline;grouping instructions from the instruction sequence to form instruction blocks, where a first group of instruction blocks are a part of a first thread of execution and a second group of instruction blocks are a part of a second thread of execution;storing the first group of instruction blocks and the second group of instruction blocks in a scheduler array, where a first commit pointer points to a location in the scheduler array for a next block of the first group to be executed and a second commit pointer points to a next block of the second group to be execute;scheduling the instructions of the instruction blocks to execute in accordance with a position in the scheduler array; andtracking execution of the first thread and second thread to enforce a fairness policy using allocation counters to track a number of instruction blocks in the scheduler array for each thread.2. The method of claim 1 , wherein each allocation counter tracks a number of entries in a correlated thread pointer map.3. The method of claim 1 , wherein the fairness policy prevents any thread from exceeding an allocation threshold.4. The method of claim 1 , further comprising:tracking an array entry to place a next block in the scheduler array for each thread using a separate allocate pointer.5. The method of claim 1 , wherein ...

Подробнее

Номер записи: 200