Поиск патентов

Настройки

Глубина выборки

Укажите год

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Ключевые слова. Может быть несколько по одной на строку

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка

Автор

Ведите корректный номера.

Владелец

Ведите корректный номера.

Классы IPC

Ведите корректный номера.

Классы CPC

Ведите корректный номера.

Начиная с года

Укажите год

Заканчивая годом

Укажите год

Применить Всего найдено 9585. Отображено 100.

26-01-2012 дата публикации

Parallel loop management

Номер: US20120023316A1

Автор: Brian Flachs, Charles R. Johns, Ulrich Weigand

Принадлежит: International Business Machines Corp

The illustrative embodiments comprise a method, data processing system, and computer program product having a processor unit for processing instructions with loops. A processor unit creates a first group of instructions having a first set of loops and second group of instructions having a second set of loops from the instructions. The first set of loops have a different order of parallel processing from the second set of loops. A processor unit processes the first group. The processor unit monitors terminations in the first set of loops during processing of the first group. The processor unit determines whether a number of terminations being monitored in the first set of loops is greater than a selectable number of terminations. In response to a determination that the number of terminations is greater than the selectable number of terminations, the processor unit ceases processing the first group and processes the second group.

Подробнее

Номер записи: 1

09-02-2012 дата публикации

Event-based bandwidth allocation mode switching method and apparatus

Номер: US20120036518A1

Автор: Jack Kang, Yu-Chi Chuang

Принадлежит: Jack Kang, Yu-Chi Chuang

A system, apparatus, and method for allocation mode switching on an event-driven basis are described herein. The allocation mode switching method includes detecting an event, selecting a bandwidth allocation mode associated with the detected event, and allocating a plurality of execution cycles of an instruction execution period of a processor core among a plurality of instruction execution threads based at least in part on the selected bandwidth allocation mode. Other embodiments may be described and claimed.

Подробнее

Номер записи: 2

08-03-2012 дата публикации

Processor

Номер: US20120060017A1

Автор: Hiroyuki Morishita

Принадлежит: Panasonic Corp

A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.

Подробнее

Номер записи: 3

22-03-2012 дата публикации

Constant Buffering for a Computational Core of a Programmable Graphics Processing Unit

Номер: US20120069033A1

Автор: John Brothers, Yang (Jeff) Jiao, Yijung Su

Принадлежит: Via Technologies Inc

Embodiments of the present disclosure are directed to graphics processing systems, comprising: a plurality of execution units, wherein one of the execution units is configurable to process a thread corresponding to a rendering context, wherein the rendering context comprises a plurality of constants with a priority level; a constant buffer configurable to store the constants of the rendering context into a plurality of slot in a physical storage space; and an execution unit control unit configurable to assign the thread to one of the execution units; a constant buffer control unit providing a translation table for the rendering context to map the corresponding constants into the slots of the physical storage space. Comparable methods are also disclosed.

Подробнее

Номер записи: 4

22-03-2012 дата публикации

Scaleable Status Tracking Of Multiple Assist Hardware Threads

Номер: US20120072707A1

Автор: Giles Roger Frazier, Richard Louis Arndt, Ronald P. Hall

Принадлежит: International Business Machines Corp

A processor includes an initiating hardware thread, which initiates a first assist hardware thread to execute a first code segment. Next, the initiating hardware thread sets an assist thread executing indicator in response to initiating the first assist hardware thread. The set assist thread executing indicator indicates whether assist hardware threads are executing. A second assist hardware thread initiates and begins executing a second code segment. In turn, the initiating hardware thread detects a change in the assist thread executing indicator, which signifies that both the first assist hardware thread and the second assist hardware thread terminated. As such, the initiating hardware thread evaluates assist hardware thread results in response to both of the assist hardware threads terminating.

Подробнее

Номер записи: 5

05-04-2012 дата публикации

Dynamically adjusting pipelined data paths for improved power management

Номер: US20120084540A1

Автор: Pascal A. Nsame, Sebastian T. Ventrone, Susan K. Lichtensteiger

Принадлежит: International Business Machines Corp

A design structure embodied in a machine readable, non-transitory storage medium used in a design process includes a system for dynamically varying the pipeline depth of a computing device. The system includes a state machine that determines an optimum length of a pipeline architecture based on a processing function to be performed. A pipeline sequence controller, responsive to the state machine, varies the depth of the pipeline based on the optimum length. A plurality of clock splitter elements, each associated with a corresponding plurality of latch stages in the pipeline architecture, are coupled to the pipeline sequence controller and adapted to operate in a functional mode, one or more clock gating modes, and a pass-through flush mode. For each of the clock splitter elements operating in the pass-through flush mode, data is passed through the associated latch stage without oscillation of clock signals associated therewith.

Подробнее

Номер записи: 6

12-04-2012 дата публикации

Efficient implementation of arrays of structures on simt and simd architectures

Номер: US20120089792A1

Автор: Brett W. Coon, Brian Fahs, Henry Packard Moreton, John R. Nickolls, Kathleen Elliott Nickolls

Принадлежит: Linares Medical Devices LLC, Nvidia Corp

One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

Подробнее

Номер записи: 7

03-05-2012 дата публикации

Out-of-order load/store queue structure

Номер: US20120110280A1

Автор: Christopher D. Bryant

Принадлежит: Advanced Micro Devices Inc

The present invention provides a method and apparatus for supporting embodiments of an out-of-order load/store queue structure. One embodiment of the apparatus includes a first queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes one or more additional queues for storing memory operation in response to completion of a memory operation. The embodiment of the apparatus is configured to remove the memory operation from the first queue in response to the completion.

Подробнее

Номер записи: 8

17-05-2012 дата публикации

Retirement serialisation of status register access operations

Номер: US20120124340A1

Автор: James Nolan Hardage

Принадлежит: ARM LTD

A processor 2 for performing out-of-order execution of a stream of program instructions includes a special register access pipeline for performing status access instructions accessing a status register 20 . In order to serialise these status access instructions relative to other instructions within the system access timing control circuitry 32 permits dispatch of other instructions to proceed but controls the commit queue and the result queue such that no program instructions in program order succeeding the status access instruction are permitted to complete until after a trigger state has been detected in which all program instructions preceding in program order the status access instruction have been performed and made any updates to the architectural state. This is followed by the performance of the status access instruction itself.

Подробнее

Номер записи: 9

31-05-2012 дата публикации

Miss buffer for a multi-threaded processor

Номер: US20120137077A1

Автор: Jama I. Barreh, Manish K. Shah

Принадлежит: Oracle International Corp

A multi-threaded processor configured to allocate entries in a buffer for instruction cache misses is disclosed. Entries in the buffer may store thread state information for a corresponding instruction cache miss for one of a plurality of threads executable by the processor. The buffer may include dedicated entries and dynamically allocable entries, where the dedicated entries are reserved for a subset of the plurality of threads and the dynamically allocable entries are allocable to a group of two or more of the plurality of threads. In one embodiment, the dedicated entries are dedicated for use by a single thread and the dynamically allocable entries are allocable to any of the plurality of threads. The buffer may store two or more entries for a given thread at a given time. In some embodiments, the buffer may help ensure none of the plurality of threads experiences starvation with respect to instruction fetches.

Подробнее

Номер записи: 10

21-06-2012 дата публикации

Programming Language Exposing Idiom Calls

Номер: US20120159126A1

Автор: Randal C. Swanberg, Ravi K Arimilli, Satya P. Sharma

Принадлежит: International Business Machines Corp

A programming language may include hint instructions that may notify a programming idiom accelerator that a programming idiom is coming. An idiom begin hint exposes the programming idiom to the programming idiom accelerator. Thus, the programming idiom accelerator need not perform pattern matching or other forms of analysis to recognize a sequence of instructions. Rather, the programmer may insert idiom hint instructions, such as an idiom begin hint, to expose the idiom to the programming idiom accelerator. Similarly, an idiom end hint may mark the end of the programming idiom.

Подробнее

Номер записи: 11

12-07-2012 дата публикации

System and method for enforcing software security through cpu statistics gathered using hardware features

Номер: US20120179898A1

Автор: Augustin J. Farrugia, Ganna Zaks, Gianpaolo Fasoli, Jon McLachlan, Julien Lerouge, Pierre Betouin

Принадлежит: Apple Inc

This disclosure is directed to measuring hardware-based statistics, such as the number of instructions executed in a specific section of a program during execution, for enforcing software security. The counting can be accomplished through a specific set of instructions, which can either be implemented in hardware or included in the instruction set of a virtual machine. For example, the set of instructions can include atomic instructions of reset, start, stop, get instruction count, and get CPU cycle count. To obtain information on a specific section of code, a software developer can insert start and stop instructions around the desired code section. For each instruction in the identified code block, when the instruction is executed, a counter is incremented. The counter can be stored in a dedicated register. The gathered statistics can be used for a variety of purposes, such as detecting unauthorized code modifications or measuring code performance.

Подробнее

Номер записи: 12

16-08-2012 дата публикации

Obtaining And Releasing Hardware Threads Without Hypervisor Involvement

Номер: US20120210102A1

Автор: Giles Roger Frazier, Ronald P. Hall

Принадлежит: International Business Machines Corp

A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.

Подробнее

Номер записи: 13

23-08-2012 дата публикации

Thread transition management

Номер: US20120216004A1

Автор: Christopher M. Abernathy, Dung Q. Nguyen, Hung Q. Le, James A. Kahle, Mary D. Brown, Susan E. Eisen

Принадлежит: International Business Machines Corp

Various systems, processes, products, and techniques may be used to manage thread transitions. In particular implementations, a system and process for managing thread transitions may include the ability to determine that a transition is to be made regarding the relative use of two data register sets and determine, based on the transition determination, whether to move thread data in at least one of the data register sets to second-level registers. The system and process may also include the ability to move the thread data from at least one data register set to second-level registers based on the move determination.

Подробнее

Номер записи: 14

06-09-2012 дата публикации

Method, apparatus, and system for speculative execution event counter checkpointing and restoring

Номер: US20120227045A1

Автор: Konrad K. Lai, Laura A. Knauth, Martin G. Dixon, Peggy J. Irelan, Ravi Rajwar

Принадлежит: Intel Corp

An apparatus, method, and system are described herein for providing programmable control of performance/event counters. An event counter is programmable to track different events, as well as to be checkpointed when speculative code regions are encountered. So when a speculative code region is aborted, the event counter is able to be restored to it pre-speculation value. Moreover, the difference between a cumulative event count of committed and uncommitted execution and the committed execution, represents an event count/contribution for uncommitted execution. From information on the uncommitted execution, hardware/software may be tuned to enhance future execution to avoid wasted execution cycles.

Подробнее

Номер записи: 15

27-09-2012 дата публикации

Region-Weighted Accounting of Multi-Threaded Processor Core According to Dispatch State

Номер: US20120246447A1

Автор: James Wilson Bishop, Michael J. Genden, Philip Lee Vitale, Steven Bradford Herndon

Принадлежит: International Business Machines Corp

According to one embodiment of the present disclosure, an approach is provided in which a thread is selected from multiple active threads, along with a corresponding weighting value. Computational logic determines whether one of the multiple threads is dispatching an instruction and, if so, computes a dispatch weighting value using the selected weighting value and a dispatch factor that indicates a weighting adjustment of the selected weighting value. In turn, a resource utilization value of the selected thread is computed using the dispatch weighting value.

Подробнее

Номер записи: 16

27-09-2012 дата публикации

Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines

Номер: US20120246450A1

Автор: Mohammad Abdallah

Принадлежит: Soft Machines Inc

A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.

Подробнее

Номер записи: 17

08-11-2012 дата публикации

Verifying a processor design using a processor simulation model

Номер: US20120284007A1

Автор: Juergen Vielfort, Kai Weber, Stefan Letz

Принадлежит: International Business Machines Corp

An improved method of verifying a processor design using a processor simulation model in a simulation environment is disclosed, wherein the processor simulation model includes at least one execution unit for executing at least one instruction of a test file. The method includes tracking each execution of each of the at least one instruction, monitoring relevant signals in each simulation cycle, maintaining information about the execution of the at least one instruction, wherein the maintained information includes a determination of an execution length of a completely executed instruction, matching the maintained information about the completely executed instruction against a set of trap elements provided by the user through a trap file, and collecting the maintained information about the completely executed instruction in a monitor file in response to a match found between the maintained information and at least one of the trap elements.

Подробнее

Номер записи: 18

24-01-2013 дата публикации

Hardware acceleration components for translating guest instructions to native instructions

Номер: US20130024661A1

Автор: Mohammad Abdallah

Принадлежит: Soft Machines Inc

A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.

Подробнее

Номер записи: 19

24-01-2013 дата публикации

Control flow integrity

Номер: US20130024676A1

Автор: Andrew F. Glew, Clarence T. Tegreene, Daniel A. Gerrity

Принадлежит: Individual

In at least some embodiments, a processor in accordance with the present disclosure is operable to enforce control flow integrity. For examiner, a processor may comprise logic operable to execute a control flow integrity instruction specified to verify changes in control flow and respond to verification failure by at least one of a trap or an exception.

Подробнее

Номер записи: 20

31-01-2013 дата публикации

Data processing apparatus, data processing system, and data processing method

Номер: US20130028260A1

Автор: Mitsuru Mushano

Принадлежит: Mush A Co Ltd

A data-processing apparatus includes a plurality of processing units having frequency bands different from one another set thereto, the plurality of processing units to process packets each including data and processing information added to the data, the processing information including instruction information indicating one or more processing instructions to the data, each processing unit in the processing units including: an input/output unit to obtain, in the packets, only a packet whose address indicates the processing unit in the processing units, the address determined in accordance with the processing information; and an operation unit to execute the processing instruction in the packet obtained by the input/output unit, the input/output unit including a receiving unit to receive only an electromagnetic wave having a frequency band set to the processing unit and obtain the packet.

Подробнее

Номер записи: 21

07-02-2013 дата публикации

Information processing device and task switching method

Номер: US20130036426A1

Автор: Hiroyuki Igura

Принадлежит: NEC Corp

Disclosed is an information processing device and a task switching method that can reduce the time required for switching of tasks in a plurality of coprocessors. The information processing device includes a processor core; coprocessors including operation units that perform operation in response to a request from the processor core and operation storage units that store the contents of operation of the operation units, save storage units that store the saved contents of operation, a task switching control unit that outputs a save/restore request signal when switching a task on which operation is performed by the coprocessors-, and save/restore units that perform at least one of saving of the contents of operation in the operation storage units to the save storage units and restoration of the contents of operation in the save storage units to the operation storage units in response to the save/restore request signal.

Подробнее

Номер записи: 22

28-03-2013 дата публикации

Programming in a Simultaneous Multi-Threaded Processor Environment

Номер: US20130080838A1

Автор: Jen-Yeu Chen, Luai A. Abou-Emara, Ronald N. Kalla

Принадлежит: International Business Machines Corp

A system, method, and product are disclosed for testing multiple threads simultaneously. The threads share a real memory space. A first portion of the real memory space is designated as exclusive memory such that the first portion appears to be reserved for use by only one of the threads. The threads are simultaneously executed. The threads access the first portion during execution. Apparent exclusive use of the first portion of the real memory space is permitted by a first one of the threads. Simultaneously with permitting apparent exclusive use of the first portion by the first one of the threads, apparent exclusive use of the first portion of the real memory space is also permitted by a second one of the threads. The threads simultaneously appear to have exclusive use of the first portion and may simultaneously access the first portion.

Подробнее

Номер записи: 23

04-04-2013 дата публикации

Tracking operand liveliness information in a computer system and performance function based on the liveliness information

Номер: US20130086367A1

Автор: Michael K. Gschwind, Valentina Salapura

Принадлежит: International Business Machines Corp

Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.

Подробнее

Номер записи: 24

04-04-2013 дата публикации

Generating compiled code that indicates register liveness

Номер: US20130086548A1

Автор: Michael K. Gschwind, Valentina Salapura

Принадлежит: International Business Machines Corp

Object code is generated from an internal representation that includes a plurality of source operands. The generating includes performing for each source operand in the internal representation determining whether a last use has occurred for the source operand. The determining includes accessing a data flow graph to determine whether all uses of a live range have been emitted. If it is determined that a last use has occurred for the source operand, an architected resource associated with the source operand is marked for last-use indication. A last-use indication is then generated for the architected resource. Instructions and the last-use indications are emitted into the object code.

Подробнее

Номер записи: 25

25-04-2013 дата публикации

Data processing device and method, and processor unit of same

Номер: US20130103930A1

Автор: Takashi Horikawa

Принадлежит: NEC Corp

A processor unit ( 200 ) includes: cache memory ( 210 ); an instruction execution unit ( 220 ); a processing unit ( 230 ) that detects fact that a thread enters an exclusive control section which is specified in advance to become a bottleneck; a processing unit ( 240 ) that detects a fact that the thread exits the exclusive control section; and an execution flag ( 250 ) that indicates whether there is the thread that is executing a process in the exclusive control section based on detection results. The cache memory ( 210 ) temporarily stores a priority flag in each cache entry, and the priority flag indicates whether data is to be used during execution in the exclusive control section. When the execution flag ( 250 ) is set, the processor unit ( 200 ) sets the priority flag that belongs to an access target of cache entries. The processor unit ( 200 ) leaves data used in the exclusive control section in the cache memory by determining a replacement target of cache entries using the priority flag when a cache miss occurs.

Подробнее

Номер записи: 26

02-05-2013 дата публикации

Digital Signal Processing Data Transfer

Номер: US20130111159A1

Автор: Adrian J. Anderson, Gareth J. Davies, Gary C. Wass

Принадлежит: Imagination Technologies Ltd

A technique for transferring data in a digital signal processing system is described. In one example, the digital signal processing system comprises a number of fixed function accelerators, each connected to a memory access controller and each configured to read data from a memory device, perform one or more operations on the data, and write data to the memory device. To avoid hardwiring the fixed function accelerators together, and to provide a configurable digital signal processing system, a multi-threaded processor controls the transfer of data between the fixed function accelerators and the memory. Each processor thread is allocated to a memory access channel, and the threads are configured to detect an occurrence of an event and, responsive to this, control the memory access controller to enable a selected fixed function accelerator to read data from or write data to the memory device via its memory access channel.

Подробнее

Номер записи: 27

30-05-2013 дата публикации

Scaleable Status Tracking Of Multiple Assist Hardware Threads

Номер: US20130139168A1

Автор: Giles Roger Frazier, Richard Louis Arndt, Ronald P. Hall

Принадлежит: International Business Machines Corp

A processor includes an initiating hardware thread, which initiates a first assist hardware thread to execute a first code segment. Next, the initiating hardware thread sets an assist thread executing indicator in response to initiating the first assist hardware thread. The set assist thread executing indicator indicates whether assist hardware threads are executing. A second assist hardware thread initiates and begins executing a second code segment. In turn, the initiating hardware thread detects a change in the assist thread executing indicator, which signifies that both the first assist hardware thread and the second assist hardware thread terminated. As such, the initiating hardware thread evaluates assist hardware thread results in response to both of the assist hardware threads terminating.

Подробнее

Номер записи: 28

06-06-2013 дата публикации

System and method for performing shaped memory access operations

Номер: US20130145124A1

Автор: Jack Hilaire Choquette, Manuel Olivier Gautho, Ming Y. (Michael) Siu, Xiaogang Qiu

Принадлежит: Individual

One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.

Подробнее

Номер записи: 29

20-06-2013 дата публикации

Policies for Shader Resource Allocation in a Shader Core

Номер: US20130155077A1

Автор: Mark Leather, Michael Mantor, Philip J. Rogers, Ralph Clay Taylor, Rex McCrary, Robert Scott Hartog, Sebastien Nussbaum, Thomas WOLLER

Принадлежит: Advanced Micro Devices Inc

A method of determining priority within an accelerated processing device is provided. The accelerated processing device includes compute pipeline queues that are processed in accordance with predetermined criteria. The queues are selected based on priority characteristics and the selected queue is processed until a time quantum lapses or a queue having a higher priority becomes available for processing.

Подробнее

Номер записи: 30

20-06-2013 дата публикации

Low latency variable transfer network for fine grained parallelism of virtual threads across multiple hardware threads

Номер: US20130159669A1

Автор: Alfred T. Watson, III, Miguel Comparan, Robert A. Shearer, Russell D. Hoover

Принадлежит: International Business Machines Corp

A method and circuit arrangement utilize a low latency variable transfer network between the register files of multiple processing cores in a multi-core processor chip to support fine grained parallelism of virtual threads across multiple hardware threads. The communication of a variable over the variable transfer network may be initiated by a move from a local register in a register file of a source processing core to a variable register that is allocated to a destination hardware thread in a destination processing core, so that the destination hardware thread can then move the variable from the variable register to a local register in the destination processing core.

Подробнее

Номер записи: 31

20-06-2013 дата публикации

Verifying speculative multithreading in an application

Номер: US20130159681A1

Автор: Mitchell D. Felton

Принадлежит: International Business Machines Corp

Verifying speculative multithreading in an application executing in a computing system, including: executing one or more test instructions serially thereby producing a serial result, including insuring that all data dependencies among the test instructions are satisfied; executing the test instructions speculatively in a plurality of threads thereby producing a speculative result; and determining whether a speculative multithreading error exists including: comparing the serial result to the speculative result and, if the serial result does not match the speculative result, determining that a speculative multithreading error exists.

Подробнее

Номер записи: 32

27-06-2013 дата публикации

Apparatus comprising a plurality of arithmetic logic units

Номер: US20130166890A1

Автор: David Smith

Принадлежит: STMicroelectronics Research and Development Ltd

An arrangement of at least two arithmetic logic units carries out an operation defined by a decoded instruction including at least one operand and more than one operation code. The operation codes and at least one operand are received and corresponding executions are performed by the arithmetic logic units on a single clock cycle. The result of the execution from one arithmetic logic unit is used as an operand by a further arithmetic logic unit. The decoding of the instruction is performed in an immediately preceding single clock cycle.

Подробнее

Номер записи: 33

27-06-2013 дата публикации

Data processing apparatus with an execution pipeline and error recovery unit and method of operating the data processing apparatus

Номер: US20130166952A1

Автор: David Michael Bull, Frederic Claude Marie Piry, Guillaume SCHON, Luca Scalabrino, Mélanie Emanuelle Lucie Teyssier

Принадлежит: ARM LTD

A data processing apparatus executes instructions in a sequence of pipelined execution stages. An error detection unit twice samples a signal associated with execution of an instruction and generates an error signal if the samples differ. An exception storage unit maintains an age-ordered list of entries corresponding to instructions issued to the execution pipeline and can mark an entry to show if the error signal has been generated in association with that instruction. A timer unit is responsive to generation of the error signal to initiate timing of a predetermined time period. An error recovery unit initiates a soft pipeline flush procedure if an oldest pending entry in the list has said error marker stored in association therewith and initiates a hard pipeline flush procedure if said predetermined time period elapses, said hard flush procedure comprising resetting said pipeline to a predetermined state.

Подробнее

Номер записи: 34

04-07-2013 дата публикации

Processor for Executing Wide Operand Operations Using a Control Register and a Results Register

Номер: US20130173888A1

Автор: Alexia Massalin, Craig Hansen, John Moussouris

Принадлежит: Microunity Systems Engineering Inc

A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Подробнее

Номер записи: 35

11-07-2013 дата публикации

Providing logical partions with hardware-thread specific information reflective of exclusive use of a processor core

Номер: US20130179886A1

Автор: Bruce Mealy, Giles R. Frazier, Naresh Nayar

Принадлежит: International Business Machines Corp

Techniques for simulating exclusive use of a processor core amongst multiple logical partitions (LPARs) include providing hardware thread-dependent status information in response to access requests by the LPARs that is reflective of exclusive use of the processor by the LPAR accessing the hardware thread-dependent information. The information returned in response to the access requests is transformed if the requestor is a program executing at a privilege level lower than the hypervisor privilege level, so that each logical partition views the processor as though it has exclusive use of the processor. The techniques may be implemented by a logical circuit block within the processor core that transforms the hardware thread-specific information to a logical representation of the hardware thread-specific information or the transformation may be performed by program instructions of an interrupt handler that traps access to the physical register containing the information.

Подробнее

Номер записи: 36

01-08-2013 дата публикации

Major branch instructions

Номер: US20130198492A1

Автор: Brian R. Prasky, Christopher A. Krygowski, Chung-Lung K. Shum, Fadi Y. Busaba, Steven R. Carlough

Принадлежит: International Business Machines Corp

Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.

Подробнее

Номер записи: 37

01-08-2013 дата публикации

Major branch instructions

Номер: US20130198496A1

Автор: Brian R. Prasky, Christopher A. Krygowski, Chung-Lung K. Shum, Fadi Y. Busaba, Steven R. Carlough

Принадлежит: International Business Machines Corp

Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.

Подробнее

Номер записи: 38

08-08-2013 дата публикации

Multi-threaded processor instruction balancing through instruction uncertainty

Номер: US20130205118A1

Автор: Alper Buyuktosunoglu, Brian R. Prasky, Vijayalakshmi Srinivasan

Принадлежит: International Business Machines Corp

A computer system for instruction execution includes a processor having a pipeline. The system is configured to perform a method including fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline.

Подробнее

Номер записи: 39

03-10-2013 дата публикации

Code generation method, and information processing apparatus

Номер: US20130262824A1

Автор: Shuichi Chiba, Takashi Arakawa

Принадлежит: Fujitsu Ltd

A computer-readable recording medium having stored therein a program for causing a computer to execute a digital signature process includes determining that a first specific instruction for executing parallel calculations of the same type, each calculation operating on a different piece of data, is generated by combining first and second instructions included in a first code, retrieving, from the first code, a third instruction for calculating data referenced by the first instruction and a fourth instruction for calculating data referenced by the second instruction, and selecting the third and fourth instructions as candidates of instructions to be combined with each other preferentially to generate a second specific instruction which is different from the first specific instruction.

Подробнее

Номер записи: 40

03-10-2013 дата публикации

Methods and apparatus to avoid surges in di/dt by throttling gpu execution performance

Номер: US20130262831A1

Автор: Jack Hilaire Choquette, Olivier Giroux, Peter Michael NELSON

Принадлежит: Nvidia Corp

Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.

Подробнее

Номер записи: 41

03-10-2013 дата публикации

Memory Disambiguation Hardware To Support Software Binary Translation

Номер: US20130262838A1

Автор: Abhay S. KANHERE, Arvind Krishnaswamy, Muawya M. Al-Otoom, Omar M. Shaikh, Paul Caprioli

Принадлежит: Intel Corp

A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.

Подробнее

Номер записи: 42

03-10-2013 дата публикации

Semiconductor device

Номер: US20130263129A1

Автор: Hitoshi Suzuki, Koji Adachi

Принадлежит: Renesas Electronics Corp

A semiconductor device includes an instruction decoder that decodes an instruction code and thereby generates instruction information, an execution unit that performs an operation based on the instruction information through pipeline processing, and a pipeline control unit that controls an order of the instruction code to be processed in the pipeline processing, in which the pipeline control unit includes a register for defining presence/absence of an authority to execute a first privilege program for each virtual machine, the first privilege program being to be executed on one virtual machine, refers to the register, and when the virtual machine that has issued the instruction code relating to the first privilege program has an authority to execute the first privilege program, instructs the execution unit to execute a process based on the instruction code relating to a second privilege program, based on an operation of the first privilege program.

Подробнее

Номер записи: 43

31-10-2013 дата публикации

Method of Concurrent Instruction Execution and Parallel Work Balancing in Heterogeneous Computer Systems

Номер: US20130290688A1

Автор: Bratanov Stanislav Victorovich

Принадлежит:

Embodiments of the present invention provide for concurrent instruction execution in heterogeneous computer systems by forming a parallel execution context whenever a first software thread encounters a parallel execution construct. The parallel execution context may comprise a reference to instructions to be executed concurrently, a reference to data said instructions may depend on, and a parallelism level indicator whose value specifies the number of times said instructions are to be executed. The first software thread may then signal to other software threads to begin concurrent execution of instructions referenced in said context. Each software thread may then decrease the parallelism level indicator and copy data referenced in the parallel execution context to said thread's private memory location and modify said data to accommodate for the new location. Software threads may be executed by a processor and operate on behalf of other processing devices or remote computer systems. 1. In a computer system , a method of concurrent execution of instructions comprising:forming a parallel execution context;copying the parallel execution context;modifying the parallel execution context; andexecuting instructions referenced in said parallel execution context concurrently by multiple threads.2. The method of claim 1 , wherein parallel execution context comprises at least a reference to instructions to be executed concurrently claim 1 , a reference to data said instruction may depend on claim 1 , and a parallelism level indicator initialized to the number of times said instructions are to be executed.3. The method of claim 1 , wherein copying the parallel execution context comprises at least duplicating data referenced in said context to enable independent access to said data from multiple threads.4. The method of claim 1 , wherein modifying the parallel execution context comprises at least updating the parallelism level indicator in accordance with the actual number of ...

Подробнее

Номер записи: 44

31-10-2013 дата публикации

Method and device for determining parallelism of tasks of a program

Номер: US20130290975A1

Автор: Jeffrey V. Olivier, Paul M. Petersen, Zhiqiang Ma

Принадлежит: Intel Corp

A method and device for determining parallelism of tasks of a program comprises generating a task data structure to track the tasks and assigning a node of the task data structure to each executing task. Each node includes a task identification number and a wait number. The task identification number uniquely identifies the corresponding task from other currently executing tasks and the wait number corresponds to the task identification number of a node corresponding to the last descendant task of the corresponding task that was executed prior to a wait command. The parallelism of the tasks is determined by comparing the relationship between the tasks.

Подробнее

Номер записи: 45

07-11-2013 дата публикации

Mitigation of thread hogs on a threaded processor using a general load/store timeout counter

Номер: US20130297910A1

Автор: Jared C. Smolens, Mark A. LUTTRELL, Paul J. Jordan, Robert T. Golla

Принадлежит: Oracle International Corp

Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes entries which may be allocated for use by any thread. Control logic detects long latency instructions. Long latency instructions have a latency greater than a given threshold. One example is a load instruction that has a read-after-write (RAW) data dependency on a store instruction that misses a last-level data cache. The long latency instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the long latency instruction are held at a given pipeline stage until the long latency instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the long latency instruction is being serviced.

Подробнее

Номер записи: 46

07-11-2013 дата публикации

Semiconductor device

Номер: US20130297916A1

Автор: Hitoshi Suzuki, Koji Adachi

Принадлежит: Renesas Electronics Corp

A related art semiconductor device suffers from a problem that a processing capacity is decayed by switching an occupied state for each partition. A semiconductor device according to the present invention includes an execution unit that executes an arithmetic instruction, and a scheduler including multiple first setting registers each defining a correspondence relationship between hardware threads and partitions, and generates a thread select signal on the basis of a partition schedule and a thread schedule. The scheduler outputs a thread select signal designating a specific hardware thread without depending on the thread schedule as the partition indicated by a first occupation control signal according to a first occupation control signal output when the execution unit executes a first occupation start instruction.

Подробнее

Номер записи: 47

14-11-2013 дата публикации

MFENCE and LFENCE Micro-Architectural Implementation Method and System

Номер: US20130305018A1

Автор: Salvador Palanca, Shekoufeh Qawami, Stephen Fischer, SUBRAMANIAM MAIYURAN

Принадлежит: Individual

A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.

Подробнее

Номер записи: 48

14-11-2013 дата публикации

Speeding Up Younger Store Instruction Execution after a Sync Instruction

Номер: US20130305022A1

Автор: Benjamin W. Stolt, Bryan J. Lloyd, David S. Ray, Dung Q. Nguyen, Hung Q. Le, Shih-Hsiung S. Tung, Susan E. Eisen

Принадлежит: International Business Machines Corp

Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction.

Подробнее

Номер записи: 49

28-11-2013 дата публикации

Apparatus and method for accelerating operations in a processor which uses shared virtual memory

Номер: US20130318323A1

Автор: Boris Ginzburg, Eliezer Weissmann, Karthikeyan Karthik VAITHIANATHAN, Ronny Ronen, Yoav ZACH

Принадлежит: Intel Corp

An apparatus and method are described for coupling a front end core to an accelerator component (e.g., such as a graphics accelerator). For example, an apparatus is described comprising: an accelerator comprising one or more execution units (EUs) to execute a specified set of instructions; and a front end core comprising a translation lookaside buffer (TLB) communicatively coupled to the accelerator and providing memory access services to the accelerator, the memory access services including performing TLB lookup operations to map virtual to physical addresses on behalf of the accelerator and in response to the accelerator requiring access to a system memory.

Подробнее

Номер записи: 50

12-12-2013 дата публикации

Systems and methods for efficient scheduling of concurrent applications in multithreaded processors

Номер: US20130332711A1

Автор: Dean E. Walker, Joe Bolding, John D. Leidel, Kevin R. Wadleigh, Tony Brewer

Принадлежит: Convey Computer

Systems and methods which provide a modular processor framework and instruction set architecture designed to efficiently execute applications whose memory access patterns are irregular or non-unit stride as disclosed. A hybrid multithreading framework (HMTF) of embodiments provides a framework for constructing tightly coupled, chip-multithreading (CMT) processors that contain specific features well-suited to hiding latency to main memory and executing highly concurrent applications. The HMTF of embodiments includes an instruction set designed specifically to exploit the high degree of parallelism and concurrency control mechanisms present in the HMTF hardware modules. The instruction format implemented by a HMTF of embodiments is designed to give the architecture, the runtime libraries, and/or the application ultimate control over how and when concurrency between thread cache units is initiated. For example, one or more bit of the instruction payload may be designated as a context switch bit (CTX) for expressly controlling context switching.

Подробнее

Номер записи: 51

06-02-2014 дата публикации

Space efficient checkpoint facility and technique for processor with integrally indexed register mapping and free-list arrays

Номер: US20140040595A1

Автор: Thang M. Tran

Принадлежит: FREESCALE SEMICONDUCTOR INC

A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.

Подробнее

Номер записи: 52

27-02-2014 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20140059333A1

Автор: Alexandre J. Farcy, Ilhyun Kim, Konrad K. Lai, Martin G. Dixon, Matthew Merten, Prakash Math, Rajesh S. Parthasarathy, Ravi Rajwar, Robert S. Chappell, Vijaykumar Kadgi

Принадлежит: Intel Corp

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.

Подробнее

Номер записи: 53

27-03-2014 дата публикации

Optimizing System Throughput By Automatically Altering Thread Co-Execution Based On Operating System Directives

Номер: US20140089637A1

Автор: Amit Merchant, Dipankar Sarma, Vaidyanathan Srinivasan

Принадлежит: International Business Machines Corp

A technique for optimizing program instruction execution throughput in a central processing unit core (CPU). The CPU implements a simultaneous multithreading (SMT) operational mode wherein program instructions associated with at least two software threads are executed in parallel as hardware threads while sharing one or more hardware resources used by the CPU, such as cache memory, translation lookaside buffers, functional execution units, etc. As part of the SMT mode, the CPU implements an autothread (AT) operational mode. During the AT operational mode, a determination is made whether there is a resource conflict between the hardware threads that undermines instruction execution throughput. If a resource conflict is detected, the CPU adjusts the relative instruction execution rates of the hardware threads based on relative priorities of the software threads.

Подробнее

Номер записи: 54

03-04-2014 дата публикации

Memory sharing across distributed nodes

Номер: US20140095810A1

Автор: John G. Johnson, Kathirgamar Aingaran, Paul N. Loewenstein, Zoran Radovic

Принадлежит: Oracle International Corp

A method and apparatus are disclosed for enabling nodes in a distributed system to share one or more memory portions. A home node makes a portion of its main memory available for sharing, and one or more sharer nodes mirrors that shared portion of the home node's main memory in its own main memory. To maintain memory coherency, a memory coherence protocol is implemented. Under this protocol, load and store instructions that target the mirrored memory portion of a sharer node are trapped, and store instructions that target the shared memory portion of a home node are trapped. With this protocol, valid data is obtained from the home node and updates are propagated to the home node. Thus, no “dirty” data is transferred between sharer nodes. As a result, the failure of one node will not cause the failure of another node or the failure of the entire system.

Подробнее

Номер записи: 55

06-01-2022 дата публикации

COMMAND PRIORITIZATION IN A COMMAND QUEUE

Номер: US20220004337A1

Автор: Cariello Giuseppe, Liang Qing, Parry Jonathan Scott

Принадлежит:

Devices and techniques for command prioritization in a command queue of a memory device are described herein. A command can be received at the memory device. An expiration time for the command can be obtained and the command can be placed into the command queue. Entries in the command queue are ordered by expiration times of commands stored therein, such that earlier entries are closer to the head of the command queue. When the memory controller is able to perform a command, the memory controller selects the next command at the head of the command queue to perform. 1. A memory device , comprising:a memory array; anda memory controller including,a command queue buffer; and receive a command at the memory device;', 'obtain an expiration time for the command;', 'sort, based on the expiration time for the command, the command among other pending commands by use of a command queue solely ordered by expiration times of commands represented therein, wherein the expiration times of commands represented in the command queue are immutable; and', 'in response to a representation of the command being at the head of the command queue, initiate performance of the command against the memory array., 'processing circuitry including one or more hardware processors, the processing circuitry configured to perform operations to implement command prioritization in a command queue of the memory device, the operations comprising2. The memory device of claim 1 , wherein claim 1 , to obtain the expiration time for the command claim 1 , the processing circuitry is configured to read from a message used to deliver the command to the memory device.3. The memory device of claim 1 , wherein claim 1 , to obtain the expiration time for the command claim 1 , the processing circuitry is configured to create the expiration time by the memory device.4. The memory device of claim 3 , wherein claim 3 , to create the expiration time by the memory device claim 3 , the processing circuitry is configured to: ...

Подробнее

Номер записи: 56

05-01-2017 дата публикации

VARIABLE LATENCY PIPE FOR INTERLEAVING INSTRUCTION TAGS IN A MICROPROCESSOR

Номер: US20170003971A1

Автор: Ayub Salma, BOWMAN Josh, Chadha Sundeep, JEGANATHAN Dhivya, KUCHARSKI Cliff, Nguyen Dung Q.

Принадлежит:

Techniques disclosed herein describe a variable latency pipe for interleaving instruction tags in a processor. According to one embodiment presented herein, an instruction tag is associated with an instruction upon issue of the instruction from the issue queue. One of a plurality of positions in the latency pipe is determined. The pipe stores one or more instruction tags, each associated with a respective instruction. The pipe also stores the instruction tags in a respective position based on the latency of each respective instruction. The instruction tag is stored at the determined position in the pipe. 1. A method for issuing instructions in a processor , comprising:upon issue of an instruction from an issue queue, associating an instruction tag with the instruction;determining one of a plurality of positions in a pipe to store the instruction tag, the plurality of positions being ordered from a head position to a tail position, the pipe storing one or more instruction tags each associated with a respective instruction, the pipe storing the one or more instruction tags in a respective position based on the latency of each of the respective instructions, and the position of the instruction tag being determined based on a latency of the instruction relative to the latency of each of the respective instructions; andstoring the instruction tag at the determined position in the pipe.2. The method of claim 1 , further comprising:broadcasting an instruction tag stored at the tail position in the pipe; andremoving the instruction tag from the pipe.3. The method of claim 2 , wherein the broadcasted instruction tag wakes up an instruction in the issue queue that is dependent on an instruction associated with the broadcasted instruction tag.4. The method of claim 2 , wherein the broadcasted instruction tag indicates to a completion logic that an instruction associated with the broadcasted instruction tag has executed.5. The method of claim 1 , wherein each of the one or more ...

Подробнее

Номер записи: 57

05-01-2017 дата публикации

DATA PROCESSING SYSTEMS

Номер: US20170003972A1

Автор: Elliott Robert Martin, Prasad Vatsalya

Принадлежит: ARM LIMITED

A data processing system has an execution pipeline with programmable execution stages which execute instructions to perform data processing operations provided by a host processor and in which execution threads are grouped together into groups in which the threads are executed in lockstep. The system also includes a compiler that compiles programs to generate instructions for the execution stages. The compiler is configured to, for an operation that comprises a memory transaction: issue to the execution stage instructions for executing the operation for the thread group to: perform the operation for the thread group as a whole; and provide the result of the operation to all the active threads of the group. At least one execution stage is configured to, in response to the instructions: perform the operation for the thread group as a whole; and provide the result of the operation to all the active threads of the group. 1. A method of operating a data processing system comprising an execution pipeline that comprises one or more programmable execution stages which execute instructions to perform data processing operations , and in which execution threads are grouped together into thread groups in which the threads of the thread group are executed in lockstep , one instruction at a time , the method comprising: [ perform the operation for a thread group as a whole; and', 'to provide the result of the operation to all the active threads of the thread group; and, 'issuing to the execution stage an instruction or set of instructions to cause the execution stage to, performing the operation for a thread group as a whole; and', 'providing the result of the operation to all the active threads of the thread group., 'the execution stage of the execution pipeline in response to the instruction or set of instructions], 'for an operation to be executed for a thread group by an execution stage of the execution pipeline of the data processing system that comprises a memory ...

Подробнее

Номер записи: 58

07-01-2016 дата публикации

METHOD OF OPERATING A MULTI-THREAD CAPABLE PROCESSOR SYSTEM, AN AUTOMOTIVE SYSTEM COMPRISING SUCH MULTI-THREAD CAPABLE PROCESSOR SYSTEM, AND A COMPUTER PROGRAM PRODUCT

Номер: US20160004535A1

Автор: Robertson Alistair, Scott Jeffrey W.

Принадлежит:

A method of operating a multi-thread capable processor system comprising a plurality of processor pipelines is described. The method comprises fetching an instruction comprising an address and selecting an operation mode based on the address of the fetched instruction, the operation mode being selected from at least a lock-step mode and a multi-thread mode. If the operation mode is selected to be the lock-step mode, the method comprises letting at least two processor pipelines of the multi-thread capable processor system execute the instruction in lock-step mode to obtain respective lock-step results, comparing the respective lock-step results against a comparison criterion for determining whether the respective lock-step results match, and, if the respective lock-step results match, determine a matching result from the respective lock-step results, and writing back the matching results. 1. A method of operating a multi-thread capable processor system comprising a plurality of processor pipelines , the method comprising:fetching an instruction comprising an address;selecting an operation mode based on the address of the fetched instruction, wherein the operation mode is selected from at least a lock-step mode and a multi-thread mode; and letting at least two processor pipelines of the multi-thread capable processor system execute the instruction in lock-step mode to obtain respective lock-step results,', 'comparing the respective lock-step results against a comparison criterion for determining whether the respective lock-step results match, and,', 'determining a matching result from the respective lock-step results and writing back the matching results, if the respective lock step results match., 'if the operation mode is selected to be the lock-step mode2. A method according to claim 1 , further comprising:signalling an error if the respective lock step results do not match.3. A method according to claim 1 , further comprising: letting a first processor pipeline of ...

Подробнее

Номер записи: 59

07-01-2016 дата публикации

MULTIPLE ISSUE INSTRUCTION PROCESSING SYSTEM AND METHOD

Номер: US20160004538A1

Автор: Lin Kenneth Chenghao

Принадлежит:

A multiple issue instruction processing system is provided. The system includes a central processing unit (CPU), a memory system and an instruction control unit. The CPU is configured to execute one or more instructions of the executable instructions at the same time. The memory system is configured to store the instructions. The instruction control unit is configured to, based on location of a branch instruction stored in a track table, control the memory system to output the instructions likely to be executed to the CPU. 1. A multiple issue instruction processing system , comprising:a central processing unit (CPU) configured to execute one or more instructions of executable instructions at the same time;a memory system configured to store the instructions; andan instruction control unit configured to, based on location of a branch instruction stored in a track table, control the memory system to output the instructions likely to be executed to the CPU.2. The system according to claim 1 , wherein:the instruction control unit further includes a tracker, and the tracker is configured to:based on the location of the branch instruction stored in the track table, move in advance from a first branch instruction of an instruction being executed by the CPU and points to a branch instruction after a number of levels of branches;based on the branch instruction passed in the process of the tracker moving, select the instructions in the corresponding instruction segment; andcontrol the memory system to output the selected instructions to the CPU.3. The system according to claim 2 , wherein:the instruction control unit also includes a segment pruner configured to give different segments to a target instruction segment of every branch instruction and a fall-through instruction segment of every branch instruction, and to give different segment number to every segment; andthe instruction control unit is further configured to control the memory system to output an instruction ...

Подробнее

Номер записи: 60

02-01-2020 дата публикации

Deterministic Optimization via Performance Tracking in a Data Storage System

Номер: US20200004456A1

Автор: Benjamin J. Scott, David W. Claude, Kyumsung Lee, Stacey Secatch, Steven S. Williams

Принадлежит: SEAGATE TECHNOLOGY LLC

A semiconductor data storage memory can receive data access commands into a queue in a first time sequence that correspond with the transfer of data between a host and portions of the memory. The memory may be divided into separate portions that each have a different owner and the access commands may be issued to each of the respective separate portions. The access commands can subsequently be executed in a different, second time sequence responsive to estimated completion times for each of the access commands based on measured completion times for previously serviced, similar commands to maintain a nominally consistent quality of service level for each of the respective owners.

Подробнее

Номер записи: 61

04-01-2018 дата публикации

System and Method for Out-of-Order Clustered Decoding

Номер: US20180004512A1

Автор: Combs Jonathan D.

Принадлежит:

A processor includes a core to execute decoded instructions and a front end. The front end includes two decode clusters and circuitry to receive data elements representing undecoded instructions, in program order, and to direct different subsets of the data elements to the two decode clusters. A splitter begins directing data elements to the first decode cluster, detects a cluster switching trigger condition, and directs a second subset of the data elements that immediately follows the first subset of data elements in program order to the second decode cluster. The trigger condition may be a predicated taken branch. The front end also includes circuitry to merge the decoded instructions generated by the first decode cluster and the decoded instructions generated by the second decode cluster to generate a sequence of decoded instructions in program order, based on a toggle indicator, and to provide it to the core for execution. 1. A system , comprising:a core to execute decoded instructions; a first decode cluster;', 'a second decode cluster;', 'a first output queue;', 'a second output queue;', 'circuitry to receive a plurality of data elements, each to represent an undecoded instruction in an ordered sequence of undecoded instructions of a program in program order;', [ decode the first subset of data elements; and', 'store decode results as decoded instructions in the first output queue;, 'direct a first subset of the plurality of data elements to the first decode cluster, the data elements in the first subset of data elements to be in program order, and the first decode cluster including circuitry to, 'detect that a trigger condition for a cluster switch has been met;', decode the second subset of data elements; and', 'store decode results as decoded instructions in the second output queue;, 'direct, responsive to the detection, a second subset of the plurality of data elements that immediately follows the first subset of data elements in program order to the ...

Подробнее

Номер записи: 62

04-01-2018 дата публикации

Administering instruction tags in a computer processor

Номер: US20180004516A1

Автор: Albert J. Van Norstrand, Jr., David S. Levitan, Hung Q. Le, Kurt A. Feiste

Принадлежит: International Business Machines Corp

Administering ITAGs in a computer processor, includes, for each instruction in a single-thread mode: incrementing a value of a wrap around counter; setting a wrap bit to a predefined value if incrementing the value causes the counter to wrap around; generating, in dependence upon the counter value and the wrap bit, an ITAG for the instruction, the ITAG comprising a bit string having a wrap bit and an index comprising the counter value; and, for each instruction in a multi-thread mode: incrementing the value of the wrap around counter; setting a wrap bit to a predefined value if incrementing the value causes the counter to wrap around; and generating, in dependence upon the counter value and the wrap bit, an ITAG for the instruction, the ITAG comprising a bit string having the wrap bit, a thread identifier, and an index comprising the counter value.

Подробнее

Номер записи: 63

04-01-2018 дата публикации

Advanced processor architecture

Номер: US20180004530A1

Автор: Martin Vorbach

Принадлежит: HYPERION CORE Inc

The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).

Подробнее

Номер записи: 64

07-01-2021 дата публикации

Energy Efficient Processor Core Architecture for Image Processor

Номер: US20210004232A1

Автор: Finchelstein Daniel Frederic, Meixner Albert, Redgrave Jason Rupert, Shacham Ofer, Zhu Qiuling

Принадлежит:

An apparatus that includes a program controller to fetch and issue instructions is described. The apparatus includes an execution lane having at least one execution unit to execute the instructions. The execution lane is part of an execution lane array that is coupled to a two dimensional shift register array structure, wherein, execution lane s of the execution lane array are located at respective array locations and are coupled to dedicated registers at same respective array locations in the two-dimensional shift register array. 1. (canceled)2. A system comprising:one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: a plurality of random access memories, and', 'a plurality of execution lanes, wherein different groups of the execution lanes are assigned to use a different respective random access memory of the plurality of random access memories;, 'receiving an input program to be executed on a device comprisingdetermining that the input program specifies two or more execution lanes in a same group of the plurality of execution lanes to compete for different memory locations in a same random access memory of the plurality of random access memories; andin response, modifying the input program to generate multiple instructions that cause execution lanes within each group to access a respective random access memory sequentially.3. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of rows of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different rows of the execution lanes.4. The system of claim 2 , wherein the plurality of execution lanes includes a plurality of columns of execution lanes claim 2 , wherein the different groups of the execution lanes comprise different columns of the execution lanes.5. The system of claim 2 , wherein the ...

Подробнее

Номер записи: 65

02-01-2020 дата публикации

SYSTEMS AND METHODS TO PREDICT LOAD DATA VALUES

Номер: US20200004536A1

Автор: DECHENE Mark J., KRYUKOV PAVEL I., Shevgoor Manjunath, Shwartsman Stanislav

Принадлежит:

Disclosed embodiments relate to predicting load data. In one example, a processor a pipeline having stages ordered as fetch, decode, allocate, write back, and commit, a training table to store an address, predicted data, a state, and a count of instances of unchanged return data, and tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage to set the state of the first load instruction in the training table to the first state when the count reaches a first threshold. 1. A processor comprising:fetch and decode circuitry to fetch and decode load instructions;a pipeline having stages ordered as fetch, decode, allocate, write back, and commit;a training table to store, for each of a plurality of load instructions, an address, predicted data, a state, and a count of instances of unchanged return data; and when no match exists, adding a new entry reflecting the first load instruction; when a match exists, but has different predicted data than the data returned for the first load instruction, reset the count and set the state to a second state; and', 'when a matching entry with matching predicted data exists, increment the count and, when the incremented count reaches a first threshold, set the state to the first state., 'tracking circuitry to determine, during one or more of the allocate and decode stages, whether a training table entry has a first state and matches a fetched first load instruction, and, if so, using the data predicted by the entry during the execute stage, the tracking circuitry further to update the training table during or after the write back stage by2. The processor of claim 1 , wherein claim 1 , when the predicted data is used to optimize execution during ...

Подробнее

Номер записи: 66

02-01-2020 дата публикации

SHARED COMPARE LANES FOR DEPENDENCY WAKE UP IN A PAIR-BASED ISSUE QUEUE

Номер: US20200004546A1

Автор: GENDEN MICHAEL J., Le Hung Q., Nguyen Dung Q., THOMTO BRIAN W.

Принадлежит:

An apparatus for shared compare lanes for dependency wakeup in a double issue queue includes a source dependency module that determines a number of source dependencies for two instructions to be paired in a row of a double issue queue of a processor. A source dependency includes an unavailable status of a dependent source for data required by the two instructions where the data is produced by another instruction. The apparatus includes a pairing determination module that writes each of the two instructions into a separate row of the double issue queue in response to the source dependency module determining that the number of source dependencies is greater than a source dependency maximum and pairs the two instructions in one row of the double issue queue in response to the source dependency module determining that the number of source dependencies is less than or equal to the source dependency maximum. 1. An apparatus comprising:a source dependency module that determines a number of source dependencies for two instructions intended to be paired in a row of a double issue queue of a processor, a source dependency comprising an unavailable status of a dependent source for data required by the two instructions where the data is produced by another instruction; anda pairing determination module that writes each of the two instructions into a separate row of the double issue queue in response to the source dependency module determining that the number of source dependencies is greater than a source dependency maximum and that pairs the two instructions in one row of the double issue queue in response to the source dependency module determining that the number of source dependencies is less than or equal to the source dependency maximum.2. The apparatus of claim 1 , wherein the source dependency maximum is equal to a number of dependency trackers available to the double issue queue claim 1 , the dependency trackers each tracking a source dependency of paired instructions ...

Подробнее

Номер записи: 67

02-01-2020 дата публикации

Shared local memory tiling mechanism

Номер: US20200004548A1

Автор: Abhishek R. Appu, Altug Koker, Joydeep Ray, Prasoonkumar Surti, Subramaniam M. Maiyuran

Принадлежит: Intel Corp

An apparatus to facilitate memory tiling is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads via access to the memory and tiling logic to apply a tiling pattern to memory addresses for data stored in the memory.

Подробнее

Номер записи: 68

02-01-2020 дата публикации

MULTI-THREADED INSTRUCTION BUFFER DESIGN

Номер: US20200004549A1

Автор: Barreh Jama I., Golla Robert T., Shah Manish K.

Принадлежит:

An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read. 120-. (canceled)21. An apparatus , comprising:a plurality of banks, wherein each bank of the plurality of banks is configured to store a respective one of a plurality of instructions; and receive a read pointer, wherein the read pointer includes a value indicative of a given bank of the plurality of banks;', 'select a subset of the plurality of banks using the read pointer and one or more decode bits associated with an instruction stored at a location specified by the read pointer;', 'activate the subset of the plurality of banks; and', 'read a respective instruction from each bank of the subset of the plurality of banks to generate a dispatch group., 'circuitry configured to22. The apparatus of claim 21 , wherein to select the subset of the plurality of banks claim 21 , the circuitry is configured to read the one or more decode bits from a memory.23. The apparatus of claim 21 , wherein the circuitry is further configured to increment the read pointer in response to a determination that reading the respective instruction from each bank of the subset of the plurality of banks ...

Подробнее

Номер записи: 69

02-01-2020 дата публикации

Combining load or store instructions

Номер: US20200004550A1

Автор: Brian Stempel, Harsh THAKKER, James Norris Dieffenderfer, Kevin JAGET, Manish Garg, Michael Morrow, Pritha GHOSHAL, Rodney Wayne Smith, Sang Hoon Lee, Thomas Philip Speier, Yusuf Cagatay Tekmen

Принадлежит: Qualcomm Inc

Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.

Подробнее

Номер записи: 70

02-01-2020 дата публикации

APPRATUS AND METHOD FOR USING PREDICTED RESULT VALUES

Номер: US20200004551A1

Автор: BULL David Michael, FEDOROV Alexei, VASEKIN Vladimir

Принадлежит:

An apparatus and method are provided for using predicted result values. The apparatus has processing circuitry for executing a sequence of instructions, and value prediction storage that comprises a plurality of entries, where each entry is used to identify a predicted result value for an instruction allocated to that entry. Dispatch circuitry maintains a record of pending instructions awaiting execution by the processing circuitry, and selects pending instructions from the record for dispatch to the processing circuitry for execution. The dispatch circuitry is arranged to enable at least one pending instruction to be speculatively executed by the processing circuitry using as a source operand a predicted result value provided by the value prediction storage. Allocation circuitry is arranged to apply a default allocation policy to identify a first instruction to be allocated an entry in the value prediction storage. However, the allocation circuitry is further responsive to a trigger condition to identify a dependent instruction whose result value will be dependent on the result value produced by executing the first instruction, and to then allocate an entry in the value prediction storage to store a predicted result value for the identified dependent instruction. Such an approach can enable performance improvements to be achieved through the use of predicted result values even in situations where the prediction accuracy of the predicted result value for the first instruction proves not to be that high, by instead enabling a predicted result value for the dependent instruction to be used to allow speculative execution of further dependent instructions. 1. An apparatus comprising:processing circuitry to execute a sequence of instructions;value prediction storage comprising a plurality of entries, each entry being used to identify a predicted result value for an instruction allocated to that entry;dispatch circuitry to maintain a record of pending instructions ...

Подробнее

Номер записи: 71

02-01-2020 дата публикации

MULTITHREADED PROCESSOR CORE WITH HARDWARE-ASSISTED TASK SCHEDULING

Номер: US20200004587A1

Автор: Abbott Michael, Cavé Vincent, Cline Scott, Fryman Joshua, Ganev Ivan, Griffin Paul, Howard Jason, Jain Samkit, More Ankit, Park Sang Phill, Pawlowski Robert, Petrini Fabrizio

Принадлежит:

Embodiments of apparatuses, methods, and systems for a multithreaded processor core with hardware-assisted task scheduling are described. In an embodiment, a processor includes a first hardware thread, a second hardware thread, and a task manager. The task manager is to issue a task to the first hardware thread. The task manager includes a hardware task queue in which to store a plurality of task descriptors. Each of the task descriptors is to represent one of a single task, a collection of iterative tasks, and a linked list of tasks. 1. A processor comprising:a first hardware thread;a second hardware thread; anda task manager to issue a task to the first hardware thread, the task manager including a hardware task queue in which to store a plurality of task descriptors, each of the task descriptors to represent one of a single task, a collection of iterative tasks, and a linked list of tasks.2. The processor of claim 1 , wherein the first hardware thread includes a load/store queue to request the task from the task manager by issuing a read request to the task manager.3. The processor of claim 1 , wherein a task descriptor representing a collection of iterative tasks is to include a count value to specify a number of iterations.4. The processor of claim 1 , wherein a task descriptor of a linked list of tasks is to include a pointer to a head of the linked list of tasks.5. The processor of claim 1 , wherein an instruction set architecture of the first hardware thread and an instruction set architecture of the second hardware thread are compatible.6. The processor of claim 5 , further comprising a thread engine to migrate a software thread from the first hardware thread to the second hardware thread.7. The processor of claim 6 , wherein the thread engine is to migrate the software thread to improve performance of a graph application.8. The processor of claim 6 , wherein the first hardware thread is in a first single-threaded pipeline and the second hardware thread is ...

Подробнее

Номер записи: 72

07-01-2021 дата публикации

BATCHING WAVEFORM DATA

Номер: US20210004365A1

Автор: Kelly Julian Shaw, Neeley Matthew

Принадлежит:

Methods and apparatus for generating waveforms for application in a quantum computing device. In one aspect, a system comprises a batch generator that receives experiment data sets defining respective experiments, each experiment data set comprising a set of waveforms defined by respective waveform data; determines unique waveforms; generates a corresponding set of respective waveform data that includes the respective waveform data for each unique waveform; generates, for each of the experiments, a waveform list that references the respective waveform data in the set of respective waveform data that corresponds to the waveforms in the set; and batch instructions that are executable by waveform generator hardware and that cause the waveform generator hardware to process each waveform list by selecting each referenced waveform data in the waveform list; and generate, in response to the selected waveform data, a waveform that is suitable for application in a quantum computing device. 1. A system implemented by one or more computers , comprising: receives as input a plurality of experiment data sets that each defines a respective experiment, each experiment data set comprising a set of waveforms, wherein each waveform in the set of waveforms is defined by respective waveform data;', 'determines, from the respective waveform data for each waveform in each experiment data set, unique waveforms;', 'generates, from the identified unique waveforms, a corresponding set of respective waveform data that includes the respective waveform data for each unique waveform;', 'generates, for each set of waveforms of the plurality of experiments, a waveform list that references the respective waveform data in the set of respective waveform data that corresponds to the waveforms in the set; and, 'a batch generator thatbatch instructions that are executable by waveform generator hardware and that cause the waveform generator hardware, upon execution, to 'generate, in response to the ...

Подробнее

Номер записи: 73

13-01-2022 дата публикации

Handling Injected Instructions in a Processor

Номер: US20220012061A1

Автор: HANLON Jamie, PALLISTER James

Принадлежит:

Aspects of the present disclosure provide a processor having: an execution unit configured to execute machine code instructions, at least one of the machine code instructions requiring multiple cycles for its execution; instruction memory holding instructions for execution, wherein the execution unit is configured to access the memory to fetch instructions for execution; an instruction injection mechanism configured to inject an instruction into the execution pipeline during execution of the at least one machine code instruction fetched from the memory; the execution unit configured to pause execution of the at least one machine code instruction, to execute the injected instruction to termination, to detect termination of the injected instruction and to automatically recommence execution of the at least one machine code instruction on detection of termination of the injected instruction. 1. A processor comprising:an execution unit configured to execute machine code instructions; a first machine code instruction requiring multiple cycles for execution;instruction memory holding the machine code instructions for execution, wherein the execution unit is configured to access the instruction memory to fetch the first machine code instruction for execution; andan instruction injection mechanism configured to inject a second machine code instruction into an execution pipeline during execution of the first machine code instruction;the execution unit configured to pause execution of the first machine code instruction, to execute the second instruction to termination, to detect termination of the second instruction and to automatically recommence execution of the first machine code instruction on detection of termination of the second instruction.2. The processor of wherein the machine code instructions comprise a plurality of worker threads for execution in a respective one of a plurality of time slots and a supervisor thread claim 1 , and wherein the supervisor thread ...

Подробнее

Номер записи: 74

03-01-2019 дата публикации

EXPOSING VALID BYTE LANES AS VECTOR PREDICATES TO CPU

Номер: US20190004797A1

Автор: Tran Son H., Zbiciak Joseph

Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream. Once fetched data elements in the data stream are disposed in lanes in a stream head register in the fixed order. Some lanes may be invalid, for example when the number of remaining data elements are less than the number of lanes in the stream head register. The streaming engine automatically produces a valid data word stored in a stream valid register indicating lanes holding valid data. The data in the stream valid register may be automatically stored in a predicate register or otherwise made available. This data can be used to control vector SIMD operations or may be combined with other predicate register data. 1. A digital data processor comprising:a data register file including a plurality of data registers storing data accessed by register number;an instruction memory storing instructions each specifying a data processing operation and at least one data operand;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one functional unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results in an instruction specified data register; an address generator for generating stream memory addresses corresponding to said stream of said instruction specified sequence of a plurality of data elements,', storing a next data element of said sequence of said plurality of data elements in a next sequential lane,', 'upon filling all lanes of said stream head register, clearing said stream head register and storing a next data element of said sequence of said plurality of data elements in a first lane, and', "if there are fewer remaining data ...

Подробнее

Номер записи: 75

02-01-2020 дата публикации

Monitoring Real-Time Processor Instruction Stream Execution

Номер: US20200004954A1

Автор: Mittal Alok, Parla Vincent E., Zawadowskiy Andrew

Принадлежит:

In one example embodiment, a computing device has a processor that executes a processor instruction stream that causes the processor to perform one or more operations for the computing device. The computing device generates one or more trace data packets including a first instruction pointer of the processor instruction stream, a second instruction pointer of the processor instruction stream subsequent to the first instruction pointer, and a string of characters derived from instructions associated with a control flow transfer between the first instruction pointer of the processor instruction stream and the second instruction pointer of the processor instruction stream. The computing device determines whether the one or more trace data packets are consistent with a secure processor instruction stream known or determined to be secure from malicious processor instructions and, if not, generates an indication that the processor instruction stream is not secure. 1. A method comprising: generating one or more trace data packets that include a first instruction pointer of the processor instruction stream, a second instruction pointer of the processor instruction stream subsequent to the first instruction pointer, and a string of characters derived from instructions associated with a control flow transfer between the first instruction pointer of the processor instruction stream and the second instruction pointer of the processor instruction stream;', 'determining whether the one or more trace data packets are consistent with a secure processor instruction stream known or determined to be secure from malicious processor instructions; and', 'if it is determined that the one or more trace data packets are not consistent with the secure processor instruction stream, generating an indication that the processor instruction stream is not secure., 'at a computing device having a processor that executes a processor instruction stream that causes the processor to perform one or more ...

Подробнее

Номер записи: 76

03-01-2019 дата публикации

METHODS AND APPARATUS FOR HANDLING RUNTIME MEMORY DEPENDENCIES

Номер: US20190004804A1

Автор: Garvey Joseph, Hagiescu Miriste Andrei Mihai, Sinclair Byron

Принадлежит: Intel Corporation

An integrated circuit may include elastic datapaths or pipelines, through which software threads or iterations of loops, may be executed. Throttling circuitry may be coupled along an elastic pipeline in the integrated circuit. The throttling circuitry may include dependency detection circuitry that dynamically detect memory dependency issues that may arise during runtime. To mitigate these dependency issues, the throttling circuitry may assert stall signals to upstream stages in the pipeline. 1. An integrated circuit , comprising:a memory circuit; and memory access circuitry that reads from the memory circuit using a load address and that writes into the memory circuit using a store address; and', 'throttling circuitry coupled to the memory access circuitry, the throttling circuitry is configured to compare the load address with the store address and to selectively stall a stage in the pipelined datapath based on the comparison., 'a pipelined datapath coupled to the memory circuit, the pipelined datapath comprises2. The integrated circuit defined in claim 1 , wherein the throttling circuitry comprises an address table configured to store a plurality of store addresses claim 1 , and wherein the plurality of store addresses include the store address.3. The integrated circuit defined in claim 1 , wherein the throttling circuitry comprises an address table configured to store a plurality of load addresses claim 1 , and wherein the plurality of load addresses include the load address.4. The integrated circuit defined in claim 1 , wherein the memory access circuitry comprises:a memory loading circuit that reads from the memory circuit using the load address;a memory storing circuit that writes into the memory circuit using the store address, wherein at least a portion of the throttling circuitry is interposed between the memory loading circuit and the stalled stage.5. The integrated circuit defined in claim 4 , wherein the pipelined datapath further comprises compute ...

Подробнее

Номер записи: 77

03-01-2019 дата публикации

Stream processor with overlapping execution

Номер: US20190004807A1

Автор: Bin He, Brian D. Emberling, Jian Yang, Jiasheng Chen, Michael J. Mantor, Qingcheng WANG, YunXiao Zou

Принадлежит: Advanced Micro Devices Inc

Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

Подробнее

Номер записи: 78

03-01-2019 дата публикации

CENTRALIZED MEMORY MANAGEMENT FOR MULTIPLE DEVICE STREAMS

Номер: US20190004808A1

Автор: LARSON Christian P., SHEN Fabin, WILSON Mei L.

Принадлежит:

Described are examples for allocating buffers for multiple components. A stream server can provide an interface to a centralized memory allocator for allocating at least one buffer in a memory to each of the multiple components. The stream server can initialize an instance of the centralized memory allocator based at least in part on a request received from a component of the multiple components via the interface. The stream server can allocate, via the instance of the centralized memory allocator, the at least one buffer for the component in the memory. The stream server can receive, via the instance of the centralized memory allocator, data for storing in the at least one buffer. The stream server can modify the data to generate modified data stored in the at least one buffer. 1. A method for allocating buffers for multiple components , comprising:providing, by a stream server, an interface to a centralized memory allocator for allocating at least one buffer in a memory to each of the multiple components;initializing, by the stream server, an instance of the centralized memory allocator based at least in part on a request received from a component of the multiple components via the interface;allocating, by the stream server via the instance of the centralized memory allocator, the at least one buffer for the component in the memory;receiving, by the stream server via the instance of the centralized memory allocator, data for storing in the at least one buffer;modifying, by the stream server, the data to generate modified data stored in the at least one buffer;receiving, by the stream server via the instance of the centralized memory allocator, a request from the component to deallocate the at least one buffer; andallocating, based on receiving the request from the component to deallocate the at least one buffer, at least a portion of memory previously allocated to the at least one buffer to another buffer for another component.2. The method of claim 1 , wherein ...

Подробнее

Номер записи: 79

03-01-2019 дата публикации

PROCESSOR SYNTHESIS DEVICE, PROCESSOR SYNTHESIS METHOD, AND COMPUTER READABLE MEDIUM

Номер: US20190004809A1

Автор: HOSHI Takumi, TAKEDA Seidai, Yamamoto Hiroyuki, YANO Tetsuo

Принадлежит: Mitsubishi Electric Corporation

A processor synthesis device inserts a stop circuit into a circuit configuration, which is defined by processor model information and includes a plurality of operators, based on instruction set information that defines an instruction set including a plurality of instructions, the stop circuit stopping an operator not used in an instruction to be executed among the plurality of operators when each of the plurality of instructions is executed. The processor synthesis device generates processor synthesis information which is an RTL description defining a circuit configuration into which the stop circuit is inserted. 1. A processor synthesis device comprising:processing circuitryto acquire instruction set information that defines an instruction set including a plurality of instructions and processor model information that defines a circuit configuration including a plurality of operators; andto insert a stop circuit into the circuit configuration of the processor model information based on the instruction set information, and generate processor synthesis information that defines a circuit configuration into which the stop circuit is inserted, the stop circuit stopping an operator not used in an instruction to be executed among the plurality of operators when each of the plurality of instructions is executed.2. The processor synthesis device according to claim 1 , wherein the instruction set information includes information that indicates an operator used in each of the plurality of instructions claim 1 , andthe processing circuitry identifies, from the instruction set information, an operator not used in at least one of the plurality of instructions among the plurality of operators, and connects the stop circuit to the operator being identified in the circuit configuration of the processor model information.3. The processor synthesis device according to claim 1 , wherein the processing circuitry further inserts a detection circuit and a selection circuit into the ...

Подробнее

Номер записи: 80

03-01-2019 дата публикации

INSTRUCTIONS FOR REMOTE ATOMIC OPERATIONS

Номер: US20190004810A1

Автор: Hughes Christopher J., JAYASIMHA DODDABALLAPUR N., Park Jong Soo, Sury Samantika S., Svennebring Jonas, Xiang Lingxiang

Принадлежит:

Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location. 1. A processor to execute an instruction atomically and with weak order , the processor comprising:fetch circuitry to fetch the instruction from a code storage, the instruction comprising an opcode, a source identifier, and a destination identifier;decode circuitry to decode the fetched instruction; anda scheduling circuit to select an execution circuit among multiple circuits in the system to execute the instruction, the scheduling circuit further to schedule execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance;wherein the execution circuit is to execute the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, wherein the executing comprises atomically reading a datum from a location identified by the destination identifier, performing an ...

Подробнее

Номер записи: 81

03-01-2019 дата публикации

STREAM PROCESSOR WITH DECOUPLED CROSSBAR FOR CROSS LANE OPERATIONS

Номер: US20190004814A1

Автор: Carson Derek, Chen Jiasheng, Hakami Mohammad Reza, He Bin, Lottes Timothy, Mantor Michael J., Smith Justin David

Принадлежит:

Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands. 1. A system comprising:a multi-lane execution pipeline;a vector register file; anda crossbar; retrieve a plurality of data operands from the vector register file;', 'convey the plurality of data operands to the multi-lane execution pipeline via the crossbar responsive to determining a permutation is required; and', 'convey the plurality of data operands to the multi-lane execution pipeline by bypassing the crossbar responsive to determining a permutation is not required., 'wherein the system is configured to2. The system as recited in claim 1 , wherein the crossbar comprises multiple layers claim 1 , and wherein the system is further configured to:perform a first permutation of data operands across lanes of the multi-lane execution pipeline with a first layer of N×N crossbars, wherein N is a positive integer; andperform a second permutation of data operands across lanes of the multi-lane execution pipeline with a second layer of N×N crossbars.3. The system as recited in claim 1 , wherein the crossbar comprises a first N/2-by-N/2 ...

Подробнее

Номер записи: 82

03-01-2019 дата публикации

REGISTER PARTITION AND PROTECTION FOR VIRTUALIZED PROCESSING DEVICE

Номер: US20190004840A1

Автор: ASARO Anthony, Jiang Yinan, Zytaruk Kelly Donald Clark

Принадлежит: ATI TECHNOLOGIES ULC

A register protection mechanism for a virtualized accelerated processing device (“APD”) is disclosed. The mechanism protects registers of the accelerated processing device designated as physical-function-or-virtual-function registers (“PF-or-VF* registers”), which are single architectural instance registers that are shared among different functions that share the APD in a virtualization scheme whereby each function can maintain a different value in these registers. The protection mechanism for these registers comprises comparing the function associated with the memory address specified by a particular register access request to the “currently active” function for the APD and disallowing the register access request if a match does not occur. 1. A method for protecting a register for a virtualization-enabled processing device , the method comprising:receiving a request to access a register associated with the processing device, the register being time-shared among functions in a virtualization scheme such that a different function owns the register during a different time-slice of the virtualization scheme;analyzing an address specified by the request to obtain a requester function identifier and an offset;identifying a hardware unit associated with the register based on the offset;forwarding the requester function identifier and the offset to a hardware unit associated with the register; andcomparing the requester function identifier to an active function identifier that indicates which function is currently active on the processing device.2. The method of claim 1 , wherein comparing the requester function identifier to the active function identifier comprises:determining that the requester function identifier and the active function identifier indicate the same function; andin response, allowing the access to the register to occur.3. The method of claim 1 , wherein comparing the requester function identifier to the active function identifier comprises:determining ...

Подробнее

Номер записи: 83

02-01-2020 дата публикации

Control Transfer Termination Instructions Of An Instruction Set Architecture (ISA)

Номер: US20200004991A1

Автор: Brandt Jason W., Sahita Ravi L., Savagaonkar Uday, Shanbhogue Vedvyas

Принадлежит:

In an embodiment, the present invention includes a processor having an execution logic to execute instructions and a control transfer termination (CTT) logic coupled to the execution logic. This logic is to cause a CTT fault to be raised if a target instruction of a control transfer instruction is not a CTT instruction. Other embodiments are described and claimed. 1a fetch unit to fetch instructions;a decode unit to decode the instructions, the decode unit including a control transfer termination (CTT) logic, responsive to a control transfer instruction, to decode the control transfer instruction into a decoded control transfer instruction, associate second state information with the decoded control transfer instruction to indicate that the CTT logic is in a wait state and provide the decoded control transfer instruction and the second state information to an execution unit, the CTT logic to transition from an idle state to the wait state responsive to the control transfer instruction;the execution unit to execute decoded instructions; anda retirement unit to retire the decoded control transfer instruction, wherein the retirement unit is to raise a fault if a next instruction to be retired after the decoded control transfer instruction is not a CTT instruction.. A processor comprising: This application is a continuation of U.S. patent application Ser. No. 15/635,294, filed Jun. 28, 2017, which is a continuation of U.S. patent application Ser. No. 13/690,221, filed Nov. 30, 2012, now U.S. Pat. No. 9,703,567, issued Jul. 11, 2017, the content of which is hereby incorporated by reference.Return-oriented programming (ROP) is a computer security exploit technique in which an attacker uses software control of a stack to execute an attacker-chosen sequence of machine instructions. These clusters of instructions typically end with a programmer-intended or unintended return (RET) instruction within existing program code. The intended or unintended RET instruction transfers ...

Подробнее

Номер записи: 84

03-01-2019 дата публикации

High-Speed, Fixed-Function, Computer Accelerator

Номер: US20190004995A1

Автор: Anthony Nowatzki, Karthikeyan Sankaralingam, Vinay Gangadhar

Принадлежит: WISCONSIN ALUMNI RESEARCH FOUNDATION

A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.

Подробнее

Номер записи: 85

02-01-2020 дата публикации

Tile Assignment to Processing Cores Within a Graphics Processing Unit

Номер: US20200005423A1

Автор: Richard Broadhurst, Rudi Bonfiglioli

Принадлежит: Imagination Technologies Ltd

A graphics processing unit configured to process graphics data using a rendering space which is sub-divided into a plurality of tiles, the graphics processing unit comprising: a plurality of processing cores configured to render graphics data; cost indication logic configured to obtain a cost indication for each of a plurality of sets of one or more tiles of the rendering space, wherein the cost indication for a set of one or more tiles is suggestive of a cost of processing the set of one or more tiles; similarity indication logic configured to obtain similarity indications between sets of one or more tiles of the rendering space, wherein the similarity indication between two sets of one or more tiles is indicative of a level of similarity between the two sets of tiles according to at least one processing metric; and scheduling logic configured to assign the sets of one or more tiles to the processing cores for rendering in dependence on the cost indications and the similarity indications.

Подробнее

Номер записи: 86

20-01-2022 дата публикации

INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Номер: US20220019431A1

Автор: Anders Mark A., Appu Abhishek R., Barik Rajkishore, Chen Xiaoming, Galoppo von Borries Nicolas C., Jahagirdar Sanjeev, Kaul Himanshu, Koker Altug, Lin Tsung-Han, Mathew Sanu K., Nurvitadhi Eriko, Ranganathan Vasanth, Ray Joydeep, Shpeisman Tatiana, Sinha Kamal, Strickland Michael S., Tang Ping T., Vembu Balaji, Yao Anbang

Принадлежит: Intel Corporation

A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads. 1. A graphics processing unit (GPU) comprising:a plurality of memory controllers;cache memory coupled with the plurality of memory controllers; a register file; and', 'circuitry coupled with the register file, the circuitry including a first core to perform a mixed precision matrix operation and a second core to perform, in response to a single instruction, multiple compute operations, wherein the multiple compute operations include a first operation to perform a fused multiply-add and a second operation to apply a rectified linear unit function to a result of the first operation., 'a graphics multiprocessor coupled with the cache memory and the plurality of memory controllers, the graphics multiprocessor having a single instruction, multiple thread (SIMT) architecture, wherein the graphics multiprocessor includes2. The GPU as in claim 1 , wherein the first operation and the second operation are single instruction multiple data (SIMD) operations.3. The GPU as in claim 1 , wherein the multiple compute operations are performed on input in a 16-bit floating-point format having a 1-bit sign and an 8-bit exponent.4. The GPU as in claim 3 , wherein the second core includes a dynamic precision processing resource that is configurable to automatically convert input in a 32-bit floating point format to the 16-bit floating-point format in conjunction with execution of the single instruction.5. The GPU as in claim 4 , wherein the dynamic precision processing resource includes ...

Подробнее

Номер записи: 87

20-01-2022 дата публикации

INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING PROGRAM

Номер: US20220019466A1

Автор: ITAKURA Kota, Okabayashi Miwa

Принадлежит: FUJITSU LIMITED

An information processing method for determining a pattern that indicates an arrangement order of the plurality of tasks from upstream to downstream of a stream is performed by a computer. The method includes acquiring a plurality of patterns to be candidates of an arrangement order of the plurality of tasks from upstream to downstream of the stream in a case of executing the plurality of tasks using a stream processing format; specifying, for each pattern of the plurality of acquired patterns, an amount of data to be reintroduced from one task of the plurality of tasks to another task located upstream side of the stream with respect to the one task; and determining the pattern from among the plurality of patterns based on the specified amount of data to be reintroduced for the each pattern. 1. An information processing method to be performed by a computer , the method comprising:acquiring a plurality of patterns to be candidates of an arrangement order of a plurality of tasks from upstream to downstream of a stream in a case of executing the plurality of tasks using a stream processing format;specifying, for each pattern of the plurality of acquired patterns, an amount of data to be reintroduced from one task of the plurality of tasks to another task located upstream side of the stream with respect to the one task; anddetermining a pattern that indicates the arrangement order of the plurality of tasks from upstream to downstream of the stream from among the plurality of patterns based on the specified amount of data to be reintroduced for the each pattern.2. The information processing method according to claim 1 , the method further comprising:causing a stream processing platform to execute the plurality of tasks, using the stream processing format, in a state of arranging the plurality of tasks from upstream to downstream of the stream according to the determined pattern.3. The information processing method according to claim 1 , the method further comprising: ...

Подробнее

Номер записи: 88

27-01-2022 дата публикации

TASK SCHEDULING FOR AGENT PREDICTION

Номер: US20220027193A1

Автор: Aubin Olivier Gravel, Ding Kai, Guney Tacettin Dogacan

Принадлежит:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a task schedule for generating prediction data for different agents. In one aspect, a method comprises receiving data that characterizes an environment in a vicinity of a vehicle at a current time step, the environment comprising a plurality of agents; receiving data that identifies high-priority agents for which respective data characterizing the agents must be generated at the current time step; identifying available computing resources at the current time step; processing the data that characterizes the environment using a complexity scoring model to determine a respective complexity score for each of the high-priority agents; and determining a schedule for the current time step that allocates the generation of the data characterizing the high-priority agents across the available computing resources based on the complexity scores. 1. A method comprising , at each of a plurality of time steps:receiving data that characterizes an environment in a vicinity of a vehicle at a current time step, the environment comprising a plurality of agents;receiving data that identifies, as high-priority agents, a proper subset of the plurality of agents for which respective data characterizing the agents must be generated at the current time step;identifying computing resources that are available for generating the respective data characterizing the high-priority agents at the current time step;processing the data that characterizes the environment using a complexity scoring model to determine one or more respective complexity scores for each of the high-priority agents, each respective complexity score characterizing an estimated amount of computing resources that is required for generation of the data characterizing the high-priority agent using a prediction model; anddetermining a schedule for the current time step that allocates the generation of the data ...

Подробнее

Номер записи: 89

14-01-2016 дата публикации

Running a 32-bit operating system on a 64-bit processor

Номер: US20160011869A1

Автор: Hugh Jackson

Принадлежит: Imagination Technologies Ltd

A processor and method of efficiently running a 32-bit operating system on a 64-bit processor includes 64-bit hardware and when running a 64-bit operating system operates as a single-threaded processor. However, when running a 32-bit operating system (which may be a guest operating system running on a virtual machine), the processor operates as a two-threaded core. The register file is logically divided into two portions, one for each thread, and logic within an execution unit may be split between threads, shared between threads or duplicated to provide an instance of the logic for each thread. Configuration bits may be set to indicate whether the processor should operate as a single-threaded or multi-threaded device.

Подробнее

Номер записи: 90

14-01-2016 дата публикации

Silent memory instructions and miss-rate tracking to optimize switching policy on threads in a processing device

Номер: US20160011874A1

Автор: Chen Dan, Doron Orenstein, Enric G. Codina, Jacob Doweck, Josep M. Codina, Rekai Gonzalez-Alberquilla, Tanausu Ramirez, Tomer Stark

Принадлежит: Intel Corp

A processing device implementing silent memory instructions and miss-rate tracking to optimize switching policy on threads is disclosed. A processing device of the disclosure includes a branch prediction unit (BPU) to predict that an instruction of a first thread in a current execution context of the processing device is a delinquent instruction, indicate that the first thread including the delinquent instruction is in a silent execution mode, indicate that the delinquent instruction is to be executed as a silent instruction, switch an execution context of the processing device to a second thread, and when the execution context returns to the first thread, cause the delinquent instruction to be re-executed as a regular instruction.

Подробнее

Номер записи: 91

14-01-2016 дата публикации

MANAGING INSTRUCTION ORDER IN A PROCESSOR PIPELINE

Номер: US20160011876A1

Автор: Carlson David Albert, Kessler Richard Eugene, Mukherjee Shubhendu Sekhar

Принадлежит:

Executing instructions in a processor includes classifying, in at least one stage of a pipeline of the processor, operations to be performed by instructions. The classifying includes: classifying a first set of operations as operations for which out-of-order execution is allowed, and classifying a second set of operations as operations for which out-of-order execution with respect to one or more specified operations is not allowed, the second set of operations including at least store operations. Results of instructions executed out-of-order are selected to commit the selected results in-order. The selecting includes, for a first result of a first instruction and a second result of a second instruction executed before and out-of-order relative to the first instruction: determining which stage of the pipeline stores the second result, and committing the first result directly from the determined stage over a forwarding path, before committing the second result. 1. A method for executing instructions in a processor , the method comprising: classifying a first set of operations as operations for which out-of-order execution is allowed, and', 'classifying a second set of operations as operations for which out-of-order execution with respect to one or more specified operations is not allowed, the second set of operations including at least store operations; and, 'classifying, in at least one stage of a pipeline of the processor, operations to be performed by instructions, the classifying including determining which stage of the pipeline stores the second result, and', 'committing the first result directly from the determined stage over a forwarding path, before committing the second result., 'selecting results of instructions executed out-of-order to commit the selected results in-order, the selecting including, for a first result of a first instruction and a second result of a second instruction executed before and out-of-order relative to the first instruction2. The ...

Подробнее

Номер записи: 92

14-01-2016 дата публикации

MANAGING INSTRUCTION ORDER IN A PROCESSOR PIPELINE

Номер: US20160011877A1

Автор: Carlson David Albert, Kessler Richard Eugene, Mukherjee Shubhendu Sekhar

Принадлежит:

Executing instructions in a processor includes determining identifiers corresponding to instructions in at least one decode stage of a pipeline of the processor. A set of identifiers for at least one instruction include: at least one operation identifier identifying an operation to be performed by the instruction, at least one storage identifier identifying a storage location for storing an operand of the operation, and at least one storage identifier identifying a storage location for storing a result of the operation. A multi-dimensional identifier is assigned to at least one storage identifier. 1. A method for executing instructions in a processor , the method comprising: at least one operation identifier identifying an operation to be performed by the instruction,', 'at least one storage identifier identifying a storage location for storing an operand of the operation, and', 'at least one storage identifier identifying a storage location for storing a result of the operation; and, 'determining identifiers corresponding to instructions in at least one decode stage of a pipeline of the processor, with a set of identifiers for at least one instruction includingassigning a multi-dimensional identifier to at least one storage identifier.2. The method of claim 1 , wherein assigning a multi-dimensional identifier to a first storage identifier includes:assigning a first dimension of the multi-dimensional identifier to a value corresponding to the first storage identifier, andassigning a second dimension of the multi-dimensional identifier to a value indicating one of a plurality of sets of physical storage locations.3. The method of claim 1 , further comprising selecting a plurality of instructions to be issued to one or more stages of the pipeline in which multiple sequences of instructions are executed in parallel through separate paths through the pipeline claim 1 , based at least in part on a Boolean value provided by circuitry that applies logic to condition ...

Подробнее

Номер записи: 93

11-01-2018 дата публикации

Computing System and controller thereof

Номер: US20180011710A1

Автор: Kaiyuan Guo, Song Yao

Принадлежит: Beijing Deephi Technology Co Ltd, Deephi Technology Co Ltd

Computing system and controller thereof are disclosed for ensuring the correct logical relationship between multiple instructions during their parallel execution. The computing system comprises: a plurality of functional modules each performing a respective function in response to an instruction for the given functional module; and a controller for determining whether or not to send an instruction to a corresponding functional module according to dependency relationship between the plurality of instructions.

Подробнее

Номер записи: 94

11-01-2018 дата публикации

JOB SCHEDULER TEST PROGRAM, JOB SCHEDULER TEST METHOD, AND INFORMATION PROCESSING APPARATUS

Номер: US20180011734A1

Автор: IWATA Akitaka

Принадлежит: FUJITSU LIMITED

A non-transitory computer-readable storage medium storing therein a job scheduler test program that causes a computer to execute a process includes: determining whether or not a state of every thread of a test-target job scheduler is a standby state; and changing a time of a system referenced when the thread executes a process to a time that is put forward in a case where the state of every thread is the standby state. 1. A non-transitory computer-readable storage medium storing therein a job scheduler test program that causes a computer to execute a process comprising:determining whether or not a state of every thread of a test-target job scheduler is a standby state; andchanging a time of a system referenced when the thread executes a process to a time that is put forward in a case where the state of every thread is the standby state.2. The non-transitory computer-readable storage medium storing therein the job scheduler test program according to claim 1 , whereinthe thread stores information indicating that the state of the thread is the standby state in a storage in a case where the thread is brought into the standby state, and stores information indicating that the state of the thread is an execution state in the storage in a case where the thread is brought into the execution state, andthe determining includes determining whether or not the state of every thread is the standby state by referring to the storage.3. The non-transitory computer-readable storage medium storing therein the job scheduler test program according to claim 1 , whereinthe thread in the standby state stores a time at which the thread is brought into an execution state next in a storage, andthe changing includes changing the time of the system to an earliest time among the times at which the thread is brought into the execution state next by referring to the storage.4. The non-transitory computer-readable storage medium storing therein the job scheduler test program according to claim 1 , ...

Подробнее

Номер записи: 95

11-01-2018 дата публикации

METHOD FOR EXECUTING MULTITHREADED INSTRUCTIONS GROUPED INTO BLOCKS

Номер: US20180011738A1

Автор: Abdallah Mohammad

Принадлежит:

A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline. 1. A method of executing multithreaded instructions grouped into blocks , the method comprising:receiving an incoming instruction sequence at a front end of an execution pipeline;grouping instructions from the instruction sequence to form instruction blocks, where a first group of instruction blocks are a part of a first thread of execution and a second group of instruction blocks are a part of a second thread of execution;storing the first group of instruction blocks and the second group of instruction blocks in a scheduler array, where a first commit pointer points to a location in the scheduler array for a next block of the first group to be executed and a second commit pointer points to a next block of the second group to be execute;scheduling the instructions of the instruction blocks to execute in accordance with a position in the scheduler array; andtracking execution of the first thread and second thread to enforce a fairness policy using allocation counters to track a number of instruction blocks in the scheduler array for each thread.2. The method of claim 1 , wherein each allocation counter tracks a number of entries in a correlated thread pointer map.3. The method of claim 1 , wherein the fairness policy prevents any thread from exceeding an allocation threshold.4. The method of claim 1 , further comprising:tracking an array entry to place a next block in the scheduler array for each thread using a separate allocate pointer.5. The method of claim 1 , wherein ...

Подробнее

Номер записи: 96

14-01-2021 дата публикации

SHARED LOCAL MEMORY TILING MECHANISM

Номер: US20210011730A1

Автор: Appu Abhishek R., Koker Altug, Malyuran Subramaniam M., Ray Joydeep, Surti Prasoonkumar

Принадлежит: Intel Corporation

An apparatus to facilitate memory tiling is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads via access to the memory and tiling logic to apply a tiling pattern to memory addresses for data stored in the memory. 119-. (canceled)20. An apparatus comprising:a shared local memory; andone or more processors including a graphics processing unit (GPU) to process a machine learning algorithm, wherein processing of the machine learning algorithm includes processing of a plurality of thread groups, each thread group including a plurality of processing threads, wherein the one or more processors are to:detect an access pattern and a shared local memory banking architecture for each of the plurality of thread groups;generate shared local memory tiling for each thread group based at least in part on the detected access pattern and the shared local memory banking architecture for the thread group; andapply the generated local memory tiling for each thread group for data stored in the shared local memory.21. The apparatus of claim 20 , wherein the one or more processors are further to:generate memory tiling parameters for each thread group of the plurality of thread groups based at least in part on the detected access pattern and shared local memory banking architecture for the thread group, the shared local memory tiling being based at least in part on the generated memory tiling parameters.22. The apparatus of claim 21 , wherein the memory tiling parameters include one or more of a pitch or element width of shared local memory tiles.23. The apparatus of claim 20 , wherein the one or more processors are further to generate and store a per thread group shared local memory state for each of the plurality of thread groups claim 20 , the shared local memory state for each thread group of the plurality of thread groups being based at least in part on the generated shared local memory tiling.24. The apparatus of ...

Подробнее

Номер записи: 97

14-01-2021 дата публикации

PLC DEVICE

Номер: US20210011731A1

Автор: ONOSE Nao

Принадлежит:

To preferentially execute an instruction with higher priority in a case of the CNC being unable to respond due to being an unresponsive timing, load on the bus or the like. A PLC device includes: a special instruction control unit that sets a priority degree indicating a degree of priority for executing predetermined processing to a special instruction for performing the predetermined processing in a control device that controls an industrial machine, and transmits the special instruction in which the priority degree is set to the control device; an instruction storage determining unit that determines whether or not to queue the special instruction according to an operation state of the control device; and an instruction storage unit that sequentially stores the special instruction received, on the basis of a determination result of the instruction storage determining unit. 1. A PLC device comprising:a special instruction control unit that sets a priority degree indicating a degree of priority for executing predetermined processing to a special instruction for performing the predetermined processing in a control device that controls an industrial machine, and transmits the special instruction in which the priority degree is set to the control device;an instruction storage determining unit that determines whether or not to queue the special instruction according to an operation state of the control device; andan instruction storage unit that sequentially stores the special instruction received, on the basis of a determination result of the instruction storage determining unit.2. The PLC device according to claim 1 , whereincommunication with the control device is performed via an interface, andthe interface includes:the instruction storage determining unit;the instruction storage unit; andan alternative execution unit that, in a case in which the control device becomes able to respond, causes the control device to execute a special instruction with higher priority of ...

Подробнее

Номер записи: 98

10-01-2019 дата публикации

Speeding Up Younger Store Instruction Execution After a Sync Instruction

Номер: US20190012175A1

Автор: Eisen Susan E., Le Hung Q., Lloyd Bryan J., Nguyen Dung Q., Ray David S., Stolt Benjamin W., Tung Shih-Hsiung S.

Принадлежит:

Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction. 1. A method , in a processor , for executing instructions that are younger than a previously dispatched synchronization (sync) instruction , comprising:dispatching, by an instruction sequencer unit of the processor, a sync instruction;sending the sync instruction to a nest of one or more devices outside of the processor;dispatching, by the instruction sequencer unit, a subsequent instruction after dispatching the sync instruction, wherein the dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest; andperforming, by the instruction sequencer unit, a completion of execution of the subsequent instruction and update of an architected state by committing results of the execution of the subsequent instruction to memory, based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction, wherein the subsequent instruction is one of a store instruction or a load instruction, and wherein:in response to the subsequent instruction being a store ...

Подробнее

Номер записи: 99

14-01-2021 дата публикации

CONTROLLER AND OPERATION METHOD THEREOF

Номер: US20210011842A1

Автор: Lee Joo-Young

Принадлежит:

A controller configured to control memory chips in communication with the controller is provided. The controller comprises: a host interface configured to receive a request from a host; an address mapper configured to, upon receipt of both a turbo write request for writing data to one or more high-speed storage blocks at a high speed to and a normal write request for writing data to one or more storage blocks at a lower speed, allocate a first plane including a memory block configured to perform write operations in a single level cell mode at the high speed to a first plane group in order to respond to the turbo write request, and allocate a second plane to a second plane group at the slower speed in order to respond to the normal write request; and a memory interface configured to control the memory chips. 1. A controller configured to control memory chips in communication with the controller , each of the memory chips including at least one plane , each plane including a plurality of memory blocks , the controller comprising:a host interface configured to receive a request from a host;an address mapper configured to, upon receipt of both a turbo write request for writing data to one or more high-speed storage blocks at a high speed to and a normal write request for writing data to one or more storage blocks at a lower speed, allocate, among the at least one plane of each of the memory chips, a first plane including a memory block configured to perform write operations in a single level cell mode at the high speed to a first plane group in order to respond to the turbo write request, and allocate a second plane to a second plane group at the slower speed in order to respond to the normal write request; anda memory interface configured to control the memory chips such that the first plane group performs an operation corresponding to the turbo write request and the second plane group performs an operation corresponding to the normal write request.2. The controller of ...

Подробнее

Номер записи: 100