Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 3569. Отображено 200.
10-09-2005 дата публикации

СИНХРОНИЗАЦИЯ КОНВЕЙЕРОВ В УСТРОЙСТВЕ ОБРАБОТКИ ДАННЫХ

Номер: RU2005109409A
Принадлежит:

... 1. Устройство обработки данных, содержащее основной процессор, способный выполнять последовательность команд, причем основной процессор содержит первый конвейер, имеющий первое множество конвейерных ступеней; сопроцессор, способный выполнять сопроцессорные команды из упомянутой последовательности команд, причем сопроцессор содержит второй конвейер, имеющий второе множество конвейерных ступеней, и каждая сопроцессорная команда выполнена с возможностью направления как через упомянутый первый конвейер, так и через упомянутый второй конвейер; и по меньшей мере, одну синхронизирующую очередь, связывающую заранее определенную конвейерную ступень одного из упомянутых конвейеров с конвейерной ступенью-партнером другого из упомянутых конвейеров, причем упомянутая заранее определенная конвейерная ступень способна вызывать размещение маркера в упомянутой синхронизирующей очереди при обработке сопроцессорной команды, а упомянутая конвейерная ступень-партнер способна обрабатывать эту сопроцессорную ...

Подробнее
19-07-1973 дата публикации

RECHENMASCHINE, DIE IN ZWEI TEILE AUFGETEILT IST

Номер: DE0002300853A1
Принадлежит:

Подробнее
23-04-1981 дата публикации

Номер: DE0002922234C2

Подробнее
15-02-2007 дата публикации

Processing unit`s e.g. CPU, analog or digital signals switching and comparing method for computer system, involves switching between operating modes of units and comparing analog signals of units by changing one signal into digital value

Номер: DE102005037242A1
Принадлежит:

The method involves switching between two operating modes of processing units (B10, B11) using a switching unit (B01). The operating modes correspond to comparison and performance modes, respectively. Two analog signals of the processing units are compared by changing one analog signal into a digital value. The conversion of the analog signals is conducted in one of the processing units. An independent claim is also included for a device for switching and comparing signals in a computer system with processing units.

Подробнее
09-06-2011 дата публикации

MODULARE MONTGOMERY-MULTIPLIKATIONSEINRICHTUNG

Номер: DE602006021558D1
Принадлежит: THALES SA, THALES

Подробнее
23-01-2002 дата публикации

Semiconductor integrated circuit

Номер: GB0000128799D0
Автор:
Принадлежит:

Подробнее
27-12-2000 дата публикации

Processors and methods of operating processors

Номер: GB0000027294D0
Автор:
Принадлежит:

Подробнее
05-02-1975 дата публикации

DATA PROCESSING APPARATUS

Номер: GB0001382850A
Автор:
Принадлежит:

... 1382850 Digital computers: testing of ROM microprogram branching HONEYWELL INFORMATION SYSTEMS Inc 8 June 1972 [26 Aug 1971] 26878/72 Heading G4A A microprogram is stored in a read only memory (ROM) 100 and its branching capability is tested by a diagnostic program. An RIT flip-flop (Fig. 5, not shown) is set to modify the operation of the half (HLT) microinstruction. The HLT micro-instruction usually stops the computer clock but, when RIT is set, the clock is not stopped by HLT but is stopped by any other micro-instruction. RIT is set by three micro-instructions and, when any of these is executed to the predetermined location, they must branch to a location containing HLT. If there are no malfunctions in the micro-instructions the clock continues to run, otherwise the machine halts and an error is identified. Fig. 1 shows a programmable terminal but the program can be used in a full computer system. Parity checking of the micro-instructions in register 101 is mentioned.

Подробнее
26-03-2014 дата публикации

Thread issue control

Номер: GB0201402259D0
Автор:
Принадлежит:

Подробнее
28-08-2002 дата публикации

Clock technologies for reducing power consumption of integrated circuits

Номер: GB0002372601A
Принадлежит:

A semiconductor integrated circuit having one or more functional circuit blocks and executing a set of instructions is configured so as to change the operating frequency or halt operation of said one or more functional circuit blocks for each instruction or execution cycle. Another semiconductor integrated circuit, having a plurality of internal or external memory blocks or an internal or external single memory block that can be dealt with as a plurality of logical memory blocks and executing a set of instructions, is configured so as to change the operating frequency according to the performance of the memory block for each instruction or execution cycle so that the operating speed during data access time in execution cycle can be changed. This results in the semiconductor integrated circuit having reduced power consumption.

Подробнее
21-08-2002 дата публикации

Ranking order of threads in a multi-threaded processor

Номер: GB0002372350A
Принадлежит:

A ranking order RANK (N-1) is assigned to a plurality of instructions threads to be executed on a multi-threaded processor (2, fig 1). Each thread is assessed in terms of a number of ranking factors including the pre-assigned priority PRIORITY(N-1), a deadline count DEADLINE_COUNT (N-1) representing the maximum deadline for that threads execution and a delay count DELAY_COUNT (N-1) representing the rate at which instruction issues are required to determine a metric based on the relative importance of each factor. Each thread is only assessed if it is determined that the processors resources required for completion of the next instruction are available using a resource checker (20, figure 2) and block line BLOCKED n. The rank order comparator 36 then compares the N metrics to determine the order in which the next instructions in the threads are to be executed. The rank order is outputted from granting circuit 38 to the priority selection circuitry (26, figure 2) to grant access to the processor ...

Подробнее
10-03-2010 дата публикации

Scheduling data processing instructions to avoid short path errors

Номер: GB0002463278A
Принадлежит:

A processor, such as a superscalar processor, is responsive to a stream of program instructions to issue said program instructions under control of dynamic scheduling circuitry to respective execution units for execution. At least one of the execution units includes error detecting circuitry for detecting a change in an output signal which occurs after the output signal has latched and during an error detecting period following said latching. The scheduling circuitry is arranged to suppress issue of program instructions to an execution unit having such error detecting circuitry on consecutive processing cycles, so as to avoid short path errors resulting from false positive error detection arising due to a signal path through the execution unit that is too quick. The error detecting circuitry is also responsive to a change in a signal generated by an error-detecting execution unit during the error detecting period to trigger an error recovery response.

Подробнее
08-05-2019 дата публикации

Hardware unit for performing matrix multiplication with clock gating

Номер: GB0002568085A
Принадлежит:

Unit (100) to perform a matrix multiplication, comprising: a multiplier stage (102) comprising a plurality of multipliers (110), each configured to multiply first and second data elements (d1 & d2) to produce a multiplication data element; one or more adder stages (104, 106, 108) following the multiplier stage that form an adder tree to produce a sum of the multiplication data elements, each adder stage comprising one or more adders (112) configured to add at least two data elements output by a previous stage to produce an addition data element; wherein at least one multiplier and/or at least one adder is preceded by a storage element (202) corresponding to each bit of the data elements input to the at least one adder or the at least one multiplier; and control logic (302, 402, 602) configured to clock gate all or a portion of the storage elements corresponding to a data element in response to determining that all or a portion of that data element can be treated as having a zero value.

Подробнее
23-12-2015 дата публикации

Replicating logic blocks to enable increased throughput

Номер: GB0002527367A
Принадлежит:

A datapath pipeline 22, preferably within a functional unit of a processor, comprises one or more replicated blocks of logic 206, 208 that form multiple, parallel logic pathways 210, 212 within the pipeline. A replicated block of logic may comprise hardware logic which takes more than one clock cycle to complete. There is an input register block 214, 216 at the start of each logic path, one or more output register blocks 218, 220 to receive data from one or more of the logic paths and a multiplexer 222 to recombine the parallel logic paths into a single output. The pipeline is configured to enable the input and output register blocks in sequence on successive clock cycles (fig. 3). The pipeline may also comprise a register block connected to the output of the multiplexer and, perhaps, a block of logic between the multiplexer and the output register block. An intermediate multiplexer may be connected to a subset of the logic paths to combine them into a single output with, perhaps, a block ...

Подробнее
09-11-1994 дата публикации

Data processing circuits and interfaces

Номер: GB0009419246D0
Автор:
Принадлежит:

Подробнее
15-05-2010 дата публикации

PROCESSOR WITH SPEED DECREASE BY PROGRAMMED STOP CYCLE

Номер: AT0000467866T
Принадлежит:

Подробнее
15-11-2011 дата публикации

PROGRAMMABLE GENERAL-PURPOSE MEDIUM PROCESSOR

Номер: AT0000532130T
Принадлежит:

Подробнее
15-07-2013 дата публикации

SYNCHRONE SEQUENTIELLE LOGIKVORRICHTUNG MIT DOPPELT GETRIGGERTEN FLIPFLOPS SOWIE EINE METHODE ZUM GEZIELT ZEITVERSETZTEN TRIGGERN SOLCHER ZUSTANDSPEICHERNDEN REGISTER

Номер: AT0000512192B1
Автор:
Принадлежит:

Presented is an improved design or redesign concept for synchronous sequential logic devices using an alternative type of registers. Further a suitable clock-tree concept for such registers is proposed. 2.2. The special registers typically use a third additional latch to the traditional two master- and slave latches. This introduces latency between the overtaking and forwarding of information bits, between two edges of one or two clocks. These registers can be clocked with extreme clock skewing. 2.3. This new approach changes advantageously the time margins for setup and hold time between combinational connected registers. The extreme skewing reduces the peak currents. The concept allows also reducing the power dissipation. Existing netlist based designs may be quickly adapted to the new technology just by changing the involved register types and optimizing the clock tree.

Подробнее
15-02-2005 дата публикации

INTERFACE FROM SYNCHRONOUS TO ASYNCHRONOUS TO SYNCHRONOUS

Номер: AT0000288104T
Принадлежит:

Подробнее
03-12-2001 дата публикации

Asynchronous completion prediction

Номер: AU0007499701A
Принадлежит:

Подробнее
23-02-2004 дата публикации

SYSTEM OF FINITE STATE MACHINES

Номер: AU2003263987A1
Принадлежит:

Подробнее
23-01-2014 дата публикации

Processor instruction issue throttling

Номер: AU2012227209B2
Принадлежит:

A system and method for reducing power consumption through issue throttling of selected problematic instructions. A power throttle unit within a processor maintains instruction issue counts for associated instruction types. The instruction types may be a subset of supported instruction types executed by an execution core within the processor. The instruction types may be chosen based on high power consumption estimates for processing instructions of these types. The power throttle unit may determine a given instruction issue count exceeds a given threshold. In response, the power throttle unit may select given instruction types to limit a respective issue rate. The power throttle unit may choose an issue rate for each one of the selected given instruction types and limit an associated issue rate to a chosen issue rate. The selection of given instruction types and associated issue rate limits is programmable. Power Management 200 Power Manager 2 Power Processor Core Manager 100 State Machine ...

Подробнее
07-01-1998 дата публикации

System for controlling access to a register mapped to an i/o address space of a computer system

Номер: AU0003369497A
Принадлежит:

Подробнее
13-12-1973 дата публикации

BRANCH FACILITY DIAGNOSTICS

Номер: AU0004311472A
Принадлежит:

Подробнее
29-07-1986 дата публикации

TIMING CONTROL SYSTEM IN DATA PROCESSOR

Номер: CA0001208798A1
Принадлежит:

Подробнее
07-12-2010 дата публикации

ASYNCHRONOUS PIPELINE WITH LATCH CONTROLLERS

Номер: CA0002424572C

An asynchronous pipeline for high-speed applications uses simple transparent latches in its datapath and small latch controllers for each pipeline stage. The stages communicate with each other using request signals and acknowledgment signals. Each transition on the request signal indicates the arrival of a distinct new data item. Each stage comprises a data latch that is normally enabled to allow data to pass through, and a latch controller that enables and disables the data latch. The request signal and the data are inputs to the data latch. Once the stage has latched the data, a done signal is produced, which is sent to the latch controller, to the previous stage as an acknowledgment signal, and to the next stage as a request signal. The latch controller disables the latch upon receipt of the done signal, and re-enables the data latch upon receipt of the acknowledgment signal from the next stage. For correct operation, the request signal must arrive at the stage after the data inputs ...

Подробнее
17-03-1981 дата публикации

DATA PROCESSING SYSTEM AND INFORMATION SCANOUT

Номер: CA1097820A
Принадлежит: AMDAHL CORP, AMDAHL CORPORATION

Disclosed is a data processing system having a principal apparatus controlled by a principal program and including a main store, a storage unit, an instruction unit, an execution unit, a console unit and a channel unit. The console unit has a secondary apparatus controlled by a secondary program and including a digital computer for addressing and accessing information from locations throughout the principal apparatus of the data processing system independent of the principal program and independent of the information paths employed by the principal apparatus.

Подробнее
20-08-1996 дата публикации

COMBINED QUEUE FOR INVALIDATES AND RETURN DATA IN MULTIPROCESSOR SYSTEM

Номер: CA0002045756C

A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. Macroinstruction pipelining is employed (instead of microinstruction pipelining), with queueing between units of the CPU to allow flexibility in instruction execution times. A wide bandwidth is available for memory access; fetching 64-bit data blocks on each cycle. A hierarchical cache arrangement has an improved method of cache set selection, increasing the likelihood of a cache hit. A writeback cache is used (instead of writethrough) and writeback is allowed to proceed even though other accesses are suppressed due to queues being full. A branch prediction method employs a branch history table which records the taken vs. not-taken history of branch opcodes recently used, and uses an empirical algorithm to predict which way the next occurrence of this branch will go, based upon the history table. A floating point processor function is integrated on-chip, with enhanced speed due to ...

Подробнее
27-10-1991 дата публикации

HIGH SPEED PROCESSOR FOR DIGITAL SIGNAL PROCESSING

Номер: CA0002038822A1
Принадлежит:

HIGH SPEED PROCESSOR FOR DIGITAL SIGNAL PROCESSING A high speed processor (HSP) including a processor interface (106), an output pipeline (212) interface, an input pipeline (214) interface, and a multichip floating point processor (110). The floating point processor (110) further comprises a data register (200) file memory, a coefficient register (204) file memory, a floating point multiplier-accumulator (201), a microsequencer (206) and a control store (208) random access memory (RAM). The high speed processor provided by the present invention is capable of operating at a clock rate of about 200MHz with the floating point processor capable of 400 MFLOPS peak performance. In one embodiment of the invention, the HSP is entirely self contained in a TFML package with all high speed interfaces residing on the package substrate and only C-MOS speed interfaces residing off of the FPP package. The FPP comprises five GaAs integrated circuit consisting of a floating point multiplier-accumulator ...

Подробнее
15-07-1982 дата публикации

Data processing installation

Номер: CH0000631018A5
Принадлежит: AMDAHL CORP, AMDAHL CORP.

Подробнее
12-06-2013 дата публикации

Method, system and apparatus for multi-level processing

Номер: CN103154892A
Автор: Mekhiel Nagi
Принадлежит:

A Multi-Level Processor (200) for reducing the cost of synchronization overhead including an upper level processor (201) for taking control and issuing the right to use shared data and to enter critical sections directly to each of a plurality of lower level processors (202, 203...20n) at processor speed. In one embodiment the instruction registers of lower level parallel processors are mapped to the data memory of upper level processor (201). Another embodiment (1300) incorporates three levels of processors. The method includes mapping the instructions of lower level processors into the memory of an upper level processor and controlling the operation of lower level processors. A variant of the method and apparatus facilitates the execution of Single Instruction Multiple Data (SIMD) and single to multiple instruction and multiple data (SI>MIMD). The processor includes the ability to stretch the clock frequency to reduce power consumption.

Подробнее
17-01-2007 дата публикации

Multi-issue processor

Номер: CN0001295597C
Принадлежит:

Подробнее
07-11-2003 дата публикации

TOGETHER OF ELECTRONIC CIRCUITS COMPRISING OF THE MEANS OF DECONTAMINATIONDE LEFT CONTAMINEES BY ERRORS

Номер: FR0002828601B1
Автор: NICOLAIDIS MICHEL
Принадлежит:

Подробнее
25-02-2011 дата публикации

METHOD AND DEVICE FOR MODE SWITCHING AND SIGNAL COMPARISON IN A COMPUTER SYSTEM COMPRISING AT LEAST TWO PROCESSING UNITS

Номер: KR0101017444B1
Автор:
Принадлежит:

Подробнее
06-04-2001 дата публикации

HIGH SPEED PIPELINE DEVICE AND METHOD FOR GENERATING CONTROL SIGNAL OF THE DEVICE

Номер: KR20010027051A
Автор: NAM, GYEONG U
Принадлежит:

PURPOSE: A pipeline device and a method for generating a controlling signal of the device are provided to make high speed an output point of the first data by minimizing the margin with multi phase clock signals. CONSTITUTION: The pipeline device includes n data passing portions, n pipe registers and a control signal generating portion. The n data passing portions have at least one or more each other different transmission time(T1,T2,...Tn≤P : P is one period of a reference clock signal) and is subordinately connected between an input terminal and an output terminal. The n pipe registers are disposed to the input terminal of each of the data passing portions and is to latch the passed data from the fore-end. The control signal generating portion subordinately generates n-1 pipeline controlling signals with a method generating the nth pipeline controlling signal of the n pipeline registers in response to the reference clock signal and generating a pipeline controlling signal of the n-1th ...

Подробнее
17-05-2006 дата публикации

VECTOR PROCESSING DEVICE, AN INFORMATION PROCESSING DEVICE, AND A VECTOR PROCESSING METHOD FOR REDUCING NOISE CAUSED BY CONCURRENT OPERATIONS

Номер: KR1020060046730A
Автор: TODA HIDEMASA
Принадлежит:

PURPOSE: A vector processing device, an information processing device, and a vector processing method are provided to reduce noise caused by concurrent operations in diverse computations and to prevent a wrong operation by shifting the process start timing of vector pipeline operators regardless of an executed load and store command. CONSTITUTION: Multiple vector pipeline operators(160-167) are operated according to operation control information ordering start and execution of a process. A command controller(10) respectively generates and outputs the operation control information to the vector pipeline operators in the different timing. A clock generator(20) generates a clock. The command controller is equipped with an operation control information generator(12) respectively/sequentially outputting the operation control information to the vector pipeline operator based on clock input from the clock generator. The command controller includes a command execution controller(13) outputting ...

Подробнее
23-02-2023 дата публикации

프로세서 및 프로세서 작동 방법

Номер: KR102502526B1
Принадлежит: 바덴호르스트, 에밀

... 본 개시는 적어도 하나의 코어를 포함하는 프로세서를 제공한다. 코어는 적어도 하나의 입력 버퍼, 입력 및 출력을 갖는 로직 유닛을 포함하되, 여기서 입력은 입력 버퍼와 통신하고, 메모리 유닛은 로직 유닛의 출력과 통신한다. 프로세서는 또한 코어의 작동을 지시하도록 구성된 CU(Control Unit)와 코어와 CU를 상호 연결하도록 구성된 통신 버스를 포함한다. CU는 코어에 명령을 제공함으로써 코어의 작동을 지시하도록 구성되며, 여기서 명령은 로직 유닛에 로딩되고 코어들 중 하나의 코어의 메모리 유닛에 저장된 값을 입력 버퍼에 기록한다. CU는 또한 입력 버퍼의 값에 적어도 부분적으로 기초하여 명령의 출력을 로직 유닛을 통해 제공하고, 명령의 출력을 메모리 유닛에 기록함으로써 코어의 작동을 지시하도록 구성된다.

Подробнее
08-10-2013 дата публикации

MÉTODO E EQUIPAMENTO PARA EXECUTAR INSTRUÇÕES DE PROCESSADOR BASEADAS EM UM RETARDO DINAMICAMENTE ALTERÁVEL

Номер: BRPI0716620A2
Автор:
Принадлежит:

Подробнее
01-10-2008 дата публикации

Method and apparatus for power throttling a processor in an information handling system

Номер: TW0200839497A
Принадлежит:

A power system couples to a multi-core processor to provide power to the processor. The power system throttles at least one of the cores of the processor when the power that the processor consumes from the power system exceeds a predetermined threshold power. The power system may reduce the rate of instruction issue by a particular core or clock gate a particular core to provide power throttling. The power system dynamically responds to variance of the actual output voltage that processor circuitry receives from the power system in comparison to an expected output voltage over time and corrects for such variance.

Подробнее
09-02-2006 дата публикации

AN APPARATUS AND METHOD FOR HETEROGENOUS CHIP MULTIPROCESSORS VIA RESOURCE ALLOCATION AND RESTRICTION

Номер: WO2006014254A1
Принадлежит:

A method and apparatus for heterogeneous chip multiprocessors (CMP) via resource restriction. In one embodiment, the method includes the accessing of a resource utilization register to identify a resource utilization policy. Once accessed, a processor controller ensures that the processor core utilizes a shared resource in a manner specified by the resource utilization policy. In one embodiment, each processor core within a CMP includes an instruction issue throttle resource utilization register, an instruction fetch throttle resource utilization register and other like ways of restricting its utilization of shared resources within a minimum and maximum utilization level. In one embodiment, resource restriction provides a flexible manner for allocating current and power resources to processor cores of a CMP that can be controlled by hardware or software. Other embodiments are described and claimed.

Подробнее
30-09-2004 дата публикации

PIPELINED INSTRUCTION PROCESSOR WITH DATA BYPASSING

Номер: WO2004084065A3
Принадлежит:

An instruction processing device has a of pipe-line stage with a functional unit for executing a command from an instruction. A first register unit is coupled to the functional unit for storing a result of execution of the command when the command has reached a first one of the pipeline stages, and for supplying bypass operand data to the functional unit. A register file is coupled to the functional unit for storing the result when the command has reached a second one of the pipeline stages, downstream from the first one of the pipeline stages, and for supplying operand data to the functional unit. A disable circuit is coupled to selectively disable storing of the results in the register file under control of the instructions.

Подробнее
13-03-2003 дата публикации

DYNAMIC VOLTAGE CONTROL METHOD AND APPARATUS

Номер: WO2003021409A2
Принадлежит:

A dynamic power controller is provided that identifies a clock frequency requirement of a processor and determines a voltage requirement to support the clock frequency requirement. The dynamic power controller transitions the processor to a power state defined by the clock frequency requirement and the voltage requirement. In particular, a voltage level indicated by the voltage requirement is supplied to the processor and the frequency distribution indicated by the frequency requirement is provided to the clocks signals of the processor.

Подробнее
01-05-2003 дата публикации

Microprocessor power management control method

Номер: US2003084355A1
Автор:
Принадлежит:

A processing unit includes a plurality of subcircuits and circuitry for generating clock signals thereto. Detection circuitry detects the assertion of a control signal and disabling circuitry is operable to disable the clock signals to one or more of the subcircuits responsive to the control signal.

Подробнее
07-02-2008 дата публикации

Method and system for fast frequency switch for a power throttle in an integrated device

Номер: US20080034237A1
Принадлежит:

The ability to change from a first bus ratio to a second bus ratio without draining the transaction queues of a processor.

Подробнее
13-12-2007 дата публикации

Hybrid Branch Prediction Scheme

Номер: US20070288732A1
Автор: David A. Luick
Принадлежит:

A method and apparatus for executing a branch instruction is provided. In one embodiment, the method includes determining if a predictability value for the branch instruction is below a threshold value. Upon determining that the predictability value is above or equal to the threshold value, branch prediction information for the branch instruction is used to predict the outcome of the branch instruction. Upon determining that the predictability value for the branch instruction is below the threshold value for predictability, an alternate method of executing the branch instruction is selected. The alternate method comprises at least one of preresolving the branch instruction, simultaneously issuing first instructions from a first path of the branch instruction and second instructions from a second path of the branch instruction, and buffering the first instructions from the first path of the branch instruction and the second instructions from the second path of the branch instruction.

Подробнее
15-09-1998 дата публикации

General purpose, multiple precision parallel operation, programmable media processor

Номер: US0005809321A1
Принадлежит: MicroUnity Systems Engineering, Inc.

A general purpose, programmable media processor for processing and transmitting a media data stream of audio, video, radio, graphics, encryption, authentication, and networking information in real-time. The media processor incorporates an execution unit that maintains substantially peak data throughout of media data streams. The execution unit includes a dynamically partionable multi-precision arithmetic unit, programmable switch and programmable extended mathematical element. A high bandwidth external interface supplies media data streams at substantially peak rates to a general purpose register file and the multi-precision execution unit. A memory management unit, and instruction and data cache/buffers are also provided. High bandwidth memory controllers are linked in series to provide a memory channel to the general purpose, programmable media processor. The general purpose, programmable media processor is disposed in a network fabric consisting of fiber optic cable, coaxial cable and ...

Подробнее
19-09-2000 дата публикации

Pipelined data processing circuit

Номер: US0006122751A1
Принадлежит: U.S. PHILIPS CORPORATION

A pipelined circuit contains a cascade of stages, each with an intial register followed by a combinatorial logic circuit. The registers are clocked. At the beginning of each clock period, data in the initial register is updated. After that, during the clock period, data propagates from the initial register, along a path through the combinatorial logic circuits, to the initial register in the next stage where it is stored at the beginning of the next cycle. In the path there are several other registers, in which the data is stored at intermdiate phases of the clock cycle, while the data is kept in the initial register. Thus differences in propagation delay along different branches of the path are eliminated without increasing the number of clock cycles needed to pass data through the pipelined circuit. This reduces the number glitches which consume energy without affecting the function of the circuit.

Подробнее
29-09-1998 дата публикации

Method and apparatus for controlling power consumption in a microprocessor

Номер: US0005815724A1
Принадлежит: Intel Corporation

A system for controlling power consumption in a microprocessor. The microprocessor fetches an instruction from memory. The instruction is decoded, producing an operation flow of at least one operation. Then, power micro-operations are introduced into the operation flow. These power micro-operations provide power consumption control functions for those functional units which are required to execute the various operations which have been decoded from the fetched instruction. The operations and power micro-operations are then scheduled for dispatch to the appropriate execution units. The scheduling is based on the availability of the appropriate execution units and the validity of operation data. The operations and power micro-operations are dispatched to the appropriate execution units, where the operations and power micro-operations are executed. The execution results are subsequently committed to the processor state in the original program order.

Подробнее
31-10-1995 дата публикации

Emulation of slower speed processor

Номер: US0005463744A
Автор:
Принадлежит:

A pulse width modulation circuit in a computer system for emulating a processor operating at a slower instruction execution speed. The pulse width modulator a computer system clock, and a register containing a first value. The first value is user-definable by software and specifies a proportion of time that a processor should remain idle. The apparatus further comprises a counter coupled to the clock, the counter having a range between a second and third values which includes the first value. A comparator is coupled to the counter and the register, and the comparator causes a central processing unit to suspend instruction execution for a specified interval of time. The comparator causes the central processing unit to resume instruction execution for remainder of the counter's range. The processor is therefore kept idle for proportions of time depending on the values of the register and the counter to emulate a slower speed processor. For high performance processors which have an on processor ...

Подробнее
01-03-2007 дата публикации

Power reduction for processor front-end by caching decoded instructions

Номер: US2007050554A1
Принадлежит:

A power aware front-end unit for a processor may include a UOP cache that disables other circuitry within the front-end unit. In an embodiment, a front-end unit may disable instruction synchronization circuitry, instruction decode circuitry and, optionally, instruction fetch circuitry while instruction look-ups are underway in both a block cache and an instruction cache. If the instruction look-up indicates a miss, the disabled circuitry thereafter may be enabled.

Подробнее
19-05-2009 дата публикации

Self-timed processor

Номер: US0007536535B2

Systems and methods for executing program instructions in a data processor at a variable rate. In one embodiment, a processor is configured to examine received instructions, identify an execution time associated with each instruction, and generate clock pulses at necessary intervals to obtain the appropriate execution time for each instruction. Instructions may be associated with types or "bins" that are in turn associated with corresponding execution times. The clock pulses may be generated by routing successive pulses through circuits that delay the pulses by desired amounts of time. The processor may also be configured to identify instructions which are input/output (I/O) instructions and are initiated or terminated by completion of handshake procedures and therefore have execution times that vary from one instance to the next.

Подробнее
20-08-2009 дата публикации

System and Method for Prioritizing Branch Instructions

Номер: US2009210674A1
Принадлежит:

The present invention provides a system and method for prioritizing branch instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one branch instruction is in the issue group, if so scheduling the least one branch instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one branch instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.

Подробнее
21-11-2006 дата публикации

Controlling a clock frequency of a plurality of processors

Номер: US0007139921B2
Принадлежит: SHERBURNE JR ROBERT WARREN

A low power reconfigurable processor core includes one or more processing units, each unit having a clock input that controls the performance of the unit; and a controller having a plurality of clock outputs each coupled to a respective clock input of one of the processing units, the controller varying the clock frequency of each processing unit to optimize power consumption and processing power for a task.

Подробнее
05-12-2002 дата публикации

Method and device for modifying the memory contents of and reprogramming a memory

Номер: US2002184546A1
Автор:
Принадлежит:

A low power a reconfigurable processor core includes one or more processing units, each unit having a clock input that controls the performance of the unit; and a controller having a plurality of clock outputs each coupled to the clock inputs of the processing units, the controller varying the clock frequency of each processing unit to optimize power consumption and processing power for a task.

Подробнее
11-05-2021 дата публикации

Power-saving mechanism for memory sub-system in pipelined processor

Номер: US0011003457B2
Принадлежит: MEDIATEK INC., MEDIATEK INC

A pipelined processor for carrying out pipeline processing of instructions, which undergo a plurality of stages, is provided. The pipelined processor includes: a memory-activation indicator and a memory controller. The memory-activation indicator stores content information that indicates whether to activate a first volatile memory and/or a second volatile memory while performing a current instruction. The memory controller is arranged for controlling activation of the first volatile memory and/or the second volatile memory in a specific stage of the plurality of stages of the current instruction according to the content information stored in the memory-activation indicator.

Подробнее
19-07-2006 дата публикации

SYNCHRONISATION BETWEEN PIPELINES IN A DATA PROCESSING APPARATUS

Номер: EP0001535144B1
Принадлежит: ARM Limited

Подробнее
13-04-2005 дата публикации

Method and apparatus for pausing execution in a processor

Номер: EP0001522917A2
Принадлежит:

A method and apparatus for pausing execution of instructions from a thread is described. In one embodiment, a pause instruction is implemented as two instructions or microinstructions; a SET instruction and a READ instruction. When a SET flag is retrieved for a given thread, the SET instruction sets a Bit flag in memory indicating that execution for the thread has been paused. The SET instruction is placed in the pipeline for execution. The following READ instruction for that thread, however, is prevented from entering the pipeline until, the SET instruction is executed and retired (resulting in a clearing of the Bit flag). Once the Bit flag has been cleared, the READ instruction is placed in the pipeline for execution. During the time that processing of one thread is paused, the execution of other threads may continue.

Подробнее
18-07-2012 дата публикации

Microprocessor with automatic selection of SIMD parallelism

Номер: EP2290527B1
Автор: Dockser, Kenneth Alan
Принадлежит: Qualcomm Incorporated

Подробнее
12-10-1988 дата публикации

Clock skew compensation system for pipeline processors

Номер: EP0000285892A3
Принадлежит:

Подробнее
06-12-1989 дата публикации

Method and apparatus for controlling execution speed of computer processor

Номер: EP0000344951A2
Принадлежит:

An improved computer processor capable of executing software instructions at two rates has two sets of microinstructions in a microsequencer (16), one for executing software instructions at a first rate and a second for executing instructions at a second rate. The speed of execution is controlled by a speed bit stored in a latch (40) which, depending on the value of the speed bit, selects microinstructions from either the first or second set. The speed bit is set to a first state by a software instruction included in blocks of software instructions to be executed at a first rate. The bit is automatically set to a second state upon execution of blocks of software instructions which do not contain a software instruction to set the bit to the first state such that these blocks are automatically executed at the second rate. The speed bit of a block of instructions which is interrupted by interrupt logic (30) or by a subroutine is stored in a subroutine stack (34) together with the address of ...

Подробнее
02-09-1998 дата публикации

Data processing system having an instruction pipeline

Номер: EP0000862112A2
Принадлежит:

A data processing system having a pipeline is provided which includes a CPU designed to fetch a plurality of instructions simultaneously from a memory through a data bus in an instruction fetching stage. Each of the instructions is expressed in a string of bits having a bit length which is shorter than a bus width of the data bus. The instruction fetching stage is effected every instruction processing cycles of a number equal to the number of the instructions simultaneously fetched from the memory for a period of time during which instruction decoding stages of a number equal to the number of simultaneously fetched instructions are effected in the instruction processing cycles in succession. This enables high-speed pipelined operations.

Подробнее
10-03-1999 дата публикации

DATA PROCESSOR

Номер: EP0000901070A1
Принадлежит:

A data processor including: a CPU (1) for performing a wait operation upon input of a wait signal (10) to its wait terminal (9); a wait/wait cancel instruction setting register (11) to which the CPU (1) sets a wait instruction and a wait cancel instruction; and a wait controller (12) for outputting a wait signal to the wait terminal (9) of the CPU (1) in accordance with the setting of the register (11), wherein the inventive data processor allows a wait state to be set and canceled as programmed independently of address space constraints.

Подробнее
18-05-1990 дата публикации

Номер: JP0002022414B2
Автор: NISHIBE SHINJI
Принадлежит:

Подробнее
14-11-2012 дата публикации

Номер: JP0005073903B2
Автор:
Принадлежит:

Подробнее
04-11-1997 дата публикации

Номер: JP0009511117A
Автор:
Принадлежит:

Подробнее
24-05-2006 дата публикации

Номер: JP0003777160B2
Автор:
Принадлежит:

Подробнее
12-02-1993 дата публикации

SYNCHRONOUS METHOD FOR PIPELINE COMPUTER AND PIPELINE COMPUTER USING THE SAME

Номер: JP0005035474A
Автор: YAMASHITA HITOSHI
Принадлежит:

PURPOSE: To make the instruction processing efficient by enabling independent operation of each instruction pipeline of the computer and synchronizing each instruction pipelines without requiring waste time for synchronization. CONSTITUTION: An instruction 1 from an instruction supply part 20 is decoded. At the procedure of instruction pipelines 10 and 11 of the decoded instruction 2, the information from an initial information generation part 22 indicating whether or not the synchronization between the execution instruction pipeline and the other instruction pipeline is necessary and what is the number of supply instruction among the instructions processed at the pipeline and not reaching the stage synchronizing the instruction pipeline based on the state of information 3 of the decoded instruction and the instruction which is processed at the instruction pipelines 10 and 11 is applied to the entire instruction. According to the procedure in the instruction pipeline of the supply instruction ...

Подробнее
10-05-2002 дата публикации

АСИНХРОННОЕ УСТРОЙСТВО ОБРАБОТКИ ДАННЫХ

Номер: RU2182353C2
Принадлежит: АРМ ЛИМИТЕД (GB)

Изобретение касается обработки данных. Техническим результатом является расширение функциональных возможностей. Устройство содержит множество асинхронных схем управления, остановочную схему для блокирования сигнала управления в цикле управления первой асинхронной схемы управления. Остановочная схема предотвращает обмен сигналами данных первой из асинхронных схем управления с другими асинхронными схемами управления. Способ описывает работу указанного устройства. 3 с. и 12 з.п. ф-лы, 4 ил.

Подробнее
31-05-2007 дата публикации

Program sequence controlling method for use in processor, involves storing pre-determined time management of program and function sequence in program operational sequence in memory, where program execution is timely controlled by sequence

Номер: DE102006037907A1
Принадлежит:

The method involves controlling a processor by a program in such a manner that a program starting address is accessed in a program memory by a program control unit. The pre-determined time management of the program and/or functional access to the processor and the pre-determined time management of the sequence of the function of individual resources in the processor are stored in a program operational sequence in a program operational sequence memory, where program execution in the processor is timely controlled by the operational sequence. An independent claim is also included for an arrangement for controlling a program sequence in a processor.

Подробнее
05-08-2010 дата публикации

PROZESSOR MIT MEHRFACHBEFEHLSAUSGABE

Номер: DE0060333089D1
Принадлежит: NXP BV, NXP B.V.

Подробнее
01-01-2014 дата публикации

Method and system for pipelining out of order instructions by combining short latency instructions to match long latency instructions

Номер: GB0002503438A
Принадлежит:

An improved system (1) for pipelining out-of-order instructions combines short latency instructions into an instruction chain to match the latency of an instruction having a second latency type. The instructions care then issued together to the short instruction pipelines (30 & 40) whilst long instructions are issued to a long pipeline (50). Timing of the write back (WB) slot of the longer instruction issued to the long pipeline (50) is used to write a value to the register (10). Dependencies between short instructions are solved by forwarding data from the first instruction to the second instruction. Output from the second instruction can be written back to the register (10) or held in an auxiliary buffer before writing to the register when the register becomes available.

Подробнее
18-01-2017 дата публикации

Replicating logic blocks to enable increased throughput

Номер: GB0002527367B
Автор: HUGH JACKSON, Hugh Jackson

Подробнее
03-09-2014 дата публикации

Updating of shadow registers in N:1 clock domain

Номер: GB0201413052D0
Автор:
Принадлежит:

Подробнее
27-03-2013 дата публикации

Processor power management based on class and content instructions

Номер: GB0201302383D0
Автор:
Принадлежит:

Подробнее
10-08-2011 дата публикации

Error recovery following speculative execution with an instruction processing pipeline

Номер: GB0201111036D0
Автор:
Принадлежит:

Подробнее
31-07-2013 дата публикации

A data processing apparatus and method for handling retrieval of instructions from an instruction cache

Номер: GB0201310557D0
Автор:
Принадлежит:

Подробнее
29-06-1994 дата публикации

Data cache

Номер: GB0009409148D0
Автор:
Принадлежит:

Подробнее
15-03-2008 дата публикации

PROCEDURE AND VORICHTUNGEN FOR THE ACHIEVEMENT OF THE THERMAL ADMINISTRATION USING PROCESSING TASK ORGANIZATION

Номер: AT0000387663T
Принадлежит:

Подробнее
15-03-2012 дата публикации

WIRELESS DEVICE WITH A PROGRAMMABLE GENERAL-PURPOSE MEDIUM PROCESSOR

Номер: AT0000548691T
Принадлежит:

Подробнее
15-07-2010 дата публикации

PROCESSOR WITH MULTIPLE INSTRUCTION OUTPUT

Номер: AT0000472134T
Принадлежит:

Подробнее
08-03-2012 дата публикации

Memory Device Having Multiple Power Modes

Номер: US20120057424A1
Принадлежит: Individual

A memory device having a memory core is described. The memory device includes a clock receiver circuit, a first interface to receive a read command, a data interface, and a second interface to receive power mode information. The data interface is separate from the first interface. The second interface is separate from the first interface and the data interface. The memory device has a plurality of power modes, including a first mode in which the clock receiver circuit, first interface, and data interface are turned off; a second mode in which the clock receiver is turned on and the first interface and data interface are turned off; and a third mode in which the clock receiver and first interface are turned on. In the third mode, the data interface is turned on when the first interface receives the command, to output data in response to the command.

Подробнее
05-04-2012 дата публикации

Dynamically adjusting pipelined data paths for improved power management

Номер: US20120084540A1
Принадлежит: International Business Machines Corp

A design structure embodied in a machine readable, non-transitory storage medium used in a design process includes a system for dynamically varying the pipeline depth of a computing device. The system includes a state machine that determines an optimum length of a pipeline architecture based on a processing function to be performed. A pipeline sequence controller, responsive to the state machine, varies the depth of the pipeline based on the optimum length. A plurality of clock splitter elements, each associated with a corresponding plurality of latch stages in the pipeline architecture, are coupled to the pipeline sequence controller and adapted to operate in a functional mode, one or more clock gating modes, and a pass-through flush mode. For each of the clock splitter elements operating in the pass-through flush mode, data is passed through the associated latch stage without oscillation of clock signals associated therewith.

Подробнее
25-04-2013 дата публикации

Alignment of instructions and replies across multiple devices in a cascaded system, using buffers of programmable depths

Номер: US20130103863A1
Автор: Tom Teng
Принадлежит: Micron Technology Inc

Buffers of programmable depths are used in the instruction and reply paths of cascaded devices to account for possible differences in latencies between the devices. The buffers may be enabled or bypassed such that the alignment of instruction and result may be performed at the boundaries between separate groups of devices having different instruction latencies.

Подробнее
09-05-2013 дата публикации

Low overhead operation latency aware scheduler

Номер: US20130117543A1
Принадлежит: Advanced Micro Devices Inc

A method and apparatus for processing multi-cycle instructions include picking a multi-cycle instruction and directing the picked multi-cycle instruction to a pipeline. The pipeline includes a pipeline control configured to detect a latency and a repeat rate of the picked multi-cycle instruction and to count clock cycles based on the detected latency and the detected repeat rate. The method and apparatus further include detecting the repeat rate and the latency of the picked multi-cycle instruction, and counting clock cycles based on the detected repeat rate and the latency of the picked multi-cycle instruction.

Подробнее
16-05-2013 дата публикации

Processor with power control via instruction issuance

Номер: US20130124900A1
Принадлежит: Advanced Micro Devices Inc

Methods and apparatuses are provided for power control in a processor. The apparatus comprises a plurality of operational units arranged as a group of operational units. A power consumption monitor determines when cumulative power consumption of the group of operational units exceeds a threshold (e.g., either or both of the cumulative power threshold and the cumulative power rate threshold) during a time interval, after which a filter for issuing instructions to the group of operational units suspends instruction issuance to the group of operational units for the remainder of the time interval. The method comprises monitoring cumulative power consumption by a group of operational units within a processor over a time interval. If the cumulative power consumption of the group of operational units exceeds the threshold, instruction issuance to the group of operational units is suspended for the remainder of the time interval.

Подробнее
20-06-2013 дата публикации

Reducing issue-to-issue latency by reversing processing order in half-pumped simd execution units

Номер: US20130159666A1
Принадлежит: International Business Machines Corp

Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.

Подробнее
07-01-2021 дата публикации

PERFORMING RESYNCHRONIZATION JOBS IN A DISTRIBUTED STORAGE SYSTEM BASED ON A PARALLELISM POLICY

Номер: US20210004163A1
Принадлежит:

The disclosure herein describes performing resynchronization (“resync”) jobs in a distributed storage system based on a parallelism policy. A resync job is obtained from a queue and input/output (I/O) resources that will be used during execution of the resync job are identified. Available bandwidth slots of each I/O resource of the identified I/O resources are determined. The parallelism policy is applied to the identified I/O resources and the available bandwidth slots. Based on the application of the parallelism policy, a bottleneck resource of the I/O resources is determined and a parallel I/O value is calculated based on the available bandwidth slots of the bottleneck resource, wherein the parallel I/O value indicates a quantity of I/O tasks that can be performed in parallel. The resync job is executed using the I/O resources, the execution of the resync job including performance of I/O tasks in parallel based on the parallel I/O value. 1. A system for performing resync jobs in a distributed storage system based on a parallelism policy , the system comprising:at least one processor; andat least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the at least one processor to:obtain a resync job from a pending resync job queue;identify one or more input/output (I/O) resources of the distributed storage system that will be used during execution of the obtained resync job;determine a quantity of available bandwidth slots of each I/O resource of the identified one or more I/O resources;calculate a parallel I/O value of the resync job based on the parallelism policy and the determined quantity of available bandwidth slots of each I/O resource; andcause the resync job to be executed using the one or more I/O resources, the execution of the resync job including performance of a quantity of I/O tasks in parallel based on the calculated parallel I/O value.2. The system of ...

Подробнее
07-01-2021 дата публикации

Pipeline flattener with conditional triggers

Номер: US20210004236A1
Принадлежит: Texas Instruments Inc

A semiconductor device comprising a processor having a pipelined architecture and a pipeline flattener and a method for operating a pipeline flattener in a semiconductor device are provided. The processor comprises a pipeline having a plurality of pipeline stages and a plurality of pipeline registers that are coupled between the pipeline stages. The pipeline flattener comprises a plurality of trigger registers for storing a trigger, wherein the trigger registers are coupled between the pipeline stages.

Подробнее
07-01-2021 дата публикации

SCALABLE CENTRALIZED INTERNET-OF-THINGS MANAGER

Номер: US20210004270A1
Принадлежит: Nutanix, Inc.

A scalable Internet of Things (IoT) system may include multiple instances of an IoT manager, each instance respectively configured to connect to a respective edge system of multiple edge systems, The IoT system may further include a containerized system configured to allow any instance of the IoT manager to deploy data pipelines to any edge system of the multiple edge systems in delta communications. Any instance of the IoT manager may send a change message to any edge system via a publish/subscribe notification method. In some examples, a centralized IoT manager may form a secure communication with an edge system, synchronize an object model with an edge object model for the edge system, and maintain the edge system using delta change communications. The IoT system may facilitate any instance of the IoT manager to subscribe a communication channel with an associated edge system for receiving update notification. 1. At least one non-transitory computer-readable storage medium including instructions that , when executed , cause a system to:form secure two-way communication with an edge system of an Internet of Things (IoT) system;synchronize an object model with an edge object model for the edge system; andmaintain the edge system using delta change communications to update the edge object model in response to an update of the object model after synchronization.2. The at least one non-transitory computer-readable storage medium of claim 1 , the instructions further cause the system to:in response to a detection of a change of entity on the edge system based on an operation of the edge system, transmit a respective message to a second edge system of the IoT system that includes information about the change of entity of the edge system.3. The at least one non-transitory computer-readable storage medium of claim 2 , wherein the instructions further cause the system to:in response to a request for inventory from the edge system, provide the inventory of the IoT system to ...

Подробнее
02-01-2020 дата публикации

SYSTEM, APPARATUS AND METHOD FOR BARRIER SYNCHRONIZATION IN A MULTI-THREADED PROCESSOR

Номер: US20200004602A1
Принадлежит:

In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed. 1. A processor comprising: a plurality of execution pipelines each to execute instructions of one or more threads;', 'a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and', 'a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups., 'a first core comprising2. The processor of claim 1 , further comprising a local network to couple the plurality of pipeline barrier circuits to the core barrier circuit.3. The processor of claim 1 , wherein the core barrier circuit is to receive a configuration message to program the plurality of pipeline barrier circuits for the first barrier group claim 1 , the configuration message including a count of the at least two threads of the first barrier group.4. The processor of claim 3 , wherein a first thread ...

Подробнее
03-01-2019 дата публикации

Stream processor with overlapping execution

Номер: US20190004807A1
Принадлежит: Advanced Micro Devices Inc

Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

Подробнее
08-01-2015 дата публикации

DISTRIBUTION OF TASKS AMONG ASYMMETRIC PROCESSING ELEMENTS

Номер: US20150012731A1
Принадлежит:

Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system. 1. A multi-core processor comprising:a first processing core and a second processing core, each including one or more arithmetic logic units and an instruction decoder;wherein the first processing core is capable of operating at a higher processing throughput than that of the second processing core;wherein, in response to an occurrence of an event, a task processed on the first processing core is to be transferred from the first processing core to the second processing core after saving a core state of the first processing core and providing the core state to the second processing core; andwherein the first processing core is to be placed in a lower power state after the second processing core resumes processing the task.2. The multi-core processor of claim 1 , wherein the higher processing throughput is based on a first pipeline depth of the first processing core that is greater than a pipeline depth of the second processing core.3. The multi-core processor of claim 1 , wherein the higher processing throughput is based on a power consumption of the first processing core that is greater than a power consumption of the second processing core.4. A method comprising:transferring a task processed on a first processing core to a second processing core after saving a core state of the first processing core and providing the core state to the second processing core, in response to an occurrence of an event; andplacing the first processing core in a lower power state after the second processing core resumes processing the task,wherein the first processing core is capable of operating at a higher processing throughput than that of the second processing core.5. The method of claim 4 , wherein the higher processing ...

Подробнее
14-01-2016 дата публикации

POWER AND THROUGHPUT OPTIMIZATION OF AN UNBALANCED INSTRUCTION PIPELINE

Номер: US20160011642A1
Принадлежит:

A method includes determining a rate of resource occupancy of a constituent stage of an unbalanced instruction pipeline implemented in a processor through profiling an instruction code. The method also includes performing data processing at a maximum throughput at an optimum clock frequency based on the rate of resource occupancy. 1. A method comprising:determining a rate of resource occupancy of a constituent stage of an unbalanced instruction pipeline implemented in a processor through profiling an instruction code; andperforming data processing associated with the unbalanced instruction pipeline at a maximum throughput and at an optimum clock frequency based on the rate of resource occupancy.2. The method of claim 1 , wherein performing the data processing includes stalling processing associated with at least one of the constituent stage of the unbalanced instruction pipeline and a previous stage claim 1 , for at least a number of clock cycles corresponding to a delay time associated with the processing claim 1 , through the constituent stage by gating a clock input to the at least one of the constituent stage and the previous stage.3. The method of claim 1 , further comprising:determining a time interval within a processing time associated with the constituent stage of the unbalanced instruction pipeline based on a change in a processing scenario associated with processing;dynamically determining the rate of resource occupancy of the constituent stage periodically with a time period equal to the determined time interval; andobtaining, at every time interval, the clock frequency associated with the rate of resource occupancy of the constituent stage for performing the data processing associated with the unbalanced instruction pipeline.4. The method of claim 3 , wherein the clock frequency associated with the data processing is higher than a frequency corresponding to the higher delay time associated with the constituent stage.5. The method of claim 2 , further ...

Подробнее
08-01-2015 дата публикации

Distribution of tasks among asymmetric processing elements

Номер: US20150012766A1
Принадлежит: Individual

Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system.

Подробнее
14-01-2016 дата публикации

MANAGING INSTRUCTION ORDER IN A PROCESSOR PIPELINE

Номер: US20160011876A1
Принадлежит:

Executing instructions in a processor includes classifying, in at least one stage of a pipeline of the processor, operations to be performed by instructions. The classifying includes: classifying a first set of operations as operations for which out-of-order execution is allowed, and classifying a second set of operations as operations for which out-of-order execution with respect to one or more specified operations is not allowed, the second set of operations including at least store operations. Results of instructions executed out-of-order are selected to commit the selected results in-order. The selecting includes, for a first result of a first instruction and a second result of a second instruction executed before and out-of-order relative to the first instruction: determining which stage of the pipeline stores the second result, and committing the first result directly from the determined stage over a forwarding path, before committing the second result. 1. A method for executing instructions in a processor , the method comprising: classifying a first set of operations as operations for which out-of-order execution is allowed, and', 'classifying a second set of operations as operations for which out-of-order execution with respect to one or more specified operations is not allowed, the second set of operations including at least store operations; and, 'classifying, in at least one stage of a pipeline of the processor, operations to be performed by instructions, the classifying including determining which stage of the pipeline stores the second result, and', 'committing the first result directly from the determined stage over a forwarding path, before committing the second result., 'selecting results of instructions executed out-of-order to commit the selected results in-order, the selecting including, for a first result of a first instruction and a second result of a second instruction executed before and out-of-order relative to the first instruction2. The ...

Подробнее
14-01-2016 дата публикации

MANAGING INSTRUCTION ORDER IN A PROCESSOR PIPELINE

Номер: US20160011877A1
Принадлежит:

Executing instructions in a processor includes determining identifiers corresponding to instructions in at least one decode stage of a pipeline of the processor. A set of identifiers for at least one instruction include: at least one operation identifier identifying an operation to be performed by the instruction, at least one storage identifier identifying a storage location for storing an operand of the operation, and at least one storage identifier identifying a storage location for storing a result of the operation. A multi-dimensional identifier is assigned to at least one storage identifier. 1. A method for executing instructions in a processor , the method comprising: at least one operation identifier identifying an operation to be performed by the instruction,', 'at least one storage identifier identifying a storage location for storing an operand of the operation, and', 'at least one storage identifier identifying a storage location for storing a result of the operation; and, 'determining identifiers corresponding to instructions in at least one decode stage of a pipeline of the processor, with a set of identifiers for at least one instruction includingassigning a multi-dimensional identifier to at least one storage identifier.2. The method of claim 1 , wherein assigning a multi-dimensional identifier to a first storage identifier includes:assigning a first dimension of the multi-dimensional identifier to a value corresponding to the first storage identifier, andassigning a second dimension of the multi-dimensional identifier to a value indicating one of a plurality of sets of physical storage locations.3. The method of claim 1 , further comprising selecting a plurality of instructions to be issued to one or more stages of the pipeline in which multiple sequences of instructions are executed in parallel through separate paths through the pipeline claim 1 , based at least in part on a Boolean value provided by circuitry that applies logic to condition ...

Подробнее
16-01-2020 дата публикации

System, Apparatus And Method For Providing A Local Clock Signal For A Memory Array

Номер: US20200019207A1
Принадлежит: Intel Corp

In an embodiment, a processor includes at least one processor core and at least one graphics processor. The at least one graphics processor may include a register file having a plurality of entries, where at least a portion of the at least one graphics processor is to operate at a first operating frequency and the register file is to operate at a second operating frequency greater than the first operating frequency, to enable the at least one graphics processor to issue a plurality of write requests to the register file in a single clock cycle at the first operating frequency and receive a plurality of data elements of a plurality of read requests from the register file in the single clock cycle at the first operating frequency. Other embodiments are described and claimed.

Подробнее
25-01-2018 дата публикации

CONTROLLING THE OPERATING SPEED OF STAGES OF AN ASYNCHRONOUS PIPELINE

Номер: US20180024837A1
Принадлежит:

An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times. 1. An apparatus comprising:an asynchronous pipeline comprising a first stage and at least one second stage; anda controller to provide control signals to the first stage to indicate a modification to an operating speed of the first stage, wherein the modification is determined based on a comparison of a completion status of the first stage to at least one completion status of the at least one second stage.2. The apparatus of claim 1 , wherein the at least one second stage comprises at least one of a left-hand stage that generates input data for the first stage and a right-hand stage that receives output data generated by the first stage.3. The apparatus of claim 1 , wherein the controller is to provide control signals to the first stage to indicate a modification of an operating voltage applied to the first stage claim 1 , and wherein the modification of the operating voltage is determined based on the comparison of the completion statuses of the first stage and the at least one second stage.4. The apparatus of claim 1 , further comprising:at least one buffer to drive signals between portions of the first stage, and wherein ...

Подробнее
24-01-2019 дата публикации

CONTROL SYSTEM AND METHOD OF TO PERFORM AN OPERATION

Номер: US20190026213A1
Автор: Mayer Albrecht
Принадлежит:

A method includes invoking a first instruction that, when executed by a first processor, causes the first processor to perform a first operation, and that, when executed by the first processor, causes a second processor to perform a second operation. The method further includes a second instruction that, when executed by the first processor, causes the first processor to perform the first operation while causing the second processor to perform a third operation or while leaving the second processor unaffected. A control system includes a first processor and a second processor, wherein the first processor is configured to execute a first instruction to perform a first operation, wherein the second processor is configured to perform a second operation when the first processor executes the first instruction. 1. A method , comprising:providing a first instruction that, when executed by a first processor, causes the first processor to perform a first operation, and that, when executed by the first processor, causes a second processor to perform a second operation; andproviding a second instruction that, when executed by the first processor, causes the first processor to perform the first operation while causing the second processor to perform a third operation or while leaving the second processor unaffected.2. The method of claim 1 ,wherein the third operation is a no-operation.3. The method of claim 1 ,wherein a first binary code representing the first instruction and a second binary code representing the second instruction have an equal length.4. The method of claim 1 ,wherein a first number of clock cycles that are required to execute the first instruction and a second number of clock cycles required to execute the second instruction are equal.5. The method of claim 1 , comprising:associating a processor address register of the first processor with the second operation;wherein the first operation involves an access to an associated processor address register of the ...

Подробнее
23-01-2020 дата публикации

Pipelined configurable processor

Номер: US20200026685A1
Автор: Paul Metzgen
Принадлежит: SILICON TAILOR Ltd

A configurable processing circuit capable of handling multiple threads simultaneously, the circuit comprising a thread data store, a plurality of configurable execution units, a configurable routing network for connecting locations in the thread data store to the execution units, a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units, and a pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle, the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections for each clock cycle to be in accordance with the configuration instance associated with the respective thread that will propagate through that pipeline section during the clock cycle.

Подробнее
28-01-2021 дата публикации

LATENCY-BASED INSTRUCTION RESERVATION CLUSTERING IN A SCHEDULER CIRCUIT IN A PROCESSOR

Номер: US20210026639A1
Принадлежит:

Latency-based instruction reservation clustering in a scheduler circuit in a processor is disclosed. The scheduler circuit includes a plurality of latency-based reservation circuits each having an assigned producer instruction cycle latency. Producer instructions with the same cycle latency can be clustered in the same latency-based reservation circuit. Thus, the number of reservation entries is distributed among the plurality of latency-based reservation circuits to avoid or reduce an increase in the number of scheduling path connections and complexity in each reservation circuit to avoid or reduce an increase in scheduling latency. The scheduling path connections are reduced for a given number of reservation entries over a non-clustered pick circuit, because signals (e.g., wake-up signals, pick-up signals) used for scheduling instructions in each latency-based reservation circuit do not have to have the same clock cycle latency so as to not impact performance. 1. A clustered scheduler circuit in a processor configured to receive a plurality of instructions comprising producer instructions and consumer instructions to be scheduled for execution , the clustered scheduler circuit comprising: receive first consumer instructions among the plurality of instructions dependent on the producer instructions having a single clock cycle latency;', 'store the first consumer instructions in first reservation entries among a plurality of first reservation entries; and', 'select a plurality of first consumer instructions stored among the plurality of first reservation entries identified as having an issue state of issue ready;, 'a first latency-based reservation circuit configured to receive a plurality of single clock cycle latency wake-up signals on the single clock cycle latency wake-up signal port each associated with an issue lane among a plurality of issue lanes, the plurality of single clock cycle latency wake-up signals each indicating an issue state of a single clock ...

Подробнее
02-02-2017 дата публикации

System and Method for Variable Lane Architecture

Номер: US20170031689A1
Принадлежит:

A system and method for variable lane architecture includes memory blocks located in a memory bank, one or more computing nodes forming a vector instruction pipeline for executing a task, each of the computing nodes located in the memory bank, each of the computing nodes executing a portion of the task independently of other ones of the computing nodes, and a global program controller unit (GPCU) forming a scalar instruction pipeline for executing the task, the GPCU configured to schedule instructions for the task at one or more of the computing nodes, the GPCU further configured to dispatch an address for the memory blocks used by each of the computing nodes to the computing nodes. 1. A processor comprising:memory blocks located in a memory bank;one or more computing nodes forming a vector instruction pipeline for executing a task, each of the computing nodes located in the memory bank, each of the computing nodes executing a portion of the task independently of other ones of the computing nodes; anda global program controller unit (GPCU) forming a scalar instruction pipeline for executing the task, the GPCU configured to schedule instructions for the task at one or more of the computing nodes, the GPCU further configured to dispatch an address for the memory blocks used by each of the computing nodes to the computing nodes.2. The processor of claim 1 , wherein the computing nodes comprise a plurality of subsets of computing nodes claim 1 , each of the plurality of subsets of computing nodes executing a different portion of the task during a different period.3. The processor of claim 2 , wherein each of the plurality of subsets of computing nodes accesses the memory blocks specified by the address dispatched to each of the plurality of subsets of computing nodes.4. The processor of claim 1 , further comprising:an instruction queue configured to receive instructions for the task scheduled to the computing nodes.5. The processor of claim 1 , wherein each computing ...

Подробнее
17-02-2022 дата публикации

INSTRUCTION DRIVEN DYNAMIC CLOCK MANAGEMENT FOR DEEP PIPELINE AND OUT-OF-ORDER OPERATION OF MICROPROCESSOR USING ON-CHIP CRITICAL PATH MESSENGER AND ELASTIC PIPELINE CLOCKING

Номер: US20220050686A1
Автор: Gu Jie, Joseph Russell E.
Принадлежит:

Systems and/or methods can include techniques to exploit dynamic timing slack on the chip. By using a special clock generator, the clock period can be shrunk as needed at every cycle. The clock period is determined during operation by checking “critical path messengers” to indicate how much dynamic timing slack exists. Elastic pipeline timing can also be introduced to redistribute timing among pipeline stages to bring further benefits. 1. A dynamic clock management method for executing instructions in deep pipeline stages of a processor , comprising:determining a dynamic timing slack (DTS) of all pipeline stages in each local clock cycle, wherein the DTS is a time margin between an actual timing period and a worst-case clock period to process all in-flight instructions by all the pipeline stages of a computing unit (CU) at a current local clock cycle, wherein the pipeline stages in the CU comprise: a first sequence of control pipeline stages followed by a second sequence of task execution pipeline stages;adjusting by increasing or decreasing a local clock period at the current local clock cycle, by a predetermined amount of time according to a determined DTS, for all the pipeline stages;executing in real-time at the current clock cycle, the in-flight instructions in each of the pipeline stages according to an adjusted local clock period; andrepeat execution of subsequent in-flight instructions in each of the pipeline stages according to a subsequent adjusted local clock period, wherein the subsequent adjusted local clock period is dynamically adjusted according to a subsequent determined DTS at a subsequent clock cycle.2. The dynamic clock management method of claim 1 , wherein the DTS determination comprising receiving simultaneously respective critical path messenger signal sent one local clock cycle earlier claim 1 , from each of the first sequence of control pipeline stages and the second sequence of execution pipeline stages.3. The dynamic clock management ...

Подробнее
17-02-2022 дата публикации

MULTIPLE DIES HARDWARE PROCESSORS AND METHODS

Номер: US20220050805A1
Принадлежит:

Methods and apparatuses relating to hardware processors with multiple interconnected dies are described. In one embodiment, a hardware processor includes a plurality of physically separate dies, and an interconnect to electrically couple the plurality of physically separate dies together. In another embodiment, a method to create a hardware processor includes providing a plurality of physically separate dies, and electrically coupling the plurality of physically separate dies together with an interconnect. 1. A processor comprising:a multi-die package substrate;a plurality of modular dies mounted on the multi-die package substrate and including a first die and a plurality of other dies;a plurality of secondary management controllers, each secondary management controller integral to a corresponding one of the plurality of other dies;an interconnect coupled to the plurality of modular dies; anda primary management controller integral to the first die to transmit one or more management requests to the plurality of secondary management controllers over the interconnect, the plurality of secondary management controllers to perform a modification to a clock of the corresponding one of the plurality of other dies based, at least in part, on the one or more management requests.2. The processor of claim 1 , wherein the plurality of secondary management controllers are to also perform a modification to a voltage of the corresponding one of the plurality of other dies based claim 1 , at least in part claim 1 , on the one or more management requests.3. The processor of claim 1 , wherein the modification to the clock of the corresponding one of the plurality of other dies is a modification of an operating frequency of the clock of the corresponding one of the plurality of other dies.4. The processor of claim 3 , wherein the modification to the clock of the corresponding one of the plurality of other dies further comprises a modification of a clock edge placement of the clock of ...

Подробнее
31-01-2019 дата публикации

PROCESSOR WITH HYBRID PIPELINE CAPABLE OF OPERATING IN OUT-OF-ORDER AND IN-ORDER MODES

Номер: US20190034208A1
Принадлежит:

A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode. 1. A circuit arrangement , comprising:a hybrid pipeline including a plurality of pipeline stages configured to execute at least one instruction stream, wherein the plurality of pipeline stages includes a dispatch stage, wherein the dispatch stage is configured to dispatch instructions to an issue queue when the hybrid pipeline is in an out-of-order mode, and wherein the dispatch stage is configured to bypass the issue queue when the hybrid pipeline is in an in-order mode; andcontrol logic coupled to the hybrid pipeline and configured to dynamically switch the hybrid pipeline between the out-of-order and in-order modes to selectively execute instructions from the at least one instruction stream using out-of-order and in-order pipeline processing, wherein the control logic includes a mode selector configured to dynamically switch the hybrid pipeline between the out-of-order and in-order modes based on power requirements of one or more upcoming instructions in the at least ...

Подробнее
30-01-2020 дата публикации

Super-Thread Processor

Номер: US20200034153A1
Автор: Halle Kevin Sean
Принадлежит:

The disclosed inventions include a processor apparatus and method that enable a general purpose processor to achieve twice the operating frequency of typical processor implementations with a modest increase in area and a modest increase in energy per operation. The invention relies upon exploiting multiple independent streams of execution. Low area and low energy memory arrays used for register files operate a modest frequency. Instructions can be issued at a rate higher than this frequency by including logic that guarantees the spacing between instructions from the same thread are spaced wider than the time to access the register file. The result of the invention is the ability to overlap long latency structures, which allows using lower energy structures, thereby reducing energy per operation. 1. A processor system comprising:a pipeline including an execution unit;an instruction cache; each of the plurality of rows being assigned to an independent thread,', 'each of the plurality of rows including at least one program counter,', 'each of the plurality of rows including storage configured to store one or more instructions,', 'each of the plurality of rows including logic configured to fetch the one or more instructions from the instruction cache, and', 'each of the plurality of rows including logic configured to determine when an instruction is ready to be issued to the pipeline from the respective row; and, 'a context unit including a plurality of rows,'}issue logic configured to select a row from among the plurality of rows and to issue an instruction from the selected row to the pipeline.2. The system of claim 1 , wherein the logic configured to determine when an instruction is ready is responsive to a history of past actions in the respective row.3. The system of claim 1 , wherein the logic configured to determine when an instruction is ready is responsive to a history of past actions in the respective row.4. The system of claim 1 , wherein each of the ...

Подробнее
04-02-2021 дата публикации

POLICY HANDLING FOR DATA PIPELINES

Номер: US20210034372A1
Принадлежит:

Methods, systems, and devices for data processing are described. In some systems, data pipelines may be implemented to handle data processing jobs. To improve data pipeline flexibility, the systems may use separate pipeline and policy declarations. For example, a pipeline server may receive both a pipeline definition defining a first set of data operations to perform and a policy definition including instructions for performing a second set of data operations, where the first set of data operations is a subset of the second set. The server may execute a data pipeline based on a trigger (e.g., a scheduled trigger, a received message, etc.). To execute the pipeline, the server may layer the policy definition into the pipeline definition when creating an execution plan. The server may execute the execution plan by performing a number of jobs using a set of resources and plugins according to the policy definition. 1. A method for data pipeline execution at a server , comprising:receiving, at the server, a pipeline definition comprising a first set of data operations to perform;receiving, at the server, a policy definition comprising instructions for performing a second set of data operations, wherein the first set of data operations is a subset of the second set of data operations;generating an execution plan based at least in part on the pipeline definition and the policy definition; andexecuting the execution plan, wherein the executing comprises performing the first set of data operations according to the instructions.2. The method of claim 1 , wherein generating the execution plan comprises:identifying a network device for performing at least one data operation of the first set of data operations based at least in part on the pipeline definition and the policy definition.3. The method of claim 2 , wherein the network device comprises a first network device and the pipeline definition indicates a second network device for performing the at least one data operation ...

Подробнее
04-02-2021 дата публикации

METHOD AND SYSTEM CREATING AND USING SUB-DATA CONFIDENCE FABRICS

Номер: US20210034382A1
Принадлежит:

In general, in one aspect, the invention relates to a method for managing data. The method includes obtaining, by a local data manager, a first data confidence fabric (DCF) configuration file, where the first DCF configuration file is associated with a first DCF pipeline and a first workload. The method further includes registering the first DCF pipeline in a DCF pipeline registry, obtaining a data set, identifying the first DCF pipeline using the DCF pipeline registry, and processing the data set based on the first DCF pipeline to obtain first processed data. 1. A method for managing data , the method comprising:obtaining, by a local data manager, a first data confidence fabric (DCF) configuration file, wherein the first DCF configuration file is associated with a first DCF pipeline and a first workload;registering the first DCF pipeline in a DCF pipeline registry;obtaining a data set;identifying the first DCF pipeline using the DCF pipeline registry; andprocessing the data set based on the first DCF pipeline to obtain first processed data.2. The method of claim 1 , further comprising:obtaining, by the local data manager, a second DCF configuration file, wherein the second DCF configuration file is associated with a second DCF pipeline and a second workload;registering the second DCF pipeline in the DCF pipeline registry; andprocessing the data set based on the second DCF pipeline to obtain second processed data.3. The method of claim 2 , wherein the first processed data and the second processed data are different.4. The method of claim 2 , wherein the first processed data is not accessible by the second workload.5. The method of claim 1 , wherein the local data manager obtains the data set from a sensor.6. The method of claim 1 , further comprising:obtaining, by the local data manager, a second DCF configuration file, wherein the second DCF configuration file is associated with a second DCF pipeline and a second workload;registering the second DCF pipeline in the ...

Подробнее
04-02-2021 дата публикации

METHOD AND SYSTEM OPTIMIZING THE USE OF SUB-DATA CONFIDENCE FABRICS

Номер: US20210034435A1
Принадлежит:

In general, in one aspect, the invention relates to a method for managing data, the method includes obtaining, by a data management system, a first resource usage of a local data system associated with a first data confidence fabric (DCF) pipeline, wherein the first DCF pipeline is associated with a first workload, obtaining second resource usage of the local data system associated with a second DCF pipeline, wherein the second DCF pipeline is associated with a second workload, analyzing the first resource usage and the second resource usage to obtain a workload ranking, and performing an action set based on the workload ranking. 1. A method for managing data , the method comprising:obtaining, by a data management system, a first resource usage of a local data system associated with a first data confidence fabric (DCF) pipeline, wherein the first DCF pipeline is associated with a first workload;obtaining second resource usage of the local data system associated with a second DCF pipeline, wherein the second DCF pipeline is associated with a second workload;analyzing the first resource usage and the second resource usage to obtain a workload ranking; andperforming an action set based on the workload ranking.2. The method of claim 1 , further comprising:generating a DCF configuration file for the first DCF pipeline; anddeploying the DCF configuration file to the local data system.3. The method of claim 1 , further comprising:prior to performing the action set:validating the workload ranking using verification votes obtained from an external computing device.4. The method of claim 1 ,wherein the first workload is executing on one selected from a group consisting of a client, the local data system, and a second local data system,wherein the local data system and the second local data system are managed by the data management system.5. The method of claim 1 , wherein analyzing the first resource usage and the second resource usage comprises:determining an amount of data ...

Подробнее
16-02-2017 дата публикации

PREDICTING MEMORY INSTRUCTION PUNTS IN A COMPUTER PROCESSOR USING A PUNT AVOIDANCE TABLE (PAT)

Номер: US20170046167A1
Принадлежит:

Predicting memory instruction punts in a computer processor using a punt avoidance table (PAT) are disclosed. In one aspect, an instruction processing circuit accesses a PAT containing entries each comprising an address of a memory instruction. Upon detecting a memory instruction in an instruction stream, the instruction processing circuit determines whether the PAT contains an entry having an address of the memory instruction. If so, the instruction processing circuit prevents the detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction, to preempt a memory instruction punt. In some aspects, the instruction processing circuit may determine, upon execution of a pending memory instruction, whether a hazard associated with the detected memory instruction has occurred. If so, an entry for the detected memory instruction is generated in the PAT. 1. An instruction processing circuit in an out-of-order (OOO) computer processor;the instruction processing circuit communicatively coupled to a front-end circuit of an execution pipeline, and comprising a punt avoidance table (PAT) providing a plurality of entries;the instruction processing circuit configured to prevent a detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction to preempt a memory instruction punt, responsive to determining that an address of the detected memory instruction is present in an entry of the plurality of entries of the PAT.2. The instruction processing circuit of claim 1 , wherein the instruction processing circuit is configured to prevent the detected memory instruction from taking effect before the at least one pending memory instruction older than the detected memory instruction by being configured to perform an in-order dispatch of the at least one pending memory instruction older than the detected memory instruction.3. The instruction ...

Подробнее
03-03-2022 дата публикации

SYSTEM AND METHOD OF POPULATING AN INSTRUCTION WORD

Номер: US20220066982A1
Автор: Danilak Radoslav
Принадлежит: TACHYUM LTD.

A methodology for populating an instruction word for simultaneous execution of instruction operations by a plurality of ALUs in a data path is provided. The methodology includes: creating a dependency graph of instruction nodes, each instruction node including at least one instruction operation; first selecting a first available instruction node from the dependency graph; first assigning the selected first available instruction node to the instruction word; second selecting any available dependent instruction nodes that are dependent upon a result of the selected first available instruction node and do not violate any predetermined rule; second assigning to the instruction word the selected any available dependent instruction nodes; and updating the dependency graph to remove any instruction nodes assigned during the first and second assigning from further consideration for assignment.

Подробнее
13-02-2020 дата публикации

TECHNIQUES FOR CONFIGURING A PROCESSOR TO EXECUTE INSTRUCTIONS EFFICIENTLY

Номер: US20200050251A1
Принадлежит:

Systems and techniques for improving the performance of circuits while adapting to dynamic voltage drops caused by the execution of noisy instructions (e.g. high power consuming instructions) are provided. The performance is improved by slowing down the frequency of operation selectively for types of noisy instructions. An example technique controls a clock by detecting an instruction of a predetermined noisy type that is predicted to have a predefined noise characteristic (e.g. a high level of noise generated on the voltage rails of a circuit due to greater amount of current drawn by the instruction), and, responsive to the detecting, deceasing a frequency of the clock. The detecting occurs before execution of the instruction. The changing of the frequency in accordance with instruction type enables the circuits to be operated at high frequencies even if some of the workloads include instructions for which the frequency of operation is slowed down. 1. A method of controlling a clock of a processor , comprising:detecting an instruction that is predicted to have a predefined noise characteristic, the detecting occurring before execution of the instruction by the processor; andresponsive to the detecting, changing a frequency of the clock.2. The method according to claim 1 , wherein the changing the frequency includes decreasing the frequency by an offset determined to compensate for a predicted drop in voltage corresponding to the predefined noise characteristic.3. The method according to claim 2 , the method further comprises:executing the instruction at least partially while the clock operates at the decreased frequency; andincreasing the frequency of the clock after the executing.4. The method according to claim 3 , further comprising subjecting the increasing to a hysteresis process.5. The method according to claim 4 , wherein the decreasing is performed without being subject to a hysteresis process.6. The method according to claim 2 , wherein the offset is ...

Подробнее
26-02-2015 дата публикации

VERIFYING FORWARDING PATHS IN PIPELINES

Номер: US20150058604A1

A tool for formally verifying forwarding paths in an information pipeline. The tool creates two logic design copies of the pipeline to be verified. The tool retrieves a first and a second instruction, which have previously been proven to compute a mathematically correct result when executed separately. The tool defines driver input functions for issuing instructions to the two logic design copies. In accordance with the driver input functions, the tool issues instructions to the two logic design copies. The tool abstracts data flow of the two logic design copies to isolate forwarding paths for verification. The tool adjusts for latency differences between the first and second logic design copies. The tool checks a register for results, and when results from of two logic design copies become available in the register, the tool verifies the results to conclusively prove the correctness of all states of the information pipeline. 1. A method for verifying forwarding paths , the method comprising:a computer system creating a first and a second logic design copy of an instruction pipeline; wherein the first logic design copy, without forwarding enabled, is driven in such a way that it executes a selected sequence of two instructions as independent instructions, and the second logic design copy, with forwarding enabled, is driven in such a way that it executes the selected sequence of two instructions as dependent instructions,the computer system retrieving a first instruction and a second instruction, wherein the instructions have been previously proven to compute a mathematically correct result when executed separately;the computer system abstracting data flow of the two logic design copies of the instruction pipeline;the computer system adjusting for latency differences between the issuing of the second instruction in the first logic design copy and the issuing of the second instruction in the second logic design copy of the instruction pipeline; andthe computer system ...

Подробнее
03-03-2016 дата публикации

System and method for dynamically managed task switch lookahead

Номер: US20160062797A1
Принадлежит: FREESCALE SEMICONDUCTOR INC

A processing system includes a processor pipeline, a detector circuit, and a task scheduler. The detector circuit includes a basic block detector circuit to determine that the processor pipeline received a first instruction of a first instance of a basic block, and to determine that a last-in-order instruction of the first instance of the basic block is a resource switch instruction (RSWI), and an indicator circuit to provide an indication in response to determining that the processor pipeline received the first instruction of a second instance of the basic block. The task scheduler initiates a resource switch, in response to the indication, at a time subsequent to the first instruction being received that is based on a cycle count that indicates a first number of processor cycles between receiving the first instruction and receiving the RSWI.

Подробнее
21-02-2019 дата публикации

CYCLE CONTROL CIRCUITS

Номер: US20190057730A1
Автор: Kim Chang Hyun
Принадлежит: SK HYNIX INC.

A cycle control circuit may include a judgement pulse generation circuit, a detection signal generation circuit or a flag generation circuit. The judgement pulse generation circuit may be configured to set a predetermined value based on an initialization signal and a period signal, and to generate a judgment pulse. The detection signal generation circuit may be configured to generate a detection signal from a reference flag. The flag generation circuit may be configured to generate a reference flag based on a reference signal. A cycle of the reference signal may be maintained or adjusted based on the reference flag. 1. A cycle control circuit comprising:a judgement pulse generation circuit configured to set a predetermined value in response to an initialization signal and configured to generate a judgement pulse in synchronization with a point of time that a time period corresponding to the predetermined value in units of a cycle of a clock signal elapses;a detection signal generation circuit configured to generate a detection signal from a reference flag in response to the judgement pulse; anda code generation circuit configured to generate a calibration code for controlling a cycle of a reference signal in response to the detection signal and the judgement pulse.2. The cycle control circuit of claim 1 , wherein the initialization signal is generated from an information signal during a mode register write operation.3. The cycle control circuit of claim 1 ,wherein the information signal includes mode information; andwherein the information signal is inputted to the cycle control circuit through a line that transmits an address or a command.4. The cycle control circuit of claim 1 , wherein the judgement pulse generation circuit is configured to initialize a count signal in response to a set control signal and a reset control signal which are generated from the initialization signal while a period signal is disabled and is configured to generate the judgement pulse by ...

Подробнее
20-02-2020 дата публикации

SYSTEM AND METHOD FOR CREATING AND EXECUTING AN INSTRUCTION WORD FOR SIMULTANEOUS EXECUTION OF INSTRUCTION OPERATIONS

Номер: US20200057639A1
Автор: Danilak Radoslav
Принадлежит:

A methodology for creating and executing instruction words for simultaneous execution of instruction operations is provided. The methodology includes creating a dependency graph of nodes with instruction operations, the graph including at least a first node having a first instruction operation and a second node having a second instruction operation being directly dependent upon the outcome of the first instruction operation; first assigning the first instruction operation to a first instruction word; second assigning a second instruction operation: to the first instruction word upon satisfaction of a first at least one predetermined criteria; and to a second instruction word, that is scheduled to be executed during a later clock cycle than the first instruction word, upon satisfaction of a second at least one predetermined criteria; and executing, in parallel by the plurality of ALUs and during a common clock cycle, any instruction operations within the first instruction word. 1. A system for creating and executing instruction words for simultaneous execution of instruction operations , the system comprising:a plurality of Arithmetic Logic Units (ALUs) in a data path operating on a clock cycle;a non-transitory computer readable memory storing instructions: creating a dependency graph of nodes with instruction operations, the graph including at least a first node having a first instruction operation and a second node having a second instruction operation, the second instruction operation being directly dependent upon the outcome of the first instruction operation;', 'first assigning the first instruction operation to a first instruction word;', to the first instruction word upon satisfaction of a first at least one predetermined criteria; and', 'to a second instruction word, that is scheduled to be executed during a later clock cycle than the first instruction word, upon satisfaction of a second at least one predetermined criteria;, 'second assigning a second ...

Подробнее
20-02-2020 дата публикации

System and method of populating an instruction word

Номер: US20200057642A1
Автор: Radoslav Danilak
Принадлежит: Tachyum Inc

A methodology for populating an instruction word for simultaneous execution of instruction operations by a plurality of ALUs in a data path is provided. The methodology includes: creating a dependency graph of instruction nodes, each instruction node including at least one instruction operation; first selecting a first available instruction node from the dependency graph; first assigning the selected first available instruction node to the instruction word; second selecting any available dependent instruction nodes that are dependent upon a result of the selected first available instruction node and do not violate any predetermined rule; second assigning to the instruction word the selected any available dependent instruction nodes; and updating the dependency graph to remove any instruction nodes assigned during the first and second assigning from further consideration for assignment.

Подробнее
20-02-2020 дата публикации

SYSTEM AND METHOD FOR LOCATION AWARE PROCESSING

Номер: US20200057645A1
Автор: Danilak Radoslav
Принадлежит: TACHYUM LTD.

A methodology for preparing a series of instruction operations for execution by plurality of arithmetic logic units (ALU) is provided. The methodology includes first assigning a first instruction operation to the first ALU; first determining, for a second instruction operation having an input that depends directly on an output of a first instruction operation, whether all inputs for the second instruction operation are available within a locally predefined range from the first ALU; second assigning, in response to at least a positive result of the first determining, the second instruction operation to the second ALU; in response to a negative result of the first determining: ensuring a pause of at least one clock cycle will occur between execution of the first instruction operation and the second instruction operation; and third assigning the second instruction operation to an ALU of the plurality of ALUs. 1. A computer hardware device having a clock speed and a clock cycle , the device comprising:a plurality of arithmetic logic units (ALU) within a data path including a first, second and third ALUs, the second ALU being within a locally predefined range of the first ALU and the third ALU being outside of the locally predefined range of the first ALU, wherein the locally predefined range is smaller than the data path; first assigning a first instruction operation to the first ALU;', 'first determining, for a second instruction operation having an input that depends directly on an output of a first instruction operation, whether all inputs for the second instruction operation are available within a locally predefined range from the first ALU;', 'second assigning, in response to at least a positive result of the first determining, the second instruction operation to the second ALU;', ensuring a pause of at least one clock cycle will occur between execution of the first instruction operation and the second instruction operation; and', 'third assigning the second ...

Подробнее
20-02-2020 дата публикации

SYSTEM AND METHOD FOR POPULATING MULTIPLE INSTRUCTION WORDS

Номер: US20200057646A1
Автор: Danilak Radoslav
Принадлежит:

A methodology for populating multiple instruction words is provided. The methodology includes: creating a dependency graph of instruction nodes, each instruction node including at least one instruction operation; first assigning a first instruction node to a first instruction word; identifying a dependent instruction node that is directly dependent upon a result of the first instruction node; first determining whether the dependent instruction node requires any input from two or more sources that are outside of a predefined physical range of each other, the range being smaller than the full extent of the data path; and second assigning, in response to satisfaction of at least one predetermined criteria including a negative result of the first determining, the dependent instruction node to the first instruction word. 1. A system for populating multiple instruction words for instruction operations , the system comprising:a plurality of Arithmetic Logic Units (ALUs) in a data path operating on a clock cycle;a non-transitory computer readable memory storing instructions: creating a dependency graph of instruction nodes, each instruction node including at least one instruction operation;', 'first assigning a first instruction node to a first instruction word;', 'identifying a dependent instruction node that is directly dependent upon a result of the first instruction node;', 'first determining whether the dependent instruction node requires any input from two or more sources that are outside of a predefined physical range of each other, the range being smaller than the full extent of the data path;', 'second assigning, in response to satisfaction of at least one predetermined criteria including a negative result of the first determining, the dependent instruction node to the first instruction word; and', 'third assigning, in response to a negative result of the first determining and violation of any of the at least one predetermined criteria, the dependent instruction ...

Подробнее
17-03-2022 дата публикации

Multiplier-Accumulator Circuitry having Processing Pipelines and Methods of Operating Same

Номер: US20220083342A1
Принадлежит: Flex Logix Technologies, Inc.

An integrated circuit including memory to store image data and filter weights, and a plurality of multiply-accumulator execution pipelines, each multiply-accumulator execution pipeline coupled to the memory to receive (i) image data and (ii) filter weights, wherein each multiply-accumulator execution pipeline processes the image data, using associated filter weights, via a plurality of multiply and accumulate operations. In one embodiment, the multiply-accumulator circuitry of each multiply-accumulator execution pipeline, in operation, receives a different set of image data, each set including a plurality of image data, and, using filter weights associated with the received set of image data, processes the set of image data associated therewith, via performing a plurality of multiply and accumulate operations concurrently with the multiply-accumulator circuitry of the other multiply-accumulator execution pipelines, to generate output data. Each set of image data includes all of the image that correlates to the output data generated therefrom. 120-. (canceled)21. An integrated circuit comprising:memory to store image data and filter weights; and first conversion circuitry, coupled to the memory, to (i) receive image data having a non-Winograd format, (ii) convert the image data to image data having a Winograd format, and (iii) output a plurality of sets of image data having the Winograd format, wherein each set of image data having the Winograd format includes a plurality of image data having the Winograd format;', 'a plurality of multiplier-accumulator circuits, connected in series to form a pipeline and configured to receive the plurality of sets of image data, having the Winograd format, output by the first conversion circuitry, wherein the plurality of multiplier-accumulator circuits process the sets of image data, having the Winograd format, using filter weights, via a plurality of concatenated multiply and accumulate operations, to generate output data, having ...

Подробнее
18-03-2021 дата публикации

Data Selection for a Processor Pipeline Using Multiple Supply Lines

Номер: US20210081205A1
Автор: Nield Simon, Rose Thomas
Принадлежит:

A method for a plurality of pipelines, each having a processing element having first and second inputs and first and second lines, wherein at least one of the pipelines includes first and second logic operable to select a respective line so that data is received at the first and second inputs respectively. A first mode is selected and for the at least one pipeline, the first and second lines of that pipeline are selected such that the processing element of that pipeline receives data via the first and second lines of that pipeline, the first line being capable of supplying data that is different to the second line. A second mode is selected and for the at least one pipeline a line of another pipeline is selected, the second line of the at least one pipeline is selected and the same data at the second line is supplied as the first line. 1. A processor comprising: a processing element having a first input and a second input, and', 'a first supply line and a second supply line;, 'a plurality of processing pipelines, wherein each pipeline of the plurality of processing pipelines comprises a first multiplexer operable to select a supply line so that data from a selected supply line is received at the first input of the first pipeline via the first multiplexer, and', 'a second multiplexer operable to select a supply line so that data from a selected supply line is received at the second input of the first pipeline via the second multiplexer;, 'wherein a first pipeline of the plurality of processing pipelines further comprises the first multiplexer of the first pipeline is configured to select a supply line of another one of the plurality of pipelines,', 'the second multiplexer of the first pipeline is configured to select the second supply line of the first pipeline, and', 'the same data is provided on both the first and second supply lines of the first pipeline., 'wherein, in a first mode of operation and for the first pipeline2. The processor as set forth in claim 1 , ...

Подробнее
12-06-2014 дата публикации

Frequency And Voltage Scaling Architecture

Номер: US20140164802A1
Принадлежит: Individual

A method and apparatus for scaling frequency and operating voltage of at least one clock domain of a microprocessor. More particularly, embodiments of the invention relate to techniques to divide a microprocessor into clock domains and control the frequency and operating voltage of each clock domain independently of the others.

Подробнее
31-03-2022 дата публикации

APPARATUS AND METHOD FOR LOW-LATENCY DECOMPRESSION ACCELERATION VIA A SINGLE JOB DESCRIPTOR

Номер: US20220100526A1
Принадлежит:

Apparatus and method for performing low-latency multi-job submission via a single job descriptor is described herein. An apparatus embodiment includes a plurality of descriptor queues to stores job descriptors describing work to be performed and enqueue circuitry to receive a first job descriptor which includes a first field to store a Single Instruction Multiple Data (SIMD) width. If the SIMD width indicates that the first job descriptor is an SIMD job descriptor and open slots are available in the descriptor queues to store new job descriptors, then the enqueue circuitry is to generate a plurality of job descriptors based on fields of the first job descriptor and to store them in the open slots of the descriptor queues. The generated job descriptors are processed by processing pipelines to perform the work described. At least some of the generated job descriptors are processed concurrently or in parallel by different processing pipelines. 1. An apparatus comprising:a plurality of job descriptor queues to store job descriptors describing work to be performed;enqueue circuitry to receive a first job descriptor, the first job descriptor comprising a plurality of fields including a first field to store a Single Instruction Multiple Data (SIMD) width, wherein when the SIMD width indicates that the first job descriptor is an SIMD job descriptor and the SIMD width is less than or equal to a number of open job descriptor slots in the plurality of job descriptor queues, the enqueue circuitry is to generate a plurality of job descriptors based on fields of the first job descriptor and store the plurality of job descriptors in the open job descriptor slots of the plurality of job descriptor queues; andone or more processing pipelines to process job descriptors stored in the plurality of job descriptor queues to perform the work described, wherein at least some of the plurality of job descriptors generated from the first job descriptor are processed in parallel by the ...

Подробнее
25-03-2021 дата публикации

DISTRIBUTION OF TASKS AMONG ASYMMETRIC PROCESSING ELEMENTS

Номер: US20210089113A1
Принадлежит: Intel Corporation

Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system 1. An apparatus comprising:a plurality of functional units integrated on a microprocessor chip, the functional units including:a memory controller,a graphics processor,an input-output (I/O) interface, anda digital signal processor (DSP); a first processing core, a second processing core, and a third processing core, all having a same instruction set architecture (ISA), wherein the first and second processing cores are able to operate at a higher performance level and power consumption level than the third processing core; and', 'monitoring circuitry including counters to count values reflecting activity levels associated with the first, second, and third processing cores; and', 'a power manager coupled to the plurality of cores, the power manager to adjust a clock frequency and voltage of one or more of the first, second, and third processing cores in accordance with a migration of a task between the first or second processing cores and the third processing core, the migration of the task based, at least in part, on the activity levels., 'a processor integrated on the multiprocessor chip, the processor comprising a plurality of cores including2. The apparatus of further comprising a scheduler to schedule the migration of the task based claim 1 , at least in part claim 1 , on the activity levels.3. The apparatus of wherein a core of the plurality of cores is to execute scheduling software to schedule the migration of the task based claim 1 , at least in part claim 1 , on the activity levels.4. The apparatus of or further comprising:an interrupt controller coupled to the plurality of cores to process at least one interrupt related to the migration of the task from one core to another core of the first, ...

Подробнее
25-03-2021 дата публикации

Programmable instruction buffering

Номер: US20210089323A1
Принадлежит: ARM LTD

A processing system 2 includes a processing pipeline 12, 14, 16, 18, 28 which includes fetch circuitry 12 for fetching instructions to be executed from a memory 6, 8. Buffer control circuitry 34 is responsive to a programmable trigger, such as explicit hint instructions delimiting an instruction burst, or predetermined configuration data specifying parameters of a burst together with a synchronising instruction, to trigger the buffer control circuitry to stall a stallable portion of the processing pipeline (e.g. issue circuitry 16), to accumulate within one or more buffers 30, 32 fetched instructions starting from a predetermined starting instruction, and, when those instructions have been accumulated, to restart the stallable portion of the pipeline.

Подробнее
25-03-2021 дата публикации

CONTROLLING THE OPERATING SPEED OF STAGES OF AN ASYNCHRONOUS PIPELINE

Номер: US20210089324A1
Принадлежит:

An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times. 120.-. (canceled)21. An apparatus comprising:an asynchronous pipeline comprising a first stage and at least one second stage; anda controller to provide control signals to the first stage to indicate a modification of an operating voltage applied to the first stage, wherein the modification of the operating voltage is determined based on a comparison of completion statuses of the first stage and the at least one second stage.22. The apparatus of claim 21 , wherein the at least one second stage comprises at least one of a left-hand stage that generates input data for the first stage and a right-hand stage that receives output data generated by the first stage.23. The apparatus of claim 21 , wherein the modification of the operating voltage modifies an operating speed of the first stage.24. The apparatus of claim 21 , further comprising:at least one buffer to drive signals between portions of the first stage, and wherein the controller is to provide control signals to indicate at least one modification of at least one drive strength of the at least one buffer, wherein the at least one modification of the at least one drive ...

Подробнее
29-03-2018 дата публикации

Pipelined Configurable Processor

Номер: US20180089140A1
Автор: Metzgen Paul
Принадлежит:

A configurable processing circuit capable of handling multiple threads simultaneously, the circuit comprising a thread data store, a plurality of configurable execution units, a configurable routing network for connecting locations in the thread data store to the execution units, a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units, and a pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle, the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections for each clock cycle to be in accordance with the configuration instance associated with the respective thread that will propagate through that pipeline section during the clock cycle. 138.-. (canceled)39. A configurable processing circuit capable of handling multiple threads simultaneously , the circuit comprising:a thread data store;a plurality of configurable execution units;a configurable routing network for connecting the thread data store to the execution units;a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units; anda pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle;the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections at each clock cycle to be in accordance with the configuration instance ...

Подробнее
02-04-2015 дата публикации

APPARATUS AND METHOD FOR EFFICIENT MIGRATION OF ARCHITECTURAL STATE BETWEEN PROCESSOR CORES

Номер: US20150095614A1
Принадлежит:

An apparatus and method are described for the efficient migration of architectural state between processor cores. For example, a processor according to one embodiment comprises: a first processing core having a first instruction execution pipeline including first register set for storing a first architectural state of a first thread being executed thereon; a second processing core having a second instruction execution pipeline including a second register set for storing a second architectural state of a second thread being executed thereon; and architectural state migration logic to perform a direct, simultaneous swap of the first architectural state from the first register set with the second architectural state from the second register set responsive to detecting that the execution of the first thread is to be migrated from the first core to the second core. 1. A processor , comprising:a first processing core having a first instruction execution pipeline including first register set for storing a first architectural state of a first thread being executed thereon;a second processing core having a second instruction execution pipeline including a second register set for storing a second architectural state of a second thread being executed thereon; andarchitectural state migration logic to perform a direct swap of the first architectural state from the first register set with the second architectural state from the second register set responsive to detecting that the execution of the first thread is to be migrated from the first core to the second core.2. The processor as in wherein the direct swap is performed by swapping the architectural state from one register at a time from the first register set and the second register set.3. The processor as in wherein the direct swap is performed by swapping the architectural state from a block of registers at a time from the first register set and the second register set.4. The processor as in wherein the direct swap is ...

Подробнее
30-03-2017 дата публикации

External intrinsic interface

Номер: US20170090943A1
Принадлежит: SAMSUNG ELECTRONICS CO LTD

An external intrinsic interface. A processor may include a core including a plurality of functional units, an intrinsic module located outside the core, and an interface module to perform relaying between the intrinsic module and a functional unit, among the plurality of functional units.

Подробнее
01-04-2021 дата публикации

METHOD AND APPARATUS FOR MINIMALLY INTRUSIVE INSTRUCTION POINTER-AWARE PROCESSING RESOURCE ACTIVITY PROFILING

Номер: US20210096855A1
Принадлежит: Intel Corporation

Systems and methods for minimally intrusive instruction pointer-aware processing resource activity profiling are disclosed. In one embodiment, a graphics processor includes a grouping of processing resources and control logic that is associated with the grouping of processing resources. The control logic is configured to sample a state of at least one processing resource of the grouping of processing resources and to determine activity data from the state with the activity data including at least one of stalls and reason counts for stalling activity, instruction types, pipeline utilization, thread utilization, and shader activity. 1. A graphics processor , comprising:a grouping of processing resources; andcontrol logic that is associated with the grouping of processing resources, the control logic is configured to sample a state of at least one processing resource of the grouping of processing resources and to determine activity data from the state with the activity data including at least one of stalls and reason counts for stalling activity, instruction types, pipeline utilization, thread utilization, or shader activity.2. The graphics processor of claim 1 , further comprising:a cache unit that is associated with the grouping of processing resources, the cache unit to receive an instruction pointer address and the activity data including a stall reason for each state of processing resources that are associated with the cache unit.3. The graphics processor of claim 2 , wherein each sampling of a state is scheduled for a chosen clock cycle and is minimally intrusive.4. The graphics processor of claim 1 , wherein the control logic is configured to store a state when threads are allocated on a processing resource with no instruction being executed for a chosen cycle that is sampled.5. The graphics processor of claim 4 , wherein the control logic is configured to discard a state for a chosen cycle that is sampled if the processing resource is idle or executing an ...

Подробнее
01-04-2021 дата публикации

COLLAPSING BUBBLES IN A PROCESSING UNIT PIPELINE

Номер: US20210096877A1
Принадлежит:

An arithmetic logic unit (ALU) pipeline of a processing unit collapses execution bubbles in response to a stall at a stage of the ALU pipeline. An execution bubble occurs at the pipeline in response to an invalid instruction being placed in the pipeline for execution. The invalid instruction thus consumes an available “slot” in the pipeline, and proceeds through the pipeline until a stall in a subsequent stage (that is, a stage after the stage executing the invalid instruction) is detected. In response to detecting the stall, the ALU continues to execute instructions that are behind the invalid instruction in the pipeline, thereby collapsing the execution bubble and conserving resources of the ALU.in response to a stall at a stage of the ALU pipeline. 1. A method comprising:identifying a first execution bubble at a first stage of an arithmetic logic unit (ALU) and a first stall condition at a second stage of the ALU; andin response to identifying the first execution bubble and the first stall condition, collapsing the first execution bubble.2. The method of claim 1 , wherein:collapsing the first execution bubble comprises executing a first instruction at a third stage of the ALU during the first stall condition.3. The method of claim 2 , wherein:collapsing the first execution bubble comprises executing a second instruction at a fourth stage of the ALU during the first stall condition.4. The method of claim 2 , wherein:collapsing the first execution bubble comprises issuing a second instruction to the ALU during the first stall condition.5. The method of claim 2 , further comprising:stalling the third stage of the ALU in response to collapsing the first execution bubble and in response to determining the first stall condition persists at the second stage of the ALU.6. The method of claim 1 , wherein:identifying the first execution bubble comprises identifying the first execution bubble in response to identifying an invalid instruction executing at the ALU.7. The ...

Подробнее
05-04-2018 дата публикации

CLOCK-GATING FOR MULTICYCLE INSTRUCTIONS

Номер: US20180095767A1
Принадлежит:

A system and a method of clock-gating for multicycle instructions are provided. For example, the method includes enabling a plurality of logic blocks that include a subset of multicycle (MC) logic blocks and a subset of pipeline logic blocks. The method also includes computing a precise enable computation value after a plurality of cycles of executing an instruction, and disabling one or more of the subset of multicycle (MC) logic blocks based on the precise enable computation value. Also, at least the subset of pipeline logic blocks needed to compute the instruction remains on. 1. A method of clock-gating for multicycle instructions , the method comprising:enabling a plurality of logic blocks that include a subset of multicycle (MC) logic blocks and a subset of pipeline logic blocks, wherein the enabling is based on a combination of a valid signal and an MC running signal;computing a precise enable computation value in a pipeline domain after a plurality of cycles of executing an instruction;determining that no instructions correspond to the subset of MC logic blocks, disabling one or more of the subset of MC logic blocks based on the precise enable computation value, wherein the precise enable computation value includes a decoded instruction, identification of a type, and a location of the decoded instruction,wherein at least the subset of pipeline logic blocks needed to compute the instruction remain on; andholding the plurality of logic blocks enabled for the plurality of cycles needed to compute the precise enable computation value using at least a control latch and an OR gate, wherein the OR gate receives both the valid bit and an MC_running signal from the control latch, wherein the OR gate and the pipeline clock domain simultaneously receive the valid bit, wherein the control latch receives the precise enable computation value.2. The method of claim 1 , further comprising:computing an imprecise enable computation value before execution of the instruction ...

Подробнее
05-04-2018 дата публикации

CLOCK-GATING FOR MULTICYCLE INSTRUCTIONS

Номер: US20180095768A1
Принадлежит:

A system and a method of clock-gating for multicycle instructions are provided. For example, the method includes enabling a plurality of logic blocks that include a subset of multicycle (MC) logic blocks and a subset of pipeline logic blocks. The method also includes computing a precise enable computation value after a plurality of cycles of executing an instruction, and disabling one or more of the subset of multicycle (MC) logic blocks based on the precise enable computation value. Also, at least the subset of pipeline logic blocks needed to compute the instruction remains on. 1. A method of clock-gating for multicycle instructions , the method comprising:enabling a plurality of logic blocks that include a subset of multicycle (MC) logic blocks and a subset of pipeline logic blocks, wherein the enabling is based on a combination of a valid signal and an MC running signal;computing a precise enable computation value in a pipeline domain after a plurality of cycles of executing an instruction;determining that no instructions correspond to the subset of MC logic blocks, disabling one or more of the subset of MC logic blocks based on the precise enable computation value,wherein at least the subset of pipeline logic blocks needed to compute the instruction remain on; andholding the plurality of logic blocks enabled for the plurality of cycles needed to compute the precise enable computation value.2. The method of claim 1 , further comprising:computing an imprecise enable computation value before execution of the instruction begins; andenabling an imprecise startup subset of logic blocks from the plurality of logic blocks based on the imprecise enable computation value,wherein the imprecise startup subset includes one or more of the MC logic blocks and one or more of the pipeline logic blocks.3. The method of claim 1 , further comprising:grouping the subset of pipeline logic blocks from the plurality of logic blocks into a pipeline clock domain; andgrouping the subset ...

Подробнее
12-05-2022 дата публикации

Small branch predictor escape

Номер: US20220147360A1
Автор: Thomas C. McDonald
Принадлежит: Centaur Technology Inc

In one embodiment, a branch prediction control system is configured to move a mispredicted conditional branch from a smaller cache side that uses the lower complexity conditional branch predictor to one of the two large cache sides that uses the higher complexity conditional branch predictors. The move (write) is achieved according to a configurable probability or chance to escape misprediction recurrence and results in a reduced amount of mispredictions for the given branch instruction.

Подробнее
12-04-2018 дата публикации

Execution of Data-Parallel Programs on Coarse-Grained Reconfigurable Architecture Hardware

Номер: US20180101387A1
Принадлежит:

A GPGPU-compatible architecture combines a coarse-grain reconfigurable fabric (CGRF) with a dynamic dataflow execution model to accelerate execution throughput of massively thread-parallel code. The CGRF distributes computation across a fabric of functional units. The compute operations are statically mapped to functional units, and an interconnect is configured to transfer values between functional units. 1. A method of computing , comprising the steps of:providing a coarse grain fabric of processing units having direct interconnects therebetween;representing a series of computing operations to be executed in the fabric as a control data flow graph having code paths;routing the direct interconnects by enabling and disabling selected ones of the direct interconnects to match the processing units to the code paths of the control data flow graph; andexecuting the computing operations in the fabric in multiple simultaneous threads by transferring values among the processing units via the direct interconnects, wherein the routing of the direct interconnects is static during execution of the computing operations.2. The method according to claim 1 , wherein the threads comprise instructions to be executed in individual processing units claim 1 , and executing the computing operations comprises dynamically scheduling the instructions of at least a portion of the threads.3. The method according to claim 2 , further comprising grouping the threads into epochs; wherein dynamically scheduling comprises deferring execution in one of the processing units of a current instruction of one of the epochs until execution in the one processing unit of all preceding instructions belonging to other epochs has completed.4. The method according to claim 2 , further comprising the steps of:making a determination that in the control data flow graph one of the code paths is longer than another code path; anddelaying the computing operations in processing units that are matched with the other ...

Подробнее
12-04-2018 дата публикации

MULTIPLE DIES HARDWARE PROCESSORS AND METHODS

Номер: US20180101502A1
Принадлежит:

Methods and apparatuses relating to hardware processors with multiple interconnected dies are described. In one embodiment, a hardware processor includes a plurality of physically separate dies, and an interconnect to electrically couple the plurality of physically separate dies together. In another embodiment, a method to create a hardware processor includes providing a plurality of physically separate dies, and electrically coupling the plurality of physically separate dies together with an interconnect. 1. A hardware processor comprising:a plurality of physically separate dies;an interconnect to electrically couple the plurality of physically separate dies together;a first transmitter circuit of a first die of the plurality of physically separate dies;a second receiver circuit of a second die of the plurality of physically separate dies electrically coupled to the first transmitter circuit of the first die through at least one data lane of the interconnect that corresponds to a clock lane of the interconnect; anda clock circuit to receive a request from the first transmitter circuit to change the second receiver circuit to an operating frequency and a clocking rate for the operating frequency, cause a look-up in a data storage device of a predetermined clock phase placement for the operating frequency and the clocking rate for the operating frequency from a plurality of predetermined clock phase placements for a first clocking rate for each single frequency of different operating frequencies and for a second, different clocking rate for each single frequency of different operating frequencies, and cause the second receiver circuit to receive data from the first transmitter circuit on the at least one data lane with the predetermined clock phase placement for the operating frequency and the clocking rate for the operating frequency on the clock lane.2. The hardware processor of claim 1 , wherein both a leading-edge placement and a trailing-edge placement of a ...

Подробнее
04-04-2019 дата публикации

PROCESSORS AND METHODS FOR CONFIGURABLE CLOCK GATING IN A SPATIAL ARRAY

Номер: US20190101952A1
Принадлежит:

Methods and apparatuses relating to configurable clock gating in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller, coupled to a first processing element and a second processing element of the plurality of processing elements and the first processing element having an output coupled to an input of the second processing element, to configure the second processing element to clock gate at least one clocked component of the second processing element, and configure the first processing element to send a reenable signal on the interconnect network to the second processing element to reenable the at least one clocked component of the second processing element when data is to be sent from the first processing element to the second processing element. 1. A processor comprising:a plurality of processing elements;an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the interconnect network and the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements; anda configuration controller coupled to the plurality of processing elements to configure the plurality of processing elements according to configuration information for the dataflow graph, and clock gate at least one clocked component of a processing element based on the configuration information.2. The processor of claim 1 , wherein the at least one clocked component is an input buffer of multiple parallel input buffers within the processing element.3. The processor of claim 1 , wherein the at least one clocked component is ...

Подробнее
09-06-2022 дата публикации

Data processign system and method for reading insruction data of instruction from memory

Номер: US20220179562A1
Принадлежит: Winbond Electronics Corp

In the disclosure, a data processing system includes a microprocessor and a memory. The integrity of data read from a memory by a microprocessor may be checked. When an instruction address is transmitted from the microprocessor to the memory for reading the instruction data corresponding to the instruction address, predetermined dummy data is also read from the memory while the instruction data is read. The integrity of the instruction data may be check by comparing the predetermined dummy data to a hardwire data that is not stored in the memory. If the dummy data matches the hardwire data, the instruction data read from the memory is determined to be correct.

Подробнее
10-05-2018 дата публикации

Performing Local Power Gating In A Processor

Номер: US20180129265A1
Принадлежит: Intel Corp

In an embodiment, the present invention includes an execution unit to execute instructions of a first type, a local power gate circuit coupled to the execution unit to power gate the execution unit while a second execution unit is to execute instructions of a second type, and a controller coupled to the local power gate circuit to cause it to power gate the execution unit when an instruction stream does not include the first type of instructions. Other embodiments are described and claimed.

Подробнее
10-05-2018 дата публикации

Performing Local Power Gating In A Processor

Номер: US20180129266A1
Принадлежит: Intel Corp

In an embodiment, the present invention includes an execution unit to execute instructions of a first type, a local power gate circuit coupled to the execution unit to power gate the execution unit while a second execution unit is to execute instructions of a second type, and a controller coupled to the local power gate circuit to cause it to power gate the execution unit when an instruction stream does not include the first type of instructions. Other embodiments are described and claimed.

Подробнее
11-05-2017 дата публикации

CACHE SYSTEM AND METHOD

Номер: US20170132140A1
Автор: Lin Kenneth Chenghao
Принадлежит:

The present invention provides a cache method and a cache system. The cache method includes the following steps. An instruction issuing is scheduled based on a program flow information stored in a cache system. The program flow information includes an instruction sequence information and an instruction distance information. A time point for the instruction issuing is determined based on the instruction sequence information and the instruction distance information. 1. A cache method , comprising:scheduling an instruction issuing based on a program flow information stored in a cache system, wherein the program flow information includes an instruction sequence information and an instruction distance information; anddetermining a time point for the instruction issuing based on the instruction sequence information and the instruction distance information.2. The cache method of claim 1 , wherein:a portion or all of the program flow information are included in instructions stored in an instruction memory; orthe program flow information extracted from instructions are stored in a program flow information memory.3. The cache method of claim 2 , wherein:the instruction memory is an instruction cache or an instruction read buffer; andthe program flow information memory is a track table or a track read buffer.4. The cache method of claim 3 , wherein:a processor system includes a main pipeline and a plurality of early pipelines; andinstructions are issued in advance to early pipelines based on the instruction sequence information stored in the instruction memory or the program flow information memory and the instruction distance.5. The cache method of claim 5 , wherein:instructions that require more execution cycles are issued in advance based on an instruction type read in advance from the instruction memory or the program flow information memory.6. The cache method of claim 5 , wherein:instructions are divided into at least two types including type “1” instructions that ...

Подробнее
28-05-2015 дата публикации

Arithmetic processing device, arithmetic processing system, and method for controlling arithmetic processing device

Номер: US20150149724A1
Принадлежит: Fujitsu Ltd

An arithmetic processing device includes: a arithmetic cores, wherein the arithmetic core comprises: an instruction controller configured to request processing corresponding to an instruction; a memory configured to store lock information indicating that a locking target address is locked, the locking target address, and priority information of the instruction; and a cache controller configured to, when storing data of a first address in a cache memory to execute a first instruction including locking of the first address from the instruction controller, suppress updating of the memory if the lock information is stored in the memory and a priority of the priority information is higher than a first priority of the first instruction.

Подробнее
30-04-2020 дата публикации

READ QUALITY OF SERVICE FOR NON-VOLATILE MEMORY

Номер: US20200133839A1
Принадлежит:

A method and apparatus to reduce read latency and improve read quality of service (Read QoS) for non-volatile memory, such as NAND array in a NAND device. For read commands that collide with an in-progress program array operation targeting the same program locations in a NAND array, the in-progress program is suspended and the controller allows the read command to read from the internal NAND buffer instead of waiting for the in-progress program to complete. For read commands queued during an in-progress program that is processing pre-reads in preparation for a program array operation, pre-read bypass allows the reads to be serviced between the pre-reads and before the program's array operation starts. In this manner, read commands can be serviced without suspending the in-progress program. Allowing internal NAND buffer reads and enabling pre-read bypass reduces read latency and improves Read QoS. 1. A memory controller comprising:a command queue to queue commands issued to pages of block-addressable non-volatile memory; and the read command in collision with the program operation on a same page of the block-addressable non-volatile memory, and', 'the read command queued during pre-reads for the program operation., 'a command logic to reduce a read starvation of a read command in the command queue during a program operation on a page of the block-addressable non-volatile memory, the read starvation a result of any of2. The memory controller of claim 1 , wherein to reduce the read starvation of the read command in collision with the program operation on the same page of the non-volatile memory claim 1 , the command logic is to:suspend the program operation;cause the read command to read from an internal buffer of page data instead of the same page of the block-addressable non-volatile memory;resume the program operation upon completion of the read from the internal buffer of page data; andwherein the same page includes a page in a same linked page group of the block- ...

Подробнее
10-06-2021 дата публикации

PROCESSOR FOR NEURAL NETWORK OPERATION

Номер: US20210173648A1
Принадлежит: National Tsing Hua University

A processor adapted for neural network operation is provided to include a scratchpad memory, a processor core, a neural network accelerator coupled to the processor core, and a arbitration unit coupled to the scratchpad memory, the processor core and the neural network accelerator. The processor core and the neural network accelerator share the scratchpad memory via the arbitration unit. 1. A processor adapted for neural network operation , comprising:a scratchpad memory that is configured to store to-be-processed data, and multiple kernel maps of a neural network model, and that has a memory interface;a processor core that is configured to issue core-side read/write instructions that conform with said memory interface to access said scratchpad memory;a neural network accelerator that is electrically coupled to said processor core and said scratchpad memory, and that is configured to issue accelerator-side read/write instructions that conform with said memory interface to access said scratchpad memory for acquiring the to-be-processed data and the kernel maps from said scratchpad memory so as to perform a neural network operation on the to-be-processed data based on the kernel maps, wherein the accelerator-side read/write instructions conform with said memory interface; andan arbitration unit that is electrically coupled to said processor core, said neural network accelerator and said scratchpad memory to permit one of said processor core and said neural network accelerator to access said scratchpad memory.2. The processor of claim 1 , wherein the neural network model is a convolutional neural network (CNN) model claim 1 , and said neural network accelerator includes an operation circuit electrically coupled to said scratchpad memory; a partial-sum memory electrically coupled to said operation circuit; and a scheduler electrically coupled to said processor core claim 1 , said scratchpad memory and said partial-sum memory;{'sup': 'th', 'claim-text': [{'sup': 'th', ' ...

Подробнее
17-06-2021 дата публикации

MARKER-BASED PROCESSOR INSTRUCTION GROUPING

Номер: US20210182072A1
Принадлежит:

A system includes a processing unit such as a GPU that itself includes a command processor configured to receive instructions for execution from a software application. A processor pipeline coupled to the processing unit includes a set of parallel processing units for executing the instructions in sets. A set manager is coupled to one or more of the processor pipeline and the command processor. The set manager includes at least one table for storing a set start time, a set end time, and a set execution time. The set manager determines an execution time for one or more sets of instructions of a first window of sets of instructions submitted to the processor pipeline. Based on the execution time of the one or more sets of instructions, a set limit is determined and applied to one or more sets of instructions of a second window subsequent to the first window. 1. A processor comprising:a processor pipeline having a set of parallel processing units for executing instructions; and receive one or more sets of instructions;', 'characterize at least one of the received sets of instructions;', 'subsequent to the characterization, receive a plurality of sets instructions for execution, wherein the plurality of sets of instructions include at least one previously characterized set of instructions;', 'provision the plurality of sets of instructions to the processor pipeline in a subsequent window of time based on the at least one previously characterized sets of instructions., 'a set manager coupled to the processor pipeline, wherein the set manager is configured to2. The processor of claim 1 , further comprising:a command processor configured to receive the instructions for execution by the processor pipeline; andwherein the set manager is configured to characterize the at least one of the received sets of instructions by identifying a set start time; a set execution time; and a set end time for the at least one of the received sets of instructions.3. The processor of claim 1 , ...

Подробнее
17-06-2021 дата публикации

MEMORY INTERFACE MANAGEMENT

Номер: US20210182201A1
Принадлежит:

A method includes receiving a signal at a memory sub-system controller to perform an operation. The method can further include, in response to receiving the signal, enabling, by the memory sub-system controller, an interface to transfer data to or from a registering clock driver (RCD) component. The RCD component is coupled to the memory sub-system controller. The method can further include transferring the data to or from the RCD component via the interface. The method can further include, in response to the enablement of the interface being unsuccessful, transferring control of a memory device to the memory sub-system controller. 1. A method , comprising:receiving a signal at a memory sub-system controller to perform an operation;in response to receiving the signal, enabling, by the memory sub-system controller, an interface to transfer data to or from a registering clock driver (RCD) component, wherein the RCD component is coupled to the memory sub-system controller;transferring the data to or from the RCD component via the interface; andin response to the enablement of the interface being unsuccessful, transferring control of a memory device to the memory sub-system controller.2. The method of claim 1 , further comprising claim 1 , in response to the enablement of the interface being successful claim 1 , maintaining control of the memory device by a device that previously had control of the memory device.3. The method of claim 2 , wherein the device that maintains control is a host coupled to the memory device.4. The method of claim 1 , wherein transferring control of the memory device comprises transferring control of a dynamic random access memory (DRAM) device to the memory sub-system controller.5. The method of claim 1 , further comprising enabling the interface without modifying a Basic Input/Output System (BIOS) associated with a host that is in communication with the memory sub-system controller.6. The method of claim 1 , further comprising sending a ...

Подробнее
24-06-2021 дата публикации

Resource Management Unit for Capturing Operating System Configuration States and Offloading Tasks

Номер: US20210191775A1
Принадлежит: Google LLC

This disclosure describes methods, devices, systems, and procedures in a computing system for capturing a configuration state of an operating system executing on a central processing unit (CPU), and offloading resource-related tasks, based on the configuration state, to a resource management unit such as a system-on-chip (SoC). The resource management unit identifies a status of each resource based on the captured configuration state of the operating system. The resource management unit then processes tasks associated with the status of the resources, such as modifying a clock rate of a clocked component in the computing system. This can alleviate the CPU from processing those tasks thereby improving overall computing system performance and dynamics. 1. A method of managing a computing system , the method comprising:capturing, by a resource management unit and into a first memory, a configuration state of an operating system in a second memory, the operating system executing on a processor of the computing system;identifying, by the resource management unit, a status of a resource in the computing system based on the configuration state of the operating system; andprocessing, by the resource management unit, a task associated with the status of the resource, the processing by the resource management unit effective to alleviate the processor from processing the task.2. The method of claim 1 , wherein the resource management unit communicates with at least one of the processor or the second memory for obtaining the configuration state of the operating system using a low-latency communication data link or a high-speed interface bus claim 1 , or by being integrated with a same integrated circuit die as the processor.3. The method of claim 1 , wherein processing the task associated with the status of the resource comprises managing the resource to improve: a performance of the computing system; a security aspect of the computing system; or a processing of an overhead ...

Подробнее
16-06-2016 дата публикации

Power saving mechanism to reduce load replays in out-of-order processor

Номер: US20160170758A1
Принадлежит: VIA Alliance Semiconductor Co Ltd

An apparatus includes a first reservation station and a second reservation station. The first reservation station dispatches a first load micro instruction, and detects and indicates on a hold bus if the first load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the first load micro instruction for execution after a first number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the first load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the first load micro instruction has retrieved the operand.

Подробнее
14-06-2018 дата публикации

PIPE LATCH CIRCUIT AND DATA OUTPUT CIRCUIT INCLUDING THE SAME

Номер: US20180165098A1
Автор: HONG Yun-Gi
Принадлежит:

A pipe latch circuit includes: a pipe pipe latch control block suitable for controlling a plurality of pipe input signals and a plurality of pipe output signals to be activated sequentially or be divided into at least two groups and be activated sequentially by group, depending on a latency setting value, and outputting at least one pipe input signal and at least one pipe output signal; and a pipe latch block coupled between an input node and an output node, and suitable for storing data of the input node in response to the pipe input signal and outputting stored data to the output node in response to the pipe output signal. 1. A pipe latch circuit comprising:a pipe latch control block suitable for controlling a plurality of pipe input signals and a plurality of pipe output signals to be activated sequentially or be divided into at least two groups and be activated sequentially by group, depending on a latency setting value, and outputting at least one pipe input signal and at least one pipe output signal; anda pipe latch block coupled between an input node and an output node, and suitable for storing data of the input node in response to the pipe input signal and outputting stored data to the output node in response to the pipe output signal.2. The pipe latch circuit according to claim 1 , wherein the pipe latch control block comprises:a first pipe latch control block suitable for controlling, the pipe input signals to be activated sequentially or be divided into at least two groups and be sequentially activated by each group, in response to a pipe input clock signal; anda second pipe latch control block suitable for controlling the pipe output signals to be activated sequentially or be divided into the at least two groups and be sequentially activated by the each group, in response to a pipe output clock signal.3. The pipe latch circuit according to claim 2 , wherein the first pipe latch control block comprises:a plurality of shift registers suitable for ...

Подробнее
21-05-2020 дата публикации

Execution of Data-Parallel Programs on Coarse-Grained Reconfigurable Architecture Hardware

Номер: US20200159539A1
Принадлежит:

A GPGPU-compatible architecture combines a coarse-grain reconfigurable fabric (CGRF) with a dynamic dataflow execution model to accelerate execution throughput of massively thread-parallel code. The CGRF distributes computation across a fabric of functional units. The compute operations are statically mapped to functional units, and an interconnect is configured to transfer values between functional units. 1. A method of computing , comprising the steps of:for a coarse-grain fabric of processing units having interconnects therebetween, receiving a representation of a series of computing operations to be processed in the fabric as a control data flow graph having code paths, the computing operations comprising instructions to be executed in the fabric;configuring the fabric by enabling and disabling selected ones of the interconnects to match the processing units to the code paths of the control data flow graph; andwhile a configuration of the fabric remains unchanged, executing the instructions of the computing operations in the fabric in a pipelined sequence of simultaneous threads.2. The method according to claim 1 , wherein configuring the fabric further comprises configuring one or more of the processing units in accordance with the control data flow graph.3. The method according to claim 1 , wherein the threads comprise instructions to be executed in individual processing units claim 1 , and wherein processing the computing operations comprises dynamically scheduling the instructions of at least a portion of the threads.4. The method according to claim 3 , further comprising grouping the threads into epochs; wherein dynamically scheduling the instructions comprises deferring execution in one of the processing units of a current instruction of one of the epochs until execution in the one processing unit of all preceding instructions belonging to other epochs has completed.5. The method according to claim 3 , further comprising the steps of:making a determination ...

Подробнее
01-07-2021 дата публикации

Apparatus and method for power virus protection in a processor

Номер: US20210200860A1
Принадлежит: Intel Corp

An apparatus and method for intelligent power virus protection in a processor. For example, one embodiment of a processor comprises: first circuitry including an instruction fetch circuit to fetch instructions, each instruction comprising an instruction type and an associated width comprising a number of bits associated with source and/or destination operand values associated with the instruction; detection circuitry to detect one or more instructions of a particular type and/or width; evaluation circuitry to evaluate an impact of power virus protection (PVP) circuitry when executing the one or more instructions based on the detected instruction types and/or widths; and control circuitry, based on the evaluation, to configure the PVP circuitry in accordance with the evaluation performed by the evaluation circuitry.

Подробнее
22-06-2017 дата публикации

Reconfigurable interconnected programmable processors

Номер: US20170177517A1
Принадлежит: Wave Computing Inc

A plurality of software programmable processors is disclosed. The software programmable processors are controlled by rotating circular buffers. A first processor and a second processor within the plurality of software programmable processors are individually programmable. The first processor within the plurality of software programmable processors is coupled to neighbor processors within the plurality of software programmable processors. The first processor sends and receives data from the neighbor processors. The first processor and the second processor are configured to operate on a common instruction cycle. An output of the first processor from a first instruction cycle is an input to the second processor on a subsequent instruction cycle.

Подробнее
29-06-2017 дата публикации

BIOS REAL-TIME CLOCK UPDATE

Номер: US20170185101A1
Принадлежит:

An example system includes a bask input/output system (BIOS) and a battery having a fuel gauge timer. The BIOS is associated with a real-time clock, and the BIOS uses timer information from the fuel gauge timer to update the real-time clock. 1. A system , comprising:a basic input/output system (BIOS), the BIOS being associated with a real-time clock; anda battery having a timer,wherein the BIOS is to utilize timer information from the timer to update the real-time clock.2. The system of claim 1 , wherein the real-time clock is updated using the timer information from timer upon boot up of the BIOS.3. The system of claim 2 , wherein real-time clock is without power before boot up of the BIOS.4. The system of claim 1 , wherein the timer part of a fuel gauge of the battery.5. The system of claim 1 , wherein the timer information from the timer includes at least the calendar date and the current time.6. A method claim 1 , comprising:booting up a basic input/output system (BIOS);obtaining timer information from a timer of a battery; andupdating a real-time clock associated with the BIOS using the timer information from the timer of the battery.7. The method of claim 6 , wherein real-time clock is without power before boot up of the BIOS.8. The method of claim 6 , wherein the battery provides power to the BIOS and the real-time Clock.9. The method of claim 6 , wherein the timer is part of a fuel gauge of the battery.10. The method of claim 6 , wherein the timer information from the timer includes at least the calendar date and the current time.11. An apparatus claim 6 , comprising:a processor; and obtain timer information from a timer of a fuel gauge of a battery; and', 'update a real-time clock associated with a basic input/output system (BIOS) using the timer information from the timer of the fuel gauge of the battery., 'a memory device including computer program code, the memory device and the computer program code, with the processor, to cause the apparatus to12. The ...

Подробнее
29-06-2017 дата публикации

Processor Instructions to Accelerate FEC Encoding and Decoding

Номер: US20170185399A1
Принадлежит: Coherent Logix Inc

Various embodiments are described of a system for improved processor instructions for a software-configurable processing element. In particular, various embodiments are described which accelerate functions useful for FEC encoding and decoding. In particular, the processing element may be configured to implement one or more instances of the relevant functions in response to receiving one of the processor instructions. The processing element may later be reconfigured to implement a different function in response to receiving a different one of the processor instructions. Each of the disclosed processor instructions may be implemented repeatedly by the processing element to repeatedly perform one or more instances of the relevant functions with a throughput approaching one or more solutions per clock cycle.

Подробнее
04-06-2020 дата публикации

FASTER SPARSE FLUSH RECOVERY

Номер: US20200174796A1
Принадлежит:

Systems, apparatuses, and methods for performing efficient processor pipeline flush recovery are disclosed. A processor core includes a retire queue for storing information of outstanding instructions. When the retire queue logic detects that a pipeline flush condition occurs, the logic creates one or more groups of entries in the retire queue. The logic begins the groups with an entry storing information for a youngest outstanding instruction, and creates other groups in a contiguous manner after creating this first group. The logic marks with a first indication a given group when the given group includes one or more instructions of a given type. The logic marks with a second indication the given group when the given group does not include an instruction of the given type. The logic sends to flush recovery logic information of one or more entries in only groups marked with the first indication. 1. An apparatus comprising:a queue comprising a plurality of entries, each configured to store information for an instruction; andlogic; and create one or more groups of entries in the queue beginning with a first entry storing information for a youngest instruction and ending with a second entry storing information for an instruction that caused the first pipeline flush condition;', 'identify each group of the one or more groups that include at least one instruction of a given type with a first indication;', 'identify each group of the one or more groups that do not include an instruction of the given type with a second indication different from the first indication;', 'convey, to flush recovery logic, instruction identifiers stored in entries storing information for instructions of the given type in each group marked with the first indication; and', 'prevent sending, to the flush recovery logic, instruction identifiers stored in entries in each group marked with the second indication., 'wherein in response to detecting a first pipeline flush condition, the logic is ...

Подробнее
15-07-2021 дата публикации

Vector maximum and minimum with indexing

Номер: US20210216313A1
Принадлежит: Texas Instruments Inc

A method to compare first and second source data in a processor in response to a vector maximum with indexing instruction includes specifying first and second source registers containing first and second source data, a destination register storing compared data, and a predicate register. Each of the registers includes a plurality of lanes. The method includes executing the instruction by, for each lane in the first and second source register, comparing a value in the lane of the first source register to a value in the corresponding lane of the second source register to identify a maximum value, storing the maximum value in a corresponding lane of the destination register, asserting a corresponding lane of the predicate register if the maximum value is from the first source register, and de-asserting the corresponding lane of the predicate register if the maximum value is from the second source register.

Подробнее
11-06-2020 дата публикации

ARITHMETIC PROCESSING APPARATUS AND MEMORY APPARATUS

Номер: US20200183702A1
Автор: Takagi Noriko
Принадлежит: FUJITSU LIMITED

An arithmetic processing apparatus includes an arithmetic circuit configured to perform an arithmetic operation on data having a first data width and perform an instruction in parallel on each element of data having a second data width, and a cache memory configured to store data, wherein the cache memory includes a tag circuit storing tags for respective ways, a data circuit storing data for the respective ways, a determination circuit that determines a type of an instruction with respect to whether data accessed by the instruction has the first data width or the second data width, and a control circuit that performs either a first pipeline operation where the tag circuit and the data circuit are accessed in parallel or a second pipeline operation where the data circuit is accessed in accordance with a tag result after accessing the tag circuit, based on a result determined by the determination circuit. 1. An arithmetic processing apparatus , comprising:an instruction issue circuit configured to issue an instruction;an arithmetic circuit configured to perform an arithmetic operation on data having a first data width and perform an instruction in parallel on each element of data having a second data width that is larger than the first data width; anda cache memory configured to store data,wherein the cache memory includesa tag circuit configured to store a plurality of tags for a plurality of respective ways;a data circuit configured to store a plurality of data for the plurality of respective ways;a determination circuit configured to determine a type of the instruction with respect to whether data accessed by the instruction that the instruction issue circuit issues has the first data width or the second data width; anda control circuit configured to selectively perform either a first pipeline operation in which the tag circuit and the data circuit are accessed in parallel or a second pipeline operation in which the data circuit is accessed in accordance with a ...

Подробнее
23-07-2015 дата публикации

Clock routing techniques

Номер: US20150205324A1
Принадлежит: Apple Inc

Techniques are disclosed relating to clock routing techniques in processors with both pipelined and non-pipelined circuitry. In some embodiments, an apparatus includes execution units that are non-pipelined and configured to perform instructions without receiving a clock signal. In these embodiments, one or more clock lines routed throughout the apparatus do not extend into the one or more execution units in each pipeline, reducing the length of the clock lines. In some embodiments, the apparatus includes multiple such pipelines arranged in an array, with the execution units located on an outer portion of the array and clocked control circuitry located on an inner portion of the array. In some embodiments, clock lines do not extend into the outer portion of the array. In some embodiments, the array includes one or more rows of execution units. These arrangements may further reduce the length of clock lines.

Подробнее
18-06-2020 дата публикации

ACCELERATING MEMORY ACCESS IN A NETWORK USING THREAD PROGRESS BASED ARBITRATION

Номер: US20200192720A1
Принадлежит:

A method accelerates memory access in a network using thread progress based arbitration. A memory controller identifies a prioritized thread from multiple threads in an application. The prioritized thread reaches a synchronization barrier after the other threads due to the thread encountering more events than the other threads before reaching the barrier, where the events are from a group consisting of instruction executions, cache misses, and load/store operations in a core. The memory controller detects a cache miss by the prioritized thread during execution of the prioritized thread after the barrier is reached by the multiple threads. The memory controller then retrieves and returns data from the memory that cures the cache miss for the prioritized thread before retrieving data that cures cache misses for the other threads by applying thread progress based arbitration in the network. 1. A method comprising:identifying, by a memory controller, a first prioritized thread from multiple threads in an application, wherein the first prioritized thread is executing in a first core in a network, wherein the first prioritized thread reaches a first barrier after other threads from the multiple threads, wherein a barrier is a stage at which the multiple threads synchronize, wherein the first prioritized thread reaches the first barrier after the other threads due to the first prioritized thread encountering more events than the other threads before reaching the first barrier, and wherein the events are from a group consisting of instruction executions, cache misses, and load/store operations in a core;detecting, by the memory controller, a cache miss by the first prioritized thread during execution of the first prioritized thread after the first barrier is reached by the multiple threads;retrieving, by the memory controller, data from a memory that cures the cache miss for the first prioritized thread before retrieving data that cures cache misses for the other threads by ...

Подробнее
27-06-2019 дата публикации

Method and System for Detection of Thread Stall

Номер: US20190196816A1
Принадлежит:

A method of checking for a stall condition in a processor is disclosed, the method including inserting an inline instruction sequence into a thread, the inline instruction sequence configured to read the result from a timing register during processing of a first instruction and store the result in a first general purpose register, wherein the timing register functions as a timer for the processor; and read the results from the timing register during processing of a second instruction and store the results in a second general purpose register, wherein the second instruction is the next consecutive instruction after the first instruction. The inline thread sequence may be inserted in sequence with the thread and further configured to compare the difference between the result in the first and second general purpose register to a programmable threshold. 1. A method of determining if a thread has stalled in a processor , the thread comprising a plurality of instructions , the method comprising:determining how many cycles a processor undergoes to complete an instruction; andcomparing how many cycles to complete the instruction to a threshold.2. The method of claim 1 , wherein the threshold is variable and programmable.3. The method of claim 1 , performed by software.4. The method of claim 1 , executed by an in-line instruction sequence added to the plurality of instructions of the thread.5. The method of claim 1 , wherein the determining process comprisesreading a timing register during execution of the first instruction;reading the timing register during execution of a second subsequent instruction; andsubtracting the result of the second instruction register read from the result of the first register read.6. The method of claim 5 , wherein the timing register is a timebase register which increments according to a function of the processor's clock cycle claim 5 , and wherein the timebase register is read during the first instruction and the result is stored in a first ...

Подробнее
25-06-2020 дата публикации

HANDLING EXCEPTIONS IN A MULTI-TILE PROCESSING ARRANGEMENT

Номер: US20200201652A1
Принадлежит: Graphcore Limited

A multitile processing system has an execution unit on each tile, and an interconnect which conducts communications between the tiles according to a bulk synchronous parallel scheme. Each tile performs an on-tile compute phase followed by an intertile exchange phase, where the exchange phase is held back until all tiles in a particular group have completed the compute phase. On completion of the compute phase, each tile generates a synchronisation request and pauses an issue of instructions until it receives a synchronisation acknowledgement. If a tile attains an excepted state, it raises an exception signal and pauses instruction issue until the excepted state has been resolved. However, tiles which are not in the excepted state can continue to perform their on-tile computer phase, and will issue their own synchronisation request in their own normal time frame. Synchronisation acknowledgements will not be received from all of the tiles in the group until the excepted state has been resolved on the tile with the excepted state. 1. A processing system comprising an arrangement of tiles and an interconnect for communication between the tiles , wherein:each tile comprises an execution unit for executing machine code instructions, each being an instance of a predefined set of instruction types in an instruction set of the processing system;the interconnect is operable to conduct communications between a the group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each of the tiles in said group perform an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in said group have completed the compute phase wherein each tile in the group is configured to generate a synchronisation request when it has completed its compute phase, and to pause issue of instructions until it receives a synchronisation acknowledgement signal; andthe processing system comprising synchronisation ...

Подробнее
13-08-2015 дата публикации

THREAD ISSUE CONTROL

Номер: US20150227376A1
Принадлежит:

A data processing system includes a processing pipeline for the parallel execution of a plurality of threads. An issue controller issues threads to the processing pipeline. A stall manager controls the stalling and unstalling of threads when a cache miss occurs within a cache memory. The issue controller issues the threads to the processing pipeline in accordance with both a main sequence and a pilot sequence. The pilot sequence is followed such that threads within the pilot sequence are issued at least a given time ahead of their neighbours within a main sequence. The given time corresponds approximately to the latency associated with a cache miss. The threads may be arranged in groups corresponding to blocks of pixels for processing within a graphics processing unit. 1. Apparatus for processing data comprising:a processing pipeline configured to execute in parallel a plurality of threads within a predetermined logical sequence of threads to be executed; andan issue controller configured to issue threads to said processing pipeline for execution; (i) a pilot sequence being a proper subset of said predetermined logical sequence; and', '(ii) a main sequence trailing said pilot sequence through said predetermined logical sequence by a delay time and comprising those threads of said predetermined logical sequence not within said pilot sequence., 'wherein said issue controller is configured to select threads from said predetermined logical sequence for issue in accordance with both2. Apparatus as claimed in claim 1 , comprising:a cache memory coupled to said processing pipeline and configured to store data values fetched from a main memory, a cache miss within said cache memory triggering a fetch operation lasting a latency time to fetch a data value from said main memory to said cache memory; anda stall manager coupled to said processing pipeline and configured to stall a given processing thread executing in said processing pipeline upon detection of a miss within said ...

Подробнее
18-08-2016 дата публикации

LOAD QUEUE ENTRY REUSE FOR OPERAND STORE COMPARE HISTORY TABLE UPDATE

Номер: US20160239307A1
Принадлежит:

Embodiments relate to load queue entry reuse for operand store compare (OSC) history table update. An aspect includes allocating a load queue entry in a load queue to a load instruction that is issued into an instruction pipeline, the load queue entry comprising a valid tag that is set and a keep tag that is unset. Another aspect includes based on the flushing of the load instruction, unsetting the valid tag and setting the keep tag. Another aspect includes reissuing the load instruction into the instruction pipeline. Another aspect includes based on determining that the allocated load queue entry corresponds to the reissued load instruction, setting the valid tag and leaving the keep tag set. Another aspect includes based on completing the reissued load instruction, and based on the valid tag and the keep tag being set, updating the OSC history table corresponding to the load instruction. 1. A computer implemented method for load queue entry reuse for operand store compare (OSC) history table update , the method comprising:allocating a load queue entry in a load queue to a load instruction that is issued into an instruction pipeline of a processor, the load queue entry comprising a valid tag and a keep tag, wherein the valid tag is set and the keep tag is unset in the allocated load queue entry;flushing the load instruction by the instruction pipeline;based on the flushing of the load instruction, unsetting the valid tag and setting the keep tag in the allocated load queue entry;reissuing the load instruction into the instruction pipeline;determining that the allocated load queue entry corresponds to the reissued load instruction;based on determining that the allocated load queue entry corresponds to the reissued load instruction, setting the valid tag and leaving the keep tag set in the allocated load queue entry;completing the reissued load instruction in the instruction pipeline; andbased on completing the reissued load instruction, and based on the valid tag ...

Подробнее
18-08-2016 дата публикации

LOAD QUEUE ENTRY REUSE FOR OPERAND STORE COMPARE HISTORY TABLE UPDATE

Номер: US20160239308A1
Принадлежит:

Embodiments relate to load queue entry reuse for operand store compare (OSC) history table update. An aspect includes allocating a load queue entry in a load queue to a load instruction that is issued into an instruction pipeline, the load queue entry comprising a valid tag that is set and a keep tag that is unset. Another aspect includes based on the flushing of the load instruction, unsetting the valid tag and setting the keep tag. Another aspect includes reissuing the load instruction into the instruction pipeline. Another aspect includes based on determining that the allocated load queue entry corresponds to the reissued load instruction, setting the valid tag and leaving the keep tag set. Another aspect includes based on completing the reissued load instruction, and based on the valid tag and the keep tag being set, updating the OSC history table corresponding to the load instruction. 1issuing a load instruction into an instruction pipeline of a processor;allocating a load queue entry in a load queue to the load instruction, the load queue entry comprising a valid tag and a keep tag, wherein the valid tag is set and the keep tag is unset in the allocated load queue entry;flushing the load instruction by the instruction pipeline;based on the flushing of the load instruction, unsetting the valid tag and setting the keep tag in the allocated load queue entry;reissuing the load instruction into the instruction pipeline;determining that the allocated load queue entry corresponds to the reissued load instruction; setting the valid tag and leaving the keep tag set in the allocated load queue entry;', 'completing the reissued load instruction in the instruction pipeline;', 'based on completing the reissued load instruction, and based on the valid tag and the keep tag being set, updating the OSC history table with OSC information corresponding to the load instruction;, 'based on determining that the allocated load queue entry corresponds to the reissued load instruction ...

Подробнее
30-10-2014 дата публикации

Memristor based multithreading

Номер: US20140325192A1

A method and a device that includes a set of multiple pipeline stages, wherein the set of multiple pipeline stages is arranged to execute a first thread of instructions; multiple memristor based registers that are arranged to store a state of another thread of instructions that differs from the first thread of instructions; and a control circuit that is arranged to control a thread switch between the first thread of instructions and the other thread of instructions by controlling a storage of a state of the first thread of instructions at the multiple memristor based registers and by controlling a provision of the state of the other thread of instructions by the set of multiple pipeline stages; wherein the set of multiple pipeline stages is arranged to execute the other thread of instructions upon a reception of the state of the other thread of instructions.

Подробнее
25-07-2019 дата публикации

Hardware Unit for Performing Matrix Multiplication with Clock Gating

Номер: US20190227807A1
Принадлежит:

Hardware units and methods for performing matrix multiplication via a multi-stage pipeline wherein the storage elements associated with one or more stages of the pipeline are clock gated based on the data elements and/or portions thereof that known to have a zero value (or can be treated as having a zero value). In some cases, the storage elements may be clock gated on a per data element basis based on whether the data element has a zero value (or can be treated as having a zero value). In other cases, the storage elements may be clock gated on a partial element basis based on the bit width of the data elements. For example, if bit width of the data elements is less than a maximum bit width for the data elements then a portion of the bits related to that data element can be treated as having a zero value and a portion of the storage elements associated with that data element may not be clocked. In yet other cases the storage elements may be clock gated on both a per element and a partial element basis. 1. A hardware unit to perform a matrix multiplication , the hardware unit comprising:a multiplier stage comprising a plurality of multipliers, each multiplier configured to multiply a first data element and a second data element to produce a multiplication data element;one or more adder stages following the multiplier stage that form an adder tree to produce a sum of the multiplication data elements, each adder stage comprising one or more adders configured to add at least two data elements output by a previous stage to produce an addition data element;wherein at least one multiplier and/or at least one adder is preceded by a storage element corresponding to each bit of the data elements input to the at least one adder and/or the at least one multiplier; andcontrol logic configured to clock gate all or a portion of the storage elements corresponding to a data element in response to determining that all or a portion of that data element can be treated as having a zero ...

Подробнее
16-07-2020 дата публикации

METHOD AND SYSTEM FOR DETECTION OF THREAD STALL

Номер: US20200225952A1
Принадлежит:

A method of checking for a stall condition in a processor is disclosed, the method including inserting an inline instruction sequence into a thread, the inline instruction sequence configured to read the result from a timing register during processing of a first instruction and store the result in a first general purpose register, wherein the timing register functions as a timer for the processor; and read the results from the timing register during processing of a second instruction and store the results in a second general purpose register, wherein the second instruction is the next consecutive instruction after the first instruction. The inline thread sequence may be inserted in sequence with the thread and further configured to compare the difference between the result in the first and second general purpose register to a programmable threshold. 1. A method of checking for a stall condition in a processor comprising: read the result from a timing register during processing of a first instruction, wherein the timing register functions as a timer for the processor; and', 'read the results from the timing register during processing of a second instruction;, 'providing an inline instruction sequence configured toinserting the inline instruction sequence into a thread of instructions; andexecuting the thread of instructions with the inserted inline instruction sequence.2. The method of claim 1 , wherein the inline instruction sequence is further configured to compare the difference between the result read from the timing register during processing of the first instruction to the result read from the timing register during processing of the second instruction.3. The method of claim 2 , wherein the inline instruction sequence is further configured to:store the result read from the timing register during processing of the first instruction into a first register;store the result read from the timing register during processing of the second instruction into a second ...

Подробнее
26-08-2021 дата публикации

Disaggregated die with input/output (i/o) tiles

Номер: US20210263880A1
Принадлежит: Intel Corp

Embodiments may relate to a die with a processor. The die may include a first input/output (I/O) tile adjacent to a first side of the processor, and a second I/O tile adjacent to a second side of the processor. The first or second I/O tiles may be communicatively coupled with the processor. Other embodiments may be described or claimed.

Подробнее
23-08-2018 дата публикации

Super-Thread Processor

Номер: US20180239608A1
Автор: Halle Kevin Sean
Принадлежит:

The disclosed inventions include a processor apparatus and method that enable a general purpose processor to achieve twice the operating frequency of typical processor implementations with a modest increase in area and a modest increase in energy per operation. The invention relies upon exploiting multiple independent streams of execution. Low area and low energy memory arrays used for register files operate a modest frequency. Instructions can be issued at a rate higher than this frequency by including logic that guarantees the spacing between instructions from the same thread are spaced wider than the time to access the register file. The result of the invention is the ability to overlap long latency structures, which allows using lower energy structures, thereby reducing energy per operation. 1. A processor system comprising:a logical register file set including a plurality of physical memory arrays, each of the memory arrays having an access time and plurality of access ports, each register file of the register file set being assigned to a different hardware context; a program counter storage,', 'an instruction block storage configured to store a one or more instructions,', 'logic configured to managing fetching the one or more from a cache, and', 'logic configured to control when an instruction is ready to be issued from the row;, 'one of the logical register files and a row of a context unit, the context unit including a plurality of rows associated with different hardware contexts, the row of the context unit further including, 'a plurality of hardware contexts, each of the hardware contexts comprisinga plurality of pipelines each including at least one execution unit; andissue logic configured to select a row from among the plurality of rows and to issue an instruction from the selected row to one of the plurality of pipelines, wherein the selection of the row is based on a ready state of the selected row,wherein the issue logic is configured to select the ...

Подробнее
23-07-2020 дата публикации

POWER-SAVING MECHANISM FOR MEMORY SUB-SYSTEM IN PIPELINED PROCESSOR

Номер: US20200233673A1
Принадлежит:

A pipelined processor for carrying out pipeline processing of instructions, which undergo a plurality of stages, is provided. The pipelined processor includes: a memory-activation indicator and a memory controller. The memory-activation indicator stores content information that indicates whether to activate a first volatile memory and/or a second volatile memory while performing a current instruction. The memory controller is arranged for controlling activation of the first volatile memory and/or the second volatile memory in a specific stage of the plurality of stages of the current instruction according to the content information stored in the memory-activation indicator. 1. A pipelined processor for carrying out pipeline processing of instructions which undergo a plurality of stages , comprising:a memory-activation indicator, storing content information that indicates whether to activate a first volatile memory and/or a second volatile memory while performing a current instruction; anda memory controller, arranged for controlling activation of the first volatile memory and/or the second volatile memory in a specific stage of the plurality of stages of the current instruction according to the content information stored in the memory-activation indicator.2. The pipelined processor as claimed in claim 1 , wherein the memory-activation indicator is a program counter claim 1 , and the specific stage is an instruction-fetch stage claim 1 , and the first volatile memory and the second volatile memory are respectively an instruction cache and an instruction memory in the pipelined processor that store a plurality of instructions.3. The pipelined processor as claimed in claim 2 , wherein the memory controller further controls activation and deactivation of the first volatile memory and/or the second volatile memory in the specific stage of the current instruction according to the content information and a program counter value stored in the program counter.4. The ...

Подробнее
17-09-2015 дата публикации

METHOD IN A PROCESSOR, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT

Номер: US20150261543A1
Автор: Lahteenmaki Mika
Принадлежит:

There is disclosed a method in which a pipelining instruction is received by a first processor core of a multicore processor. Information in the pipelining instruction is used to determine a connection between a first functional unit in the first processor core and a second functional unit in a second processor core of the multicore processor. A switch is controlled to form a pipeline comprising the first functional unit and the second functional unit to enable data communication connection between an output of the first functional unit and an input of the second functional unit. The method may further comprise using a translation unit to translate an instruction of an instruction set of a first processor core to a corresponding instruction or a sequence of instructions of an instruction set of the second processor core. There is also disclosed an apparatus and a computer program product to implement the method. 173-. (canceled)74. A method comprising:receiving a pipelining instruction by a first processor core of a multicore processor;using information in the pipelining instruction to determine a connection between a first functional unit in the first processor core and a second functional unit in a second processor core of the multicore processor; andcontrolling a switch to form a pipeline comprising the first functional unit and the second functional unit to enable data communication connection between an output of the first functional unit and an input of the second functional unit.75. The method according to comprising controlling the switch to couple the output of the first functional unit to the input of the second functional unit.76. The method according to comprising connecting the output of the first functional unit to the input of the second functional unit via an internal bus of the multicore processor.77. The method according to comprising controlling the switch to form the communication connection via an internal register of the multicore processor.78. ...

Подробнее
08-09-2016 дата публикации

Pipelined Configurable Processor

Номер: US20160259757A1
Автор: Paul Metzgen
Принадлежит: SILICON TAILOR Ltd

A configurable processing circuit capable of handling multiple threads simultaneously, the circuit comprising a thread data store, a plurality of configurable execution units, a configurable routing network for connecting locations in the thread data store to the execution units, a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units, and a pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle, the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections for each clock cycle to be in accordance with the configuration instance associated with the respective thread that will propagate through that pipeline section during the clock cycle.

Подробнее
08-09-2016 дата публикации

Dynamic clock rate control for power reduction

Номер: US20160261251A1
Автор: Reed P. Tidwell
Принадлежит: SanDisk Corp, SanDisk Technologies LLC

A pipeline system may adjust clock rates of variable-rate clock signals sent to different processing circuit blocks in a pipeline based on their respective, individual input and output buffer fill levels and processor busy statuses. Variable-rate clock generation circuitry may generate the variable-rate clock signals based on a common clock signal. Additionally, the variable-rate clock generation circuitry may set or adjust the rates of variable-rate clock signals linearly in evenly-spaced increments and decrements.

Подробнее
30-09-2021 дата публикации

FAULT TOLERANT SYSTEM

Номер: US20210303492A1
Автор: Yoshida Yoshitaka
Принадлежит:

A fault tolerant system includes a primary virtual machine and a secondary virtual machine. The primary virtual machine includes a synchronizing information generator and a first interrupt blocker. The synchronizing information generator executes bytecode and outputs synchronizing information based on information related to the executed bytecode. The first interrupt blocker blocks an interrupt inputted from an external location. The secondary virtual machine includes a synchronous execution unit that executes the bytecode based on the synchronizing information and a second interrupt blocker that blocks the interrupt. When the interrupt is acquired, the synchronizing information generator executes the bytecode based on the interrupt. The first interrupt blocker outputs the interrupt to the synchronizing information generator when the interrupt is inputted during execution of an instruction, included in the bytecode, to accept the interrupt.

Подробнее
24-09-2015 дата публикации

EXECUTION OF DATA-PARALLEL PROGRAMS ON COARSE-GRAINED RECONFIGURABLE ARCHITECTURE HARDWARE

Номер: US20150268963A1
Принадлежит:

A GPGPU-compatible architecture combines a coarse-grain reconfigurable fabric (CGRF) with a dynamic dataflow execution model to accelerate execution throughput of massively thread-parallel code. The CGRF distributes computation across a fabric of functional units. The compute operations are statically mapped to functional units, and an interconnect is configured to transfer values between functional units. 1. A method of computing , comprising the steps of:providing an arrangement of processing units having interconnects therebetween;statically mapping compute operations to respective processing units;configuring the interconnects to transfer values between functional units based on connectivity requirements of the compute operations; andassociating instructions for execution of the compute operations with respective control tokens and thread identifiers, the instructions belonging to a plurality of threads and including a first instruction and a second instruction, wherein the thread identifier of the first instruction identifies a first thread and the thread identifier of the second instruction identifies a second thread;pipelining the instructions through the arrangement responsively to the control tokens to the mapped processing units;dynamically rescheduling an order of execution of the first instruction with respect to the second instruction in one of the mapped processing units; andperforming the compute operations responsively to the first instruction and the second instruction in the processing units in the rescheduled order of execution.2. The method according to claim 1 , wherein rescheduling is performed when the first instruction is unable to be executed by the one mapped processing unit.3. The method according to claim 1 , further comprising the steps of:representing the instructions as input tokens; andperforming variants of the compute operations in one of the processing units responsively to the input tokens to generate result tokens; ...

Подробнее
14-09-2017 дата публикации

SINGLE CYCLE MULTI-BRANCH PREDICTION INCLUDING SHADOW CACHE FOR EARLY FAR BRANCH PREDICTION

Номер: US20170262287A1
Автор: Abdallah Mohammad
Принадлежит: SOFT MACHINES, INC.

A method of identifying instructions including accessing a plurality of instructions that comprise multiple branch instructions. For each branch instruction of the multiple branch instructions, a respective first mask is generated representing instructions that are executed if a branch is taken. A respective second mask is generated representing instructions that are executed if the branch is not taken. A prediction output is received that comprises a respective branch prediction for each branch instruction. For each branch instruction, the prediction output is used to select a respective resultant mask from among the respective first and second masks. For each branch instruction, a resultant mask of a subsequent branch is invalidated if a previous branch is predicted to branch over said subsequent branch. A logical operation is performed on all resultant masks to produce a final mask. The final mask is used to select a subset of instructions for execution. 1. A method of identifying instructions , said method comprising:accessing a plurality of instructions that comprise multiple branch instructions;for each branch instruction of said multiple branch instructions, generating a respective first mask representing instructions that are executed if said branch is taken and a respective second mask representing instructions that are executed if said branch is not taken;receiving a prediction output that comprises a respective branch prediction for each branch instruction of said multiple branch instructions;for each branch instruction of said multiple branch instructions, using said prediction output to select a respective resultant mask from among said respective first and second masks;for each branch instruction, invalidating a resultant mask of a subsequent branch if a previous branch is predicted to branch over said subsequent branch;performing a logical operation on all resultant masks to produce a final mask; andselecting a subset of instructions for execution, ...

Подробнее
06-08-2020 дата публикации

Information processing apparatus, non-transitory computer-readable medium, and information processing method

Номер: US20200249945A1
Автор: Akihiro Tabuchi
Принадлежит: Fujitsu Ltd

An information processing apparatus includes: a memory; and a processor configured to: acquire an instruction sequence including plural instructions; generate plural candidates of new instruction sequences capable of obtaining an execution result as same as in the instruction sequence, by replacing at least a part of plural nop instructions included in the instruction sequence with a wait instruction that waits for completion of all preceding instructions; delete any one of the nop instructions and the wait instruction from each of the new instruction sequences, when the execution result does not change in case any one of the nop instructions and the wait instruction is deleted from the new instruction sequences in the candidates; and select a one candidate among the candidates subjected to the delete, the one candidate including the number of instructions equal to or less than a certain number, and having a smallest number of execution cycles.

Подробнее
01-10-2015 дата публикации

SPECULATIVE LOOKAHEAD PROCESSING DEVICE AND METHOD

Номер: US20150277927A1
Принадлежит:

The present invention discloses a speculative lookahead processing device and method to enhance the statistical performance of datapaths. The method comprises steps: entering an input signal to at least two datapath units in a round-robin way; outputting the correct value at the Nth cycle, and acquiring the speculation value at the Mth cycle beforehand to start the succeeding computation, wherein M and N are natural numbers and M is smaller than N; comparing the speculation value with the correct value at the Nth cycle to determine whether the speculation is successful; if successful, excluding extra activities; if not successful, deleting the succeeding computation undertaken beforehand and restarting the succeeding computation with the correct value. 1. A speculative lookahead processing device comprisingN pieces of datapath units receiving and calculating an input signal, generating at least one speculation value and a correct value, and storing said speculation value, wherein said N is a natural number greater than or equal to 2;at least one first multiplexer connected with an output terminal of each said datapath unit, receiving said speculation value, and outputting said speculation value while said datapath units work at an Mth cycle;a second multiplexer connected with an output terminal of each said datapath unit, receiving said correct value generated by said datapath units according to said input signal, and outputting said correct value while said datapath units work at an Nth cycle, wherein said M and said N are natural numbers and said N is greater than said M; anda comparator connected with said datapath units and said second multiplexer, receiving said speculation value and said correct value, comparing said speculation value with said correct value, and driving said datapath units to continue to undertake an instruction corresponding to said speculation value while said speculation value is identical to said correct value.2. The speculative lookahead ...

Подробнее
20-09-2018 дата публикации

Data Processing Device for Providing Access to a Data Stream

Номер: US20180267808A1
Принадлежит:

A data processing device configured to provide access to a stream of data elements to a consumer. The data processing device is further configured to generate a data stream source by generating a first instance of a stream providing module such that the stream providing module provides the data stream in one of a push-mode and a pull-mode, wherein, in the push mode, the first instance of the stream providing module provides elements of the data stream according to a predefined schedule, wherein, in the pull mode, the first instance of the stream providing module provides elements of the data stream after receiving a data stream request from a consumer. Such a data processing device may provide the benefit of versatility as it can be transformed to both push and pull computation patterns. 1. A data processing device , comprising:a memory having a plurality of instructions stored thereon; and provide access to a stream of data elements to a consumer; and', 'generate a data stream source by generating a first instance of a stream providing circuit to enable the stream providing circuit to provide the elements of the data stream in one of a push-mode and a pull-mode,', 'in the push-mode, the first instance of the stream providing circuit providing the elements of the data stream according to a predefined schedule, and', 'in the pull-mode, the first instance of the stream providing circuit providing the elements of the data stream after receiving a data stream request from the consumer., 'a processor coupled to the memory, the instructions causing the processor to be configured to2. The data processing device of claim 1 , wherein the instructions further cause the processor to be configured to generate the data stream source by generating the first instance of the stream providing circuit during an operating time of the data processing device.3. The data processing device of claim 1 , wherein the instructions further cause the processor to be configured to generate a ...

Подробнее
06-10-2016 дата публикации

Method and apparatus for a superscalar processor

Номер: US20160291980A1
Автор: Wuxian Shi, Yiqun Ge
Принадлежит: Huawei Technologies Co Ltd

A superscalar processor, for out of order self-timed execution, comprising a plurality of independent self-timed function units, having corresponding instruction queues for holding instructions to be executed by the function unit. The processor further comprising an instruction dispatcher configured for inputting instructions in program counter order; and determining an appropriate function unit for execution of the instruction and a resource management unit configured for monitoring the function units and signaling availability of the appropriate function unit, wherein the dispatcher only dispatches the instruction to the appropriate function unit in response to the availability signal from the resource management unit.

Подробнее