Поиск патентов

Настройки

Глубина выборки

Укажите год

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Ключевые слова. Может быть несколько по одной на строку

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка

Автор

Ведите корректный номера.

Владелец

Ведите корректный номера.

Классы IPC

Ведите корректный номера.

Классы CPC

Ведите корректный номера.

Начиная с года

Укажите год

Заканчивая годом

Укажите год

Применить Всего найдено 61. Отображено 53.

02-08-2016 дата публикации

On-chip traffic prioritization in memory

Номер: US0009405711B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, IBM, International Business Machines Corporation

According to one embodiment, a method for traffic prioritization in a memory device includes sending a memory access request including a priority value from a processing element in the memory device to a crossbar interconnect in the memory device. The memory access request is routed through the crossbar interconnect to a memory controller in the memory device associated with the memory access request. The memory access request is received at the memory controller. The priority value of the memory access request is compared to priority values of a plurality of memory access requests stored in a queue of the memory controller to determine a highest priority memory access request. A next memory access request is performed by the memory controller based on the highest priority memory access request.

Подробнее

Номер записи: 1

27-03-2018 дата публикации

High bandwidth low latency data exchange between processing elements

Номер: US9928190B2

Автор: FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: IBM, International Business Machines Corporation

Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.

Подробнее

Номер записи: 2

21-02-2017 дата публикации

Vector processing in an active memory device

Номер: US0009575755B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, PRENER DANIEL A, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, PRENER DANIEL A, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Prener Daniel A.

Embodiments relate to vector processing in an active memory device. An aspect includes a method for vector processing in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Based on the iteration count, execution of the sub-instructions in parallel is repeated for multiple iterations by the processing element. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.

Подробнее

Номер записи: 3

28-02-2017 дата публикации

Vector register file

Номер: US0009582466B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.

Подробнее

Номер записи: 4

14-02-2017 дата публикации

Predication in a vector processor

Номер: US9569211B2

Автор: FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.

Подробнее

Номер записи: 5

15-12-2016 дата публикации

LOW LATENCY DATA EXCHANGE BETWEEN PROCESSING ELEMENTS

Номер: US20160364352A1

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: International Business Machines Corp

Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.

Подробнее

Номер записи: 6

15-12-2016 дата публикации

LOW LATENCY DATA EXCHANGE BETWEEN PROCESSING ELEMENTS

Номер: US20160364364A1

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: International Business Machines Corp

Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.

Подробнее

Номер записи: 7

06-03-2018 дата публикации

High bandwidth low latency data exchange between processing elements

Номер: US0009910802B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, IBM, International Business Machines Corporation

Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.

Подробнее

Номер записи: 8

27-10-2016 дата публикации

ON-CHIP TRAFFIC PRIORITIZATION IN MEMORY

Номер: US20160313947A1

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: International Business Machines Corp

According to one embodiment, a method for traffic prioritization in a memory device includes sending a memory access request including a priority value from a processing element in the memory device to a crossbar interconnect in the memory device. The memory access request is routed through the crossbar interconnect to a memory controller in the memory device associated with the memory access request. The memory access request is received at the memory controller. The priority value of the memory access request is compared to priority values of a plurality of memory access requests stored in a queue of the memory controller to determine a highest priority memory access request. A next memory access request is performed by the memory controller based on the highest priority memory access request.

Подробнее

Номер записи: 9

12-12-2017 дата публикации

On-chip traffic prioritization in memory

Номер: US0009841926B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, IBM, International Business Machines Corporation

According to one embodiment, a method for traffic prioritization in a memory device includes sending a memory access request including a priority value from a processing element in the memory device to a crossbar interconnect in the memory device. The memory access request is routed through the crossbar interconnect to a memory controller in the memory device associated with the memory access request. The memory access request is received at the memory controller. The priority value of the memory access request is compared to priority values of a plurality of memory access requests stored in a queue of the memory controller to determine a highest priority memory access request. A next memory access request is performed by the memory controller based on the highest priority memory access request.

Подробнее

Номер записи: 10

15-12-2016 дата публикации

MECHANISM FOR CONTROLLING SUBSET OF DEVICES

Номер: US20160363916A1

Автор: Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Bryan S. Rosenburg, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, ROSENBURG BRYAN S, Fox Thomas W., Jacobson Hans M., Nair Ravi, Rosenburg Bryan S.

Принадлежит: International Business Machines Corp

A computer detects a request by a process for access to a shadow control page, wherein the shadow control page allows the process access to one or more devices. The computer assigns the shadow control page and a key to the process associated with the request. The computer detects a request by the process via the assigned shadow control page for creation of a subset of devices from the one or more devices. The computer inputs information detailing an association between the subset of devices and the assigned key into a subset definition table, wherein the subset definition table includes one or more keys and one or more corresponding subsets.

Подробнее

Номер записи: 11

02-08-2016 дата публикации

On-chip traffic prioritization in memory

Номер: US0009405712B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, IBM, International Business Machines Corporation

According to one embodiment, a memory device is provided. The memory device includes a processing element coupled to a crossbar interconnect. The processing element is configured to send a memory access request, including a priority value, to the crossbar interconnect. The crossbar interconnect is configured to route the memory access request to a memory controller associated with the memory access request. The memory controller is coupled to memory and to the crossbar interconnect. The memory controller includes a queue and is configured to compare the priority value of the memory access request to priority values of a plurality of memory access requests stored in the queue of the memory controller to determine a highest priority memory access request and perform a next memory access request based on the highest priority memory access request.

Подробнее

Номер записи: 12

14-03-2017 дата публикации

Vector register file

Номер: US0009594724B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.

Подробнее

Номер записи: 13

03-01-2017 дата публикации

Vector processing in an active memory device

Номер: US0009535694B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, PRENER DANIEL A, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, PRENER DANIEL A, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Prener Daniel A.

Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.

Подробнее

Номер записи: 14

25-04-2017 дата публикации

Gather/scatter of multiple data elements with packed loading/storing into/from a register file entry

Номер: US0009632777B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Jaime H. Moreno, Ravi Nair, Daniel A. Prener, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, MORENO JAIME H, NAIR RAVI, PRENER DANIEL A, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Moreno Jaime H., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, MORENO JAIME H, NAIR RAVI, PRENER DANIEL A, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Moreno Jaime H., Nair Ravi, Prener Daniel A.

Embodiments relate to packed loading and storing of data. An aspect includes a method for packed loading and storing of data distributed in a system that includes memory and a processing element. The method includes fetching and decoding an instruction for execution by the processing element. The processing element gathers a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The data elements are packed and loaded into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.

Подробнее

Номер записи: 15

21-02-2017 дата публикации

Predication in a vector processor

Номер: US0009575756B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, NAIR RAVI, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.

Подробнее

Номер записи: 16

26-07-2016 дата публикации

Chaining between exposed vector pipelines

Номер: US0009400656B2

Автор: Thomas W. Fox, Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, FOX THOMAS W, FLEISCHER BRUCE M, JACOBSON HANS M, NAIR RAVI, Fox Thomas W., Fleischer Bruce M., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, IBM, International Business Machines Corporation

Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline.

Подробнее

Номер записи: 17

25-04-2017 дата публикации

Gather/scatter of multiple data elements with packed loading/storing into /from a register file entry

Номер: US0009632778B2

Автор: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Jaime H. Moreno, Ravi Nair, Daniel A. Prener, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, MORENO JAIME H, NAIR RAVI, PRENER DANIEL A, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Moreno Jaime H., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION, FLEISCHER BRUCE M, FOX THOMAS W, JACOBSON HANS M, MORENO JAIME H, NAIR RAVI, PRENER DANIEL A, IBM, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Moreno Jaime H., Nair Ravi, Prener Daniel A.

Embodiments relate to packed loading and storing of data. An aspect includes a system for packed loading and storing of distributed data. The system includes memory and a processing element configured to communicate with the memory. The processing element is configured to perform a method including fetching and decoding an instruction for execution by the processing element. A plurality of individually addressable data elements is gathered from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The processing element packs and loads the data elements into register file elements of a register file entry based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.

Подробнее

Номер записи: 18

06-02-2014 дата публикации

Active buffered memory

Номер: US20140040592A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Jaime H. Moreno, James A. Kahle, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

According to one embodiment of the present invention, a method for operating a memory device that includes memory and a processing element includes receiving, in the processing element, a command from a requestor, loading, in the processing element, a program based on the command, the program comprising a load instruction loaded from a first memory location in the memory, and performing, by the processing element, the program, the performing including loading data in the processing element from a second memory location in the memory. The method also includes generating, by the processing element, a virtual address of the second memory location based on the load instruction and translating, by the processing element, the virtual address into a real address.

Подробнее

Номер записи: 19

06-02-2014 дата публикации

Packed load/store with gather/scatter

Номер: US20140040596A1

Автор: Bruce M. Fleischer, Daniel A. Prener, Hans M. Jacobson, Jaime H. Moreno, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

Embodiments relate to packed loading and storing of data. An aspect includes a method for packed loading and storing of data distributed in a system that includes memory and a processing element. The method includes fetching and decoding an instruction for execution by the processing element. The processing element gathers a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The data elements are packed and loaded into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.

Подробнее

Номер записи: 20

06-02-2014 дата публикации

Predication in a vector processor

Номер: US20140040597A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.

Подробнее

Номер записи: 21

06-02-2014 дата публикации

VECTOR PROCESSING IN AN ACTIVE MEMORY DEVICE

Номер: US20140040598A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions. 1. A system for vector processing in an active memory device , the system comprising:memory in the active memory device; and decoding, in the processing element, an instruction comprising a plurality of sub-instructions to execute in parallel;', 'determining an iteration count to repeat execution of the sub-instructions in parallel;', 'repeating execution of the sub-instructions in parallel for multiple iterations, by the processing element, based on the iteration count; and', 'accessing multiple locations in the memory in parallel based on the execution of the sub-instructions., 'a processing element in the active memory device, the processing element configured to perform a method comprising2. The system of claim 1 , wherein the processing element is further configured to perform:determining, by the processing element, an iteration count source based on the instruction as one of: an iteration count field in the instruction and an iteration count register; andsetting the iteration count based on the iteration count source.3. The system of claim 1 , wherein the sub-instructions comprise at least a pair of a memory access sub-instruction in parallel with an arithmetic-logical sub-instruction claim 1 , and the processing element is further ...

Подробнее

Номер записи: 22

06-02-2014 дата публикации

PACKED LOAD/STORE WITH GATHER/SCATTER

Номер: US20140040599A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Moreno Jaime H., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to packed loading and storing of data. An aspect includes a system for packed loading and storing of distributed data. The system includes memory and a processing element configured to communicate with the memory. The processing element is configured to perform a method including fetching and decoding an instruction for execution by the processing element. A plurality of individually addressable data elements is gathered from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The processing element packs and loads the data elements into register file elements of a register file entry based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry. 1. A system for packed loading and storing of distributed data , the system comprising:memory; and fetching and decoding an instruction for execution by the processing element;', 'gathering, by the processing element, a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction; and', 'packing and loading the data elements into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry., 'a processing element configured to communicate with the memory, the processing element configured to perform a method comprising2. The system of claim 1 , wherein the processing element includes a vector computation register file comprising a plurality of register file entries claim 1 , each of ...

Подробнее

Номер записи: 23

06-02-2014 дата публикации

PREDICATION IN A VECTOR PROCESSOR

Номер: US20140040601A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions. 1. A method for vector processor predication in an active memory device that includes memory and a processing element , the method comprising:decoding, in the processing element, an instruction comprising a plurality of sub-instructions to execute in parallel;accessing one or more mask bits from a vector mask register in the processing element; andapplying the one or more mask bits by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.2. The method of claim 1 , wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: execution of at least one element of the sub-instructions and execution of at least one execution slot operating on a sub-element of at least one of the sub-instructions.3. The method of claim 1 , wherein applying the one or more mask bits by the processing element to predicate operation further comprises blocking one or more of: a memory access sub-instruction and part of an arithmetic operation.4. The method of claim 1 , further comprising:performing one or more of clock gating and data gating to one or more of: an arithmetic logic unit, a load-store unit, a vector computation register file, and a scalar computation register file based on the one or more mask bits.5. The method of ...

Подробнее

Номер записи: 24

06-02-2014 дата публикации

VECTOR PROCESSING IN AN ACTIVE MEMORY DEVICE

Номер: US20140040603A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to vector processing in an active memory device. An aspect includes a method for vector processing in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Based on the iteration count, execution of the sub-instructions in parallel is repeated for multiple iterations by the processing element. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions. 1. A method for vector processing in an active memory device that includes memory and a processing element , the method comprising:decoding, in the processing element, an instruction comprising a plurality of sub-instructions to execute in parallel;determining an iteration count to repeat execution of the sub-instructions in parallel;repeating execution of the sub-instructions in parallel for multiple iterations, by the processing element, based on the iteration count; andaccessing multiple locations in the memory in parallel based on the execution of the sub-instructions.2. The method of claim 1 , further comprising:determining, by the processing element, an iteration count source based on the instruction as one of: an iteration count field in the instruction and an iteration count register; andsetting the iteration count based on the iteration count source.3. The method of claim 1 , wherein the sub-instructions comprise at least a pair of a memory access sub-instruction in parallel with an arithmetic-logical sub-instruction claim 1 , and further comprising:flowing the memory access sub-instruction to a load-store unit in the processing element; andflowing the arithmetic-logical sub-instruction to an arithmetic logic unit in the processing element to execute the memory access sub-instruction in parallel with the ...

Подробнее

Номер записи: 25

13-02-2014 дата публикации

Vector register file

Номер: US20140047211A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.

Подробнее

Номер записи: 26

13-02-2014 дата публикации

VECTOR REGISTER FILE

Номер: US20140047214A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder. 1. A method for accessing a vector register in a vector register file , the method comprising:receiving a read command at a read port of the vector register file, the read command specifying a vector register address;decoding the vector register address by an address decoder to determine a selected vector register of the vector register file, wherein the vector register file comprises a plurality of vector registers and each vector register comprises a plurality of elements;determining an element address for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register;selecting a word in a memory array of the selected vector register as read data based on the element address; andoutputting the read data from the selected vector register based on the decoding of the vector register address by the address decoder.2. The method of claim 1 , further comprising:incrementing the read element counter to select a next sequential element in the memory array as the read data for one of: the read command and a next read command ...

Подробнее

Номер записи: 27

14-01-2016 дата публикации

MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER

Номер: US20160011996A1

Автор: "OBrien John K.", "OBrien Kathryn M.", Asaad Sameh, Bellofatto Ralph E., BLOCKSOME Michael A., Blumrich Matthias A., BOYLE Peter, Brunheroto Jose R., Chen Dong, Cher Chen-Yong, Chiu George L., Christ Norman, Coteus Paul W., DAVIS KRISTAN D., Dozsa Gabor J., Eichenberger Alexandre E., Eisley Noel A., Ellavsky Matthew R., Evans Kahn C., Fleischer Bruce M., Fox Thomas W., Gara Alan, Giampapa Mark E., Gooding Thomas M., Gschwind Michael K., GUNNELS JOHN A., Hall Shawn A., Haring Rudolf A., Heidelberger Philip, Inglett Todd A., Knudson Brant L., Kopcsay Gerard V., Kumar Sameer, MAMIDALA AMITH R., Marcella James A., Megerian Mark G., MILLER DOUGLAS R., Miller Samuel J., Muff Adam J., MUNDY MICHAEL B., Ohmacht Martin, Parker Jeffrey J., Poole Ruth J., RATTERMAN Joseph D., Salapura Valentina, Satterfield David L., Senger Robert M., Steinmacher-Burow Burkhard, Stockdell William M., Stunkel Craig B., Sugavanam Krishnan, Sugawara Yutaka, Takken Todd E., Trager Barry M., Van Oosten James L., Wait Charles D., Walkup Robert E., Watson Alfred T., Wisniewski Robert W., Wu Peng

Принадлежит:

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing. 1. A massively parallel computing structure comprising:a plurality of processing nodes interconnected by multiple independent networks, each node including a plurality of processing elements for performing computation or communication activity as required when performing parallel algorithm operations, a first of said networks includes an n-dimensional torus network, n is an integer equal to or greater than 5, including communication links interconnecting said nodes for providing high-speed, low latency point-to-point and multicast packet communications among said nodes or independent partitioned subsets thereof;said n-dimensional torus network for enabling point-to-point, all-to-all, collective (broadcast, reduce) and global barrier and notification functions among said nodes or independent partitioned subsets thereof, wherein combinations of said networks interconnecting said nodes are collaboratively or independently utilized according to bandwidth and latency requirements of an algorithm for optimizing algorithm processing performance;wherein each said processing element is multi-way hardware threaded supporting transactional memory execution ...

Подробнее

Номер записи: 28

08-05-2014 дата публикации

ADDRESS GENERATION IN AN ACTIVE MEMORY DEVICE

Номер: US20140129799A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to address generation in an active memory device that includes memory and a processing element. An aspect includes a method for address generation in the active memory device. The method includes reading a base address value and an offset address value from a register file group of the processing element. The processing element determines a virtual address based on the base address value and the offset address value. The processing element translates the virtual address into a physical address and accesses a location in the memory based on the physical address. 1. A method for address generation in an active memory device that includes memory and a processing element , the method comprising:reading a base address value from a register file group of the processing element;reading an offset address value from the register file group of the processing element;determining, by the processing element, a virtual address based on the base address value and the offset address value;translating, by the processing element, the virtual address into a physical address; andaccessing a location in the memory based on the physical address.2. The method of claim 1 , further comprising:receiving the base address value and the offset address value from a main processor in communication with the processing element; andstoring the base address value and the offset address value in the register file group of the processing element.3. The method of claim 1 , wherein the register file group comprises a scalar register file and a vector register file claim 1 , and further comprising:reading the base address value from the scalar register file;reading the offset address value from the scalar register file; andsequentially generating a sequence of virtual addresses and corresponding physical addresses by incrementally adding the offset address value to the base address value and subsequent intermediate sums, and translating the sequence of virtual addresses to the ...

Подробнее

Номер записи: 29

08-05-2014 дата публикации

Main processor support of tasks performed in memory

Номер: US20140130050A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

According to one embodiment of the present invention, a method for operating a computer system including a main processor, a processing element and memory is provided. The method includes receiving, at the processing element, a task from the main processor, performing, by the processing element, an instruction specified by the task, determining, by the processing element, that a function is to be executed on the main processor, the function being part of the task, sending, by the processing element, a request to the main processor for execution, the request comprising execution of the function and receiving, at the processing element, an indication that the main processor has completed execution of the function specified by the request.

Подробнее

Номер записи: 30

08-05-2014 дата публикации

MAIN PROCESSOR SUPPORT OF TASKS PERFORMED IN MEMORY

Номер: US20140130051A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

According to one embodiment of the present invention, a computer system for executing a task includes a main processor, a processing element and memory. The computer system is configured to perform a method including receiving, at the processing element, the task from the main processor, performing, by the processing element, an instruction specified by the task, determining, by the processing element, that a function is to be executed on the main processor, the function being part of the task, sending, by the processing element, a request to the main processor for execution, the request including execution of the function and receiving, at the processing element, an indication that the main processor has completed execution of the function specified by the request. 1. A computer system for executing a task , the computer system comprising:a main processor, a processing element and memory, the computer system configured to perform a method comprising:receiving, at the processing element, the task from the main processor;performing, by the processing element, an instruction specified by the task;determining, by the processing element, that a function is to be executed on the main processor, the function being part of the task;sending, by the processing element, a request to the main processor for execution, the request comprising execution of the function; andreceiving, at the processing element, an indication that the main processor has completed execution of the function specified by the request.2. The computer system of claim 1 , wherein the request comprises an address pointing to a location of the function.3. The computer system of claim 1 , further comprising creating a plurality of tasks claim 1 , including the task claim 1 , from a compiled program claim 1 , wherein the function is identified as capable of execution by the main processor.4. The computer system of claim 1 , further comprising executing claim 1 , by the main processor claim 1 , the function ...

Подробнее

Номер записи: 31

15-05-2014 дата публикации

ACTIVE MEMORY DEVICE GATHER, SCATTER, AND FILTER

Номер: US20140136811A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Kahle James A., Moreno Jaime H., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to loading and storing of data. An aspect includes a method for transferring data in an active memory device that includes memory and a processing element. An instruction is fetched and decoded for execution by the processing element. Based on determining that the instruction is a gather instruction, the processing element determines a plurality of source addresses in the memory from which to gather data elements and a destination address in the memory. One or more gathered data elements are transferred from the source addresses to contiguous locations in the memory starting at the destination address. Based on determining that the instruction is a scatter instruction, a source address in the memory from which to read data elements at contiguous locations and one or more destination addresses in the memory to store the data elements at non-contiguous locations are determined, and the data elements are transferred. 1. A method for transferring data in an active memory device that includes memory and a processing element , the method comprising:fetching and decoding an instruction for execution by the processing element; and determining a plurality of source addresses in the memory from which to gather data elements;', 'determining a destination address in the memory; and', 'transferring one or more gathered data elements from the plurality of source addresses to contiguous locations in the memory starting at the destination address., 'based on determining that the instruction is a gather instruction, the processing element performing2. The method of claim 1 , wherein the instruction claim 1 , the plurality of source addresses claim 1 , and the destination address are provided by a main processor in communication with the processing element.3. The method of claim 2 , wherein the plurality of source addresses and the destination address are received from the main processor in an effective address format and are translated by the processing element to ...

Подробнее

Номер записи: 32

15-05-2014 дата публикации

EXPOSED-PIPELINE PROCESSING ELEMENT WITH ROLLBACK

Номер: US20140136894A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Prener Daniel A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

An aspect includes providing rollback support in an exposed-pipeline processing element. A method for providing rollback support in an exposed-pipeline processing element includes detecting, by rollback support logic, an error associated with execution of an instruction in the exposed-pipeline processing element. The rollback support logic determines whether the exposed-pipeline processing element supports replay of the instruction for a predetermined number of cycles. Based on determining that the exposed-pipeline processing element supports replay of the instruction, a rollback action is performed in the exposed-pipeline processing element to attempt recovery from the error. 1. A method for providing rollback support in an exposed-pipeline processing element , the method comprising:detecting, by rollback support logic, an error associated with execution of an instruction in the exposed-pipeline processing element;determining, by the rollback support logic, whether the exposed-pipeline processing element supports replay of the instruction for a predetermined number of cycles; andbased on determining that the exposed-pipeline processing element supports replay of the instruction, performing a rollback action in the exposed-pipeline processing element to attempt recovery from the error.2. The method of claim 1 , further comprising:based on determining that the exposed-pipeline processing element does not support replay of the instruction, triggering an exception to restore the exposed-pipeline processing element to a previously stored checkpoint.3. The method of claim 1 , wherein determining claim 1 , by the rollback support logic claim 1 , whether the exposed-pipeline processing element supports replay of the instruction further comprises checking a state of an instruction bit of the instruction by decode logic claim 1 , the instruction bit configured to indicate whether the instruction supports rollback and replay.4. The method of claim 1 , further comprising: ...

Подробнее

Номер записи: 33

15-05-2014 дата публикации

EXPOSED-PIPELINE PROCESSING ELEMENT WITH ROLLBACK

Номер: US20140136895A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

An aspect includes providing rollback support in an exposed-pipeline processing element. A system includes the exposed-pipeline processing element with rollback support logic. The rollback support logic is configured to detect an error associated with execution of an instruction in the exposed-pipeline processing element. The rollback support logic determines whether the exposed-pipeline processing element supports replay of the instruction for a predetermined number of cycles. Based on determining that the exposed-pipeline processing element supports replay of the instruction, a rollback action is performed in the exposed-pipeline processing element to attempt recovery from the error. 1. A system for rollback support in an exposed-pipeline processing element , the system comprising:the exposed-pipeline processing element comprising rollback support logic; and detecting an error associated with execution of an instruction in the exposed-pipeline processing element;', 'determining whether the exposed-pipeline processing element supports replay of the instruction for a predetermined number of cycles; and', 'based on determining that the exposed-pipeline processing element supports replay of the instruction, performing a rollback action in the exposed-pipeline processing element to attempt recovery from the error., 'the rollback support logic configured to perform a method comprising2. The system of claim 1 , wherein the rollback support logic is further configured to perform:based on determining that the exposed-pipeline processing element does not support replay of the instruction, triggering an exception to restore the exposed-pipeline processing element to a previously stored checkpoint.3. The system of claim 1 , wherein determining whether the exposed-pipeline processing element supports replay of the instruction further comprises checking a state of an instruction bit of the instruction by decode logic claim 1 , the instruction bit configured to indicate whether ...

Подробнее

Номер записи: 34

29-05-2014 дата публикации

LOW LATENCY DATA EXCHANGE

Номер: US20140149673A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

According to one embodiment, a method for exchanging data in a system that includes a main processor in communication with an active memory device is provided. The method includes a processing element in the active memory device receiving an instruction from the main processor and receiving a store request from a thread running on the main processor, the store request specifying a memory address associated with the processing element. The method also includes storing a value provided in the store request in a queue in the processing element and the processing element performing the instruction using the value from the queue. 1. A system for exchanging data , the system comprising:a main processor in communication with an active memory device, the method comprising:receiving, at a processing element in the active memory device, an instruction from the main processor;receiving, at the processing element, a store request from a thread running on the main processor, the store request specifying a memory address associated with the processing element;storing a value provided in the store request in a queue in the processing element; andperforming, by the processing element, the instruction using the value from the queue.2. The system of claim 1 , wherein storing the value comprises storing an 8 byte value or a 16 byte value.3. The system of claim 1 , wherein storing the value claim 1 , performing the instruction are synchronized based on a predetermined code execution on both the processing element and the main processor.4. The system of claim 3 , wherein the predetermined code execution is determined by a compiler when compiling an application that has code that executes on both the processing element and the main processor.5. The system of claim 1 , wherein receiving the store request from the thread running on the main processor further comprises receiving a store request from the main processor that bypasses all system cache before it is received by the processing ...

Подробнее

Номер записи: 35

29-05-2014 дата публикации

Low latency data exchange

Номер: US20140149680A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

According to one embodiment, a method for exchanging data in a system that includes a main processor in communication with an active memory device is provided. The method includes a processing element in the active memory device receiving an instruction from the main processor and receiving a store request from a thread running on the main processor, the store request specifying a memory address associated with the processing element. The method also includes storing a value provided in the store request in a queue in the processing element and the processing element performing the instruction using the value from the queue.

Подробнее

Номер записи: 36

19-06-2014 дата публикации

SEQUENTIAL LOCATION ACCESSES IN AN ACTIVE MEMORY DEVICE

Номер: US20140173224A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to sequential location accesses in an active memory device that includes memory and a processing element. An aspect includes a method for sequential location accesses that includes receiving from the memory a first group of data values associated with a queue entry at the processing element. A tag value associated with the queue entry and specifying a position from which to extract a first subset of the data values is read. The queue entry is populated with the first subset of the data values starting at the position specified by the tag value. The processing element determines whether a second subset of the data values in the first group of data values is associated with a subsequent queue entry, and populates a portion of the subsequent queue entry with the second subset of the data values. 1. A method for sequential location accesses in an active memory device that includes memory and a processing element , the method comprising:receiving from the memory a first group of data values associated with a queue entry at the processing element;reading a tag value associated with the queue entry, the tag value specifying a position from which to extract a first subset of the data values;populating the queue entry with the first subset of the data values starting at the position in the first group of data values specified by the tag value;determining, by the processing element, whether a second subset of the data values in the first group of data values is associated with a subsequent queue entry; andbased on determining that the second subset of the data values in the first group of data values is associated with the subsequent queue entry, populating a portion of the subsequent queue entry with the second subset of the data values.2. The method of claim 1 , further comprising:analyzing a subsequent instruction associated with the subsequent queue entry targeting memory locations; andbased on determining that a previous instruction associated with the ...

Подробнее

Номер записи: 37

10-07-2014 дата публикации

On-chip traffic prioritization in memory

Номер: US20140195743A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

According to one embodiment, a method for traffic prioritization in a memory device includes sending a memory access request including a priority value from a processing element in the memory device to a crossbar interconnect in the memory device. The memory access request is routed through the crossbar interconnect to a memory controller in the memory device associated with the memory access request. The memory access request is received at the memory controller. The priority value of the memory access request is compared to priority values of a plurality of memory access requests stored in a queue of the memory controller to determine a highest priority memory access request. A next memory access request is performed by the memory controller based on the highest priority memory access request.

Подробнее

Номер записи: 38

10-07-2014 дата публикации

ON-CHIP TRAFFIC PRIORITIZATION IN MEMORY

Номер: US20140195744A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

According to one embodiment, a memory device is provided. The memory device includes a processing element coupled to a crossbar interconnect. The processing element is configured to send a memory access request, including a priority value, to the crossbar interconnect. The crossbar interconnect is configured to route the memory access request to a memory controller associated with the memory access request. The memory controller is coupled to memory and to the crossbar interconnect. The memory controller includes a queue and is configured to compare the priority value of the memory access request to priority values of a plurality of memory access requests stored in the queue of the memory controller to determine a highest priority memory access request and perform a next memory access request based on the highest priority memory access request. 1. A memory device , comprising:a processing element coupled to a crossbar interconnect, the processing element configured to send a memory access request comprising a priority value to the crossbar interconnect;the crossbar interconnect configured to route the memory access request to a memory controller associated with the memory access request; andthe memory controller coupled to memory and to the crossbar interconnect, the memory controller comprising a queue and configured to compare the priority value of the memory access request to priority values of a plurality of memory access requests stored in the queue of the memory controller to determine a highest priority memory access request and perform a next memory access request based on the highest priority memory access request.2. The memory device of claim 1 , wherein the processing element further comprises a memory request priority register claim 1 , and the processing element is further configured to set the priority value based on the memory request priority register in the processing element.3. The memory device of claim 2 , wherein the memory request priority ...

Подробнее

Номер записи: 39

25-06-2015 дата публикации

POWER MANAGEMENT FOR IN-MEMORY COMPUTER SYSTEMS

Номер: US20150177811A1

Автор: Bose Pradip, Buyuktosunoglu Alper, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Vega Augusto J.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

According to one embodiment, a method for power management of a compute node including at least two power-consuming components is provided. A power capping control system compares power consumption level of the compute node to a power cap. Based on determining that the power consumption level is greater than the power cap, actions are performed including: reducing power provided to a first power-consuming component based on determining that it has an activity level below a first threshold and that power can be reduced to the first power-consuming component. Power provided to a second power-consuming component is reduced based on determining that it has an activity level below a second threshold and that power can be reduced to the second power-consuming component. Power reduction is forced in the compute node based on determining that power cannot be reduced in either of the first or second power-consuming component. 1. A method for power management of a compute node comprising at least two power-consuming components , the method comprising:determining, by a power capping control system of the compute node, a power consumption level of the compute node;comparing, by the power capping control system, the power consumption level to a power cap; and reducing power provided to a first power-consuming component of the compute node based on determining that the first power-consuming component has an activity level below a first threshold and that power can be reduced to the first power-consuming component;', 'reducing power provided to a second power-consuming component of the compute node based on determining that the second power-consuming component has an activity level below a second threshold and that power can be reduced to the second power-consuming component; and', 'forcing a power reduction in the compute node based on determining that power cannot be reduced in either of the first or second power-consuming component., 'based on determining that the power ...

Подробнее

Номер записи: 40

18-09-2014 дата публикации

LOCAL BYPASS FOR IN MEMORY COMPUTING

Номер: US20140281084A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Ohmacht Martin, Sugavanam Krishnan

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments include a method for bypassing data in an active memory device. The method includes a requestor determining a number of transfers to a grantor that have not been communicated to the grantor, requesting to the interconnect network that the bypass path be used for the transfers based on the number of transfers meeting a threshold and communicating the transfers via the bypass path to the grantor based on the request, the interconnect network granting control of the grantor in response to the request. The method also includes the interconnect network requesting control of the grantor based on an event and communicating delayed transfers via the interconnect network from other requestors, the delayed transfers being delayed due to the grantor being previously controlled by the requestor, the communicating based on the control of the grantor being changed back to the interconnect network. 17-. (canceled)8. A system for communication in a computer system , the computer system comprising:an active memory device including an interconnect network, a memory vault, a processing element and a bypass path between the memory vault and the processing element that bypasses the interconnect network, the system configured to perform a method comprising:determining, by a requestor, a number of transfers to a grantor that have not been communicated to the grantor;requesting, by the requestor, to the interconnect network that the bypass path be used for the transfers based on the number of transfers meeting a threshold;communicating the transfers via the bypass path to the grantor based on the request, the interconnect network granting control of the grantor in response to the request;requesting, by the interconnect network, control of the grantor based on an event; andcommunicating delayed transfers via the interconnect network from other requestors, the delayed transfers being delayed due to the grantor being previously controlled by the requestor, the communicating based ...

Подробнее

Номер записи: 41

18-09-2014 дата публикации

LOCAL BYPASS FOR IN MEMORY COMPUTING

Номер: US20140281100A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi, Ohmacht Martin, Sugavanam Krishnan

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments include a method for bypassing data in an active memory device. The method includes a requestor determining a number of transfers to a grantor that have not been communicated to the grantor, requesting to the interconnect network that the bypass path be used for the transfers based on the number of transfers meeting a threshold and communicating the transfers via the bypass path to the grantor based on the request, the interconnect network granting control of the grantor in response to the request. The method also includes the interconnect network requesting control of the grantor based on an event and communicating delayed transfers via the interconnect network from other requestors, the delayed transfers being delayed due to the grantor being previously controlled by the requestor, the communicating based on the control of the grantor being changed back to the interconnect network. 1. A method for communication in an active memory device including an interconnect network , a memory vault , a processing element and a bypass path between the memory vault and the processing element that bypasses the interconnect network , the method comprising:determining, by a requestor, a number of transfers to a grantor that have not been communicated to the grantor;requesting, by the requestor, to the interconnect network that the bypass path be used for the transfers based on the number of transfers meeting a threshold;communicating the transfers via the bypass path to the grantor based on the request, the interconnect network granting control of the grantor in response to the request;requesting, by the interconnect network, control of the grantor based on an event; andcommunicating delayed transfers via the interconnect network from other requestors, the delayed transfers being delayed due to the grantor being previously controlled by the requestor, the communicating based on the control of the grantor being changed back to the interconnect network,the requestor ...

Подробнее

Номер записи: 42

18-09-2014 дата публикации

Chaining between exposed vector pipelines

Номер: US20140281386A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline.

Подробнее

Номер записи: 43

18-09-2014 дата публикации

CHAINING BETWEEN EXPOSED VECTOR PIPELINES

Номер: US20140281403A1

Автор: Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline. 1. A method for chaining data in an exposed-pipeline processing element , the method comprising:separating a multiple instruction word into a first sub-instruction and a second sub-instruction;receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element;issuing the first sub-instruction at a first time;issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction;performing, by a first pipeline, the first sub-instruction at a first clock cycle;communicating, by the first pipeline, the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline, the first ...

Подробнее

Номер записи: 44

18-09-2014 дата публикации

POWER MANAGEMENT FOR A COMPUTER SYSTEM

Номер: US20140281605A1

Автор: Bose Pradip, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments include a method for managing power in a computer system including a main processor and an active memory device including powered units, the active memory device in communication with the main processor by a memory link, the powered units including a processing element. The method includes the main processor executing a program on a program thread, encountering a first section of code to be executed by the active memory device, changing, by a first command, a power state of a powered unit on the active memory device based on the main processor encountering the first section of code, the first command including a store command. The method also includes the processing element executing the first section of code at a second time, changing a power state of the main processor from a power use state to a power saving state based on the processing element executing the first section. 110-. (canceled)11. A system for managing power in a computer system , the computer system comprising:a main processor and an active memory device including powered units, the active memory device in communication with the main processor by a memory link, the powered unit comprising a processing element, the system configured to perform a method comprising:executing, at the main processor, a program on a program thread;encountering, at the main processor, a first section of code to be executed by the active memory device;changing, by a first command, a power state of a powered unit on the active memory device based on the main processor encountering the first section of code, the first command comprising a store command;executing, by the processing element, the first section of code at a second time;changing a power state of the main processor from a power use state to a power saving state based on the processing element executing the first section of code;changing, by a second command, the power state of the main processor from the power saving state to the power use state based ...

Подробнее

Номер записи: 45

18-09-2014 дата публикации

POWER MANAGEMENT FOR A COMPUTER SYSTEM

Номер: US20140281629A1

Автор: Bose Pradip, Fleischer Bruce M., Fox Thomas W., Jacobson Hans M., Nair Ravi

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments include a method for managing power in a computer system including a main processor and an active memory device including powered units, the active memory device in communication with the main processor by a memory link, the powered units including a processing element. The method includes the main processor executing a program on a program thread, encountering a first section of code to be executed by the active memory device, changing, by a first command, a power state of a powered unit on the active memory device based on the main processor encountering the first section of code, the first command including a store command. The method also includes the processing element executing the first section of code at a second time, changing a power state of the main processor from a power use state to a power saving state based on the processing element executing the first section. 1. A system for managing power in a computer system , the computer system comprising:a main processor and an active memory device including powered units, the active memory device in communication with the main processor by a memory link, the powered unit comprising a processing element, the system configured to perform a method comprising:executing, at the main processor, a program on a program thread;encountering, at the main processor, a first section of code to be executed by the active memory device;changing, by a first command, a power state of a powered unit on the active memory device based on the main processor encountering the first section of code, the first command comprising a store command;executing, by the processing element, the first section of code at a second time;changing a power state of the main processor from a power use state to a power saving state based on the processing element executing the first section of code;changing, by a second command, the power state of the main processor from the power saving state to the power use state based on the processing ...

Подробнее

Номер записи: 46

25-07-2019 дата публикации

Cache miss thread balancing

Номер: US20190227932A1

Автор: Aaron Tsai, Anthony Saporito, Brian D. Barrick, Christian Jacobi, Gregory W. Alexander, Somin Song, Thomas W. Fox

Принадлежит: International Business Machines Corp

A simultaneous multithread (SMT) processor having a shared dispatch pipeline includes a first circuit that detects a cache miss thread. A second circuit determines a first cache hierarchy level at which the detected cache miss occurred. A third circuit determines a Next To Complete (NTC) group in the thread and a plurality of additional groups (X) in the thread. The additional groups (X) are dynamically configured based on the detected cache miss. A fourth circuit determines whether any groups in the thread are younger than the determined NTC group and the plurality of additional groups (X), and flushes all the determined younger groups from the cache miss thread.

Подробнее

Номер записи: 47

13-09-2018 дата публикации

Cache miss thread balancing

Номер: US20180260326A1

Автор: Aaron Tsai, Anthony Saporito, Brian D. Barrick, Christian Jacobi, Gregory W. Alexander, Somin Song, Thomas W. Fox

Принадлежит: International Business Machines Corp

A simultaneous multithread (SMT) processor having a shared dispatch pipeline includes a first circuit that detects a cache miss thread. A second circuit determines a first cache hierarchy level at which the detected cache miss occurred. A third circuit determines a Next To Complete (NTC) group in the thread and a plurality of additional groups (X) in the thread. The additional groups (X) are dynamically configured based on the detected cache miss. A fourth circuit determines whether any groups in the thread are younger than the determined NTC group and the plurality of additional groups (X), and flushes all the determined younger groups from the cache miss thread.

Подробнее

Номер записи: 48

12-07-2016 дата публикации

Power management for in-memory computer systems

Номер: US9389675B2

Автор: Alper Buyuktosunoglu, Augusto J. Vega, Bruce M. Fleischer, Hans M. Jacobson, Pradip Bose, Ravi Nair, Thomas W. Fox

Принадлежит: International Business Machines Corp

According to one embodiment, a method for power management of a compute node including at least two power-consuming components is provided. A power capping control system compares power consumption level of the compute node to a power cap. Based on determining that the power consumption level is greater than the power cap, actions are performed including: reducing power provided to a first power-consuming component based on determining that it has an activity level below a first threshold and that power can be reduced to the first power-consuming component. Power provided to a second power-consuming component is reduced based on determining that it has an activity level below a second threshold and that power can be reduced to the second power-consuming component. Power reduction is forced in the compute node based on determining that power cannot be reduced in either of the first or second power-consuming component.

Подробнее

Номер записи: 49

07-07-2011 дата публикации

Register file soft error recovery

Номер: US20110167296A1

Автор: Adam J. Muff, Alfred T. Watson, III, Charles D. Wait, Thomas W. Fox

Принадлежит: International Business Machines Corp

Register file soft error recovery including a system that includes a first register file and a second register file that mirrors the first register file. The system also includes an arithmetic pipeline for receiving data read from the first register file, and error detection circuitry to detect whether the data read from the first register file includes corrupted data. The system further includes error recovery circuitry to insert an error recovery instruction into the arithmetic pipeline in response to detecting the corrupted data. The inserted error recovery instruction replaces the corrupted data in the first register file with a copy of the data from the second register file.

Подробнее

Номер записи: 50

19-06-2014 дата публикации

Sequential location accesses in an active memory device

Номер: WO2014090092A1

Автор: Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair, Thomas W. Fox

Принадлежит: Ibm (China) Co., Limited, INTERNATIONAL BUSINESS MACHINES CORPORATION

Embodiments relate to sequential location accesses in an active memory device that includes memory and a processing element. An aspect includes a method for sequential location accesses that includes receiving from the memory a first group of data values associated with a queue entry at the processing element. A tag value associated with the queue entry and specifying a position from which to extract a first subset of the data values is read. The queue entry is populated with the first subset of the data values starting at the position specified by the tag value. The processing element determines whether a second subset of the data values in the first group of data values is associated with a subsequent queue entry, and populates a portion of the subsequent queue entry with the second subset of the data values.

Подробнее

Номер записи: 51

04-12-1928 дата публикации

Design fob

Номер: USD77084S

Автор: Thomas W. Fox

Принадлежит:

Подробнее

Номер записи: 52

15-05-2018 дата публикации

Multi-petascale highly efficient parallel supercomputer

Номер: US09971713B2

Автор: Adam J. Muff, Alan Gara, Alexandre E. Eichenberger, Alfred T. Watson, Amith R. Mamidala, Barry M. Trager, Brant L. Knudson, Bruce M. Fleischer, Burkhard Steinmacher-Burow, Charles D. Wait, Chen-Yong Cher, Craig B. Stunkel, David L. Satterfield, DONG Chen, Douglas R. Miller, Gabor J. Dozsa, George L. Chiu, Gerard V. Kopcsay, James A. Marcella, James L. Van Oosten, Jeffrey J. Parker, John A. Gunnels, John K. O'Brien, Jose R. Brunheroto, Joseph D. Ratterman, Kahn C. Evans, Kathryn M. O'Brien, Krishnan Sugavanam, Kristan D. Davis, Mark E. Giampapa, Mark G. Megerian, Martin Ohmacht, Matthew R. Ellavsky, Matthias A. Blumrich, Michael A. Blocksome, Michael B. Mundy, Michael K. Gschwind, Noel A. Eisley, Norman Christ, Paul W. Coteus, Peng Wu, Peter Boyle, Philip Heidelberger, Ralph E. Bellofatto, Robert E. Walkup, Robert M. Senger, Robert W. Wisniewski, Rudolf A. Haring, Ruth J. Poole, Sameer Kumar, Sameh Asaad, Samuel J. Miller, Shawn A. Hall, Thomas M. Gooding, Thomas W. Fox, Todd A. Inglett, Todd E. Takken, Valentina Salapura, William M. Stockdell, Yutaka Sugawara

Принадлежит: Globalfoundries Inc

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

Подробнее

Номер записи: 53