Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 190. Отображено 100.
02-08-2012 дата публикации

SYNCHRONIZING ACCESS TO DATA IN SHARED MEMORY VIA UPPER LEVEL CACHE QUEUING

Номер: US20120198167A1

A processing unit includes a store-in lower level cache having reservation logic that determines presence or absence of a reservation and a processor core including a store-through upper level cache, an instruction execution unit, a load unit that, responsive to a hit in the upper level cache on a load-reserve operation generated through execution of a load-reserve instruction by the instruction execution unit, temporarily buffers a load target address of the load-reserve operation, and a flag indicating that the load-reserve operation bound to a value in the upper level cache. If a storage-modifying operation is received that conflicts with the load target address of the load-reserve operation, the processor core sets the flag to a particular state, and, responsive to execution of a store-conditional instruction, transmits an associated store-conditional operation to the lower level cache with a fail indication if the flag is set to the particular state. 1. A method of data processing in a multi-processor data processing system including a processor core supported by a store-though upper level cache , the data processing system further including a store-in lower level cache and a system memory , said method comprising:the processor core executing a load-reserve instruction to determine a load target address of a load-reserve operation;responsive to the load target address hitting in the store-through upper level cache, temporarily buffering in the processor core the load target address;if a storage-modifying operation is received by the processor core that conflicts with the buffered load target address of the load-reserve operation, setting a flag to a particular state; andin response to execution of a store-conditional instruction by the processor core, transmitting an associated store-conditional operation to the store-in lower level cache with a fail indication if the flag is set to the particular state so that the lower level cache will fail the store- ...

Подробнее
09-08-2012 дата публикации

Coordinated writeback of dirty cachelines

Номер: US20120203968A1
Принадлежит: International Business Machines Corp

A data processing system includes a processor core and a cache memory hierarchy coupled to the processor core. The cache memory hierarchy includes at least one upper level cache and a lowest level cache. A memory controller is coupled to the lowest level cache and to a system memory and includes a physical write queue from which the memory controller writes data to the system memory. The memory controller initiates accesses to the lowest level cache to place into the physical write queue selected cachelines having spatial locality with data present in the physical write queue.

Подробнее
09-08-2012 дата публикации

MEMORY BUS WRITE PRIORITIZATION

Номер: US20120203969A1

A data processing system includes a multi-level cache hierarchy including a lowest level cache, a processor core coupled to the multi-level cache hierarchy, and a memory controller coupled to the lowest level cache and to a memory bus of a system memory. The memory controller includes a physical read queue that buffers data read from the system memory via the memory bus and a physical write queue that buffers data to be written to the system memory via the memory bus. The memory controller grants priority to write operations over read operations on the memory bus based upon a number of dirty cachelines in the lowest level cache memory. 1. A method of data processing in a data processing system including a processor core , a multi-level cache memory hierarchy including a lowest level cache , a memory controller , and a system memory coupled to the memory controller by a memory bus , said method comprising:the memory controller establishing a priority of read operations over write operations on the memory bus;the memory controller temporarily granting priority to write operations over read operations on the memory bus based upon a number of dirty cachelines in the lowest level cache memory; andwhile write operations have priority of read operations on the memory bus, the memory controller issuing one or more write operations from a physical write queue of the memory controller to the system memory via the memory bus.2. The method of claim 1 , wherein:the lowest level cache includes a first subset of storage locations allocated as a virtual write queue accessible to the memory controller and a second subset of storage locations; andtemporarily granting priority to write operations comprises temporarily granting priority to write operations over read operations based upon a number of dirty cachelines within the virtual write queue but not based upon a number of dirty cachelines within the second subset of storage locations.3. The method of claim 2 , wherein:the lowest ...

Подробнее
16-08-2012 дата публикации

CACHE-BASED SPECULATION OF STORES FOLLOWING SYNCHRONIZING OPERATIONS

Номер: US20120210072A1

A method of processing store requests in a data processing system includes enqueuing a store request in a store queue of a cache memory of the data processing system. The store request identifies a target memory block by a target address and specifies store data. While the store request and a barrier request older than the store request are enqueued in the store queue, a read-claim machine of the cache memory is dispatched to acquire coherence ownership of target memory block of the store request. After coherence ownership of the target memory block is acquired and the barrier request has been retired from the store queue, a cache array of the cache memory is updated with the store data. 1. A method of processing store requests in a data processing system , the method comprising:enqueuing a store request in a store queue of a cache memory of the data processing system, the store request identifying a target memory block by a target address and specifying store data;while the store request and a barrier request older than the store request are enqueued in the store queue, dispatching a read-claim machine of the cache memory to acquire coherence ownership of the target memory block of the store request; andafter coherence ownership of the target memory block is acquired and the barrier request has been retired from the store queue, updating a cache array of the cache memory with the store data.2. The method of claim 1 , and further comprising:in response to the dispatching of the read-claim machine, the read-claim machine initiating an operation to acquire the target memory block; andin response to receipt of the target memory block and presence of the older barrier request in the store queue, installing the target memory block unmodified by store data of the store request into a cache array of the cache memory.3. The method of claim 2 , and further comprising releasing the read-claim machine in response to the installing.4. The method of claim 1 , wherein:the read- ...

Подробнее
18-10-2012 дата публикации

PERFORMING A PARTIAL CACHE LINE STORAGE-MODIFYING OPERATION BASED UPON A HINT

Номер: US20120265938A1

Analyzing pre-processed code includes identifying at least one storage-modifying construct specifying a storage-modifying memory access to a memory hierarchy of a data processing system and determining if more than one granule of a cache line of data containing multiple granules that is targeted by the storage-modifying construct is subsequently referenced by said pre-processed code. Post-processed code including a storage-modifying instruction corresponding to the at least one storage-modifying construct in the pre-processed code is generated and stored. Generating the post-processed code includes marking the storage-modifying instruction with a partial cache line hint indicating that said storage-modifying instruction targets less than a full cache line of data within a memory hierarchy if the analyzing indicates only one granule of the target cache line will be accessed while the cache line is held in the cache memory and otherwise refraining from marking the storage-modifying instruction with the partial cache line hint. 1. A program product , comprising:a computer readable storage medium; and [ identifying at least one storage-modifying construct specifying a storage-modifying memory access to a memory hierarchy of a data processing system; and', 'determining if more than one granule of a cache line of data containing multiple granules that is targeted by said at least one storage-modifying construct is subsequently referenced by said pre-processed code; and, 'analyzing pre-processed code, wherein the analyzing includes, 'generating post-processed code including a storage-modifying instruction corresponding to the at least one storage-modifying construct in the pre-processed code, wherein the generating includes marking the storage-modifying instruction with a partial cache line hint indicating that said storage-modifying instruction targets less than a full cache line of data within a memory hierarchy if the analyzing indicates only one granule of the target ...

Подробнее
22-11-2012 дата публикации

FACILITATING DATA COHERENCY USING IN-MEMORY TAG BITS AND FAULTING STORES

Номер: US20120297109A1

Fine-grained detection of data modification of original data is provided by associating separate guard bits with granules of memory storing the original data from which translated data has been obtained. The guard bits facilitate indicating whether the original data stored in the associated granule is indicated as protected. The guard bits are set and cleared by special-purpose instructions. Responsive to initiating a data store operation to modify the original data, the associated guard bit(s) are checked to determine whether the original data is indicated as protected. Responsive to the checking indicating that a guard bit is set for the associated original data, the data store operation to modify the original data is faulted and the translated data is discarded, thereby facilitating data coherency between the original data and the translated data. 1. A computer program product for facilitating data coherency , the computer program product comprising: faulting the data store operation; and', 'initiating discarding of the translated data., 'responsive to initiating a data store operation to modify original data from which translated data has been obtained, checking whether at least one guard bit associated with the original data is set to indicate protection of the original data, and responsive to the checking indicating that the at least one guard bit is set, 'a non-transitory storage medium readable by a processor and storing instructions for execution by the processor to perform a method comprising2. The computer program product of claim 1 , wherein the original data is stored in portions across multiple granules of memory of a single memory page claim 1 , the single memory page comprising a plurality of granules of memory claim 1 , each respective granule of memory of the multiple granules of memory having associated therewith a guard bit for indicating protection of that granule of memory claim 1 , wherein the plurality of granules of memory of the single ...

Подробнее
20-12-2012 дата публикации

AGGREGATE SYMMETRIC MULTIPROCESSOR SYSTEM

Номер: US20120324190A1
Автор: STARKE William J.

An aggregate symmetric multiprocessor (SMP) data processing system includes a first SMP computer including at least first and second processing units and a first system memory pool and a second SMP computer including at least third and fourth processing units and second and third system memory pools. The second system memory pool is a restricted access memory pool inaccessible to the fourth processing unit and accessible to at least the second and third processing units, and the third system memory pool is accessible to both the third and fourth processing units. An interconnect couples the second processing unit in the first SMP computer for load-store coherent, ordered access to the second system memory pool in the second SMP computer, such that the second processing unit in the first SMP computer and the second system memory pool in the second SMP computer form a synthetic third SMP computer. 1. An aggregate symmetric multiprocessor (SMP) data processing system , comprising:a first SMP computer including at least first and second processing units and a first system memory pool;a second SMP computer including at least third and fourth processing units and second and third system memory pools, wherein the second system memory pool is a restricted access memory pool inaccessible to the fourth processing unit and accessible to at least the second and third processing units and the third system memory pool is accessible to both the third and fourth processing units; andan interconnect coupling the second processing unit in the first SMP computer for load-store coherent, ordered access to the second system memory pool in the second SMP computer, wherein the second processing unit in the first SMP computer and the second system memory pool in the second SMP computer form a synthetic third SMP computer.2. The aggregate symmetric multiprocessor (SMP) data processing system of claim 1 , wherein the first SMP computer further includes a fourth system memory pool that is a ...

Подробнее
08-08-2013 дата публикации

PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS

Номер: US20130205120A1

A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation. 17-. (canceled)8. A data processing system configured to process an instruction sequence that includes a barrier instruction , a load instruction preceding the barrier instruction , and a subsequent memory access instruction following the barrier instruction , the data processing system comprising:a cache memory; and determine that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction;', 'if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction; and', 'if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor ...

Подробнее
08-08-2013 дата публикации

PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS

Номер: US20130205121A1

A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor core, in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation. 1. A method of processing an instruction sequence that includes a barrier instruction , a load instruction preceding the barrier instruction , and a subsequent memory access instruction following the barrier instruction , the method comprising:determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction;if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction; andif execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, ...

Подробнее
03-10-2013 дата публикации

DATA CACHE BLOCK DEALLOCATE REQUESTS

Номер: US20130262769A1

A data processing system includes a processor core supported by upper and lower level caches. In response to executing a deallocate instruction in the processor core, a deallocation request is sent from the processor core to the lower level cache, the deallocation request specifying a target address associated with a target cache line. In response to receipt of the deallocation request at the lower level cache, a determination is made if the target address hits in the lower level cache. In response to determining that the target address hits in the lower level cache, the target cache line is retained in a data array of the lower level cache and a replacement order field in a directory of the lower level cache is updated such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss. 1. A method of data processing in a data processing system including a processor core supported by upper and lower level caches , the method comprising:in response to executing a deallocate instruction in the processor core, sending a deallocation request from the processor core to the lower level cache, the deallocation request specifying a target address associated with a target cache line;in response to receipt of the deallocation request at the lower level cache, determining if the target address hits in the lower level cache; andin response to determining that the target address hits in the lower level cache, retaining the target cache line in a data array of the lower level cache and updating a replacement order field in a directory of the lower level cache such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss in a congruence class including the target cache line.2. The method of claim 1 , wherein updating the replacement order field includes making the target cache line least recently used (LRU).3. The method of claim 1 , and further comprising: ...

Подробнее
03-10-2013 дата публикации

DATA CACHE BLOCK DEALLOCATE REQUESTS IN A MULTI-LEVEL CACHE HIERARCHY

Номер: US20130262770A1

In response to executing a deallocate instruction, a deallocation request specifying a target address of a target cache line is sent from a processor core to a lower level cache. In response, a determination is made if the target address hits in the lower level cache. If so, the target cache line is retained in a data array of the lower level cache, and a replacement order field of the lower level cache is updated such that the target cache line is more likely to be evicted in response to a subsequent cache miss in a congruence class including the target cache line. In response to the subsequent cache miss, the target cache line is cast out to the lower level cache with an indication that the target cache line was a target of a previous deallocation request of the processor core. 1. A method of data processing in a data processing system including a processor core supported by upper and lower level caches , the method comprising:in response to executing a deallocate instruction in the processor core, sending a deallocation request from the processor core to the lower level cache, the deallocation request specifying a target address associated with a target cache line;in response to receipt of the deallocation request at the lower level cache, determining if the target address hits in the lower level cache;in response to determining that the target address hits in the lower level cache, retaining the target cache line in a data array of the lower level cache and updating a replacement order field in a directory of the lower level cache such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss in a congruence class including the target cache line; andin response to the subsequent cache miss, casting out the target cache line to the lower level cache with an indication that the target cache line was a target of a previous deallocation request of the processor core.2. The method of claim 1 , wherein ...

Подробнее
03-10-2013 дата публикации

DATA CACHE BLOCK DEALLOCATE REQUESTS

Номер: US20130262777A1

A data processing system includes a processor core supported by upper and lower level caches. In response to executing a deallocate instruction in the processor core, a deallocation request is sent from the processor core to the lower level cache, the deallocation request specifying a target address associated with a target cache line. In response to receipt of the deallocation request at the lower level cache, a determination is made if the target address hits in the lower level cache. In response to determining that the target address hits in the lower level cache, the target cache line is retained in a data array of the lower level cache and a replacement order field in a directory of the lower level cache is updated such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss. 111.-. (canceled)12. A processing unit , comprising:a processor core including an upper level cache, wherein the upper level cache, in response to executing a deallocate instruction, sends a deallocation request specifying a target address associated with a target cache line; anda lower level cache coupled to the processor core, the lower level cache including a data array, a directory of contents of the data array, and control logic, wherein the control logic, in response to receipt of the deallocation request at the lower level cache, determines if the target address hits in the directory and, in response to determining that the target address hits in the directory, retains the target cache line in the data array and updates a replacement order field in the directory such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss in a congruence class including the target cache line.13. The processing unit of claim 12 , wherein claim 12 , in response to receipt of the deallocation request claim 12 , the control logic updates the replacement order field to make ...

Подробнее
03-10-2013 дата публикации

DATA CACHE BLOCK DEALLOCATE REQUESTS IN A MULTI-LEVEL CACHE HIERARCHY

Номер: US20130262778A1

In response to executing a deallocate instruction, a deallocation request specifying a target address of a target cache line is sent from a processor core to a lower level cache. In response, a determination is made if the target address hits in the lower level cache. If so, the target cache line is retained in a data array of the lower level cache, and a replacement order field of the lower level cache is updated such that the target cache line is more likely to be evicted in response to a subsequent cache miss in a congruence class including the target cache line. In response to the subsequent cache miss, the target cache line is cast out to the lower level cache with an indication that the target cache line was a target of a previous deallocation request of the processor core. 110.-. (canceled)11. A processing unit , comprising:a processor core including an upper level cache, wherein the upper level cache, in response to executing a deallocate instruction, sends a deallocation request specifying a target address associated with a target cache line; anda lower level cache coupled to the processor core, the lower level cache including a data array, a directory of contents of the data array, and control logic, wherein the control logic, in response to receipt of the deallocation request at the lower level cache, determines if the target address hits in the directory and, in response to determining that the target address hits in the directory, retains the target cache line in the data array and updates a replacement order field in the directory such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss in a congruence class including the target cache line, and wherein the control logic, in response to the subsequent cache miss, casts out the target cache line to the lower level cache with an indication that the target cache line was a target of a previous deallocation request of the processor core. ...

Подробнее
05-01-2017 дата публикации

TRANSACTIONAL STORAGE ACCESSES SUPPORTING DIFFERING PRIORITY LEVELS

Номер: US20170004004A1
Принадлежит:

In at least some embodiments, a cache memory of a data processing system receives a transactional memory access request including a target address and a priority of the requesting memory transaction. In response, transactional memory logic detects a conflict for the target address with a transaction footprint of an existing memory transaction and accesses a priority of the existing memory transaction. In response to detecting the conflict, the transactional memory logic resolves the conflict by causing the cache memory to fail the requesting or existing memory transaction based at least in part on their relative priorities. Resolving the conflict includes at least causing the cache memory to fail the existing memory transaction when the requesting memory transaction has a higher priority than the existing memory transaction, the transactional memory access request is a transactional load request, and the target address is within a store footprint of the existing memory transaction. 1. A method of data processing in a data processing system , the method comprising:at a cache memory of a data processing system, receiving a transactional memory access request generated by execution by a processor core of a transactional memory access instruction within a requesting memory transaction, wherein the transactional memory access request includes a target address of data to be accessed and indicates a priority of the requesting memory transaction;in response to the transactional memory access request, transactional memory logic detecting a conflict for the target address between the transactional memory access request and a transaction footprint of an existing memory transaction;the transactional memory logic accessing a priority of the existing memory transaction; and 'causing the cache memory to fail the existing memory transaction when the requesting memory transaction has a higher priority than the existing memory transaction, the transactional memory access request is a ...

Подробнее
05-01-2017 дата публикации

TRANSACTIONAL STORAGE ACCESSES SUPPORTING DIFFERING PRIORITY LEVELS

Номер: US20170004085A1
Принадлежит:

In at least some embodiments, a cache memory of a data processing system receives a transactional memory access request including a target address and a priority of the requesting memory transaction. In response, transactional memory logic detects a conflict for the target address with a transaction footprint of an existing memory transaction and accesses a priority of the existing memory transaction. In response to detecting the conflict, the transactional memory logic resolves the conflict by causing the cache memory to fail the requesting or existing memory transaction based at least in part on their relative priorities. Resolving the conflict includes at least causing the cache memory to fail the existing memory transaction when the requesting memory transaction has a higher priority than the existing memory transaction, the transactional memory access request is a transactional load request, and the target address is within a store footprint of the existing memory transaction. 16.-. (canceled)7. A processing unit , comprising:a processor core;a cache memory, coupled to the processor core, that receives a transactional memory access request generated by execution of a transactional memory access instruction within a requesting memory transaction, wherein the transactional memory access request includes a target address of data to be accessed and indicates a priority of the requesting memory transaction; responsive to the transactional memory access request, detect a conflict for the target address between the transactional memory access request and a transaction footprint of an existing memory transaction;', 'access a priority of the existing memory transaction; and', 'causing the cache memory to fail the existing memory transaction when the requesting memory transaction has a higher priority than the existing memory transaction, the transactional memory access request is a transactional load request, and the target address is within a store footprint of the ...

Подробнее
21-01-2021 дата публикации

MULTIPLE CHIP BRIDGE CONNECTOR

Номер: US20210020529A1
Принадлежит:

The present invention includes a bridge connector with one or more semiconductor layers in a bridge connector shape. The shape has one or more edges, one or more bridge connector contacts on a surface of the shape, and one or more bridge connectors. The bridge connectors run through one or more of the semiconductor layers and connect two or more of the bridge connector contacts. The bridge connector contacts are with a tolerance distance from one of the edges. In some embodiments the bridge connector is a central bridge connector that connects two or more chips disposed on the substrate of a multi-chip module (MCM). The chips have chip contacts that are on an interior corner of the chip. The interior corners face one another. The central bridge connector overlaps the interior corners so that each of one or more of the bridge contacts is in electrical contact with each of one or more of the chip contacts. In some embodiments, overlap is minimized to permit more access to the surface of the chips. Arrays of MCMs and methods of making bridge connects are disclosed. Bridge connector shapes include: rectangular, window pane, plus-shaped, circular shaped, and polygonal-shaped. 1: A bridge connector comprising:one or more semiconductor layers in a bridge connector shape, the shape having one or more edges;one or more bridge connector contacts on a surface of the bridge connector shape, all the bridge connector contacts being within a tolerance distance from one or more of the edges;one or more bridge connectors, the bridge connectors running through one or more of the semiconductor layers and connecting two or more of the bridge connector contacts; andan opening passing completely through the bridge connector, the opening having an opening shape, the opening is centered within the bridge connector and is not in the tolerance distance.2: A connector claim 1 , as in claim 1 , where the semiconductor layers contain one or more active components.3: A connector claim 1 , as in ...

Подробнее
17-02-2022 дата публикации

MULTIFUNCTION COMMUNICATION INTERFACE SUPPORTING MEMORY SHARING AMONG DATA PROCESSING SYSTEMS

Номер: US20220050787A1
Принадлежит:

In a data processing environment, a communication interface of a second host data processing system receives, from a first host data processing system, a host command in a first command set. The host command specifies a memory access to a memory coupled to the second host data processing system. The communication interface translates the host command into a command in a different second command set emulating coupling of an attached functional unit to the communication interface. The communication interface presents the second command to a host bus protocol interface of the second host data processing system. Based on receipt of the second command, the host bus protocol interface initiates, on a system fabric of the second host data processing system, a host bus protocol memory access request specifying the memory access. 1. A method of communication in a data processing environment , the method comprising:receiving, from a first host data processing system at a communication interface of a second host data processing system, a host command in a first command set, the host command specifying a memory access to a memory coupled to the second host data processing system;translating the host command into a command in a different second command set emulating coupling of an attached functional unit to the communication interface;presenting the second command to a host bus protocol interface of the second host data processing system; andbased on receipt of the second command, initiating, by the host bus protocol interface on a system fabric of the second host data processing system, a host bus protocol memory access request specifying the memory access.2. The method of claim 1 , wherein:the second command specifies an address;the method further comprises pinning, in a page frame table of the second host data processing system, a page table entry for the address.3. The method of claim 1 , wherein:the communication interface is a first communication interface;the second host ...

Подробнее
31-01-2019 дата публикации

SECURE MEMORY IMPLEMENTATION FOR SECURE EXECUTION OF VIRTUAL MACHINES

Номер: US20190034627A1
Принадлежит:

An embodiment involves secure memory implementation for secure execution of virtual machines. Data is processed in a first mode and a second mode, and commands are sent to a chip interconnect bus using real addresses, wherein the chip interconnect bus includes a number of bits for the real addresses. A memory controller is operatively coupled to a memory component. A secure memory range is specified by using range registers. If the real address is detected to be in the secure memory range to match a memory component address, a real address bit is set. If the real address is in the memory address hole, a security access violation is detected. If the real address is not in the secure address range and the real address bit is set, the security access violation is detected. 1. A computer system comprising:a processor to process data in a first mode and a second mode, and send commands to a chip interconnect bus using real addresses, wherein the chip interconnect bus transports a number of bits for the real addresses, wherein the chip interconnect bus is larger than a number of bits needed for a maximum memory range supported by the computer system, and wherein a first portion of the bits for real addresses which are not in the range of the supported maximum memory range is used to indicate whether to operate in the first mode or the second mode creating a memory address hole;a memory controller operatively coupled to a memory component; specifying a secure memory range by using range registers;', 'responsive to determining that the real address is detected to be in the secure memory range to match a memory component address, setting a real address bit;', 'responsive to determining that the real address is in the memory address hole, detecting a security access violation; and', 'responsive to determining that the real address is not in the secure address range and the real address bit is set, detecting the security access violation., 'the processor further is capable of ...

Подробнее
31-01-2019 дата публикации

SECURE MEMORY IMPLEMENTATION FOR SECURE EXECUTION OF VIRTUAL MACHINES

Номер: US20190034628A1
Принадлежит:

Secure memory implementation for secure execution of virtual machines. Data is processed in a first mode and a second mode, and commands are sent to a chip interconnect bus using real addresses, wherein the chip interconnect bus includes a number of bits for the real addresses. A memory controller is operatively coupled to a memory component. A secure memory range is specified by using range registers. If the real address is detected to be in the secure memory range to match a memory component address, a real address bit is inverted. If the real address is in the secure memory address hole, a security access violation is detected. If the real address is not in the secure address range and the real address bit is set, the security access violation is detected. 1. A method comprising:processing, by one or more computer processors, data in a first mode and a second mode, wherein a data processing unit is configured to send commands to a chip interconnect bus using real addresses;wherein the chip interconnect bus includes a number of bits for the real addresses;wherein the chip interconnect bus is larger than a number of bits needed for a maximum memory range supported by the computer system;wherein a first portion of the bits for real addresses which are not in the range of the supported maximum memory range is used to indicate whether to operate in the first mode or the second mode creating a memory address hole, and wherein a memory controller operatively coupled to a memory component;specifying, by the one or more computer processors, a secure memory range by using range registers;responsive to determining that the real address is detected to be in the secure memory range to match a memory component address, inverting, by the one or more computer processors, a real address bit;responsive to determining that the real address is in the secure memory address hole, detecting, by the one or more computer processors, a security access violation; andresponsive to ...

Подробнее
07-02-2019 дата публикации

Techniques for requesting data associated with a cache line in symmetric multiprocessor systems

Номер: US20190042428A1
Принадлежит: International Business Machines Corp

A technique for operating a data processing system includes transitioning, by a cache, to a highest point of coherency (HPC) for a cache line in a required state without receiving data for one or more segments of the cache line that are needed. The cache issues a command to a lowest point of coherency (LPC) that requests data for the one or more segments of the cache line that were not received and are needed. The cache receives the data for the one or more segments of the cache line from the LPC that were not previously received and were needed.

Подробнее
06-02-2020 дата публикации

LINK-LEVEL CYCLIC REDUNDANCY CHECK REPLAY FOR NON-BLOCKING COHERENCE FLOW

Номер: US20200042449A1

Data processing in a data processing system including a plurality of processing nodes coupled to an interconnect includes receiving, by a fabric controller, a first command from a remote processing node via the interconnect. The fabric controller determines that the command includes a replay indication, the replay indication indicative of a replay event at one or more processing nodes of the plurality of processing nodes. The first command is dropped from a deskew buffer of the fabric controller responsive to the determining that the command includes the replay indication. 1. A method of data processing in a data processing system including a plurality of processing nodes coupled to an interconnect , the method comprising:receiving, by a fabric controller, a first command from a remote processing node via the interconnect;determining, by the fabric controller, that the command includes a replay indication, the replay indication indicative of a replay event at one or more processing nodes of the plurality of processing nodes;returning, by the fabric controller, a combined response to the remote processing node, wherein the combined response is comprised of one or more partial responses received from the plurality of processing nodes; and wherein the partial responses include an indication that the command was dropped; anddropping the first command from a deskew buffer of the fabric controller responsive to the determining that the command includes the replay indication.2. The method of claim 1 , further comprising:storing a partial response associated with the command in an overcommit queue of the fabric controller.3. The method of claim 2 , wherein the partial response includes an indication that the command was dropped.4. The method of claim 2 , further comprising:sending the partial response to the remote processing node via the interconnect.5. The method of claim 1 , wherein the command is formatted as a link layer packet claim 1 , the replay indication being in ...

Подробнее
16-02-2017 дата публикации

TECHNIQUES FOR IMPLEMENTING BARRIERS TO EFFICIENTLY SUPPORT CUMULATIVITY IN A WEAKLY-ORDERED MEMORY SYSTEM

Номер: US20170046264A1
Принадлежит:

A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure. 1. A cache memory , comprising:a data array;a store queue configured to buffer synchronization operations and store operations;a pollution vector control configured to create respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory; anda store queue controller configured to set dependencies in a dependency data structure of the store queue based on the pollution vectors to reduce unnecessary ordering effects and dispatch store operations from the store queue in accordance with the dependencies indicated by the dependency data structure.2. The cache memory of claim 1 , wherein the cache memory is a lower level cache memory and the cache memory is further configured to:compare a load target address of a load operation that hits in an upper level cache memory that is associated with the lower level cache memory to active store target addresses in the lower level cache memory; andin response to a match of the load target address with one or more of the active store target addresses in the lower level cache memory, set an associated one of the respective pollution vectors to indicate that a thread that issued the load operation is polluted by a given store operation active in the lower level cache memory.3. The cache memory of claim 1 , wherein the cache memory is further configured ...

Подробнее
22-02-2018 дата публикации

MEMORY MOVE INSTRUCTION SEQUENCE TARGETING A MEMORY-MAPPED DEVICE

Номер: US20180052599A1
Принадлежит:

A data processing system includes a processor core having a store-in lower level cache, a memory controller, a memory-mapped device, and an interconnect fabric communicatively coupling the lower level cache and the memory-mapped device. In response to a first instruction in the processor core, a copy-type request specifying a source real address is transmitted to the lower level cache. In response to a second instruction in the processor core, a paste-type request specifying a destination real address associated with the memory-mapped device is transmitted to the lower level cache. In response to receipt of the copy-type request, the lower level cache copies a data granule from a storage location specified by the source real address into a non-architected buffer. In response to receipt of the paste-type request, the lower level cache issues on the interconnect fabric a command that writes the data granule from the non-architected buffer to the memory-mapped device. 1. A method of data processing in a data processing system including a processor core having a store-through upper level cache and a store-in lower level cache , a memory controller , a memory-mapped device , and an interconnect fabric communicatively coupling the lower level cache and the memory-mapped device , the method comprising:in response to a first instruction in the processor core, generating a copy-type request and transmitting the copy-type request to the lower level cache, wherein the copy-type request specifies a source real address;in response to a second instruction in the processor core, generating a paste-type request and transmitting the paste-type request to the lower level cache, wherein the paste-type request specifies a destination real address associated with the memory-mapped device;in response to receipt of the copy-type request from the processor core at the lower level cache, the lower level cache copying a data granule from a storage location specified by the source real ...

Подробнее
22-02-2018 дата публикации

MEMORY ACCESS IN A DATA PROCESSING SYSTEM UTILIZING COPY AND PASTE INSTRUCTIONS

Номер: US20180052605A1
Принадлежит:

A data processing system includes a processor core having a store-through upper level cache and a store-in lower level cache. In response to a first instruction, the processor core generates a copy-type request and transmits the copy-type request to the lower level cache, where the copy-type request specifies a source real address. In response to a second instruction, the processor core generates a paste-type request and transmits the paste-type request to the lower level cache, where the paste-type request specifies a destination real address. In response to receipt of the copy-type request, the lower level cache copies a data granule from a storage location specified by the source real address into a non-architected buffer, and in response to receipt of the paste-type request, the lower level cache writes the data granule from the non-architected buffer to a storage location specified by the destination real address. 1. A method of data processing in a data processing system including a processor core having a store-through upper level cache and a store-in lower level cache , the method comprising:in response to a first instruction in the processor core, generating a copy-type request and transmitting the copy-type request to the lower level cache, wherein the copy-type request specifies a source real address;in response to a second instruction in the processor core, generating a paste-type request and transmitting the paste-type request to the lower level cache, wherein the paste-type request specifies a destination real address;in response to receipt of the copy-type request from the processor core at the lower level cache, the lower level cache copying a data granule from a storage location specified by the source real address into a non-architected buffer; andin response to receipt of the paste-type request from the processor core at the lower level cache, the lower level cache writing the data granule from the non-architected buffer to a storage location ...

Подробнее
22-02-2018 дата публикации

EFFICIENT ENFORCEMENT OF BARRIERS WITH RESPECT TO MEMORY MOVE SEQUENCES

Номер: US20180052606A1
Принадлежит:

In a data processing system implementing a weak memory model, a lower level cache receives, from a processor core, a plurality of copy-type requests and a plurality of paste-type requests that together indicate a memory move to be performed. The lower level cache also receives, from the processor core, a barrier request that requests enforcement of ordering of memory access requests prior to the barrier request with respect to memory access requests after the barrier request. In response to the barrier request, the lower level cache enforces a barrier indicated by the barrier request with respect to a final paste-type request ending the memory move but not with respect to other copy-type requests and paste-type requests in the memory move. 1. A method of data processing in a data processing system implementing a weak memory model , wherein the data processing system includes a processor core having a store-through upper level cache and a store-in lower level cache , the method comprising:the lower level cache receiving, from the processor core, a plurality of copy-type requests and a plurality of paste-type requests that together indicate a memory move to be performed;the lower level cache receiving, from the processor core, a barrier request that requests enforcement of ordering of memory access requests prior to the barrier request with respect to memory access requests after the barrier request; andin response to the barrier request, the lower level cache enforcing a barrier indicated by the barrier request with respect to a final paste-type request ending the memory move but not with respect to other copy-type requests and paste-type requests in the memory move.2. The method of claim 1 , wherein:receiving the barrier request includes receiving the barrier request in a store queue of the lower level cache;receiving the plurality of copy-type requests and the plurality of paste-type requests includes receiving the plurality of copy-type requests and the plurality ...

Подробнее
22-02-2018 дата публикации

MIGRATION OF MEMORY MOVE INSTRUCTION SEQUENCES BETWEEN HARDWARE THREADS

Номер: US20180052607A1
Принадлежит:

A data processing system includes at least one processor core each having an associated store-through upper level cache and an associated store-in lower level cache. In response to execution of a memory move instruction sequence including a plurality of copy-type instructions and a plurality of paste-type instructions, the at least one processor core transmits a corresponding plurality of copy-type and paste-type requests to its associated lower level cache, where each copy-type request specifies a source real address and each paste-type request specifies a destination real address. In response to receipt of each copy-type request, the associated lower level cache copies a respective data granule from a respective storage location specified by the source real address of that copy-type request into a non-architected buffer. In response to receipt of each paste-type request, the associated lower level cache writes a respective one of the data granules from the non-architected buffer to a respective storage location specified by the destination real address. The memory move instruction sequence begins execution on a first hardware thread and continues on a second hardware thread. 1. A method of data processing in a data processing system including at least one processor core each having an associated store-through upper level cache and an associated store-in lower level cache , the method comprising:in response to execution of a memory move instruction sequence including a plurality of copy-type instructions and a plurality of paste-type instructions, the at least one processor core transmitting a corresponding plurality of copy-type and paste-type requests to its associated lower level cache, wherein each copy-type request specifies a source real address and each paste-type request specifies a destination real address;in response to receipt of each copy-type request, the associated lower level cache copying a respective one of a plurality of data granules from a ...

Подробнее
22-02-2018 дата публикации

MEMORY MOVE INSTRUCTION SEQUENCE ENABLING SOFTWARE CONTROL

Номер: US20180052608A1
Принадлежит:

A processor core of a data processing system, in response to a first instruction, generates a copy-type request specifying a source real address and transmits it to a lower level cache. In response to a second instruction, the processor core generates a paste-type request specifying a destination real address associated with a memory-mapped device and transmits it to the lower level cache. In response to receipt of the copy-type request, the lower level cache copies a data granule from a storage location specified by the source real address into a non-architected buffer. In response to receipt of the paste-type request, the lower level cache issues a command to write the data granule from the non-architected buffer to the memory-mapped device. In response to receipt from the memory-mapped device of a busy response, the processor core abandons the memory move instruction sequence and performs alternative processing. 1. A method of data processing in a data processing system including a processor core having a store-through upper level cache and a store-in lower level cache , a memory controller , and a memory-mapped device , the method comprising: in response to a first instruction, the processor core generating a copy-type request specifying a source real address and transmitting the copy-type request to the lower level cache;', 'in response to a second instruction, the processor core generating a paste-type request specifying a destination real address associated with the memory-mapped device and transmitting the paste-type request to the lower level cache;, 'the processor core executing a memory move instruction sequence, wherein the executing includesin response to the copy-type request, the lower level cache copying a data granule from a storage location specified by the source real address into a non-architected buffer; andin response to the paste-type request, the lower level cache issuing a command to write the data granule from the non-architected buffer to ...

Подробнее
22-02-2018 дата публикации

Memory move instruction sequence including a stream of copy-type and paste-type instructions

Номер: US20180052687A1
Принадлежит: International Business Machines Corp

A processor core has a store-through upper level cache and a store-in lower level cache. In response to execution of a memory move instruction sequence including a plurality of copy-type instruction and a plurality of paste-type instructions, the processor core transmits a corresponding plurality of copy-type and paste-type requests to the lower level cache, where each copy-type request specifies a source real address and each paste-type request specifies a destination real address. In response to receipt of each copy-type request, the lower level cache copies a respective one of a plurality of data granules from a respective storage location specified by the source real address of that copy-type request into a non-architected buffer. In response to receipt of each paste-type request, the lower level cache writes a respective one of the plurality of data granules from the non-architected buffer to a respective storage location specified by the destination real address of that paste-type request.

Подробнее
22-02-2018 дата публикации

MEMORY MOVE INSTRUCTION SEQUENCE TARGETING AN ACCELERATOR SWITCHBOARD

Номер: US20180052688A1
Принадлежит:

A processor core of a data processing system, in response to a first instruction, generates a copy-type request specifying a source real address and transmits it to a lower level cache. In response to a second instruction, the processor core generates a paste-type request specifying a destination real address associated with a memory-mapped device and transmits it to the lower level cache. In response to the copy-type request, the lower level cache copies a data granule from a storage location specified by the source real address into a non-architected buffer. In response to the paste-type request, the lower level cache writes the data granule from the non-architected buffer to the memory-mapped device. In response to receipt of the data granule, the memory-mapped device stores the data granule in a queue in the system memory associated with a hardware device of the data processing system. 1. A method of data processing in a data processing system including a processor core having a store-through upper level cache and a store-in lower level cache , a memory controller coupled to a system memory , a memory-mapped device and a hardware device , the method comprising:in response to a first instruction in the processor core, generating a copy-type request and transmitting the copy-type request to the lower level cache, wherein the copy-type request specifies a source real address;in response to a second instruction in the processor core, generating a paste-type request and transmitting the paste-type request to the lower level cache, wherein the paste-type request specifies a destination real address associated with a memory-mapped device;in response to receipt of the copy-type request from the processor core at the lower level cache, the lower level cache copying a data granule from a storage location specified by the source real address into a non-architected buffer;in response to receipt of the paste-type request from the processor core at the lower level cache, the ...

Подробнее
03-03-2016 дата публикации

CACHE BACKING STORE FOR TRANSACTIONAL MEMORY

Номер: US20160062891A1

In response to a transactional store request, the higher level cache transmits, to the lower level cache, a backup copy of an unaltered target cache line in response to a target real address hitting in the higher level cache, updates the target cache line with store data to obtain an updated target cache line, and records the target real address as belonging to a transaction footprint of the memory transaction. In response to a conflicting access to the transaction footprint prior to completion of the memory transaction, the higher level cache signals failure of the memory transaction to the processor core, invalidates the updated target cache line in the higher level cache, and causes the backup copy of the target cache line in the lower level cache to be restored as a current version of the target cache line. 16.-. (canceled)7. A processing unit , comprising:a processor core;a lower level cache; [ in response to the target real address hitting in the higher level cache, the higher level cache transmitting, to the lower level cache, a backup copy of the target cache line unaltered by the store data;', 'the higher level cache updating the target cache line with the store data to obtain an updated target cache line;', 'the higher level cache recording the target real address as belonging to a transaction footprint of the memory transaction; and, 'responsive to receipt at the higher level cache of a transactional store request of the processor core generated by execution of a transactional store instruction within a memory transaction, the transactional store request specifying a target real address of a target cache line and store data, 'responsive to a conflicting access to the transaction footprint prior to completion of the memory transaction, the higher level cache signaling failure of the memory transaction to the processor core, invalidating the updated target cache line in the higher level cache, and causing the backup copy of the target cache line in the ...

Подробнее
03-03-2016 дата публикации

CACHE BACKING STORE FOR TRANSACTIONAL MEMORY

Номер: US20160062892A1

In response to a transactional store request, the higher level cache transmits, to the lower level cache, a backup copy of an unaltered target cache line in response to a target real address hitting in the higher level cache, updates the target cache line with store data to obtain an updated target cache line, and records the target real address as belonging to a transaction footprint of the memory transaction. In response to a conflicting access to the transaction footprint prior to completion of the memory transaction, the higher level cache signals failure of the memory transaction to the processor core, invalidates the updated target cache line in the higher level cache, and causes the backup copy of the target cache line in the lower level cache to be restored as a current version of the target cache line. 1. A method of data processing in a data processing system including a plurality of processor cores including a processor core supported by a higher and lower level caches , the method comprising: in response to the target real address hitting in the higher level cache, the higher level cache transmitting, to the lower level cache, a backup copy of the target cache line unaltered by the store data;', 'the higher level cache updating the target cache line with the store data to obtain an updated target cache line;', 'the higher level cache recording the target real address as belonging to a transaction footprint of the memory transaction; and, 'in response to receipt at the higher level cache of a transactional store request of the processor core generated by execution of a transactional store instruction within a memory transaction, the transactional store request specifying a target real address of a target cache line and store datain response to a conflicting access to the transaction footprint prior to completion of the memory transaction, the higher level cache signaling failure of the memory transaction to the processor core, invalidating the updated ...

Подробнее
22-05-2014 дата публикации

SELECTIVE POSTED DATA ERROR DETECTION BASED ON REQUEST TYPE

Номер: US20140143611A1

In a data processing system, a selection is made, based at least on an access type of a memory access request, between at least a first timing and a second timing of data transmission with respect to completion of error detection processing on a target memory block of the memory access request. In response to receipt of the memory access request and selection of the first timing, data from the target memory block is transmitted to a requestor prior to completion of error detection processing on the target memory block. In response to receipt of the memory access request and selection of the second timing, data from the target memory block is transmitted to the requestor after and in response to completion of error detection processing on the target memory block. 110-. (canceled)11. A memory subsystem of a data processing system , comprising:an error detection circuit; andcontrol logic coupled to system memory, wherein the control logic selects, based at least on an access type of a memory access request, between at least a first timing and a second timing of data transmission with respect to completion of error detection processing by the error detection circuit on a target memory block of the memory access request, wherein the control logic, responsive to receipt of the memory access request and selection of the first timing, causes the memory subsystem to transmit data from the target memory block to a requestor prior to completion of error detection processing on the target memory block by the error detection circuit, and wherein the control logic, responsive to receipt of the memory access request and selection of the second timing, causes the memory subsystem to transmit data from the target memory block to the requestor after and in response to completion of error detection processing on the target memory block by the error detection circuit.12. The memory subsystem of claim 11 , wherein the control logic selects the first timing for a memory access request ...

Подробнее
22-05-2014 дата публикации

SELECTIVE POSTED DATA ERROR DETECTION BASED ON REQUEST TYPE

Номер: US20140143613A1

In a data processing system, a selection is made, based at least on an access type of a memory access request, between at least a first timing and a second timing of data transmission with respect to completion of error detection processing on a target memory block of the memory access request. In response to receipt of the memory access request and selection of the first timing, data from the target memory block is transmitted to a requestor prior to completion of error detection processing on the target memory block. In response to receipt of the memory access request and selection of the second timing, data from the target memory block is transmitted to the requestor after and in response to completion of error detection processing on the target memory block. 1. A method of data processing , comprising:selecting, based at least on an access type of a memory access request, between at least a first timing and a second timing of data transmission with respect to completion of error detection processing on a target memory block of the memory access request;in response to receipt of the memory access request and selection of the first timing, transmitting data from the target memory block to a requestor prior to completion of error detection processing on the target memory block; andin response to receipt of the memory access request and selection of the second timing, transmitting data from the target memory block to the requestor after and in response to completion of error detection processing on the target memory block.2. The method of claim 1 , wherein the selecting includes selecting the first timing for a memory access request that is one of a set including a demand data load request generated by execution of a load-type instruction and a data prefetch request.3. The method of claim 1 , wherein the selecting includes a memory subsystem selecting between at least the first timing and the second timing.4. The method of claim 1 , wherein the selecting further ...

Подробнее
27-02-2020 дата публикации

DROPPED COMMAND TRUNCATION FOR EFFICIENT QUEUE UTILIZATION IN MULTIPROCESSOR DATA PROCESSING SYSTEM

Номер: US20200065276A1

Data processing in a data processing system including a plurality of processing nodes coupled by a communication link includes receiving a first command from a first processing node. A link stall of the communication link is detected by a first link layer of the first processing node. A stop command is received at a first transaction layer of the first processing node from the first link layer. The first command is truncated by the first transaction layer into a first truncated command responsive to receiving the stop command. A command arbiter is instructed to stop issuing new commands. The first truncated command is forwarded to an asynchronous crossing buffer of the first processing node. 1. A method of data processing in a data processing system including a plurality of processing nodes coupled by a communication link , the method comprising:receiving a first command from a first processing node;detecting, by a first link layer of the first processing node, a link stall of the communication link;receiving a stop command at a first transaction layer of the first processing node from the first link layer;truncating, by the first transaction layer, the first command into a first truncated command responsive to receiving the stop command;instructing a command arbiter to stop issuing new commands; andforwarding the first truncated command to an asynchronous crossing buffer of the first processing node.2. The method of claim 1 , further comprising:queuing the first truncated command in the asynchronous crossing buffer.3. The method of claim 2 , further comprising:detecting a recovery of the communication link; andforwarding the first truncated command to the link from the asynchronous crossing buffer responsive to the detecting of the recovery of the communication link.4. The method of claim 3 , further comprising:receiving the first truncated command at a second transaction layer of a second processing node;detecting the first truncated command by the second ...

Подробнее
29-05-2014 дата публикации

COHERENT PROXY FOR ATTACHED PROCESSOR

Номер: US20140149681A1

A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP. 18-. (canceled)9. A coherent attached processor proxy (CAPP) , comprising:transport logic having a first interface configured to support communication with a system fabric of a primary coherent system a second interface configured to support communication with an attached processor (AP) including a cache memory that holds copies of memory blocks belonging to a coherent address space of the primary coherent system;snooper logic that services snooped memory access requests received from the system fabric on behalf of the AP; andmaster logic that manages memory access requests within the primary coherent system on behalf of the AP, wherein the master logic, responsive to receiving a memory access request from the AP and an expected coherence state of a target address of the memory access request with respect to the cache memory of the AP, determines a coherence state of the target address with respect to the CAPP and determines whether or not the expected state matches the coherence state determined by the CAPP, and wherein the master logic, responsive to determining that the expected ...

Подробнее
29-05-2014 дата публикации

PROGRAMMABLE COHERENT PROXY FOR ATTACHED PROCESSOR

Номер: US20140149682A1

A coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed. 110-. (canceled)11. A coherent attached processor proxy (CAPP) , comprising:transport logic having a first interface configured to support communication with a system fabric of a primary coherent system a second interface configured to support communication with an attached processor (AP) external to the primary coherent system;snooper logic that participates on behalf of the AP in operations snooped from the system fabric; andmaster logic that participates in operations initiated by the master logic on the system fabric on behalf of the AP, wherein each of the operations includes multiple components communicated with the CAPP including a request and at least one coherence message;wherein the CAPP determines one or more of the components of the operations by reference to at least one programmable data structure within the CAPP and wherein the CAPP updates the at least one programmable data structure responsive to an input.12. The coherent attached processor proxy (CAPP) of claim 11 , wherein the one or more of the components of the operation determined by the CAPP by reference to the at least one programmable data structure include a partial response of the AP.13. The coherent attached processor proxy (CAPP) of claim 11 , wherein the one or more of the components of the operation determined by the CAPP by reference to the at least one programmable data structure include a combined response that represents a ...

Подробнее
29-05-2014 дата публикации

PROGRAMMABLE COHERENT PROXY FOR ATTACHED PROCESSOR

Номер: US20140149683A1

A coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed. 1. A method of data processing , comprising:a coherent attached processor proxy (CAPP) within a primary coherent system participating in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP, wherein the operation includes multiple components communicated with the CAPP including a request and at least one coherence message;the CAPP determining one or more of the components of the operation by reference to at least one programmable data structure within the CAPP; andreprogramming the at least one programmable data structure within the CAPP.2. The method of claim 1 , wherein determining one or more of the components of the operation by reference to at least one programmable data structure within the CAPP includes determining a partial response of the AP.3. The method of claim 1 , wherein determining one or more of the components of the operation by reference to at least one programmable data structure within the CAPP includes determining a combined response that represents a systemwide coherence response to the request.4. The method of claim 3 , wherein:participating in the operation includes a master machine within the CAPP issuing the request; anddetermining a combined response that represents a systemwide coherence response to the request includes the CAPP translating the combined ...

Подробнее
29-05-2014 дата публикации

COHERENT PROXY FOR ATTACHED PROCESSOR

Номер: US20140149689A1
Принадлежит:

A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP. 1. A method of data processing , comprising:receiving a memory access request from an attached processor (AP) at a coherent attached processor proxy (CAPP) of a primary coherent system, wherein the CAPP includes a CAPP directory of contents of a cache memory in the AP that holds copies of memory blocks belonging to a coherent address space of the primary coherent system, and wherein the receiving includes receiving an expected coherence state of a target address of the memory access request with respect to the cache memory of the AP;in response to receiving the memory access request at the CAPP, the CAPP determining a coherence state of the target address with respect to the CAPP and determining whether or not the expected state matches the coherence state determined by the CAPP;in response to determining that the expected state matches the coherence state determined by the CAPP, the CAPP issuing a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system; andin response to determining that the expected state does not match the ...

Подробнее
12-06-2014 дата публикации

VIRTUAL MACHINES FAILOVER

Номер: US20140164701A1
Принадлежит:

Disclosed is a computer system () comprising a processor unit () adapted to run a virtual machine in a first operating mode; a cache () accessible to the processor unit, said cache including a cache controller (); and a memory () accessible to the cache controller for storing an image of said virtual machine; wherein the processor unit is adapted to create a log () in the memory prior to running the virtual machine in said first operating mode; the cache controller is adapted to transfer a modified cache line from the cache to the memory; and write only the memory address of the transferred modified cache line in the log; and the processor unit is further adapted to update a further image of the virtual machine in a different memory location, e.g. on another computer system, by retrieving the memory addresses stored in the log, retrieve the modified cache lines from the memory addresses and update the further image with said modifications. A computer cluster including such computer systems, a method of managing such a computer cluster and a computer program product are also disclosed. 1. A method of operating a computer cluster including a first computer system having a memory storing an image of a virtual machine , at least one processor unit , and a cache accessible to the at least one processor unit , wherein each processor unit is adapted to run the virtual machine in a first operating mode , the method comprising: defining a log in the memory of the first computer system;', 'running the virtual machine using said image; and', 'upon transferring a modified cache line to the memory or another cache of the first computer system, writing only the memory address of said cache line in the log; and, 'in said first operating mode, a processor unit of the first computer system reading the memory addresses from the log in the memory;', 'retrieving the cache lines stored at said memory addresses; and', 'updating the further image of the virtual machine with the retrieved ...

Подробнее
12-06-2014 дата публикации

VIRTUAL MACHINE FAILOVER

Номер: US20140164709A1

Disclosed is a computer system () comprising a processor unit () adapted to run a virtual machine in a first operating mode; a cache () accessible to the processor unit, said cache including a cache controller (); and a memory () accessible to the cache controller for storing an image of said virtual machine; wherein the processor unit is adapted to create a log () in the memory prior to running the virtual machine in said first operating mode; the cache controller is adapted to transfer a modified cache line from the cache to the memory; and write only the memory address of the transferred modified cache line in the log; and the processor unit is further adapted to update a further image of the virtual machine in a different memory location, e.g. on another computer system, by retrieving the memory addresses stored in the log, retrieve the modified cache lines from the memory addresses and update the further image with said modifications. A computer cluster including such computer systems, a method of managing such a computer cluster and a computer program product are also disclosed. 1. A computer system comprising:a processor unit adapted to run a virtual machine in a first operating mode;a cache accessible to the processor unit, said cache including a cache controller; anda memory accessible to the cache controller that stores an image of said virtual machine, wherein:the processor unit is adapted to provide a replication manager adapted to define a log in the memory prior to running the virtual machine in said first operating mode; transfer a modified cache line from the cache to the memory or another cache; and', 'write only the memory address of the transferred modified cache line in the defined log; and, 'the cache controller is adapted tothe processor unit is further adapted to update a further image of the virtual machine in a different memory location by:retrieving the memory addresses from the log;retrieving the modified cache lines from the memory ...

Подробнее
12-06-2014 дата публикации

VIRTUAL MACHINES FAILOVER

Номер: US20140164710A1

Disclosed is a computer system () comprising a processor unit () adapted to run a virtual machine in a first operating mode; a cache () accessible to the processor unit, said cache comprising a plurality of cache rows (), each cache row comprising a cache line () and an image modification flag () indicating a modification of said cache line caused by the running of the virtual machine; and a memory () accessible to the cache controller for storing an image of said virtual machine; wherein the processor unit comprises a replication manager adapted to define a log () in the memory prior to running the virtual machine in said first operating mode; and said cache further includes a cache controller () adapted to periodically check said image modification flags; write only the memory address of the flagged cache lines in the defined log and subsequently clear the image modification flags. A computer cluster including such computer systems and a method of managing such a computer cluster are also disclosed. 1. A method of operating a computer cluster including a first computer system including a memory storing an image of a virtual machine , at least one processor unit adapted to run the virtual machine in a first operation mode , and a cache accessible to the at least one processor unit , said cache including , for each of a plurality cache lines , a respective associated one of a plurality of image modification flags indicating modification of the associated cache line caused by the running of the virtual machine , the method comprising: defining a log in the memory of the first computer system;', 'running the virtual machine using said image;', 'upon modifying a cache line of the cache during said running step, signaling modification of the cache line by setting the associated image modification flag; and, 'in the first operation mode, a processor unit of the first computer system 'writing the memory addresses only of cache lines indicated as modified by', 'the ...

Подробнее
12-06-2014 дата публикации

VIRTUAL MACHINE FAILOVER

Номер: US20140165056A1

Disclosed is a computer system () comprising a processor unit () adapted to run a virtual machine in a first operating mode; a cache () accessible to the processor unit, said cache comprising a plurality of cache rows (), each cache row comprising a cache line () and an image modification flag () indicating a modification of said cache line caused by the running of the virtual machine; and a memory () accessible to the cache controller for storing an image of said virtual machine; wherein the processor unit comprises a replication manager adapted to define a log () in the memory prior to running the virtual machine in said first operating mode; and said cache further includes a cache controller () adapted to periodically check said image modification flags; write only the memory address of the flagged cache lines in the defined log and subsequently clear the image modification flags. A computer cluster including such computer systems and a method of managing such a computer cluster are also disclosed. 1. A computer system comprising:a processor unit adapted to run a virtual machine in a first operating mode;a cache accessible to the processor unit, said cache including, for each of a plurality of cache lines, a respective associated one of a plurality of image modification flags indicating a modification of the associated cache line caused by the running of the virtual machine; anda memory accessible to the cache controller that stores an image of said virtual machine; wherein:the processor unit includes a replication manager adapted to define a log in the memory prior to running the virtual machine in said first operating mode; and periodically check the plurality of image modification flags; and', 'write the memory address of the cache lines in the defined log that are indicated as modified by the plurality of image modification flags., 'said cache further includes a cache controller adapted to2. The computer system of claim 1 , wherein the cache controller is ...

Подробнее
09-04-2015 дата публикации

Techniques for Moving Checkpoint-Based High-Availability Log and Data Directly From a Producer Cache to a Consumer Cache

Номер: US20150100731A1

A technique of operating a data processing system, includes logging addresses for cache lines modified by a producer core in a data array of a producer cache to create a high-availability (HA) log for the producer core. The technique also includes moving the HA log directly from the producer cache to a consumer cache of a consumer core and moving HA data associated with the addresses of the HA log directly from the producer cache to the consumer cache. The HA log corresponds to a cache line that includes multiple of the addresses. Finally, the technique includes processing, by the consumer core, the HA log and the HA data for the data processing system. 17-. (canceled)8. A data processing system , comprising:a producer core;a producer cache coupled to the producer core;a consumer core; anda consumer cache coupled to the consumer core, wherein the producer cache is configured to log addresses for cache lines modified by the producer core in a data array of the producer cache to create a high-availability (HA) log for the producer core, write the HA log directly into the consumer cache of the consumer core, and write HA data associated with the addresses of the HA log directly into the consumer cache, and wherein the HA log corresponds to a cache line that includes multiple of the addresses and the consumer core is configured to process the HA log and the HA data for the data processing system.9. The data processing system of claim 8 , wherein the modified cache lines are indicated by an HA bit.10. The data processing system of claim 8 , wherein the addresses for the modified cache lines are logged in an intermediate buffer of the producer cache.11. The data processing system of claim 10 , wherein the HA log is transferred from the intermediate buffer to a circular buffer of the consumer cache in response to the intermediate buffer being full and is written into the consumer cache by injecting the HA log stored in the circular buffer into the consumer cache.12. The ...

Подробнее
09-04-2015 дата публикации

Moving Checkpoint-Based High-Availability Log and Data Directly From a Producer Cache to a Consumer Cache

Номер: US20150100732A1

A technique of operating a data processing system, includes logging addresses for cache lines modified by a producer core in a data array of a producer cache to create a high-availability (HA) log for the producer core. The technique also includes moving the HA log directly from the producer cache to a consumer cache of a consumer core and moving HA data associated with the addresses of the HA log directly from the producer cache to the consumer cache. The HA log corresponds to a cache line that includes multiple of the addresses. Finally, the technique includes processing, by the consumer core, the HA log and the HA data for the data processing system. 1. A method of operating a data processing system , comprising:logging addresses for cache lines modified by a producer core in a data array of a producer cache to create a high-availability (HA) log for the producer core, wherein the HA log corresponds to a cache line that includes multiple of the addresses;moving the HA log directly from the producer cache to a consumer cache of a consumer core;moving HA data associated with the addresses of the HA log directly from the producer cache to the consumer cache; andprocessing, by the consumer core, the HA log and the HA data for the data processing system.2. The method of claim 1 , wherein the modified cache lines are indicated by an HA bit.3. The method of claim 1 , wherein the logging addresses for cache lines modified by a producer core in a producer cache includes logging addresses for the modified cache lines in an intermediate buffer associated with the producer cache.4. The method of claim 3 , wherein the moving the HA log directly from the producer cache to a consumer cache includes transferring the HA log from the intermediate buffer to a circular buffer in response to the intermediate buffer being full and injecting the HA log stored in the circular buffer into the consumer cache using a cache injection command.5. The method of claim 1 , wherein the moving HA ...

Подробнее
05-04-2018 дата публикации

PRE-TRANSMISSION DATA REORDERING FOR A SERIAL INTERFACE

Номер: US20180095905A1

A serial communication system includes a transmitting circuit for serially transmitting data via a serial communication link including N channels where N is an integer greater than 1. The transmitting circuit includes an input buffer having storage for input data frames each including M bytes forming N segments of M/N contiguous bytes. The transmitting circuit additionally includes a reordering circuit coupled to the input buffer. The reordering circuit includes a reorder buffer including multiple entries. The reordering circuit buffers, in each of multiple entries of the reorder buffer, a byte in a common byte position in each of the N segments of an input data frame. The reordering circuit sequentially outputs the contents of the entries of the reorder buffer via the N channels of the serial communication link. 1. A serial communication system , comprising: an input buffer having storage for input data frames each including M bytes forming N segments of M/N contiguous bytes;', 'a reordering circuit coupled to the input buffer, wherein the reordering circuit includes a reorder buffer, and wherein the reordering circuit buffers, in each of multiple entries of the reorder buffer, a byte in a common byte position in each of the N segments of an input data frame, and wherein the reordering circuit sequentially outputs the contents of the entries of the reorder buffer via the N channels of the serial communication link., 'a transmitting circuit for serially transmitting data via a serial communication link including N channels where N is an integer greater than 1, the transmitting circuit including2. The serial communication system of claim 1 , and further comprising:a receiving circuit coupled to the serial communication link, wherein the receiving circuit includes a deserializing circuit that reassembles input data frames from serial data received from the transmitting circuit via the serial communication link.3. The serial communication system of claim 2 , wherein ...

Подробнее
09-04-2020 дата публикации

Information Handling System with Immediate Scheduling of Load Operations

Номер: US20200110704A1
Принадлежит: International Business Machines Corp

An information handling system (IHS) includes a processor with a cache memory system. The processor includes a processor core with an L1 cache memory that couples to an L2 cache memory. The processor includes an arbitration mechanism that controls load and store requests to the L2 cache memory. The arbitration mechanism includes control logic that enables a load request to interrupt a store request that the L2 cache memory is currently servicing. When the L2 cache memory finishes servicing the interrupting load request, the L2 cache memory may return to servicing the interrupted store request at the point of interruption.

Подробнее
25-04-2019 дата публикации

MANAGING EFFICIENT SELECTION OF A PARTICULAR PROCESSOR THREAD FOR HANDLING AN INTERRUPT

Номер: US20190121760A1
Принадлежит:

A processing unit connected via a system fabric to multiple processing units calls a first single command in a bus protocol that allows sampling over the system fabric of the capability of snoopers distributed across the processing units to handle an interrupt. The processing unit, in response to detecting at least one first selection of snoopers with capability to handle the interrupt, calling a second single command in the bus protocol to poll the first selection of snoopers over the system fabric for an availability status. The processing unit, in response to detecting at least one second selection of snoopers respond with the available status indicating an availability to handle the interrupt, assigning a single snooper from among the second selection of snoopers to handle the interrupt by calling a third single command in the bus protocol. 1. A method comprising:calling, by a processing unit of a plurality of processing units connected via a system fabric, a first single bus command in a bus protocol that controls sampling over the system fabric of the capability of a plurality of snoopers distributed across the plurality of processing units to handle an interrupt, each of the plurality of snoopers controlling assignment of interrupts to one or more separate selections of processor threads distributed in each plurality of processing units;in response to detecting at least one first selection of snoopers of the plurality of snoopers with capability to handle the interrupt, calling, by the processing unit, a second single bus command in the bus protocol to poll the first selection of snoopers over the system fabric for an availability status; andin response to detecting at least one second selection of snoopers respond with the available status indicating an availability to handle the interrupt, assigning, by the processing unit, a single snooper from among the second selection of snoopers to handle the interrupt by calling a third single bus command in the bus ...

Подробнее
21-05-2015 дата публикации

VIRTUAL MACHINE BACKUP

Номер: US20150143055A1
Принадлежит:

A computer system comprises a processor unit arranged to run a hypervisor running one or more virtual machines, a cache connected to the processor unit and comprising a plurality of cache rows, each cache row comprising a memory address, a cache line and an image modification flag and a memory connected to the cache and arranged to store an image of at least one virtual machine. The processor unit is arranged to define a log in the memory and the cache further comprises a cache controller arranged to set the image modification flag for a cache line modified by a virtual machine being backed up, periodically check the image modification flags and write only the memory address of the flagged cache rows in the defined log. The processor unit is further arranged to monitor the free space available in the defined log and to trigger an interrupt if the free space available falls below a specific amount. 1. A method comprising:indicating, in a log, updates to memory of a virtual machine when the updates are evicted from a cache of the virtual machine;determining a guard band for the log, wherein the guard band indicates a threshold amount of free space for the log;determining that the guard band will be or has been encroached upon corresponding to indicating an update in the log;updating a backup image of the virtual machine based, at least in part, on a set of one or more entries of the log, wherein the set of entries is sufficient to comply with the guard band; andremoving the set of entries from the log.2. The method of claim 1 , wherein said determining a guard band comprises:determining a number of write-back cache lines in the cache;determining a number of instructions in a pipeline for a processor unit that executes instructions issued by the virtual machine;determining a number of additional instructions capable of being issued to the pipeline in the time taken to trigger an interrupt of the processor unit; anddefining the guard band based on a sum of the ...

Подробнее
21-05-2015 дата публикации

DYNAMIC WRITE PRIORITY BASED ON VIRTUAL WRITE QUEUE HIGH WATER MARK

Номер: US20150143056A1

A set associative cache is managed by a memory controller which places writeback instructions for modified (dirty) cache lines into a virtual write queue, determines when the number of the sets containing a modified cache line is greater than a high water mark, and elevates a priority of the writeback instructions over read operations. The controller can return the priority to normal when the number of modified sets is less than a low water mark. In an embodiment wherein the system memory device includes rank groups, the congruence classes can be mapped based on the rank groups. The number of writes pending in a rank group exceeding a different threshold can additionally be a requirement to trigger elevation of writeback priority. A dirty vector can be used to provide an indication that corresponding sets contain a modified cache line, particularly in least-recently used segments of the corresponding sets. 18.-. (canceled)9. A memory controller comprising:a virtual write queue containing writeback instructions for selected cache lines of a cache memory to be written to a system memory device wherein the cache memory logically organizes the cache lines into sets according to different congruence classes; anda cache cleaner which determines when a number of the sets containing at least one modified cache line is greater than a predetermined threshold, and responsively elevates a priority of the writeback instructions.10. The memory controller of wherein the priority of the writeback instructions is elevated over read operations.11. The memory controller of wherein the predetermined threshold is a first predetermined threshold and said cache cleaner claim 9 , after elevating the priority of the writeback instructions claim 9 , further determines that a second number of the sets containing at least one modified cache line is less than a second predetermined threshold claim 9 , and responsively lowers the priority of the writeback instructions.12. The memory controller ...

Подробнее
21-05-2015 дата публикации

DYNAMIC WRITE PRIORITY BASED ON VIRTUAL WRITE QUEUE HIGH WATER MARK

Номер: US20150143059A1

A set associative cache is managed by a memory controller which places writeback instructions for modified (dirty) cache lines into a virtual write queue, determines when the number of the sets containing a modified cache line is greater than a high water mark, and elevates a priority of the writeback instructions over read operations. The controller can return the priority to normal when the number of modified sets is less than a low water mark. In an embodiment wherein the system memory device includes rank groups, the congruence classes can be mapped based on the rank groups. The number of writes pending in a rank group exceeding a different threshold can additionally be a requirement to trigger elevation of writeback priority. A dirty vector can be used to provide an indication that corresponding sets contain a modified cache line, particularly in least-recently used segments of the corresponding sets. 1. A method of managing a cache memory of a computer system comprising:loading memory values into cache lines of the cache memory wherein the cache lines are logically organized into sets according to different congruence classes;modifying memory values in selected cache lines;placing writeback instructions for modified cache lines into a virtual write queue of a system memory device;determining that a number of the sets containing at least one modified cache line is greater than a predetermined threshold; andresponsive to said determining, elevating a priority of the writeback instructions.2. The method of wherein the cache memory is a lowest level cache memory in a memory hierarchy of the computer system.3. The method of wherein said elevating raises the priority of the writeback instructions over read operations.4. The method of further comprising:after elevating the priority of the writeback instructions, second determining that a second number of the sets containing at least one modified cache line is less than a second predetermined threshold; andresponsive ...

Подробнее
28-05-2015 дата публикации

EARLY DATA TAG TO ALLOW DATA CRC BYPASS VIA A SPECULATIVE MEMORY DATA RETURN PROTOCOL

Номер: US20150149854A1

A bypass mechanism allows a memory controller to transmit requested data to an interconnect before the data's error code has been decoded, e.g., a cyclical redundancy check (CRC). The tag, tag CRC, data, and data CRC are pipelined from DRAM in four frames, each having multiple clock cycles. The tag includes a bypass bit indicating whether data transmission to the interconnect should begin before CRC decoding. After receiving the tag CRC, the controller decodes it and reserves a request machine which sends a transmit request signal to inform the interconnect that data is available. Once the transmit request is granted by the interconnect, the controller can immediately start sending the data, before decoding the data CRC. So long as no error is found, the controller completes transmission of the data to the interconnect, including providing an indication that the data as transmitted is error-free. 17.-. (canceled)8. A memory controller comprising:at least one error detection circuit; and receives a tag representing at least a portion of an address for a requested memory value in a memory device,', 'receives a tag error code associated with the tag,', 'makes a first determination that the tag is error-free using the error detection circuit,', 'responsive to the first determination, issues a transmission request to an interconnect between the memory controller and a requesting device,', 'receives a transmission grant from the interconnect,', 'receives data representing the requested memory value,', 'receives data error code associated with the data,', 'makes a second determination that the data is error-free using the error detection circuit,', 'initiates transmission of the data from the memory controller to the interconnect once the transmission grant is received and before making the second determination; and', 'responsive to the second determination, completes transmission of the data from the memory controller to the interconnect, including providing an indication ...

Подробнее
28-05-2015 дата публикации

EARLY DATA TAG TO ALLOW DATA CRC BYPASS VIA A SPECULATIVE MEMORY DATA RETURN PROTOCOL

Номер: US20150149866A1

A bypass mechanism allows a memory controller to transmit requested data to an interconnect before the data's error code has been decoded, e.g., a cyclical redundancy check (CRC). The tag, tag CRC, data, and data CRC are pipelined from DRAM in four frames, each having multiple clock cycles. The tag includes a bypass bit indicating whether data transmission to the interconnect should begin before CRC decoding. After receiving the tag CRC, the controller decodes it and reserves a request machine which sends a transmit request signal to inform the interconnect that data is available. Once the transmit request is granted by the interconnect, the controller can immediately start sending the data, before decoding the data CRC. So long as no error is found, the controller completes transmission of the data to the interconnect, including providing an indication that the data as transmitted is error-free. 1. A method of providing a requested memory value in a computer system comprising:receiving a tag from a memory device of the computer system at a memory controller of the computer system, the tag representing at least a portion of an address for the requested memory value in the memory device;receiving a tag error code associated with the tag from the memory device at the memory controller;decoding the tag error code at the memory controller to make a first determination that the tag is error-free;responsive to the first determination, issuing a transmission request to an interconnect between the memory controller and a requesting device;receiving a transmission grant from the interconnect at the memory controller;receiving data representing the requested memory value from the memory device at the memory controller;receiving data error code associated with the data from the memory device at the memory controller;initiating transmission of the data from the memory controller to the interconnect once the transmission grant is received;after said initiating, decoding the data ...

Подробнее
31-05-2018 дата публикации

MANAGING LOWEST POINT OF COHERENCY (LPC) MEMORY USING A SERVICE LAYER ADAPTER

Номер: US20180150396A1
Принадлежит:

Managing lowest point of coherency (LPC) memory using a service layer adapter, the adapter coupled to a processor and an accelerator on a host computing system, the processor configured for symmetric multi-processing, including receiving, by the adapter, a memory access instruction from the accelerator; retrieving, by the adapter, a real address for the memory access instruction; determining, using base address registers on the adapter, that the real address targets the LPC memory, wherein the base address registers direct memory access requests between the LPC memory and other memory locations on the host computing system; and sending, by the adapter, the memory access instruction and the real address to a media controller for the LPC memory, wherein the media controller for the LPC memory is attached to the adapter via a memory interface. 1. A method of managing lowest point of coherency (LPC) memory using a service layer adapter , the adapter coupled to a processor and an accelerator on a host computing system , the processor configured for symmetric multi-processing , the method comprising:receiving, by the adapter, a memory access instruction from the accelerator;retrieving, by the adapter, a real address for the memory access instruction;determining, using base address registers on the adapter, that the real address targets the LPC memory, wherein the base address registers direct memory access requests between the LPC memory and other memory locations on the host computing system; andsending, by the adapter, the memory access instruction and the real address to a media controller for the LPC memory, wherein the media controller for the LPC memory is attached to the adapter via a memory interface.2. The method of claim 1 , further comprising:receiving, from the processor, a subsequent memory access instruction;determining, using base address registers on the adapter, that the subsequent memory access instruction targets the LPC memory; andsending, by the ...

Подробнее
25-06-2015 дата публикации

COHERENCY OVERCOMMIT

Номер: US20150178205A1

One or more systems, devices, methods, and/or processes described can receive, via an interconnect, messages from processing nodes, and a first portion of the messages can displace a second portion of the messages based on priorities of the first portion of messages or based on expirations times of the second portion of messages. In one example, the second portion of messages can be stored via a buffer of a fabric controller (FBC) of the interconnect, and the first portion of messages, associated with higher priorities than the second portion of messages, can displace the second portion of messages in the buffer. For instance, the second portion of messages can include speculative commands. In another example, the second portion of messages can be stored via the buffer, and the second portion of messages, associated with expiration times, can displace the second portion of messages based on the expiration times. 1. A method of data processing in a data processing system that includes an interconnect and a plurality of processing nodes coupled to the interconnect , the method comprising:a fabric controller receiving, via the interconnect, a plurality of messages from the plurality of processing nodes;the fabric controller storing, via a buffer, at least a first message of the plurality of messages and a second message of the plurality of messages; that a third message of the plurality of messages is associated with a higher priority than a priority associated the first message; and', 'that a first amount of time has transpired that exceeds a first expiration associated with the first message;, 'determining at least one ofstoring, via displacing the first message from the buffer, the third message in the buffer in response to said determining; andtransmitting the first, second and third messages to at least one processor unit.2. The method of claim 1 , wherein:the determining is a first determining;the method further comprises: that a fourth message of the plurality ...

Подробнее
25-06-2015 дата публикации

DETERMINING COMMAND RATE BASED ON DROPPED COMMANDS

Номер: US20150178230A1
Принадлежит:

In one or more embodiments, one or more systems, devices, methods, and/or processes described can send, via an interconnect, a rate master command to at least one of multiple processing nodes; determine that a message indicating a dropped command, associated with the rate master command, is received; determine that a count, associated with dropped commands, satisfies a threshold; and provide, to the processing nodes via the interconnect, a signal indicating a command rate, in response to determining that the count satisfies the threshold. Moreover, the count can be incremented in response to determining that the message is received. The at least one of multiple processing nodes can receive, via the interconnect, the signal indicating the command rate and can utilize the command rate in issuing speculative commands, via the interconnect. 17-. (canceled)8. A data processing system , comprising:an interconnect;a plurality of processing nodes coupled to the interconnect; and send, via the interconnect, a first rate master command to at least one of the plurality of processing nodes;', 'determine that a first message indicating a first dropped command, associated with the first rate master command, is received;', 'determine that a count, associated with dropped commands, satisfies a threshold; and', 'provide, to the plurality of processing nodes via the interconnect, a signal indicating a command rate, in response to determining that the count satisfies the threshold., 'a first scheduler coupled to the interconnect and configured to9. The data processing system of claim 8 , wherein the first scheduler is further configured to increment the count claim 8 , associated with the dropped commands claim 8 , in response to determining that the first message indicating the first dropped command is received.10. The data processing system of claim 8 , wherein the signal provided to the plurality of processing nodes is an out-of-band signal indicating the command rate.11. The data ...

Подробнее
25-06-2015 дата публикации

DETERMINING COMMAND RATE BASED ON DROPPED COMMANDS

Номер: US20150178231A1

In one or more embodiments, one or more systems, devices, methods, and/or processes described can send, via an interconnect, a rate master command to at least one of multiple processing nodes; determine that a message indicating a dropped command, associated with the rate master command, is received; determine that a count, associated with dropped commands, satisfies a threshold; and provide, to the processing nodes via the interconnect, a signal indicating a command rate, in response to determining that the count satisfies the threshold. Moreover, the count can be incremented in response to determining that the message is received. The at least one of multiple processing nodes can receive, via the interconnect, the signal indicating the command rate and can utilize the command rate in issuing speculative commands, via the interconnect. 1. A method of operating a data processing system that includes an interconnect and a plurality of processing nodes coupled to the interconnect , the method comprising:a first scheduler coupled to the interconnect sending, via the interconnect, a first rate master command to at least one of the plurality of processing nodes coupled to the interconnect;the first scheduler determining that a first message indicating a first dropped command, associated with the first rate master command, is received;determining that a count, associated with dropped commands, satisfies a threshold; andproviding, to the plurality of processing nodes via the interconnect, a signal indicating a command rate, in response to said determining that the count satisfies the threshold.2. The method of claim 1 , further comprising:incrementing the count, associated with the dropped commands, in response to said determining that the first message indicating the first dropped command is received.3. The method of claim 1 , wherein the signal provided to the plurality of processing nodes is an out-of-band signal indicating the command rate.4. The method of claim 1 , ...

Подробнее
25-06-2015 дата публикации

COHERENCY OVERCOMMIT

Номер: US20150178233A1
Принадлежит:

One or more systems, devices, methods, and/or processes described can receive, via an interconnect, messages from processing nodes and a first portion of the messages can displace a second portion of the messages based on priorities of the first portion of messages or based on expirations times of the second portion of messages. In one example, the second portion of messages can be stored via a buffer of a fabric controller (FBC) of the interconnect, and the first portion of messages, associated with higher priorities than the second portion of messages, can displace the second portion of messages in the buffer. For instance, the second portion of messages can include speculative commands. In another example, the second portion of messages can be stored via the buffer, and the second portion of messages, associated with expiration times, can displace the second portion of messages based on the expiration times. 17.-. (canceled)8. A data processing system , comprising:an interconnect;a plurality of processing nodes coupled to the interconnect; and responsive to receiving via the interconnect a plurality of messages from the plurality of processing nodes, store, via a buffer, at least a first message of the plurality of messages and a second message of the plurality of messages;', that a third message of the plurality of messages is associated with a higher priority than a priority associated the first message; and', 'that a first amount of time has transpired that exceeds a first expiration associated with the first message;, 'determine at least one of, 'store, via displacing the first message from the buffer, the third message in the buffer in response to the determination; and', 'transmit the first, second and third messages to at least one processor unit., 'a fabric controller configured to9. The data processing system of claim 8 , wherein the fabric controller further: that a fourth message of the plurality of messages is not associated with a higher priority ...

Подробнее
23-06-2016 дата публикации

Non-serialized push instruction for pushing a message payload from a sending thread to a receiving thread

Номер: US20160179517A1
Принадлежит: International Business Machines Corp

In at least some embodiments, a processor core executes a sending thread including a first push instruction and a second push instruction subsequent to the first push instruction in a program order. Each of the first and second push instructions requests that a respective message payload be pushed to a mailbox of a receiving thread. In response to executing the first and second push instructions, the processor core transmits respective first and second co-processor requests to a switch in the data processing system via an interconnect fabric of the data processing system. The processor core transmits the second co-processor request to the switch without regard to acceptance of the first co-processor request by the switch.

Подробнее
23-06-2016 дата публикации

NON-SERIALIZED PUSH INSTRUCTION FOR PUSHING A MESSAGE PAYLOAD FROM A SENDING THREAD TO A RECEIVING THREAD

Номер: US20160179518A1
Принадлежит:

In at least some embodiments, a processor core executes a sending thread including a first push instruction and a second push instruction subsequent to the first push instruction in a program order. Each of the first and second push instructions requests that a respective message payload be pushed to a mailbox of a receiving thread. In response to executing the first and second push instructions, the processor core transmits respective first and second co-processor requests to a switch in the data processing system via an interconnect fabric of the data processing system. The processor core transmits the second co-processor request to the switch without regard to acceptance of the first co-processor request by the switch. 1. A method of data processing within a data processing system , the method comprising:in a processor core, executing a sending thread including a first push instruction and a second push instruction subsequent to the first push instruction in a program order, wherein each of the first and second push instructions requests that a respective message payload be pushed to a mailbox of a receiving thread;in response to executing the first and second push instructions, the processor core transmitting respective first and second co-processor requests to a switch in the data processing system via an interconnect fabric of the data processing system, wherein the transmitting includes the processor core transmitting the second co-processor request to the switch without regard to acceptance of the first co-processor request by the switch.2. The method of claim 1 , wherein the executing includes executing the first and second push instructions without regard to the program order.3. The method of claim 1 , wherein transmitting respective first and second co-processor requests to the switch includes transmitting claim 1 , in each of the first and second co-processor requests claim 1 , a software-supplied sequence number that orders the respective message ...

Подробнее
23-06-2016 дата публикации

Addressing for inter-thread push communication

Номер: US20160179590A1
Принадлежит: International Business Machines Corp

In a data processing system, a switch includes a receive data structure including receive entries each uniquely corresponding to a receive window, where each receive entry includes addressing information for one or more mailboxes into which messages can be injected, a send data structure including send entries each uniquely corresponding to a send window, where each send entry includes a receive window field that identifies one or more receive windows, and switch logic. The switch logic, responsive to a request to push a message to one or more receiving threads, accesses a send entry that corresponds to a send window of the sending thread, utilizes contents of the receive window field of the send entry to access one or more of the receive entries, and pushes the message to one or more mailboxes of one or more receiving threads utilizing the addressing information of the receive entry or entries.

Подробнее
23-06-2016 дата публикации

PUSH INSTRUCTION FOR PUSHING A MESSAGE PAYLOAD FROM A SENDING THREAD TO A RECEIVING THREAD

Номер: US20160179591A1

A processor core of a data processing system receives a push instruction of a sending thread that requests that a message payload identified by at least one operand of the push instruction be pushed to a mailbox of a receiving thread. In response to receiving the push instruction, the processor core executes the push instruction of the sending thread. In response to executing the push instruction, the processor core initiates transmission of the message payload to the mailbox of the receiving thread. In one embodiment, the processor core initiates transmission of the message payload by transmitting a co-processor request to a switch of the data processing system via an interconnect fabric. 17-. (canceled)8. A processing unit , comprising:a memory; anda processor core coupled to the memory, wherein the processor core includes at least one execution unit that, responsive to receiving a push instruction of a sending thread that requests that a message payload identified by at least one operand of the push instruction be pushed to a mailbox of a receiving thread, executes the push instruction, and wherein the processor core, responsive to executing the push instruction, initiates transmission of the message payload to the mailbox of the receiving thread.9. The processing unit of claim 8 , wherein the processor core initiates transmission of the message payload by transmitting a co-processor request to a switch in the data processing system via an interconnect fabric of the data processing system.10. The processing unit of claim 9 , wherein:the push instruction includes a co-processor type parameter;the data processing system includes multiple switches including the switch; andthe processor core transmits the co-processor type parameter in the co-processor request on the interconnect fabric to identify the switch as responsible for servicing the co-processor request.11. The processing unit of claim 9 , wherein:the switch includes a data structure including a plurality of ...

Подробнее
23-06-2016 дата публикации

ADDRESSING FOR INTER-THREAD PUSH COMMUNICATION

Номер: US20160179592A1
Принадлежит:

In a data processing system, a switch includes a receive data structure including receive entries each uniquely corresponding to a receive window, where each receive entry includes addressing information for one or more mailboxes into which messages can be injected, a send data structure including send entries each uniquely corresponding to a send window, where each send entry includes a receive window field that identifies one or more receive windows, and switch logic. The switch logic, responsive to a request to push a message to one or more receiving threads, accesses a send entry that corresponds to a send window of the sending thread, utilizes contents of the receive window field of the send entry to access one or more of the receive entries, and pushes the message to one or more mailboxes of one or more receiving threads utilizing the addressing information of the receive entry or entries. 1. A method of inter-thread push communication in a data processing system , the method comprising:maintaining, in a switch of the data processing system, a receive data structure including a plurality of receive entries each uniquely corresponding to a receive window, wherein each of the plurality of receive entries includes addressing information for one or more mailboxes into which messages can be injected via inter-thread push communication;maintaining, in the switch, a send data structure including a plurality of send entries each uniquely corresponding to a send window, wherein each of the plurality of send entry includes a receive window field that identifies one or more receive windows; and accessing a send entry among the plurality of send entries that corresponds to a send window of the sending thread;', 'accessing one or more of the plurality of receive entries utilizing contents of the receive window field of the send entry; and', 'pushing the message to one or more mailboxes of the one or more receiving threads utilizing the addressing information of the one or ...

Подробнее
23-06-2016 дата публикации

PUSH INSTRUCTION FOR PUSHING A MESSAGE PAYLOAD FROM A SENDING THREAD TO A RECEIVING THREAD

Номер: US20160179593A1
Принадлежит:

A processor core of a data processing system receives a push instruction of a sending thread that requests that a message payload identified by at least one operand of the push instruction be pushed to a mailbox of a receiving thread. In response to receiving the push instruction, the processor core executes the push instruction of the sending thread. In response to executing the push instruction, the processor core initiates transmission of the message payload to the mailbox of the receiving thread. In one embodiment, the processor core initiates transmission of the message payload by transmitting a co-processor request to a switch of the data processing system via an interconnect fabric. 1. A method of data processing within a data processing system , the method comprising:in a processor core, receiving a push instruction of a sending thread that requests that a message payload identified by at least one operand of the push instruction be pushed to a mailbox of a receiving thread; andin response to receiving the push instruction, the processor core executing the push instruction and, in response to executing the push instruction, initiating transmission of the message payload to the mailbox of the receiving thread.2. The method of claim 1 , wherein initiating transmission of the message payload includes transmitting a co-processor request to a switch in the data processing system via an interconnect fabric of the data processing system.3. The method of claim 2 , wherein:receiving the push instruction includes receiving the push instruction with a co-processor type parameter;the data processing system includes multiple switches including the switch; andinitiating transmission of the message payload includes transmitting the co-processor type parameter in the co-processor request on the interconnect fabric to identify the switch as responsible for servicing the co-processor request.4. The method of claim 2 , wherein:the switch includes a data structure including a ...

Подробнее
04-06-2020 дата публикации

SELECTIVELY PREVENTING PRE-COHERENCE POINT READS IN A CACHE HIERARCHY TO REDUCE BARRIER OVERHEAD

Номер: US20200174931A1
Принадлежит:

A data processing system includes a processor core having a shared store-through upper level cache and a store-in lower level cache. The processor core executes a plurality of simultaneous hardware threads of execution including at least a first thread and a second thread, and the shared store-through upper level cache stores a first cache line accessible to both the first thread and the second thread. The processor core executes in the first thread a store instruction that generates a store request specifying a target address of a storage location corresponding to the first cache line. Based on the target address hitting in the shared store-through upper level cache, the first cache line is temporarily marked, in the shared store-through upper level cache, as private to the first thread, such that any memory access request by the second thread targeting the storage location will miss in the shared store-through upper level cache. 1. A method of data processing in a data processing system including a processor core having a shared store-through upper level cache and a store-in lower level cache , the method comprising:the processor core executing a plurality of simultaneous hardware threads of execution including at least a first thread and a second thread;storing in the shared store-through upper level cache a first cache line accessible to both the first thread and the second thread;the processor core executing in the first thread a store instruction that generates a store request specifying a target address of a storage location corresponding to the first cache line; andbased on the target address hitting in the shared store-through upper level cache, temporarily marking, in the shared store-through upper level cache, the first cache line as private to the first thread, such that any memory access request by the second thread targeting the storage location will miss in the shared store-through upper level cache.2. The method of claim 1 , and further comprising ...

Подробнее
15-07-2021 дата публикации

EARLY COMMITMENT OF A STORE-CONDITIONAL REQUEST

Номер: US20210216457A1
Принадлежит:

A data processing system includes multiple processing units all having access to a shared memory system. A processing unit includes a lower level cache configured to serve as a point of systemwide coherency and a processor core coupled to the lower level cache. The processor core includes an upper level cache, an execution unit that executes a store-conditional instruction to generate a store-conditional request that specifies a store target address and store data, and a flag that, when set, indicates the store-conditional request can be completed early in the processor core. The processor core also includes completion logic configured to commit an update of the shared memory system with the store data specified by the store-conditional request based on whether the flag is set. 1. A processing unit for a data processing system including multiple processing units all having access to a shared memory system , said processing unit comprising:a lower level cache configured to serve as a point of systemwide coherency; an upper level cache;', 'an execution unit that executes a store-conditional instruction, wherein execution of the store-conditional instruction generates a store-conditional request that specifies a store target address and store data;', 'a flag that, when set, indicates the store-conditional request can be completed early in the processor core; and', 'completion logic configured to commit an update of the shared memory system with the store data in the processor core based on whether the flag is set., 'a processor core coupled to the lower level cache, the processor core including2. The processing unit of claim 1 , wherein the processor core is configured to set the flag to indicate the store-conditional request can be committed to the shared memory system based on an indication received from the lower level cache in conjunction with data from a target cache line identified by the store target address.3. The processing unit of claim 2 , wherein the lower ...

Подробнее
09-10-2014 дата публикации

TRANSIENT CONDITION MANAGEMENT UTILIZING A POSTED ERROR DETECTION PROCESSING PROTOCOL

Номер: US20140304558A1
Принадлежит:

In a data processing system, a memory subsystem detects whether or not at least one potentially transient condition is present that would prevent timely servicing of one or more memory access requests directed to the associated system memory. In response to detecting at least one such potentially transient condition, the memory system identifies a first read request affected by the at least one potentially transient condition. In response to identifying the read request, the memory subsystem signals to a request source to issue a second read request for the same target address by transmitting to the request source dummy data and a data error indicator. 1. A method of data processing in a data processing system including a memory subsystem that controls access to a system memory , the method comprising:the memory subsystem detecting whether or not at least one potentially transient condition is present that would prevent timely servicing of one or more memory access requests directed to the associated system memory;in response to detecting at least one potentially transient condition that would prevent timely servicing of one or more memory access requests directed to the associated system memory, the memory system identifying a first read request affected by the at least one potentially transient condition, wherein the first read request specifies a target address; andin response to identifying the read request, the memory subsystem signaling to a request source to issue a second read request for the target address by transmitting, to the request source, dummy data and a data error indicator.2. The method of claim 1 , wherein detecting whether or not at least one potentially transient condition is present includes detecting an address error.3. The method of claim 1 , wherein detecting whether or not at least one potentially transient condition is present includes detecting a memory refresh cycle.4. The method of claim 1 , wherein detecting whether or not at least ...

Подробнее
09-10-2014 дата публикации

TRANSIENT CONDITION MANAGEMENT UTILIZING A POSTED ERROR DETECTION PROCESSING PROTOCOL

Номер: US20140304573A1

In a data processing system, a memory subsystem detects whether or not at least one potentially transient condition is present that would prevent timely servicing of one or more memory access requests directed to the associated system memory. In response to detecting at least one such potentially transient condition, the memory system identifies a first read request affected by the at least one potentially transient condition. In response to identifying the read request, the memory subsystem signals to a request source to issue a second read request for the same target address by transmitting to the request source dummy data and a data error indicator. 17.-. (canceled)8. A memory subsystem for controlling access to a system memory of a data processing system , the memory subsystem comprising:error detection circuitry; and detects whether or not at least one potentially transient condition is present that would prevent timely servicing of one or more memory access requests directed to the associated system memory;', 'in response to detecting at least one potentially transient condition that would prevent timely servicing of one or more memory access requests directed to the associated system memory, identifies a first read request affected by the at least one potentially transient condition, wherein the first read request specifies a target address; and', 'in response to identifying the read request, signals to a request source to issue a second read request for the target address by transmitting, to the request source, dummy data and a data error indicator., 'control logic that9. The memory subsystem of claim 8 , wherein the at least one potentially transient condition is an address error.10. The memory subsystem of claim 8 , wherein the at least one potentially transient condition is a memory refresh cycle.11. The memory subsystem of claim 8 , wherein the at least one potentially transient condition is a hang condition.12. The memory subsystem of claim 8 , wherein: ...

Подробнее
18-07-2019 дата публикации

REMOTE NODE BROADCAST OF REQUESTS IN A MULTINODE DATA PROCESSING SYSTEM

Номер: US20190220409A1
Принадлежит:

A cache coherent data processing system includes at least non-overlapping first, second, and third coherency domains. A master in the first coherency domain of the cache coherent data processing system selects a scope of an initial broadcast of an interconnect operation from among a set of scopes including (1) a remote scope including both the first coherency domain and the second coherency domain, but excluding the third coherency domain that is a peer of the first coherency domain, and (2) a local scope including only the first coherency domain. The master then performs an initial broadcast of the interconnect operation within the cache coherent data processing system utilizing the selected scope, where performing the initial broadcast includes the master initiating broadcast of the interconnect operation within the first coherency domain. 1. A method of data processing in a cache coherent data processing system including at least non-overlapping first , second , and third coherency domains , said method comprising:a master in the first coherency domain of the cache coherent data processing system selecting a scope of an initial broadcast of an interconnect operation from among a set of scopes including (1) a remote scope including both the first coherency domain and the second coherency domain, but excluding the third coherency domain that is a peer of the first coherency domain, and (2) a local scope including only the first coherency domain; andthe master performing an initial broadcast of the interconnect operation within the cache coherent data processing system utilizing the selected scope, wherein performing the initial broadcast includes the master initiating broadcast of the interconnect operation within the first coherency domain.2. The method of claim 1 , wherein:the operation includes a request address; andthe selecting includes selecting the selected scope based on which coherency domain includes a home system memory of the request address.3. The ...

Подробнее
27-08-2015 дата публикации

MANAGING SPECULATIVE MEMORY ACCESS REQUESTS IN THE PRESENCE OF TRANSACTIONAL STORAGE ACCESSES

Номер: US20150242250A1

In at least some embodiments, a cache memory of a data processing system receives a speculative memory access request including a target address of data speculatively requested for a processor core. In response to receipt of the speculative memory access request, transactional memory logic determines whether or not the target address of the speculative memory access request hits a store footprint of a memory transaction. In response to determining that the target address of the speculative memory access request hits a store footprint of a memory transaction, the transactional memory logic causes the cache memory to reject servicing the speculative memory access request. 16-. (canceled)7. A processing unit , comprising:a processor core;a cache memory coupled to the processor core; andtransactional memory logic that, responsive to receipt of a speculative memory access request at the cache memory that includes a target address of data speculatively requested for the processor core, determines whether the target address of the speculative memory access request hits a store footprint of a memory transaction and, responsive to determining that the target address of the speculative memory access request hits a store footprint of a memory transaction, causes the cache memory to reject servicing the speculative memory access request.8. The processing unit of claim 7 , wherein the speculative memory access request is a load request.9. The processing unit of claim 7 , wherein the speculative memory access request comprises a data prefetch request.10. The processing unit of claim 7 , wherein:the transactional memory logic determines whether the speculative memory access request is a transactional speculative memory access request or a non-transactional speculative memory access request;the transactional memory logic causes the cache memory to reject servicing the speculative memory access request in response to determining the memory access request is a transactional ...

Подробнее
27-08-2015 дата публикации

MANAGING SPECULATIVE MEMORY ACCESS REQUESTS IN THE PRESENCE OF TRANSACTIONAL STORAGE ACCESSES

Номер: US20150242251A1

In at least some embodiments, a cache memory of a data processing system receives a speculative memory access request including a target address of data speculatively requested for a processor core. In response to receipt of the speculative memory access request, transactional memory logic determines whether or not the target address of the speculative memory access request hits a store footprint of a memory transaction. In response to determining that the target address of the speculative memory access request hits a store footprint of a memory transaction, the transactional memory logic causes the cache memory to reject servicing the speculative memory access request. 1. A method of data processing in a data processing system , the method comprising:at a cache memory of a data processing system, receiving a speculative memory access request, the speculative memory access request including a target address of data speculatively requested for a processor core;in response to receipt of the speculative memory access request, transactional memory logic determining whether the target address of the speculative memory access request hits a store footprint of a memory transaction; andin response to determining that the target address of the speculative memory access request hits a store footprint of a memory transaction, the transactional memory logic causing the cache memory to reject servicing the speculative memory access request.2. The method of claim 1 , wherein the speculative memory access request is a load request.3. The method of claim 1 , wherein the speculative memory access request comprises a data prefetch request.4. The method of claim 1 , wherein:the determining includes determining whether the speculative memory access request is a transactional speculative memory access request or a non-transactional speculative memory access request;causing the cache memory to reject servicing the speculative memory access request comprises causing the cache memory to ...

Подробнее
27-08-2015 дата публикации

SYNCHRONIZING ACCESS TO DATA IN SHARED MEMORY

Номер: US20150242320A1

In some embodiments, in response to execution of a load-reserve instruction that binds to a load target address held in a store-through upper level cache, a processor core sets a core reservation flag, transmits a load-reserve operation to a store-in lower level cache, and tracks, during a core reservation tracking interval, the reservation requested by the load-reserve operation until the store-in lower level cache signals that the store-in lower level cache has assumed responsibility for tracking the reservation. In response to receipt during the core reservation tracking interval of an invalidation signal indicating presence of a conflicting snooped operation, the processor core cancels the reservation by resetting the core reservation flag and fails a subsequent store-conditional operation. Responsive to not canceling the reservation during the core reservation tracking interval, the processor core determines whether a store-conditional operation succeeds by reference to a pass/fail indication provided by the store-in lower level cache. 1. A processing unit for a multiprocessor data processing system , said processing unit comprising:a processor core including a store-through upper level cache, an instruction execution unit, and a core reservation flag; set the core reservation flag;', 'transmit a load-reserve operation to a store-in lower level cache that records a reservation in the store-in lower level cache for the load target address;', 'track, during a core reservation tracking interval, the reservation requested by load-reserve operation until the store-in lower level cache signals that the store-in lower level cache has assumed responsibility for tracking the reservation;', 'responsive to receipt during the core reservation tracking interval of an invalidation signal indicating presence of a conflicting snooped operation, cancel the reservation by resetting the core reservation flag and fails a subsequent store-conditional operation; and', 'responsive to ...

Подробнее
11-12-2014 дата публикации

INTEGRATED CIRCUIT SYSTEM HAVING DECOUPLED LOGICAL AND PHYSICAL INTERFACES

Номер: US20140365733A1

An integrated circuit system including a first integrated circuit chip including first logic, a second integrated circuit chip, and second logic distributed across the first and second integrated circuit chips. The second logic includes a first unit integrated in the first integrated circuit chip and a second unit integrated in the second integrated circuit chip. The integrated circuit system further includes a physical communication link coupling the first unit in the first integrated circuit chip and the second unit in the second integrated circuit chip and a request interface between the first logic and first unit of the second logic. The request interface is implemented in the first integrated circuit such that communication via the request interface between the first logic and the first unit of the second logic has low latency and such that the request interface is decoupled from the physical communication link. 1. An integrated circuit system , comprising:a first integrated circuit chip including first logic;a second integrated circuit chip;second logic distributed across the first and second integrated circuit chips, wherein the second logic includes a first unit integrated in the first integrated circuit chip and a second unit integrated in the second integrated circuit chip;a physical communication link coupling the first unit in the first integrated circuit chip and the second unit in the second integrated circuit chip; anda request interface between the first logic and first unit of the second logic, wherein the request interface is implemented in the first integrated circuit such that communication via the request interface between the first logic and the first unit of the second logic has low latency and such that the request interface is decoupled from the physical communication link.2. The integrated circuit system of claim 1 , wherein the first unit of the second logic includes a cache memory.3. The integrated circuit system of claim 2 , wherein the ...

Подробнее
08-10-2015 дата публикации

TECHNIQUES FOR IMPLEMENTING BARRIERS TO EFFICIENTLY SUPPORT CUMULATIVITY IN A WEAKLY-ORDERED MEMORY SYSTEM

Номер: US20150286569A1

A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure. 17-. (canceled)8. A cache memory , comprising:a data array;a store queue configured to buffer synchronization operations and store operations;a pollution vector control configured to create respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory; anda store queue controller configured to set dependencies in a dependency data structure of the store queue based on the pollution vectors to reduce unnecessary ordering effects and dispatch store operations from the store queue in accordance with the dependencies indicated by the dependency data structure.9. The cache memory of claim 8 , wherein the cache memory is a lower level cache memory and the cache memory is further configured to:compare a load target address of a load operation that hits in an upper level cache memory that is associated with the lower level cache memory to active store target addresses in the lower level cache memory; andin response to a match of the load target address with one or more of the active store target addresses in the lower level cache memory, set an associated one of the respective pollution vectors to indicate that a thread that issued the load operation is polluted by a given store operation active in the lower level cache memory.10. The cache memory of claim 8 , wherein the cache memory is ...

Подробнее
08-10-2015 дата публикации

TECHNIQUES FOR IMPLEMENTING BARRIERS TO EFFICIENTLY SUPPORT CUMULATIVITY IN A WEAKLY-ORDERED MEMORY SYSTEM

Номер: US20150286570A1

A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure. 1. A method of operating a cache memory of a data processing system , comprising:creating, by a pollution vector control, respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory;setting, by a store queue controller, dependencies in a dependency data structure of a store queue of the cache memory based on the pollution vectors to reduce unnecessary ordering effects; anddispatching store operations from the store queue in accordance with the dependencies indicated by the dependency data structure.2. The method of claim 1 , wherein the cache memory is a lower level cache memory and the method further comprises:comparing, by respective comparators, a load target address of a load operation that hits in an upper level cache memory that is associated with the lower level cache memory to active store target addresses in the lower level cache memory; andin response to a match of the load target address with one or more of the active store target addresses in the lower level cache memory, setting, by the pollution vector control, an associated one of the respective pollution vectors to indicate that a thread that issued the load operation is polluted by a given store operation active in the lower level cache memory.3. The method of claim 1 , further comprising:closing, by the store ...

Подробнее
12-10-2017 дата публикации

DECREASING THE DATA HANDOFF INTERVAL IN A MULTIPROCESSOR DATA PROCESSING SYSTEM BASED ON AN EARLY INDICATION OF A SYSTEMWIDE COHERENCE RESPONSE

Номер: US20170293557A1
Принадлежит:

A multiprocessor data processing system includes multiple vertical cache hierarchies supporting a plurality of processor cores, a system memory, and a system interconnect coupled to the system memory and the multiple vertical cache hierarchies. A first cache memory in a first vertical cache hierarchy issues on the system interconnect a request for a target cache line. Responsive to the request and prior to receiving a systemwide coherence response for the request, the first cache memory receives from a second cache memory in a second vertical cache hierarchy by cache-to-cache intervention the target cache line and an early indication of the systemwide coherence response for the request. In response to the early indication of the systemwide coherence response and prior to receiving the systemwide coherence response, the first cache memory initiates processing to install the target cache line in the first cache memory. 1. A method of data processing in a multiprocessor data processing system including multiple vertical cache hierarchies supporting a plurality of processor cores , a system memory , and a system interconnect coupled to the system memory and the multiple vertical cache hierarchies , the method comprising:a first cache memory in a first vertical cache hierarchy issuing on the system interconnect a request for a target cache line;responsive to the request and prior to receiving a systemwide coherence response for the request, the first cache memory receiving from a second cache memory in a second vertical cache hierarchy by cache-to-cache intervention the target cache line and an early indication of the systemwide coherence response for the request; andin response to the early indication of the systemwide coherence response and prior to receiving the systemwide coherence response, the first cache memory initiating processing to install the target cache line in the first cache memory.2. The method of claim 1 , and further comprising:the first cache memory ...

Подробнее
12-10-2017 дата публикации

Decreasing the data handoff interval for a reserved cache line based on an early indication of a systemwide coherence response

Номер: US20170293558A1
Принадлежит: International Business Machines Corp

A multiprocessor data processing system includes multiple vertical cache hierarchies supporting a plurality of processor cores, a system memory, and a system interconnect. In response to a load-and-reserve request from a first processor core, a first cache memory supporting the first processor core issues on the system interconnect a memory access request for a target cache line of the load-and-reserve request. Responsive to the memory access request and prior to receiving a systemwide coherence response for the memory access request, the first cache memory receives from a second cache memory in a second vertical cache hierarchy by cache-to-cache intervention the target cache line and an early indication of the systemwide coherence response for the memory access request. In response to the early indication and prior to receiving the systemwide coherence response, the first cache memory initiating processing to update the target cache line in the first cache memory.

Подробнее
12-10-2017 дата публикации

EARLY FREEING OF A SNOOP MACHINE OF A DATA PROCESSING SYSTEM PRIOR TO COMPLETION OF SNOOP PROCESSING FOR AN INTERCONNECT OPERATION

Номер: US20170293559A1
Принадлежит:

In at least one embodiment, a multiprocessor data processing system includes multiple vertical cache hierarchies supporting a plurality of processor cores, a system memory, and an interconnect fabric. In response to a first cache memory snooping on the interconnect fabric a request of an interconnect operation of a second cache memory, the first cache memory allocates a snoop machine to service the request. Responsive to the snoop machine completing its processing of the request and prior to the first cache memory receiving a systemwide coherence response of the interconnect operation, the first cache memory allocates an entry in a data structure to handle completion of processing for the interconnection operation and deallocates the snoop machine. The entry of the data structure protects transfer of coherence ownership of a target cache line from the first cache memory to the second cache memory during a protection window extending at least until the systemwide coherence response is received. 1. A method of data processing in a multiprocessor data processing system including multiple vertical cache hierarchies supporting a plurality of processor cores , a system memory , and an interconnect fabric coupled to the system memory and the multiple vertical cache hierarchies , the method comprising:in response to a first cache memory in a first vertical cache hierarchy supporting a first processor core snooping on the interconnect fabric a request of an interconnect operation of a second cache memory in a second vertical cache hierarchy, the first cache memory allocating a snoop machine to service the request;responsive to the snoop machine completing its processing of the request and prior to the first cache memory receiving a systemwide coherence response of the interconnect operation, the first cache memory allocating an entry in a data structure to handle completion of processing for the interconnection operation and deallocating the snoop machine; andthe entry of ...

Подробнее
31-10-2019 дата публикации

INTEGRATED CIRCUIT AND DATA PROCESSING SYSTEM SUPPORTING ATTACHMENT OF A REAL ADDRESS-AGNOSTIC ACCELERATOR

Номер: US20190332537A1
Принадлежит:

An integrated circuit for a coherent data processing system includes a first communication interface for communicatively coupling the integrated circuit with the coherent data processing system, a second communication interface for communicatively coupling the integrated circuit with an accelerator unit including an effective address-based accelerator cache for buffering copies of data from a system memory of the coherent data processing system, and a real address-based directory inclusive of contents of the accelerator cache. The real address-based directory assigns entries based on real addresses utilized to identify storage locations in the system memory. The integrated circuit further includes request logic that communicates memory access requests and request responses with the accelerator unit via the second communication interface. A request response identifies a target of a corresponding memory access request utilizing a host tag specifying an entry associated with the target in the real address-based directory.

Подробнее
31-10-2019 дата публикации

TRANSLATION INVALIDATION IN A TRANSLATION CACHE SERVING AN ACCELERATOR

Номер: US20190332548A1
Принадлежит:

An integrated circuit includes a first communication interface for communicatively coupling the integrated circuit with a coherent data processing system, a second communication interface for communicatively coupling the integrated circuit with an accelerator unit including an accelerator functional unit and an effective address-based accelerator cache for buffering copies of data from the system memory of the coherent data processing system, and a real address-based directory inclusive of contents of the accelerator cache. The real address-based directory assigns entries based on real addresses utilized to identify storage locations in the system memory. The integrated circuit includes request logic that, responsive to receipt on the first communication interface of a translation entry invalidation request, issues to the accelerator unit via the second communication interface an invalidation request that identifies an entry in the effective address-based accelerator cache to be invalidated utilizing a host tag identifying a storage location in the real address-based directory.

Подробнее
31-10-2019 дата публикации

INTEGRATED CIRCUIT AND DATA PROCESSING SYSTEM HAVING A CONFIGURABLE CACHE DIRECTORY FOR AN ACCELERATOR

Номер: US20190332549A1
Принадлежит:

An integrated circuit includes a first communication interface for communicatively coupling the integrated circuit with a coherent data processing system, a second communication interface for communicatively coupling the integrated circuit with an accelerator unit including an effective address-based accelerator cache for buffering copies of data from a system memory, and a real address-based directory inclusive of contents of the accelerator cache. The real address-based directory assigns entries based on real addresses utilized to identify storage locations in the system memory. The integrated circuit further includes directory control logic that configures at least a number of congruence classes utilized in the real address-based directory based on configuration parameters specified on behalf of or by the accelerator unit.

Подробнее
31-10-2019 дата публикации

INTEGRATED CIRCUIT AND DATA PROCESSING SYSTEM SUPPORTING ADDRESS ALIASING IN AN ACCELERATOR

Номер: US20190332551A1
Принадлежит:

An integrated circuit includes a first communication interface for communicatively coupling the integrated circuit with a coherent data processing system, a second communication interface for communicatively coupling the integrated circuit with an accelerator unit including an effective address-based accelerator cache for buffering copies of data from a system memory, and a real address-based directory inclusive of contents of the accelerator cache. The real address-based directory assigns entries based on real addresses utilized to identify storage locations in the system memory. The integrated circuit further includes request logic that communicates memory access requests and request responses with the accelerator unit. The request logic, responsive to receipt from the accelerator unit of a read-type request specifying an aliased second effective address of a target cache line, provides a request response including a host tag indicating that the accelerator unit has associated a different first effective address with the target cache line.

Подробнее
31-12-2015 дата публикации

VIRTUAL MACHINE BACKUP

Номер: US20150378770A1
Принадлежит:

A virtual machine backup method includes utilizing a log to indicate updates to memory of a virtual machine when the updates are evicted from a cache of the virtual machine. A guard band is determined that indicates a threshold amount of free space for the log. A determination is made that the guard band will be or has been encroached upon corresponding to indicating an update in the log. A backup image of the virtual machine is updated based, at least in part, on a set of one or more entries of the log, wherein the set of entries is sufficient to comply with the guard band. The set of entries is removed from the log. 1. A method comprising:indicating, in a log, updates to memory of a virtual machine when the updates are evicted from a cache of the virtual machine;determining a guard band for the log, wherein the guard band indicates a threshold amount of free space for the log;determining that the guard band will be or has been encroached upon corresponding to indicating an update in the log;updating a backup image of the virtual machine based, at least in part, on a set of one or more entries of the log, wherein the set of entries is sufficient to comply with the guard band; andremoving the set of entries from the log.2. The method of claim 1 , wherein said determining a guard band comprises:determining a number of write-back cache lines in the cache;determining a number of instructions in a pipeline for a processor unit that executes instructions issued by the virtual machine;determining a number of additional instructions capable of being issued to the pipeline in the time taken to trigger an interrupt of the processor unit; anddefining the guard band based on a sum of the determined number of write-back cache lines, the determined number of instructions, and the determined number of additional instructions.3. The method of claim 1 , wherein said determining a guard band comprises:determining a number of dirty cache lines in the cache;determining a number of ...

Подробнее
27-12-2018 дата публикации

Efficient enforcement of barriers with respect to memory move sequences

Номер: US20180373436A1
Принадлежит: International Business Machines Corp

In a data processing system implementing a weak memory model, a lower level cache receives, from a processor core, a plurality of copy-type requests and a plurality of paste-type requests that together indicate a memory move to be performed. The lower level cache also receives, from the processor core, a barrier request that requests enforcement of ordering of memory access requests prior to the barrier request with respect to memory access requests after the barrier request. In response to the barrier request, the lower level cache enforces a barrier indicated by the barrier request with respect to a final paste-type request ending the memory move but not with respect to other copy-type requests and paste-type requests in the memory move.

Подробнее
05-12-2019 дата публикации

MANAGING EFFICIENT SELECTION OF A PARTICULAR PROCESSOR THREAD FOR HANDLING AN INTERRUPT

Номер: US20190370198A1
Принадлежит:

A snooper of a processing unit connected to processing units via a system fabric receives a first single bus command in a bus protocol that allows sampling over the system fabric of the capability of snoopers to handle an interrupt and returns a first response indicating the capability of the snooper to handle the interrupt. The snooper, in response to receiving a second single bus command in the bus protocol to poll a first selection of snoopers for an availability status to service a criteria specified in the second single bus command, returns a second response indicating the availability of the snooper to service the criteria. The snooper, in response to receiving a third single bus command in the bus protocol to direct the snooper to handle the interrupt, assigns the interrupt to a particular processor thread of a respective selection of the one or more separate selections of processors threads distributed in the processing unit. 1. A method comprising:in response to receiving, by a particular snooper of a particular processing unit of a plurality of processing units connected via a system fabric, a first single bus command in a bus protocol that allows sampling over the system fabric of the capability of a plurality of snoopers to handle an interrupt, returning, by the particular snooper, a first response indicating the capability of the particular snooper to handle the interrupt, wherein each of the plurality of processing units comprises a respective snooper from among the plurality of snoopers distributed across the plurality of processing units, wherein each respective snooper of the plurality of snoopers controls assignment of interrupts to one or more separate selections of processor threads distributed in each of the plurality of processing units;in response to receiving, by the particular snooper, a second single bus command in the bus protocol to poll a first selection of snoopers indicating capability of the plurality of snoopers for an availability ...

Подробнее
26-12-2019 дата публикации

SECURE MEMORY IMPLEMENTATION FOR SECURE EXECUTION OF VIRTUAL MACHINES

Номер: US20190392143A1
Принадлежит:

Secure memory implementation for secure execution of virtual machines. Data is processed in a first mode and a second mode, and commands are sent to a chip interconnect bus using real addresses, wherein the chip interconnect bus includes a number of bits for the real addresses. A memory controller is operatively coupled to a memory component. A secure memory range is specified by using range registers. If the real address is detected to be in the secure memory range to match a memory component address, a real address bit is inverted. If the real address is in the secure memory address hole, a security access violation is detected. If the real address is not in the secure address range and the real address bit is set, the security access violation is detected. 1. A computer implemented method comprising:processing, by one or more computer processors, data in a first mode and a second mode, wherein a data processing unit sends commands to a chip interconnect bus using real addresses;wherein the chip interconnect bus transports a number of bits for the real addresses;wherein the chip interconnect bus is larger than a number of bits needed for a maximum memory range supported by the computer system;wherein a first portion of the bits for real addresses which are not in the range of the supported maximum memory range is used to indicate whether to operate in the first mode or the second mode creating a memory address hole, and wherein a memory controller operatively coupled to a memory component;specifying, by the one or more computer processors, a secure memory range by using range registers;responsive to determining that the real address is detected to be in the secure memory range to match a memory component address, setting, by the one or more computer processors, a real address bit; andresponsive to determining that the real address is in the memory address hole, detecting, by the one or more computer processors, a security access violation.2. The method of claim 1 ...

Подробнее
29-12-2022 дата публикации

DRAM CACHING STORAGE CLASS MEMORY

Номер: US20220414007A1
Принадлежит:

A method, system, and computer program product for local DRAM caching of storage class memory elements are provided. The method identifies a cache line with a cache address in a local dynamic random-access memory (DRAM). The cache line is compressed within the local DRAM to generate a compressed cache line and an open memory space within the local DRAM. A cache tag is generated in the open memory space and a validation value is generated in the open memory space for the compressed cache line. The method determines a cache-hit for the cache line based on the cache address, the cache tag, and the validation value. 1. A computer-implemented method , comprising:identifying a cache line with a cache address in a dynamic random-access memory (DRAM) cache, the DRAM cache being a first memory type in a computing environment including a second memory type distinct from the first memory type;compressing the cache line within the DRAM cache to generate a compressed cache line and an open memory space within the DRAM cache;generating a cache tag in the open memory space;generating a validation value in the open memory space for the compressed cache line;determining a presence of the cache line based on the cache address, the cache tag, and the validation value; andin response to a cache-hit, delivering at least a portion of the cache line in response to the cache-hit, the portion of the cache line being decompressed for delivery.2. The method of claim 1 , wherein the open memory space is contiguous with the compressed cache line within the DRAM cache.3. The method of claim 1 , further comprising:segmenting the cache line into a set of quadword sectors;compressing each quadword sector of the set of quadword sectors to generate a set of compressed quadword sectors; andgenerating a cache tag and validation value for each compressed quadword sector of the set of quadword sectors.4. The method of claim 3 , wherein a local address space of a processor is divided into a first region ...

Подробнее
28-08-2008 дата публикации

Data processing system, method and interconnect fabric supporting destination data tagging

Номер: US20080209135A1
Принадлежит: International Business Machines Corp

A data processing system includes a plurality of communication links and a plurality of processing units including a local master processing unit. The local master processing unit includes interconnect logic that couples the processing unit to one or more of the plurality of communication links and an originating master coupled to the interconnect logic. The originating master originates an operation by issuing a write-type request on at least one of the one or more communication links, receives from a snooper in the data processing system a destination tag identifying a route to the snooper, and, responsive to receipt of the combined response and the destination tag, initiates a data transfer including a data payload and a data tag identifying the route provided within the destination tag.

Подробнее
14-02-2017 дата публикации

Push instruction for pushing a message payload from a sending thread to a receiving thread

Номер: US9569293B2
Принадлежит: International Business Machines Corp

A processor core of a data processing system receives a push instruction of a sending thread that requests that a message payload identified by at least one operand of the push instruction be pushed to a mailbox of a receiving thread. In response to receiving the push instruction, the processor core executes the push instruction of the sending thread. In response to executing the push instruction, the processor core initiates transmission of the message payload to the mailbox of the receiving thread. In one embodiment, the processor core initiates transmission of the message payload by transmitting a co-processor request to a switch of the data processing system via an interconnect fabric.

Подробнее
27-03-2023 дата публикации

LOW-PROFILE SUPPORT STRUCTURE FOR A ROTARY REGENERATIVE HEAT EXCHANGER

Номер: DK3966512T3
Принадлежит: Arvos Ljungstrom LLC

Подробнее
12-06-2018 дата публикации

Memory move instruction sequence enabling software control

Номер: US9996298B2
Принадлежит: International Business Machines Corp

A processor core of a data processing system, in response to a first instruction, generates a copy-type request specifying a source real address and transmits it to a lower level cache. In response to a second instruction, the processor core generates a paste-type request specifying a destination real address associated with a memory-mapped device and transmits it to the lower level cache. In response to receipt of the copy-type request, the lower level cache copies a data granule from a storage location specified by the source real address into a non-architected buffer. In response to receipt of the paste-type request, the lower level cache issues a command to write the data granule from the non-architected buffer to the memory-mapped device. In response to receipt from the memory-mapped device of a busy response, the processor core abandons the memory move instruction sequence and performs alternative processing.

Подробнее
17-07-2012 дата публикации

Lateral cache-to-cache cast-in

Номер: US8225045B2
Принадлежит: International Business Machines Corp

A data processing system includes a first processing unit and a second processing unit coupled by an interconnect fabric. The first processing unit has a first processor core and associated first upper and first lower level caches, and the second processing unit has a second processor core and associated second upper and lower level caches. In response to a data request, a victim cache line is selected for castout from the first lower level cache. The first processing unit issues on the interconnect fabric a lateral castout (LCO) command that identifies the victim cache line to be castout from the first lower level cache and indicates that a lower level cache is an intended destination. In response to a coherence response indicating success of the LCO command, the victim cache line is removed from the first lower level cache and held in the second lower level cache.

Подробнее
21-10-2010 дата публикации

Updating Partial Cache Lines in a Data Processing System

Номер: US20100268884A1
Принадлежит: International Business Machines Corp

A processing unit for a data processing system includes a processor core having one or more execution units for processing instructions and a register file for storing data accessed in processing of the instructions. The processing unit also includes a multi-level cache hierarchy coupled to and supporting the processor core. The multi-level cache hierarchy includes at least one upper level of cache memory having a lower access latency and at least one lower level of cache memory having a higher access latency. The lower level of cache memory, responsive to receipt of a memory access request that hits only a partial cache line in the lower level cache memory, sources the partial cache line to the at least one upper level cache memory to service the memory access request. The at least one upper level cache memory services the memory access request without caching the partial cache line.

Подробнее
06-08-2009 дата публикации

Method and Apparatus for Supporting Low-Overhead Memory Locks Within a Multiprocessor System

Номер: US20090198916A1
Принадлежит: International Business Machines Corp

A method for supporting low-overhead memory locks within a multi-processor system is disclosed. A lock control section is initially assigned to a data block within a system memory of the multiprocessor system. In response to a request for accessing the data block by a processing unit within the multiprocessor system, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the request for accessing the data block is ignored. Otherwise, if the lock control section of the data block has not been set, the lock control section of the data block is set, and the request for accessing the data block is allowed.

Подробнее
06-12-2016 дата публикации

Cache backing store for transactional memory

Номер: US9514049B2
Принадлежит: International Business Machines Corp

In response to a transactional store request, the higher level cache transmits, to the lower level cache, a backup copy of an unaltered target cache line in response to a target real address hitting in the higher level cache, updates the target cache line with store data to obtain an updated target cache line, and records the target real address as belonging to a transaction footprint of the memory transaction. In response to a conflicting access to the transaction footprint prior to completion of the memory transaction, the higher level cache signals failure of the memory transaction to the processor core, invalidates the updated target cache line in the higher level cache, and causes the backup copy of the target cache line in the lower level cache to be restored as a current version of the target cache line.

Подробнее
24-07-2008 дата публикации

Data processing system, method and interconnect fabric for selective link information allocation in a data processing system

Номер: US20080175272A1

A data processing system includes a plurality of processing units coupled for communication by a communication link and a configuration register. The configuration register has a plurality of different settings each corresponding to a respective one of a plurality of different link information allocations. Information is communicated over the communication link in accordance with a particular link information allocation among the plurality of link information allocations determined by a respective setting of the configuration register.

Подробнее
25-05-2010 дата публикации

Data processing system and method that permit pipelining of I/O write operations and multiple operation scopes

Номер: US7725619B2
Принадлежит: International Business Machines Corp

A data processing system includes at least a first processing node having an input/output (I/O) controller and a second processing including a memory controller for a memory. The memory controller receives, in order, pipelined first and second DMA write operations from the I/O controller, where the first and second DMA write operations target first and second addresses, respectively. In response to the second DMA write operation, the memory controller establishes a state of a domain indicator associated with the second address to indicate an operation scope including the first processing node. In response to the memory controller receiving a data access request specifying the second address and having a scope excluding the first processing node, the memory controller forces the data access request to be reissued with a scope including the first processing node based upon the state of the domain indicator associated with the second address.

Подробнее
17-06-2008 дата публикации

Data processing system and method for efficient communication utilizing an in coherency state

Номер: US7389388B2
Принадлежит: International Business Machines Corp

A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.

Подробнее
18-09-2008 дата публикации

Data processing system, method and interconnect fabric supporting multiple planes of processing nodes

Номер: US20080225863A1
Принадлежит: International Business Machines Corp

A data processing system includes a first plane including a first plurality of processing nodes, each including multiple processing units, and a second plane including a second plurality of processing nodes, each including multiple processing units. The data processing system also includes a plurality of point-to-point first tier links. Each of the first plurality and second plurality of processing nodes includes one or more first tier links among the plurality of first tier links, where the first tier link(s) within each processing node connect a pair of processing units in the same processing node for communication. The data processing system further includes a plurality of point-to-point second tier links. At least a first of the plurality of second tier links connects processing units in different ones of the first plurality of processing nodes, at least a second of the plurality of second tier links connects processing units in different ones of the second plurality of processing nodes, and at least a third of the plurality of second tier links connects a processing unit in the first plane to a processing unit in the second plane.

Подробнее
04-11-2008 дата публикации

Data processing system, processor and method of data processing in which local memory access requests are serviced by state machines with differing functionality

Номер: US7447845B2
Принадлежит: International Business Machines Corp

A data processing system includes a local processor core and a cache memory coupled to the local processor core. The cache memory includes a data array, a directory of contents of the data array, at least one snoop machine that services memory access requests of a remote processor core, and multiple state machines that service memory access requests of the local processor core. The multiple state machines include a first state machine that has a first set of memory access requests of the local processor core that it is capable of servicing and a second state machine that has a different second set of memory access requests of the local processor core that it is capable of servicing.

Подробнее