Поиск патентов

Настройки

Глубина выборки

Укажите год

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Ключевые слова. Может быть несколько по одной на строку

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка

Автор

Ведите корректный номера.

Владелец

Ведите корректный номера.

Классы IPC

Ведите корректный номера.

Классы CPC

Ведите корректный номера.

Начиная с года

Укажите год

Заканчивая годом

Укажите год

Применить Всего найдено 2963. Отображено 100.

07-06-2012 дата публикации

Custom atomics using an off-chip special purpose processor

Номер: US20120144128A1

Автор: Benjamin C. Serebrin, Stephen D. Glaser

Принадлежит: Advanced Micro Devices Inc

An apparatus for executing an atomic memory transaction comprises a processing core in a multi-processing core system, where the processing core is configured to store an atomic program in a cache line. The apparatus further comprises an atomic program execution unit that is configured to execute the atomic program as a single atomic memory transaction with a guarantee of forward progress.

Подробнее

Номер записи: 1

12-07-2012 дата публикации

Method and system for dynamic templatized query language in software

Номер: US20120179720A1

Автор: Christopher J. Kasten, Greg Seitz

Принадлежит: eBay Inc

A system to automatically generate query language in software is described. The system receives a request for data that is persistently stored in a database. The system selects a predefined query template from a number of query templates based on the request. The system utilizes the query template to receive content from at least one different source, the first source being a prototype data object. The system generates a query statement based on the query template that includes the content. Finally the system queries the database using the query statement to retrieve the requested data.

Подробнее

Номер записи: 2

06-09-2012 дата публикации

Method, apparatus, and system for speculative execution event counter checkpointing and restoring

Номер: US20120227045A1

Автор: Konrad K. Lai, Laura A. Knauth, Martin G. Dixon, Peggy J. Irelan, Ravi Rajwar

Принадлежит: Intel Corp

An apparatus, method, and system are described herein for providing programmable control of performance/event counters. An event counter is programmable to track different events, as well as to be checkpointed when speculative code regions are encountered. So when a speculative code region is aborted, the event counter is able to be restored to it pre-speculation value. Moreover, the difference between a cumulative event count of committed and uncommitted execution and the committed execution, represents an event count/contribution for uncommitted execution. From information on the uncommitted execution, hardware/software may be tuned to enhance future execution to avoid wasted execution cycles.

Подробнее

Номер записи: 3

24-01-2013 дата публикации

Hardware acceleration components for translating guest instructions to native instructions

Номер: US20130024661A1

Автор: Mohammad Abdallah

Принадлежит: Soft Machines Inc

A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.

Подробнее

Номер записи: 4

09-05-2013 дата публикации

System and method for managing an object cache

Номер: US20130117510A1

Автор: Deren George Ebdon, Robert W. Peterson

Принадлежит: Recursion Software Inc

In order to optimize efficiency of deserialization, a serialization cache is maintained at an object server. The serialization cache is maintained in conjunction with an object cache and stores serialized forms of objects cached within the object cache. When an inbound request is received, a serialized object received in the request is compared to the serialization cache. If the serialized byte stream is present in the serialization cache, then the equivalent object is retrieved from the object cache, thereby avoiding deserialization of the received serialized object. If the serialized byte stream is not present in the serialization cache, then the serialized byte stream is deserialized, the deserialized object is cached in the object cache, and the serialized object is cached in the serialization cache.

Подробнее

Номер записи: 5

24-10-2013 дата публикации

Systems and methods for backing up storage volumes in a storage system

Номер: US20130282975A1

Автор: Eiji Tosaka, Hiroyuki Miyoshi, Katsuyoshi Katori, Norie Iwasaki, Takeshi Nohta

Принадлежит: International Business Machines Corp

Systems and methods for backing up storage volumes are provided. One system includes a primary side, a secondary side, and a network coupling the primary and secondary sides. The secondary side includes first and second VTS including a cache and storage tape. The first VTS is configured to store a first portion of a group of storage volumes in its cache and migrate the remaining portion to its storage tape. The second VTS is configured to store the remaining portion of the storage volumes in its cache and migrate the first portion to its storage tape. One method includes receiving multiple storage volumes from a primary side, storing the storage volumes in the cache of the first and second VTS, migrating a portion of the storage volumes from the cache to storage tape in the first VTS, and migrating a remaining portion of the storage volumes from the cache to storage tape in the second VTS.

Подробнее

Номер записи: 6

28-11-2013 дата публикации

Apparatus and method for accelerating operations in a processor which uses shared virtual memory

Номер: US20130318323A1

Автор: Boris Ginzburg, Eliezer Weissmann, Karthikeyan Karthik VAITHIANATHAN, Ronny Ronen, Yoav ZACH

Принадлежит: Intel Corp

An apparatus and method are described for coupling a front end core to an accelerator component (e.g., such as a graphics accelerator). For example, an apparatus is described comprising: an accelerator comprising one or more execution units (EUs) to execute a specified set of instructions; and a front end core comprising a translation lookaside buffer (TLB) communicatively coupled to the accelerator and providing memory access services to the accelerator, the memory access services including performing TLB lookup operations to map virtual to physical addresses on behalf of the accelerator and in response to the accelerator requiring access to a system memory.

Подробнее

Номер записи: 7

27-02-2014 дата публикации

Method, apparatus, and system for speculative abort control mechanisms

Номер: US20140059333A1

Автор: Alexandre J. Farcy, Ilhyun Kim, Konrad K. Lai, Martin G. Dixon, Matthew Merten, Prakash Math, Rajesh S. Parthasarathy, Ravi Rajwar, Robert S. Chappell, Vijaykumar Kadgi

Принадлежит: Intel Corp

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.

Подробнее

Номер записи: 8

05-01-2017 дата публикации

EFFICIENT INSTRUCTION FUSION BY FUSING INSTRUCTIONS THAT FALL WITHIN A COUNTER-TRACKED AMOUNT OF CYCLES APART

Номер: US20170003965A1

Автор: GABOR Ron, Ouziel Ido, Raghuvanshi Pankaj, Rappoport Lihu, Valentine Robert

Принадлежит:

A technique to enable efficient instruction fusion within a computer system. In one embodiment, a processor logic delays the processing of a second instruction for a threshold amount of time if a first instruction within an instruction queue is fusible with the second instruction. 1. A system comprising:a plurality of processors;a processor interconnect to communicatively couple two or more of the plurality of processors;a system memory communicatively coupled to one or more of the plurality of processors over a memory interconnect;one of the plurality of processors comprising:a plurality of cores to execute instructions;a cache shared by two or more of the plurality of cores; an instruction memory to store instructions;', 'a decoder to decode instructions;', 'an instruction fusion circuit to fuse a first instruction and a second instruction to form a fused instruction to be processed by the core as a single instruction; and', 'the instruction fusion circuit to fuse the first and second instructions when both the first and second instructions have been stored in the instruction memory prior to issuance., 'one of the cores comprising2. The system of claim 1 , further comprising at least one data communication device communicatively coupled to at least one of the plurality of processors.3. The system of claim 1 , further comprising at least one storage device communicatively coupled to at least one of the plurality of processors. This application is a continuation of U.S. patent application Ser. No. 15/143,418, filed on Apr. 30, 2016, entitled “Improving Efficient Instruction Fusion By Fusing Instructions That Fall Within A Counter-Tracked Amount Of Cycles Apart”, which is a continuation of U.S. patent application Ser. No. 12/290,395, filed Oct. 30, 2008, entitled “Delayed Processing Techniques For Improving Efficient Instructions Fusion”, all of which is herein incorporated by reference.Embodiments of the invention relate generally to the field of information ...

Подробнее

Номер записи: 9

05-01-2017 дата публикации

SYSTEM AND METHOD FOR INSTRUCTION SET CONVERSION

Номер: US20170003967A1

Автор: Lin Kenneth Chenghao

Принадлежит: Shanghai XinHao Microelectronics Co. Ltd.

An instruction set conversion system and method is provided, which can convert guest instructions to host instructions for processor core execution. Through configuration, instruction sets supported by the processor core are easily expanded. A method for real-time conversion between host instruction addresses and guest instruction addresses is also provided, such that the processor core can directly read out the host instructions from a higher level cache, reducing the depth of a pipeline. 1. An instruction set conversion method , comprising:converting guest instructions to host instructions, and creating mapping relationships between guest instruction addresses and host instruction addresses;storing the host instruction in a cache memory that is accessed directly by a processor core;based on the host instruction address, perform a cache addressing operation to fetch directly the corresponding host instruction for processor core execution; orafter converting the guest instruction address outputted by the processor core to the host instruction address based on the mapping relationship, perform the cache addressingoperation to fetch the corresponding host instruction for processor core execution.262-. (canceled) The present invention generally relates to computer, communication, and integrated circuit technologies.Currently, if one processor core executes programs belonging to different instruction sets, the most common method is to use a software virtual machine (or a virtual layer). The role of the virtual machine is to translate or interpret a program composed of an instruction set (guest instruction set) that is not supported by the processor core to generate the corresponding instructions in the instruction set (host instruction set) supported by the processor core for processor core execution. In general, during the operating process, an interpretation method fetches all fields including opcodes and operands in the guest instruction using the virtual machine in ...

Подробнее

Номер записи: 10

05-01-2017 дата публикации

NON-FAULTING COMPUTE INSTRUCTIONS

Номер: US20170004088A1

Автор: Gschwind Michael K., Olsson Brett

Принадлежит:

A compute instruction to be executed is to use a memory operand in a computation. An address associated with the memory operand is to be used to locate a portion of memory from which data is to be obtained and placed in the memory operand. A determination is made as to whether the portion of memory extends across a specified memory boundary. Based on the portion of memory extending across the specified memory boundary, the portion of memory includes a plurality of memory units and a check is made as to whether at least one specified memory unit is accessible and whether at least one specified memory unit is inaccessible. Based on the checking indicating the at least one specified memory unit is accessible and the at least one specified memory unit is inaccessible accessing the at least one specified memory unit that is accessible and placing data from the at least one specified memory unit that is accessible in one or more locations in the memory operand, and for the at least one unit of memory that is inaccessible, placing default data in one or more other locations of the memory operand. 1. A computer program product for facilitating processing of compute instructions in a computing environment , said computer program product comprising: obtaining, by a processor, a compute instruction to be executed, the compute instruction to use a memory operand in a computation indicated by the compute instruction;', 'obtaining an address associated with the memory operand, the address to be used to locate a portion of memory from which data is to be obtained and placed in the memory operand;', 'determining whether the portion of memory extends across a specified memory boundary, wherein based on the portion of memory extending across the specified memory boundary, the portion of memory comprises a plurality of memory units;', 'based on determining the portion of memory extends across the specified memory boundary, checking whether at least one specified memory unit of the ...

Подробнее

Номер записи: 11

07-01-2016 дата публикации

ALERTING HARDWARE TRANSACTIONS THAT ARE ABOUT TO RUN OUT OF SPACE

Номер: US20160004558A1

Автор: Busaba Fadi Y., Cain, Gschwind Michael Karl, III Harold W., Michael Maged M., Salapura Valentina

Принадлежит:

A transactional memory system determines whether to pass control of a transaction to an about-to-run-out-of-resource handler. A processor of the transactional memory system determines information about an about-to-run-out-of-resource handler for transaction execution of a code region of a hardware transaction. The processor dynamically monitors an amount of available resource for the currently running code region of the hardware transaction. The processor detects that the amount of available resource for transactional execution of the hardware transaction is below a predetermined threshold level. The processor, based on the detecting, saves speculative state information of the hardware transaction, and executes the about-to-run-out-of-resource handler, the about-to-run-out-of-resource handler determining whether the hardware transaction is to be aborted or salvaged. 1. A method for determining whether to pass control of a transaction , executing in a transactional memory environment , to an about-to-run-out-of-resource handler , the method comprising:determining, by a processor, information about an about-to-run-out-of-resource handler for transaction execution of a code region of a hardware transaction;dynamically monitoring, by the processor, an amount of available resource for the currently running code region of the hardware transaction;detecting, by the processor, that the amount of available resource for transactional execution of the hardware transaction is below a predetermined threshold level;based on detecting the amount of available resource is below the predetermined threshold level, saving, by the processor, speculative state information of the hardware transaction; andbased on detecting the amount of available resource is below the predetermined threshold level, executing, by the processor, the about-to-run-out-of-resource handler, wherein the about-to-run-out-of-resource handler determines whether the hardware transaction is to be aborted or salvaged. ...

Подробнее

Номер записи: 12

07-01-2016 дата публикации

REDUNDANT, FAULT-TOLERANT, DISTRIBUTED REMOTE PROCEDURE CALL CACHE IN A STORAGE SYSTEM

Номер: US20160004613A1

Автор: Hayes John, Lee Robert, Robinson Joshua, Vajgel Peter

Принадлежит:

A method of operating a remote procedure call cache in a storage cluster is provided. The method includes receiving a remote procedure call at a first storage node having solid-state memory and writing information, relating to the remote procedure call, to a remote procedure call cache of the first storage node. The method includes mirroring the remote procedure call cache of the first storage node in a mirrored remote procedure call cache of a second storage node. A plurality of storage nodes and a storage cluster are also provided. 1. A plurality of storage nodes , comprising:each of the plurality of storage nodes having non-volatile solid-state memory for user data storage, a remote procedure call cache, and a mirrored remote procedure call cache configured to minor a remote procedure call cache of another storage node.2. The plurality of storage nodes of claim 1 , further comprising:each of the plurality of storage nodes having a table, configured to indicate a primary authority, a first backup authority, and a second backup authority, wherein the remote procedure call cache corresponds to the primary authority.3. The plurality of storage nodes of claim 1 , wherein each of the plurality of storage nodes is configured to send a copy of contents of the remote procedure call cache to a further storage node for storage in the mirrored remote procedure call cache of the further storage node.4. The plurality of storage nodes of claim 1 , further comprising:each of the plurality of storage nodes configured to check the remote procedure call cache and to determine whether a result of a remote procedure call is posted.5. The plurality of storage nodes of claim 1 , further comprising:each of the plurality of storage nodes having a non-volatile random access memory (NVRAM) containing the remote procedure call cache and the mirrored remote procedure call cache.6. The plurality of storage nodes of claim 1 , further comprising:the remote procedure call cache configured to ...

Подробнее

Номер записи: 13

07-01-2016 дата публикации

Salvaging lock elision transactions

Номер: US20160004640A1

Автор: Chung-Lung K. Shum, Harold W. Cain, III, Maged M. Michael, Michael Karl Gschwind, Valentina Salapura

Принадлежит: International Business Machines Corp

A transactional memory system salvages a hardware lock elision (HLE) transaction. A processor of the transactional memory system executes a lock-acquire instruction in an HLE environment and records information about a lock elided to begin HLE transactional execution of a code region. The processor detects a pending point of failure in the code region during the HLE transactional execution. The processor stops HLE transactional execution at the point of failure in the code region. The processor acquires the lock using the information, and based on acquiring the lock, commits the speculative state of the stopped HLE transactional execution. The processor starts non-transactional execution at the point of failure in the code region.

Подробнее

Номер записи: 14

04-01-2018 дата публикации

SYSTEMS, APPARATUSES, AND METHODS FOR SNOOPING PERSISTENT MEMORY STORE ADDRESSES

Номер: US20180004525A1

Автор: Baghsorkhi Sara

Принадлежит:

Systems, methods, and apparatuses for executing an instruction are described. In some embodiments, a decoder circuit decodes an instruction, wherein the instruction to include at least an opcode, a field for source operand, and a field for a destination operand. Execution circuitry executes the decoded instruction to determine if a tag from the address from the source operand matches a tag in a selected non-volatile memory address cache (NVMAC) cache line, wherein when there is a match a hit indication is stored in the destination operand, and when there is not a match, a no hit indication is stored in the destination operand and the NVMAC is updated with the tag from the address from the source operand. 1. An apparatus comprising:a decoder circuit to decode an instruction, wherein the instruction to include at least an opcode, a field for source operand, and a field for a destination operand; andexecution circuitry to execute the decoded instruction to determine if a tag from the address from the source operand matches a tag in a selected non-volatile memory address cache (NVMAC) cache line, wherein when there is a match a hit indication is stored in the destination operand, and when there is not a match, a no hit indication is stored in the destination operand and the NVMAC is updated with the tag from the address from the source operand.2. The apparatus of claim 1 , wherein the NVMAC stores cache lines for a non-volatile random access memory.3. The apparatus of claim 1 , wherein the execution circuitry is further to determine that a cache line entry with a matching tag is valid.4. The apparatus of claim 1 , wherein upon no match being found claim 1 , the execution circuitry is to select a way of the NVMAC based on a least recently used value stored in the selected cache line.5. The apparatus of claim 1 , wherein the NVMAC cache line is selected using an index stored in the address from the source operand.6. The apparatus of claim 1 , wherein the NVMAC is separate ...

Подробнее

Номер записи: 15

04-01-2018 дата публикации

Method and Logic for Maintaining Performance Counters with Dynamic Frequencies

Номер: US20180004532A1

Автор: Levy Ofer, Pardo-Fridman Eti, Yasin Ahmad

Принадлежит:

A processor includes a front end including circuitry to decode an instruction from an instruction stream and a core including circuitry to process the instruction. The core includes an execution pipeline, a dynamic core frequency logic unit, and a counter compensation logic unit. The execution pipeline includes circuitry to execute the instruction. The dynamic core frequency logic unit includes circuitry to squash a clock of the core to reduce a core frequency. The clock may not be visible to software. The counter compensation logic unit includes circuitry to adjust a performance counter increment associated with a performance counter based on at least the dynamic core frequency logic unit circuitry to squash a clock of the core to reduce a core frequency. 1. A processor , comprising:a front end including circuitry to decode an instruction from an instruction stream; an execution pipeline including circuitry to execute the instruction;', 'a dynamic core frequency logic unit including circuitry to squash a clock of the core to reduce a core frequency, wherein the clock is invisible to software; and', 'a counter compensation logic unit including circuitry to adjust a performance counter increment associated with a performance counter based on at least the dynamic core frequency logic unit circuitry to squash a clock of the core to reduce a core frequency., 'a core including circuitry to process the instruction, the core comprising2. The processor in claim 1 , wherein the counter compensation logic unit further includes circuitry to determine whether the performance counter monitors an event measured in cycles claim 1 , and adjustment of the performance counter increment is further based on a determination that the performance counter monitors an event measured in cycles.3. The processor in claim 1 , wherein the counter compensation logic unit further includes circuitry to determine whether a mode of measurement for the performance counter is set to measure in cycles ...

Подробнее

Номер записи: 16

07-01-2021 дата публикации

Metadata Programmable Tags

Номер: US20210004231A1

Автор: Andre' Dehon

Принадлежит: Charles Stark Draper Laboratory Inc

A method comprises receiving a current instruction for metadata processing performed in a metadata processing domain that is isolated from a code execution domain including the current instruction. The method further comprises determining, by the metadata processing domain in connection with metadata for the current instruction, whether to allow execution of the current instruction in accordance with a set of one or more policies. The one or more policies may include a set of rules that enforces execution of a complete sequence of instructions in a specified order from a first instruction of the complete sequence to a last instruction of the complete sequence. The metadata processing may be implemented by a metadata processing hierarchy comprising a control module, a masking module, a hash module, a rule cache lookup module, and/or an output tag module.

Подробнее

Номер записи: 17

04-01-2018 дата публикации

COOPERATIVE TRIGGERING

Номер: US20180004628A1

Автор: Baugh Lee W., Merten Matthew C., STRONG BEEMAN C.

Принадлежит: Intel Corporation

There is disclosed in an example a processor, having: a front end including circuitry to decode instructions from an instruction stream; a data cache unit including circuitry to cache data for the processor; and a core triggering block (CTB) to provide integration between two or more different debug capabilities. 1. A processor , comprising:a front end including circuitry to decode instructions from an instruction stream;a data cache unit including circuitry to cache data for the processor;a core triggering block (CTB) to provide integration between two or more different debug capabilities.2. The processor of claim 1 , further comprising an on-chip visibility buffer claim 1 , wherein the CTB is provided at least partly in hardware to write logging or tracing data to the on-chip visibility buffer.3. The processor of claim 1 , wherein the CTB is further to flush the on-chip visibility buffer to main memory.4. The processor of claim 3 , wherein the CTB is to provide the logging or tracing function on an instruction boundary.5. The processor claim 1 , wherein the CTB is to provide a logging or tracing function.6. The processor of claim 1 , wherein the CTB is to provide a data address trace.7. The processor of claim 1 , further comprising a precise event based signaling (PEBS) facility claim 1 , wherein the CTB is to cooperate with the PEBS facility.8. The processor of claim 1 , further comprising an off-die interface to interoperate with an uncore block or peripheral device.9. The processor of claim 1 , wherein the CTB is provided at least partly in microcode.10. The processor of claim 1 , wherein the CTB is provided primarily in hardware.11. A method of providing visibility features for a processor claim 1 , comprising:communicatively coupling to a front end having instruction decode circuitry;communicatively coupling to a data cache having circuitry to cache data for the processor; andintegrating two or more different debug capabilities of the processor within a core ...

Подробнее

Номер записи: 18

04-01-2018 дата публикации

Direct store to coherence point

Номер: US20180004660A1

Автор: Patrick P. LAI, Robert Allen Shearer

Принадлежит: Microsoft Technology Licensing LLC

A system that uses a write-invalidate protocol has at least two types of stores. A first type of store operation uses a write-back policy resulting in snoops for copies of the cache line at lower cache levels. A second type of store operation writes, using a coherent write-through policy, directly to the last-level cache without snooping the lower cache levels. By storing directly to the coherence point, where cache coherence is enforced, for the coherent write-through operations, snoop transactions and responses are not exchanged with the other caches. A memory order buffer at the last-level cache ensures proper ordering of stores/loads sent directly to the last-level cache.

Подробнее

Номер записи: 19

04-01-2018 дата публикации

PREFETCH BANDWIDTH THROTTLING BY DYNAMICALLY ADJUSTING MISS BUFFER PREFETCH-DROPPING THRESHOLDS

Номер: US20180004670A1

Автор: Chou Yuan C., SUDHIR Suraj

Принадлежит: ORACLE INTERNATIONAL CORPORATION

The disclosed embodiments relate to a method for controlling prefetching in a processor to prevent over-saturation of interfaces in the memory hierarchy of the processor. While the processor is executing, the method determines a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy. Next, the method selectively adjusts a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization, wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy, and wherein when the occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped. 1. A method for controlling prefetching to prevent over-saturation of interfaces in a memory hierarchy of a processor , comprising:while the processor is executing, determining a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy; andselectively adjusting a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization;wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy; andwherein when an occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped.2. The method of claim 1 , wherein selectively adjusting the prefetch-dropping high-water mark based on the determined bandwidth utilization comprises:selecting a lower prefetch-dropping high-water mark when the determined bandwidth utilization indicates that the interface from the cache to the ...

Подробнее

Номер записи: 20

03-01-2019 дата публикации

STREAMING ENGINE WITH EARLY EXIT FROM LOOP LEVELS SUPPORTING EARLY EXIT LOOPS AND IRREGULAR LOOPS

Номер: US20190004798A1

Автор: Zbiciak Joseph

Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Upon a stream break instruction specifying one of the nested loops, the stream engine ends a current iteration of the loop. If the specified loop was not the outermost loop, the streaming engine begins an iteration of a next outer loop. If the specified loop was the outermost nested loop, the streaming engine ends the stream. The streaming engine places a vector of data elements in order in lanes within a stream head register. A stream break instruction is operable upon a vector break. 1. A digital data processor comprising:an instruction memory storing instructions each specifying a data processing operation and at least one data operand field;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one functional unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results in an instruction specified data register; an address generator for generating stream memory addresses corresponding to said stream of an instruction specified sequence of a plurality of data elements in said plural nested loops,', 'a stream head register storing a data element of said stream next to be used by said at least one operational unit;, 'a streaming engine connected to said instruction decoder operable in response to a stream start instruction to recall from memory a stream of an instruction specified sequence of a plurality of data elements including plural nested ...

Подробнее

Номер записи: 21

03-01-2019 дата публикации

STREAM ENGINE WITH ELEMENT PROMOTION AND DECIMATION MODES

Номер: US20190004799A1

Автор: Zbiciak Joseph

Принадлежит:

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to operational units for use as operands. A promotion unit optionally increases date element data size by an integral power of 2 either zero filing or sign filling the additional bits. A decimation unit optionally decimates data elements by an integral factor of 2. For ease of implementation the promotion factor must be greater than or equal to the decimation factor. 1. A digital data processor comprising:an instruction memory storing instructions each specifying a data processing operation and at least one data operand field;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one operational unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results; an element decimation unit receiving data elements of said stream recalled from memory, said element decimation unit operable to omit received data according to an instruction specified decimation factor, and', 'a stream head register receiving data elements from said element decimation unit, storing at least one data element of said stream next to be used by said at least one operational unit; and, 'a streaming engine connected to said instruction decoder operable in response to a stream start instruction to recall from memory a stream of an instruction specified sequence of a plurality of data elements, said streaming engine including'}wherein said at least one operational unit is responsive to a stream operand instruction to ...

Подробнее

Номер записи: 22

07-01-2021 дата публикации

SYSTEM AND METHOD OF EXECUTING NEURAL NETWORKS

Номер: US20210004684A1

Автор: MATVEEV Alexander, SHAVIT Nir

Принадлежит: Neuralmagic Inc.

A system and method of inferring a neural network (NN) on one or more target computing devices. The NN may include a plurality of layers, where at least one layer includes one or more kernels. Embodiments may include: receiving a data structure representing the NN; analyzing the data structure to produce one or more tasks, where each task may include computations pertaining to a kernel of the NN; selecting a sparse version of at least one kernel and replacing the at least one kernel with the sparse version; and compiling the one or more tasks to produce one or more respective tensor columns, The one or more tensor columns are adapted to fit in respective one or more cache memories of the one or more target computing devices, and include task instruction code that represents at least one computation of the kernel of the NN. 2. The method of claim 1 , comprising:storing the one or more tensor columns in respective cache memories of the one or more target computing devices; andinferring the NN on incoming data by executing the task instruction code on the incoming data, within the cache memory.3. The method of claim 1 , comprising: producing one or more sparse versions of the of one or more kernels;', 'calculating a level of precision of an output of a task corresponding to the one or more sparse versions; and', 'selecting a sparse version from the one or more sparse versions according to at least one of: the calculated level of precision and the cache parameter values., 'selecting a sparse version of one or more kernels, wherein the selecting comprises4. The method of claim 1 , comprising compiling the task instruction code to produce a task instruction code block that is devoid of zero-value kernel elements.5. The method of claim 1 , comprising for a tensor column:selecting a subset of nodes of a layer of the NN as an input vector of the task instruction code of the tensor column; andcalculating a required memory space for output of computations of the task ...

Подробнее

Номер записи: 23

03-01-2019 дата публикации

SOFTWARE CONDITION EVALUATION APPARATUS AND METHODS

Номер: US20190004929A1

Автор: Connor Patrick L., Fastabend John R., Gherghe Calin, Multanen Eric W.

Принадлежит:

Devices and methods for debugging software or detecting malicious software on a compute node are described herein. A device can include an interface to a central processing unit (CPU) of a compute node. The device can include processing circuitry. During execution of a software application of the CPU, the processing circuitry can process CPU operational metrics received over the interface, wherein values of the CPU operational metrics vary with execution of the software application. Based on the values, the processing circuitry can determine an operational status of the software application and provide an indicator of an error condition of the software application responsive to detection of an error based on any criteria. Other embodiments are also described. 1. A device comprising:an interface to a central processing unit (CPU) of a compute node; andprocessing circuitry coupled to the interface to:during execution of a software application of the CPU, collect CPU operational metrics received over the interface, wherein values of the CPU operational metrics vary with execution of the software application;determine an operational status of the software application based on the values; andprovide an output based at least in part on a result of the determination.2. The device of claim 1 , wherein the device is included within a same chip set as the CPU.3. The device of claim 2 , wherein the device includes a microcontroller to execute Original Equipment Manufacturer (OEM)-provided firmware.4. The device of claim 2 , wherein the device includes a microcontroller to provide at least one of remote configuration claim 2 , booting from a remote hard drive claim 2 , providing two-factor authentication claim 2 , and enabling a poison pill to disable a remote system over a connection.5. The device of claim 1 , wherein the device is electrically coupled to in a separate socket of the compute node from the CPU.6. The device of claim 1 , wherein the CPU operational metrics ...

Подробнее

Номер записи: 24

03-01-2019 дата публикации

MEMORY TYPE WHICH IS CACHEABLE YET INACCESSIBLE BY SPECULATIVE INSTRUCTIONS

Номер: US20190004961A1

Автор: BOGGS Darrell D., Chaudhry Shailender, CORNABY Mike, FORTINO Nick, KHARTIKOV Denis, MOOLEY Alok, Segelken Ross, Tuck Nathan, VREUGDENHIL Gordon

Принадлежит:

An improved architectural means to address processor cache attacks based on speculative execution defines a new memory type that is both cacheable and inaccessible by speculation. Speculative execution cannot access and expose a memory location that is speculatively inaccessible. Such mechanisms can disqualify certain sensitive data from being exposed through speculative execution. Data which must be protected at a performance cost may be specifically marked. If the processor is told where secrets are stored in memory and is forbidden from speculating on those memory locations, then the processor will ensure the process trying to access those memory locations is privileged to access those locations before reading and caching them. Such countermeasure is effective against attacks that use speculative execution to leak secrets from a processor cache. 1. A method of accessing memory comprising:determining whether a memory location is marked as being cacheable but not speculatively accessible; andif said determining determines the memory location is marked as cacheable but not speculatively accessible, allowing said memory location contents to be cached but disallowing access to the memory location by speculative execution.2. The method of wherein said determining is based on virtual memory page tables.3. The method of wherein said determining is based on memory type.4. The method of wherein said determining is based on the memory location being within a certain region of memory.5. The method of wherein said determining is based on the memory location being within a predefined fixed region of memory.6. The method of wherein said determining is based on type information a memory management device accesses from a stored page table.7. The method of wherein said determining is based on information stored in a translation lookaside buffer.8. The method of wherein said determining is based on an instruction format.9. The method of wherein disallowing access includes denying ...

Подробнее

Номер записи: 25

03-01-2019 дата публикации

MITIGATION OF CODE REUSE ATTACKS BY RESTRICTED INDIRECT BRANCH INSTRUCTION

Номер: US20190005231A1

Автор: Peleg Nitzan

Принадлежит:

A method, computer program product and/or system is disclosed. According to an aspect of this invention, one or more processors receive an indirect jump instruction comprising a target address offset and a maximal offset value. One or more processors determine whether the target address offset is valid by comparison of the target address offset and the maximal offset value and one or more processors execute a jump operation based on whether the target address offset is valid. In some embodiments of the present invention, the jump operation comprises one or more processors executing an instruction located at a target address referenced by the target address offset if the target address offset is valid. In some embodiments, the jump operation further comprises one or more processors raising an exception if the target address offset is not valid. 1receiving, by one or more processors, a first indirect jump instruction comprising a first target address offset and a first maximal offset value;determining, by one or more processors, whether the first target address offset is valid by comparison of the first target address offset and the first maximal offset value;determining, by one or more processors, the first target address offset is valid if the target address offset is between zero and the first maximal offset value;executing, by one or more processors, a first jump operation based on whether the first target address offset is valid, wherein the first jump operation comprises executing an instruction located at a first target address referenced by the first target address offset if the first target address offset is valid, and wherein the first jump operation comprises raising a first exception if the first target address offset is not valid;receiving, by one or more processors, a second indirect jump instruction comprising a second target address offset and a first maximal number of bits;determining, by one or more processors, a second maximal offset value by ...

Подробнее

Номер записи: 26

01-01-2015 дата публикации

EVICT ON WRITE, A MANAGEMENT STRATEGY FOR A PREFETCH UNIT AND/OR FIRST LEVEL CACHE IN A MULTIPROCESSOR SYSTEM WITH SPECULATIVE EXECUTION

Номер: US20150006821A1

Автор: Gara Alan, Ohmacht Martin

Принадлежит:

In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache. 1. In a parallel processing system comprising a plurality of cores , at least first and second levels of cache , a method comprising:maintaining the first level cache responsive to selectively operable in accordance with at least first and second modes of speculation blind addressing; andchoosing one of the first and second modes, responsive to program related considerations.2. The method of claim 1 , wherein the program related considerations comprise whether speculation is short running or long running.3. The method of claim 1 , wherein the first and second modes compriseevicting a line from the L1 on write; andmaintaining a multi-piece address space in the L1, wherein each thread has a separate space that gives the illusion of no speculation.4. The method of claim 1 , wherein choosing is responsive to a programmable switch.5. A processor for use in a multiprocessor system claim 1 , the processor comprisingmeans for communicating with a communications pathway, the pathway comprising first and second level caches;means for switching between at least two modes of using the first and second level caches, both modes allowing the first level cache and/or prefetch unit to be ...

Подробнее

Номер записи: 27

20-01-2022 дата публикации

INSTRUCTION CACHE BEHAVIOR AND BRANCH PREDICTION

Номер: US20220019435A1

Автор: Fei Fei, Guo Xiao Ping, HAOCHEN Gui, LI Hua Qing, Li Qi, Li Yi, Liu Yang, Wang Ting, Wang Yangming, Zeng Xiao Hua, Zhang Qing

Принадлежит:

Instruction cache behavior and branch prediction are used to improve the functionality of a computing device by profiling branching instructions in an instruction cache to identify likelihoods of proceeding to a plurality of targets from the branching instructions; identifying a hot path in the instruction cache based on the identified likelihoods; and rearranging the plurality of targets relative to one another and associated branching instructions so that a first branching instruction that has a higher likelihood of proceeding to a first hot target than to a first cold target and that previously flowed to the first cold target and jumped to the first hot target instead flows to the first hot target and jumps to the first cold target. 1. A computer-implemented method comprising: an origin instruction;', 'a cold target; and', 'a hot target located after the cold target in an instruction cache, wherein, based on historical data, the branching instruction is more likely to proceed to the hot target than to the cold target at execution based on a routing condition;, 'identifying a branching instruction associated withswapping an order of the cold target and the hot target in the instruction cache; andreversing the routing condition.2. The computer-implemented method of claim 1 , further comprising:in response to determining that the cold target flows to the hot target at execution, inserting a jump instruction from to the hot target in the instruction cache after the swapped cold target.3. The computer-implemented method of claim 1 , wherein execution of the branching instruction originally jumps to the hot target when the routing condition is a first one of true or false and originally flows to the cold target when the routing condition is a second one of true or false; andwherein after swapping the order and reversing the routing condition, the execution of the branching instruction jumps to the cold target when the routing condition is the second one of true or ...

Подробнее

Номер записи: 28

12-01-2017 дата публикации

MEMORY CONTROL UNIT AND DATA STORAGE APPARATUS INCLUDING THE SAME

Номер: US20170010960A1

Автор: AN Joong Hyun, JEONG Jae Hyeong, KIM Jae Woo, KIM Kwang Hyun

Принадлежит:

The memory control unit includes a descriptor fetch block suitable for fetching a descriptor from a volatile memory; an instruction fetch block suitable for fetching an instruction set from an instruction memory through an address information, wherein the instruction fetch block obtains the address information from the instruction memory through an index information included in the fetched descriptor; and a memory instruction generation block suitable for generating a memory instruction by combining a descriptor parameter value included in the fetched descriptor to the fetched instruction set. 1. A memory control unit comprisinga descriptor fetch block suitable for fetching a descriptor from a volatile memory;an instruction fetch block suitable for fetching an instruction set from an instruction memory through an address information, wherein the instruction fetch block obtains the address information from the instruction memory through an index information included in the fetched descriptor; anda memory instruction generation block suitable for generating a memory instruction by combining a descriptor parameter value included in the fetched descriptor to the fetched instruction set.2. The memory control unit of claim 1 , wherein the instruction memory includes an index region storing the address information claim 1 , and an instruction set region storing the instruction set.3. The memory control unit of claim 2 ,wherein the address information includes a start address and a count value for the instruction set within the instruction set region,wherein the instruction fetch block obtains the address information from the index region through the index information, andwherein the instruction fetch block fetches one or more instructions corresponding to the address information within the instruction set.4. The memory control unit of claim 3 , wherein the instruction fetch logic fetches the instruction set in whole or in part according to the address information.5. The ...

Подробнее

Номер записи: 29

12-01-2017 дата публикации

SYSTEMS AND METHODS FACILITATING REDUCED LATENCY VIA STASHING IN SYSTEM ON CHIPS

Номер: US20170010966A1

Автор: Mittal Millind

Принадлежит:

Systems and methods that facilitate reduced latency via stashing in multi-level cache memory architectures of systems on chips (SoCs) are provided. One method involves stashing, by a device includes a plurality of multi-processor central processing unit cores, first data into a first cache memory of a plurality of cache memories, the plurality of cache memories being associated with a multi-level cache memory architecture. The method also includes generating control information including: a first instruction to cause monitoring contents of a second cache memory of the plurality of cache memories to determine whether a defined condition is satisfied for the second cache memory; and a second instruction to cause prefetching the first data into the second cache memory of the plurality of cache memories based on a determination that the defined condition is satisfied. 1. A method , comprising:stashing, by a device comprising a plurality of multi-processor central processing unit cores, first data into a first cache memory of a plurality of cache memories, the plurality of cache memories being associated with a multi-level cache memory architecture;generating control information comprising a first instruction to cause monitoring contents of a second cache memory of the plurality of cache memories to determine whether a defined condition is satisfied for the second cache memory; andprefetching the first data from the first cache memory to the second cache memory based on execution of the first instruction.2. The method of claim 1 , wherein the prefetching and the generating are performed concurrently.3. The method of claim 1 , wherein the defined condition comprises the second cache memory failing to store the first data associated with a defined address.4. The method of claim 1 , wherein the first cache memory is a shared cache memory for two or more of the plurality of multi-processor CPU cores.5. The method of claim 1 , wherein the second cache memory is a per ...

Подробнее

Номер записи: 30

12-01-2017 дата публикации

USING L1 CACHE AS RE-ORDER BUFFER

Номер: US20170010967A1

Автор: OLORODE Oluleye, Ong Hung, Venkatasubramanian Ramakrishnan

Принадлежит:

A method is shown that eliminates the need for a dedicated reorder buffer register bank or memory space in a multi level cache system. As data requests from the L2 cache may be returned out of order, the L1 cache uses it's cache memory to buffer the out of order data and provides the data to the requesting processor in the correct order from the buffer. 17-. (canceled)8. A data processing system comprising:a central processing unit capable of generating memory data requests;a cache temporarily storing memory data for use by the central processing unit; store an ordered list of entries of pending memory data requests received from said central processing unit, said scoreboard operable to track outstanding requests, initiating an entry upon receiving a memory data request from said central processing unit and retiring an entry upon providing corresponding requested data to said central processing unit,', if so providing the data to said central processing unit from said cache, and', 'if not requesting the data from external memory, upon receiving data returned from the external memory in response to a memory data request, verify that the returned data is in response to the oldest pending data request:', 'if so providing the data to said central processing unit, and', 'if not storing the returned data in said cache and indicating that the received data is available in said cache., 'verify if the data in response to an oldest pending data request is available in said cache], 'a scoreboard connected to said central processing unit and said cache, said scoreboard operating to'}9. The data processing system of claim 8 , wherein:said scoreboard verifies the presence of the returned data in said cache in parallel with receiving the data thus eliminating cache stalls.10. The data processing system of claim 8 , wherein:each entry in said scoreboard contains a transaction ID (CID), a virtual address, a physical address, a cache way, and cache hit/miss information.11. The data ...

Подробнее

Номер записи: 31

12-01-2017 дата публикации

PROCESSOR WITH EFFICIENT PROCESSING OF RECURRING LOAD INSTRUCTIONS FROM NEARBY MEMORY ADDRESSES

Номер: US20170010971A1

Автор: Friedmann Jonathan, Mizrahi Noam

Принадлежит:

A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. Based on respective formats of the memory addresses specified in the symbolic expressions, a sequence of load instructions that access a predictable pattern of memory addresses in the external memory is identified. At least one cache line that includes a plurality of data values is retrieved from the external memory. Based on the predictable pattern, two or more of the data values that are requested by respective load instructions in the sequence are saved from the cache line to the internal memory. The saved data values are assigned to be served from the internal memory to one or more instructions that depend on the respective load instructions. 1. A method , comprising:in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions comprise symbolic expressions that specify memory addresses in an external memory in terms of one or more register names;identifying, based on the symbolic expressions of the memory addresses, a sequence of load instructions that access the external memory;further identifying, based on the symbolic expressions of the memory addresses, that the load instructions in the sequence will access a predictable pattern of different memory addresses in the external memory, wherein both the sequence of the load instructions and the predictable pattern are identified independently of numerical values of the memory addresses;in response to a given load instruction in the sequence, and before processing subsequent load instructions in the sequence, retrieving from the external memory at least one cache line that comprises a plurality of data values;saving from the cache line to an internal memory, based ...

Подробнее

Номер записи: 32

12-01-2017 дата публикации

PROCESSOR WITH EFFICIENT PROCESSING OF RECURRING LOAD INSTRUCTIONS

Номер: US20170010972A1

Автор: Friedmann Jonathan, Mizrahi Noam

Принадлежит:

A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. At least first and second load instructions that access a same memory address in the external memory are identified in the program code, based on respective formats of the memory addresses specified in the symbolic expressions of the load instructions. An outcome of at least one of the load instructions is assigned to be served from an internal memory in the processor. 1. A method , comprising:in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions comprise symbolic expressions that specify memory addresses in an external memory in terms of one or more register names;identifying in the program code at least first and second load instructions that access a same memory address in the external memory, based on respective formats of the memory addresses specified in the symbolic expressions of the load instructions; andassigning an outcome of at least one of the load instructions to be served from an internal memory in the processor.2. The method according to claim 1 , wherein identifying the first and second load instructions further comprises identifying that no store instruction accesses the same memory address between the first and second load instructions.3. The method according to claim 1 , wherein assigning the outcome comprises reading a value from the same memory address in response to the first load instruction claim 1 , saving the value in the internal memory claim 1 , and assigning the value in response to the second load instruction from the internal memory.4. The method according to claim 1 , wherein identifying the first and second load instructions comprises identifying that the symbolic expressions ...

Подробнее

Номер записи: 33

12-01-2017 дата публикации

PROCESSOR WITH EFFICIENT PROCESSING OF LOAD-STORE INSTRUCTION PAIRS

Номер: US20170010973A1

Автор: Friedmann Jonathan, Mizrahi Noam

Принадлежит:

A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. At least a store instruction and a subsequent load instruction that access the same memory address in the external memory are identified, based on respective formats of the memory addresses specified in the symbolic expressions. An outcome of at least one of the memory-access instructions is assigned to be served to one or more instructions that depend on the load instruction, from an internal memory in the processor. 1. A method , comprising:in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions comprise symbolic expressions that specify memory addresses in an external memory in terms of one or more register names;identifying at least a store instruction and a subsequent load instruction that access the same memory address in the external memory, based on respective formats of the memory addresses specified in the symbolic expressions; andassigning an outcome of at least one of the memory-access instructions, to be served to one or more instructions that depend on the load instruction, from an internal memory in the processor.2. The method according to claim 1 , wherein both the store instruction and the load instruction specify the memory address using the same symbolic expression.3. The method according to claim 1 , wherein the store instruction and the load instruction specify the memory address using different symbolic expressions.4. The method according to claim 1 , wherein both the store instruction and the load instruction are processed by the same hardware thread.5. The method according to claim 1 , wherein the store instruction and the load instruction are processed by different hardware threads.6. The ...

Подробнее

Номер записи: 34

08-01-2015 дата публикации

SYSTEMS AND METHODS FOR MANAGING DATA

Номер: US20150012692A1

Автор: Apte Manish R., Brown Michael F., JOSHI Vikram, Luan Yang, Vidwans Hrishikesh A.

Принадлежит: INTELLECTUAL PROPERTY HOLDINGS 2 LLC

Systems and methods for managing data input/output operations are described. In one aspect, a device driver identifies a data read operation generated by a virtual machine in a virtual environment. The device driver is located in the virtual machine and the data read operation identifies a physical cache address associated with the data requested in the data read operation. A determination is made regarding whether data associated with the data read operation is available in a cache associated with the virtual machine. 1. A computer-implemented method comprising:identifying a data read operation generated by a virtual machine in a virtual environment, wherein the data read operation is identified by a device driver within the virtual machine generating the data read operation; anddetermining whether data associated with the data read operation is available in a cache associated with the virtual machine.2. The computer-implemented method of claim 1 , further comprising retrieving the data associated with the data read operation from the cache if the cache contains the data associated with the data read operation.3. The computer-implemented method of claim 1 , further comprising determining a physical cache address associated with the data requested in the data read operation.4. The computer-implemented method of claim 1 , further comprising translating a storage I/O address associated with the data read operation into physical cache address using a cache tag associated with the storage I/O address.5. The computer-implemented method of claim 3 , wherein the cache tag is stored in the virtual machine.6. The computer-implemented method of claim 3 , wherein the cache tag is stored in the base operating system.7. A virtualized computing system comprising:a plurality of virtual machines, wherein each virtual machine includes an input/output driver for intercepting input/output operations associated with the virtual machine;a shared data cache; anda cache provisioner ...

Подробнее

Номер записи: 35

08-01-2015 дата публикации

Parallel, pipelined, integrated-circuit implementation of a computational engine

Номер: US20150012708A1

Автор: Rooyakkers Albert, RUBINSTEIN Jorge

Принадлежит:

Embodiments of the present invention are directed to parallel, pipelined, integrated-circuit implementations of computational engines designed to solve complex computational problems. One embodiment of the present invention is a family of video encoders and decoders (“codecs”) that can be incorporated within cameras, cell phones, and other electronic devices for encoding raw video signals into compressed video signals for storage and transmission, and for decoding compressed video signals into raw video signals for output to display devices. A highly parallel, pipelined, special-purpose integrated-circuit implementation of a particular video codec provides, according to embodiments of the present invention, a cost-effective video-codec computational engine that provides an extremely large computational bandwidth with relatively low power consumption and low-latency for decompression and compression of compressed video signals and raw video signals, respectively. 1. A integrated-circuit computational engine comprising:processing-element subcomponents, each of which carries out a high-level computational step of a stepwise computational process, the processing-element subcomponents arranged in one or more assembly-line-like series, which operate concurrently on different computational objects;an object cache that stores computational objects, the computational objects comprising data-structure values input to processing elements prior to each high-level computational step and output from processing elements during each high-level computational step; andan object bus that provides computational-object-level transmission transactions and through which computational objects are exchanged between the processing elements and the object cache.2. The integrated-circuit computational engine of implemented as a single integrated circuit.3. The integrated-circuit computational engine of implemented as two or more single-integrated-circuit computational engines.4. The integrated ...

Подробнее

Номер записи: 36

08-01-2015 дата публикации

Distribution of tasks among asymmetric processing elements

Номер: US20150012766A1

Автор: Douglas Carmean, Eric Sprangle, Herbert Hum, Rajesh Kumar

Принадлежит: Individual

Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system.

Подробнее

Номер записи: 37

12-01-2017 дата публикации

ATOMIC MEMORY UPDATE UNIT & METHODS

Номер: US20170011544A1

Автор: Clohset Steven J., Peterson Luke T., Redgrave Jason R.

Принадлежит:

In an aspect, an update unit can evaluate condition(s) in an update request and update one or more memory locations based on the condition evaluation. The update unit can operate atomically to determine whether to effect the update and to make the update. Updates can include one or more of incrementing and swapping values. An update request may specify one of a pre-determined set of update types. Some update types may be conditional and others unconditional. The update unit can be coupled to receive update requests from a plurality of computation units. The computation units may not have privileges to directly generate write requests to be effected on at least some of the locations in memory. The computation units can be fixed function circuitry operating on inputs received from programmable computation elements. The update unit may include a buffer to hold received update requests. 1. A method of graphics processing of a 3-D scene using ray tracing , comprising:executing a thread of computation in a programmable computation unit, wherein the executing of the thread comprises executing an instruction, from an instruction set defining instructions that can be used to program the programmable computation unit, the instruction causing issuance of an operation code including data that identifies a ray, one or more shapes, and an operation to be performed for the ray with respect to the one or more shapes, wherein the operation to be performed is selected from a pre-determined set of operations;buffering the operation code in a non-transitory memory; andreading the operation code and performing the operation specified by the operation code for the ray, within a logic module that executes independently of the programmable computation unit and is capable of performing operations consisting of the operations from the pre-determined set of operations.2. The machine-implemented method of graphics processing of claim 1 , wherein the operation to be performed for the ray ...

Подробнее

Номер записи: 38

14-01-2016 дата публикации

UNDEFINED INSTRUCTION RECODING

Номер: US20160011875A1

Автор: Sundar Shyam

Принадлежит:

A system and method for efficiently decoding and handling undefined instructions. A semiconductor chip predecodes instructions of a computer program. In response to determining a particular instruction is an undefined operation, the chip replaces an N-bit opcode in the particular instruction with an N-bit pattern different from the opcode. When instructions are fetched from an instruction cache, the corresponding opcodes are compared to the N-bit pattern. When a match is found, a trap may be set. The trap may later cause an exception handler subroutine for undefined operations to initiate execution. 1. A processor comprising:an instruction cache (i-cache) configured to store a plurality of instructions; and receive a first instruction;', 'in response to determining that the first instruction corresponds to an undefined operation, replace an opcode of the first instruction with one or more bits different from the opcode; and', 'store the first instruction with the replaced opcode in the i-cache., 'control logic configured to2. The processor as recited in claim 1 , wherein an indication the first instruction stored in the i-cache is an undefined operation is represented by the one or more bits different from the opcode within the first instruction.3. The processor as recited in claim 1 , wherein the control logic is further configured to:fetch a second instruction from the i-cache; anddetermine if an opcode of a second instruction corresponds to an undefined operation.4. The processor as recited in claim 3 , wherein determining the opcode of the second instruction corresponds to an undefined operation comprises determining the opcode of the second instructions matches the one or more bits.5. The processor as recited in claim 1 , wherein the one or more bits consist of a number of bits that varies in dependence on a size of the opcode.6. The processor as recited in claim 1 , wherein to determine the first instruction corresponds to an undefined operation claim 1 , the ...

Подробнее

Номер записи: 39

14-01-2016 дата публикации

DISTRIBUTED PROCESSING METHOD AND SYSTEM

Номер: US20160011980A1

Автор: FUJITA Kazuhisa, Furuta Tomonori, Itou Fumiaki, KOIKE Yasuo, Maeda Toshiyuki, Miyaji Tadahiro

Принадлежит: FUJITSU LIMITED

A disclosed information processing method is executed in a distributed processing system that processes data by plural information processing apparatuses. And the information processing method includes: obtaining, by a first information processing apparatus of the plural information processing apparatuses and from a second information processing apparatus that manages relations among data, identification information of first data that has a predetermined relation with second data and identification information of an information processing apparatus that manages the first data, upon detecting access to the second data managed by the first information processing apparatus; reading out, by the first information processing apparatus, the first data, upon determining that the information processing apparatus that manages the first data corresponds to the first information processing apparatus; and loading, by the first information processing apparatus, the first data into a cache. 1. An information processing method executed in a distributed processing system that processes data by a plurality of information processing apparatuses , comprising:obtaining, by a first information processing apparatus of the plurality of information processing apparatuses and from a second information processing apparatus that manages relations among data, identification information of first data that has a predetermined relation with second data and identification information of an information processing apparatus that manages the first data, upon detecting access to the second data managed by the first information processing apparatus;reading out, by the first information processing apparatus, the first data, upon determining that the information processing apparatus that manages the first data corresponds to the first information processing apparatus; andloading, by the first information processing apparatus, the first data into a cache.2. The information processing method as set forth in ...

Подробнее

Номер записи: 40

14-01-2016 дата публикации

VARIABLE HANDLES

Номер: US20160011982A1

Автор: Goetz Brian, Rose John Robert, Sandoz Paul

Принадлежит:

According to one technique, a virtual machine identifies a first instruction to create a variable handle instance, the first instruction including declaration information that identifies a type of receiver and a variable held by the receiver to which the variable handle instance is configured to provide access. If access to the variable is permissible, the virtual machine creates the variable handle instance comprising constrained functions configured to execute constrained operations on a memory location of the variable. The virtual machine identifies a second instruction that specifies a call to a particular constrained, wherein the second instruction specifies the receiver or is implicitly bound to the receiver. The virtual machine identifies a particular memory location where the instance of the variable is stored and performs the particular constrained function with respect to the particular memory location. 1. A method comprising:identifying a first instruction to create a variable handle instance, the first instruction including declaration information that identifies a type of a receiver and a variable held by the receiver to which the variable handle instance is configured to provide access;responsive to the first instruction, and based on the declaration information, performing one or more checks to determine whether access to the variable is permissible;in response to a determination that access to the variable is permissible, creating the variable handle instance, the variable handle instance comprising one or more constrained functions configured to execute constrained operations on a memory location of the variable;identifying a second instruction that specifies a call to a particular constrained function of the one or more constrained functions of the variable handle instance, wherein the second instruction indicates the receiver; andidentifying a particular memory location where the variable is stored and causing performance of the particular ...

Подробнее

Номер записи: 41

11-01-2018 дата публикации

TECHNIQUES FOR METADATA PROCESSING

Номер: US20180011708A1

Автор: DEHON Andre

Принадлежит:

Techniques are described for metadata processing that can be used to encode an arbitrary number of security policies for code running on a processor. Metadata may be added to every word in the system and a metadata processing unit may be used that works in parallel with data flow to enforce an arbitrary set of policies. In one aspect, the metadata may be characterized as unbounded and software programmable to be applicable to a wide range of metadata processing policies. Techniques and policies have a wide range of uses including, for example, safety, security, and synchronization. Additionally, described are aspects and techniques in connection with metadata processing in an embodiment based on the RISC-V architecture. 129-. (canceled)30. A method of processing instructions comprising:receiving a current instruction for metadata processing performed in a metadata processing domain that is isolated from a code execution domain including the current instruction, anddetermining, by the metadata processing domain in connection with metadata for the current instruction, whether to allow execution of the current instruction in accordance with a set of one or more policies, wherein the one or more policies include a set of rules that enforce execution of a complete sequence of instructions in a specified order from a first instruction of the complete sequence to a last instruction of the complete sequence.31. The method of claim 30 , further comprising:mapping a first shared physical page into a first virtual address space of a first process; andmapping the first shared physical page into a second virtual address space for a second process, said first shared physical page including a plurality of memory locations, wherein each of the plurality of memory locations is associated with one of a plurality of global metadata tags used in connection with rule processing in the metadata processing domain.32. The method of claim 31 , wherein the plurality of global metadata tags ...

Подробнее

Номер записи: 42

11-01-2018 дата публикации

STREAM REFERENCE REGISTER WITH DOUBLE VECTOR AND DUAL SINGLE VECTOR OPERATING MODES

Номер: US20180011709A1

Автор: Zbiciak Joseph

Принадлежит:

A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers. 1. A digital signal processor comprising:an instruction memory storing instructions each specifying a data processing operation and at least one data operand field;an instruction decoder connected to said instruction memory for sequentially recalling instructions from said instruction memory and determining said specified data processing operation and said specified at least one operand;at least one functional unit connected to said data register file and said instruction decoder for performing data processing operations upon at least one operand corresponding to an instruction decoded by said instruction decoder and storing results in an instruction specified data register; a first stream head register storing a data element of said stream next to be used, and', 'a second stream head register storing a data element of said stream next to be used following said data stored in said first stream head register; and, 'a streaming engine connected to said instruction decoder operable in response to a stream start instruction to recall from memory a stream of an instruction specified sequence of a plurality of data elements, said streaming engine including'} decode an instruction having a predetermined ...

Подробнее

Номер записи: 43

11-01-2018 дата публикации

CONTROL STATE PRESERVATION DURING TRANSACTIONAL EXECUTION

Номер: US20180011765A1

Автор: Bradbury Jonathan D., Gschwind Michael Karl, Schwarz Eric M., Shum Chung-Lung K.

Принадлежит:

A method includes saving a control state for a processor in response to commencing a transactional processing sequence, wherein saving the control state produces a saved control state. The method also includes permitting updates to the control state for the processor while executing the transactional processing sequence. Examples of updates to the control state include key mask changes, primary region table origin changes, primary segment table origin changes, CPU tracing mode changes, and interrupt mode changes. The method also includes restoring the control state for the processor to the saved control state in response to encountering a transactional error during the transactional processing sequence. In some embodiments, saving the control state comprises saving the current control state to memory corresponding to internal registers for an unused thread or another level of virtualization. A corresponding computer system and computer program product are also disclosed herein. 1. A method comprising:saving a control state for a processor in response to commencing a transactional processing sequence, wherein saving the control state produces a saved control state;permitting updates to the control state for the processor while executing the transactional processing sequence; andrestoring the control state for the processor to the saved control state in response to encountering a transactional error during the transactional processing sequence.2. The method of claim 1 , wherein saving the control state comprises saving the current control state to a backup set of internal control registers or registers corresponding to an unused thread or another level of virtualization.3. The method of claim 1 , wherein saving the control state comprises saving the current control state to a private location in memory.4. The method of claim 3 , wherein the private location is owned by an operating system thread or the central processing unit (CPU).5. The method of claim 1 , wherein ...

Подробнее

Номер записи: 44

19-01-2017 дата публикации

APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS

Номер: US20170017491A1

Автор: Ben-Kiki Oren, Markovich Dror, Pardo Ilan, Valentine Robert, Weissmann Eliezer, YOSEF YUVAL

Принадлежит:

An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time. 1. A processor comprising:a plurality of simultaneous multithreading (SMT) cores, each of the SMT cores to perform out-of-order instruction execution for a plurality of threads;at least one shared cache circuit to be shared among two or more the of SMT cores; an instruction fetch circuit to fetch instructions of one or more of the threads,', 'an instruction decode circuit to decode the instructions,', 'a register renaming circuit to rename registers of a register file,', 'an instruction cache circuit to store instructions to be executed, and', 'a data cache circuit to store data;, 'at least one of ...

Подробнее

Номер записи: 45

19-01-2017 дата публикации

APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS

Номер: US20170017492A1

Автор: Ben-Kiki Oren, Markovich Dror, Pardo Ilan, Valentine Robert, Weissmann Eliezer, YOSEF YUVAL

Принадлежит:

An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time. 1. A system comprising:a plurality of processors;a first interconnect to communicatively couple two or more of the plurality of processors;a second interconnect to communicatively couple one or more of the plurality of processors to one or more other system components; anda system memory communicatively coupled to one or more of the processors; a plurality of simultaneous multithreading (SMT) cores, each of the SMT cores to perform out-of-order instruction execution for a plurality of threads;', 'at least one shared cache circuit to be shared among two or more the of SMT cores;', an instruction ...

Подробнее

Номер записи: 46

21-01-2016 дата публикации

OPERATING POINT MANAGEMENT IN MULTl-CORE ARCHITECTURES

Номер: US20160018882A1

Автор: Lamdan Oren, Naveh Alon, Rotem Efraim

Принадлежит:

For one disclosed embodiment, a processor comprises a plurality of processor cores to operate at variable performance levels. One of the plurality of processor cores may operate at one time at a performance level different than a performance level at which another one of the plurality of processor cores may operate at the one time. The plurality of processor cores are in a same package. Logic of the processor is to set one or more operating parameters for one or more of the plurality of processor cores. Logic of the processor is to monitor activity of one or more of the plurality of processor cores. Logic of the processor is to constrain power of one or more of the plurality of processor cores based at least in part on the monitored activity. The logic to constrain power is to limit a frequency at which one or more of the plurality of processor cores may be set. Other embodiments are also disclosed.

Подробнее

Номер записи: 47

21-01-2016 дата публикации

Prefetching instructions in a data processing apparatus

Номер: US20160019065A1

Автор: Christopher Daniel Emmons, Mitchell Bryan HAYENGA

Принадлежит: ARM LTD

A data processing apparatus has prefetch circuitry for prefetching cache lines of instructions into an instruction cache. A prefetch lookup table is provided for storing prefetch entries, with each entry corresponding to a region of a memory address space and identifying at least one block of one or more cache lines within the corresponding region from which processing circuitry accessed an instruction on a previous occasion. When the processing circuitry executes an instruction from a new region, the prefetch circuitry looks up the table, and if it stores a prefetch entry for the new region, then the at least one block identified by the corresponding entry is prefetched into the cache.

Подробнее

Номер записи: 48

21-01-2016 дата публикации

Synchronizing a translation lookaside buffer with an extended paging table

Номер: US20160019164A1

Автор: Andrew V. Anderson, Camron Rust, Gilbert Neiger, Rajesh M. Sankaran, Richard Uhlig, Scott Dion Rodgers, Sebastian Schoenberg, Steven M. Bennett

Принадлежит: Intel Corp

A processor including logic to execute an instruction to synchronize a mapping from a physical address of a guest of a virtualization based system (guest physical address) to a physical address of the host of the virtualization based system (host physical address), and stored in a translation lookaside buffer (TLB), with a corresponding mapping stored in an extended paging table (EPT) of the virtualization based system.

Подробнее

Номер записи: 49

21-01-2016 дата публикации

SYNCHRONIZING A TRANSLATION LOOKASIDE BUFFER WITH AN EXTENDED PAGING TABLE

Номер: US20160019165A1

Автор: Anderson Andrew V., Bennett Steven M., Neiger Gilbert, Rodgers Scott Dion, Rust Camron, Sankaran Rajesh M., Schoenberg Sebastian, Uhlig Richard A.

Принадлежит:

A processor including logic to execute an instruction to synchronize a mapping from a physical address of a guest of a virtualization based system (guest physical address) to a physical address of the host of the virtualization based system (host physical address), and stored in a translation lookaside buffer (TLB), with a corresponding mapping stored in an extended paging table (EPT) of the virtualization based system.

Подробнее

Номер записи: 50

15-01-2015 дата публикации

Partitioned memory with shared memory resources and configurable functions

Номер: US20150019803A1

Автор: Dipak Sikdar, Jay Patel, Michael J Miller, Michael Morrison

Принадлежит: Mosys Inc

A memory device that includes an input interface that receives instructions and input data on a first plurality of serial links. The memory device includes a memory block having a plurality of banks, wherein each of the banks has a plurality of memory cells, and wherein the memory block has multiple ports. An output interface provides data on a second plurality of serial links. A cache coupled to the IO interface and to the plurality of banks, stores write data designated for a given memory cell location when the given memory cell location is currently being accessed, thereby avoiding a collision. Memory device includes one or more memory access controllers (MACs) coupled to the memory block and one or more arithmetic logic units (ALUs) coupled to the MACs. The ALUs perform one or more operations on data prior to the data being transmitted out of the IC via the IO, such as read/modify/write or statistics or traffic management functions, thereby reducing congestion on the serial links and offloading appropriate operations from the host to the memory device.

Подробнее

Номер записи: 51

18-01-2018 дата публикации

INFORMATION PROCESSING DEVICE, STORAGE MEDIUM, AND METHOD

Номер: US20180018153A1

Автор: MUKAI Yuta

Принадлежит: FUJITSU LIMITED

A device includes a processor configured to: divide loop in a program into first loop and second loop when compiling the program, the loop accessing data of an array and prefetching data of the array to be accessed at a repetition after prescribed repetitions at each repetition, the first loop including one or more repetitions from an initial repetition to a repetition immediately before the repetition after the prescribed repetitions, the second loop including one or more repetitions from the repetition after the prescribed repetitions to a last repetition, and generate an intermediate language code configured to access data of the array using a first region in a cache memory and prefetch data of the array using a second region in the cache memory in the first loop, and to access and prefetch data of the array using the second region in the second loop. 1. An information processing device comprising:a memory; and divide loop processing in a program into first loop processing and second loop processing when compiling the program, the loop processing accessing data of an array and prefetching data of the array to be accessed at a repetition processing after prescribed repetition processings at each repetition processing in the loop processing, the first loop processing including one or more repetition processings from an initial repetition processing to a repetition processing immediately before the repetition processing after the prescribed repetition processings, the second loop processing including one or more repetition processings from the repetition processing after the prescribed repetition processings to a last repetition processing, and', 'generate an intermediate language code based on the program when compiling the program, the intermediate language code being configured to access data of the array by using a first region in a cache memory and prefetch data of the array by using a second region in the cache memory in the first loop processing, and to ...

Подробнее

Номер записи: 52

18-01-2018 дата публикации

METHOD FOR INCREASING CACHE SIZE

Номер: US20180018265A1

Автор: Shribman Derry, Vilenski Ofer

Принадлежит:

A method for increasing storage space in a system containing a block data storage device, a memory, and a processor is provided. Generally, the processor is configured by the memory to tag metadata of a data block of the block storage device indicating the block as free, used, or semifree. The free tag indicates the data block is available to the system for storing data when needed, the used tag indicates the data block contains application data, and the semifree tag indicates the data block contains cache data and is available to the system for storing application data type if no blocks marked with the free tag are available to the system. 1. A method for using a resource by one or more applications , the resource comprising multiple resource components that are individually accessed and controlled by an operating system for being used by the one or more applications , each of the resource components is tagged using a first tag , a second tag , or a third tag , and each of the resource components is capable of being used by the one or more applications for a first purpose and a second purpose , for use with a request from an application by an operating system to use two resource components respectively for the first and second purposes , the method comprising the steps of:determining if a resource component associated with the first tag or with the second tag is available for use;responsive to the determining, notifying the application if no resource component in the resource is associated with the first tag or with the second tag;determining, by the operating system, if a first resource component associated with the first tag is available in the resource;if a first resource component associated with the first tag is available, then:selecting the first resource component associated with the first tag;using the selected first resource component by the application for the first purpose; andtagging the first resource component with the third tag;determining, by the ...

Подробнее

Номер записи: 53

26-01-2017 дата публикации

HARDWARE ACCELERATION COMPONENTS FOR TRANSLATING GUEST INSTRUCTIONS TO NATIVE INSTRUCTIONS

Номер: US20170024212A1

Автор: Abdallah Mohammad

Принадлежит:

A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache. 1. A hardware based translation accelerator , comprising:a guest fetch logic component for accessing a plurality of guest instructions;a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling the plurality of guest instructions into a guest instruction block;a plurality of conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block;a native cache coupled to the conversion tables for storing the corresponding native conversion block;a conversion look aside buffer coupled to the native cache for storing a mapping of a guest far branch in the guest instruction block to a corresponding native instruction;wherein upon a subsequent request for the guest far branch, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest far branch has a corresponding converted native instruction in the native cache; ...

Подробнее

Номер записи: 54

26-01-2017 дата публикации

PROCESSOR AND CONTROL METHOD OF PROCESSOR

Номер: US20170024215A1

Автор: Okazaki Ryohei, Suzuki Takashi

Принадлежит: FUJITSU LIMITED

A processor includes: an instruction execution unit that executes an instruction; and a branch prediction unit that stores history information indicating every instruction fetches performed a certain number of times before an instruction fetch of a branch prediction target instruction whether the instruction predicted as branch-taken is included and weight tables including weights corresponding to instructions and predicts the branch prediction target instruction to be taken or not-taken. The branch prediction unit, before the instruction fetch of the branch prediction target instruction, obtains the history information and the weights related to the instruction fetches performed the certain number of times to perform a product-sum operation, and at the time of the instruction fetch of the branch prediction target instruction, performs an operation of a result of the product-sum operation and a weight of the branch prediction target instruction to perform branch prediction. 1. A processor comprising:an instruction execution unit that executes an instruction;a branch prediction unit that stores history information and weight tables including weights corresponding to instructions, before an instruction fetch of a branch prediction target instruction is performed, obtains the history information and the weights related to instruction fetches performed a certain number of times before the instruction fetch of the branch prediction target instruction, performs a weight product-sum operation using the obtained history information and weights, and at the time of the instruction fetch of the branch prediction target instruction, performs an operation of a result obtained by the weight product-sum operation and a weight of the branch prediction target instruction, and predicts the branch prediction target instruction to be branch-taken or branch-not-taken, the history information indicating every instruction fetches performed the certain number of times whether the ...

Подробнее

Номер записи: 55

28-01-2016 дата публикации

DATA CACHE SYSTEM AND METHOD

Номер: US20160026469A1

Автор: Lin Kenneth Chenghao

Принадлежит:

A data cache system is provided. The system includes a central processing unit (CPU), a memory system, an instruction track table, a tracker and a data engine. The CPU is configured to execute instructions and read data. The memory system is configured to store the instructions and the data. The instruction track table is configured to store corresponding information of branch instructions stored in the memory system. The tracker is configured to point to a first data read instruction after an instruction currently being executed by the CPU. The data engine is configured to calculate a data address in advance before the CPU executes the data read instruction pointed to by the tracker. Further, the data engine is also configured to control the memory system to provide the corresponding data for the CPU based on the data address. 1. A data cache system , comprising:a central processing unit (CPU) configured to execute instructions and read data;a memory system configured to store the instructions and the data;an instruction track table configured to store corresponding information of branch instructions stored in the memory system;a tracker configured to point to a first data read instruction after an instruction currently being executed by the CPU; anda data engine configured to calculate a data address in advance before the CPU executes the data read instruction pointed to by the tracker, and control the memory system to provide the corresponding data for the CPU based on the data address.2. The system according to claim 1 , wherein:the memory system further includes an instruction cache, and the instruction cache is configured to store instructions for the CPU to execute and corresponding information of the data read instructions, wherein the corresponding information indicates whether the instruction is a data read instruction.3. The system according to claim 2 , wherein:the corresponding information of the data read instruction is type information of the data ...

Подробнее

Номер записи: 56

28-01-2016 дата публикации

CACHING BASED OPERATING SYSTEM INSTALLATION

Номер: US20160026474A1

Автор: Cremel Olivier A., Devine Christopher P., Hiltgen Daniel K.

Принадлежит: VMWARE, INC.

An image of system software is installed by loading an executable image of the system software using a boot loader, where the executable image includes a kernel and a plurality of files used by the kernel. The kernel of the system software is executed to generate the image of the system software that includes a copy of the kernel. Generating the image of the system software involves the steps of generating a plurality of pointers that each point to a different one of the files, retrieving the files using the pointers, and storing a copy of the kernel and the files in a storage device from which the system software is to be booted as the image of the system software. 1. A method of generating an image of system software from an executable image that includes a kernel , and a plurality of files used by the kernel , comprising:generating a plurality of pointers, each pointing to a different one of the files;retrieving the files using the pointers; andstoring a copy of the kernel and the files in a storage device from which the system software is to be booted as the image of the system software.2. The method of claim 1 , further comprising compressing each of the files upon retrieving the files using the pointers.3. The method of claim 1 , wherein the copy of the kernel is one of the files.4. The method of claim 1 , wherein the storage device is partitioned according to a partition table included in the executable image.5. The method of claim 1 , wherein the storage device is partitioned according to a set of partition rules included in the executable image.6. The method of claim 1 , wherein the copy of the kernel is stored in a first partition of the storage device and the files are stored in a second partition of the storage device.7. The method of claim 1 , wherein the copy of the kernel and the files are stored in a single partition of the storage device.8. The method of claim 1 , wherein the files include at least one directory object.9. A system for generating an ...

Подробнее

Номер записи: 57

26-01-2017 дата публикации

OPERAND CACHE FLUSH, EVICTION, AND CLEAN TECHNIQUES

Номер: US20170024323A1

Автор: Havlir Andrew M., Potter Terence M.

Принадлежит:

An apparatus includes an operand cache for storing operands from a register file for use by execution circuitry. In some embodiments, eviction priority for the operand cache is based on the status of entries (e.g., whether dirty or clean) and the retention priority of entries. In some embodiments, flushes are handled differently based on their retention priority (e.g., low-priority entries may be pre-emptively flushed). In some embodiments, timing for cache clean operations is specified on a per-instruction basis. Disclosed techniques may spread out write backs in time, facilitate cache clean operations, facilitate thread switching, extend the time operands are available in an operand cache, and/or improve the use of compiler hints, in some embodiments 1. An apparatus , comprising:an execution unit;a register file configured to store operands for instructions to be executed by the execution unit; and a hint field indicating a retention priority for the operand stored by the entry; and', 'a dirty field indicating whether the operand stored by the entry has been modified;, 'an operand cache that includes a plurality of entries configured to store source operands from the register file and result operands of operations by the execution unit, wherein entries of the operand cache include fields that comprise at least select, at a first priority level, the entry from clean entries associated with a first hint field value that indicates a first retention priority;', 'select, at a second priority level that is lower than the first priority level, the entry from dirty entries associated with a second hint field value that indicates a second retention priority that is higher than the first retention priority; and', 'select, at a third priority level that is lower than the second priority level, the entry from clean entries associated with the second hint field value that indicates the second retention priority., 'wherein the apparatus is configured to select an entry to evict ...

Подробнее

Номер записи: 58

26-01-2017 дата публикации

Electronic device and method for fabricating the same

Номер: US20170024336A1

Автор: Guk-Cheon Kim, Seung Mo Noh, Won-Joon Choi, Yang-Kon KIM

Принадлежит: SK hynix Inc

This technology provides a method for fabricating an electronic device. A method for fabricating an electronic device including a variable resistance element, which includes a free layer having a variable magnetization direction; a pinned layer having a first non-variable magnetization direction, and including first ferromagnetic materials and a first spacer layer interposed between adjacent two first ferromagnetic materials among the first ferromagnetic materials; a tunnel barrier layer interposed between the free layer and the pinned layer; a magnetic correction layer having a second magnetization direction which is anti-parallel to the first magnetization direction; and a third spacer layer interposed between the magnetic correction layer and the pinned layer, and providing an anti-ferromagnetic exchange coupling between the magnetic correction layer and the pinned layer.

Подробнее

Номер записи: 59

28-01-2016 дата публикации

INFORMATION PROCESSING DEVICE, MEMORY ORDER GUARANTEE METHOD, AND RECORDING MEDIUM STORING PROGRAM

Номер: US20160026571A1

Автор: FUKUYAMA Tomohisa

Принадлежит:

An information processing device includes a plurality of processors including an Acquire side processor and a Release side processor, and a shared memory. The Acquire side processor and the Release side processor includes a cache, a memory access control unit in the Release side processor configured to issue a StoreFence instruction for requesting a guarantee of completing the cache invalidation by the Acquire side processor, a memory access control unit in the Acquire side processor configured to issue a LoadFence instruction in response to the StoreFence instruction for guaranteeing completion of the cache invalidation in accordance with the invalidation request from the shared memory after completing a process for the cache invalidation, and an invalidation request control unit configured to perform a process for invalidating the cache in accordance with the invalidation request from the shared memory. 1. An information processing device comprising:a plurality of processors including an Acquire side processor intending to read data and a Release side processor intending to write data; anda shared memory,the Acquire side processor and the Release side processor includinga cache,a memory access control unit configured to control access from the processors to the shared memory, the memory access control unit in the Release side processor, configured to comprise a store counter whose value is increased if a Store instruction is issued to the shared memory, and is decreased if an acknowledgement response indicating correct reception of the Store instruction is received from the shared memory, and a wait counter whose value is set at a value representing a predetermined time if the store counter has come to indicate 0, the wait counter decreasing the value at every predetermined interval, and the predetermined time being determined such that, compared to a time since the shared memory's sending the invalidation request until the Acquire side processor's completing the ...

Подробнее

Номер записи: 60

28-01-2016 дата публикации

Using a decrementer interrupt to start long-running hardware operations before the end of a shared processor dispatch cycle

Номер: US20160026573A1

Автор: David A. Larson, Michael J. Vance, Stuart Z. Jacobs

Принадлежит: International Business Machines Corp

Method to perform an operation, the operation comprising processing a first logical partition on a shared processor for the duration of a dispatch cycle, issuing, by a hypervisor, at a predefined time prior to completion of the dispatch cycle, a lightweight hypervisor decrementer (HDEC) interrupt specifying a cache line address buffer location in a virtual processor, and responsive to the lightweight HDEC, writing, by the shared processor, a set of cache line addresses used by the first logical partition to the cache line address buffer location in the virtual processor.

Подробнее

Номер записи: 61

28-01-2016 дата публикации

GENERAL PURPOSE DIGITAL DATA PROCESSOR, SYSTEMS AND METHODS

Номер: US20160026574A1

Автор: Frank Steven J., Lin Hai

Принадлежит:

The invention provides improved data processing apparatus, systems and methods that include one or more nodes, e.g., processor modules or otherwise, that include or are otherwise coupled to cache, physical or other memory (e.g., attached flash drives or other mounted storage devices) collectively, “system memory.” At least one of the nodes includes a cache memory system that stores data (and/or instructions) recently accessed (and/or expected to be accessed) by the respective node, along with tags specifying addresses and statuses (e.g., modified, reference count, etc.) for the respective data (and/or instructions). The tags facilitate translating system addresses to physical addresses, e.g., for purposes of moving data (and/or instructions) between system memory (and, specifically, for example, physical memory-such as attached drives or other mounted storage) and the cache memory system. 1. A digital data processor or processing system comprisinga plurality of nodes that are communicatively coupled to one another,at least one of the nodes including a cache memory that stores at least one of data and instructions that are at least one of accessed and expected to be accessed by the respective node, andsystem memory that includes the cache memory of multiple ones of said plurality of nodes and that includes a mounted storage device communicatively coupled to at least one of the plurality of nodes,wherein the cache memory of said at least one node additionally stores tags specifying addresses for respective data or instructions in the system memory, said addresses forming part of a system address space that is common to the system memory including multiple ones of the plurality of nodes and to the mounted storage device.2. (canceled)3. The digital data processor or processing system of claim 1 , wherein the system memory comprises the cache memory of multiple nodes.4. (canceled)5. The digital data processor or processing system of claim 3 , wherein the tags specify one ...

Подробнее

Номер записи: 62

28-01-2016 дата публикации

CACHE LINE CROSSING LOAD TECHNIQUES FOR A CACHING SYSTEM

Номер: US20160026580A1

Автор: DOOLEY MILES R., Hrusecky David A.

Принадлежит: INTERNATIONAL BUSINESS MACHINES CORPORATION

A technique for handling an unaligned load operation includes detecting a cache line crossing load operation that is associated with a first cache line and a second cache line. In response to an cache including the first cache line but not including the second cache line, the second cache line is reloaded into the cache in a same set as the first cache line. In response to reloading the second cache line in the cache, a cache line crossing link indicator associated with the first cache line is asserted to indicate that both the first and second cache lines include portions of a desired data element. 17-. (canceled)8. A processor , comprising:a cache; and detect a cache line crossing load operation that is associated with a first cache line and a second cache line;', 'in response to the cache including the first cache line but not including the second cache line, reload the second cache line in the cache in a same set as the first cache line; and', 'in response to reloading, the second cache line in the cache, assert a cache line crossing link indicator that is associated with the first cache line to indicate that both the first and second cache lines include portions of a desired data element., 'a processor core coupled to the cache, wherein the processor is configured to9. The processor of claim 8 , wherein the processor is further configured to:detect, using a set hit prediction unit, at prefetch stream allocation whether a set in the cache was hit;in response to detecting that a set in the cache was hit, notifying a reload control unit to utilize the set indicated by the set hit for reloading the second cache line in the cache; andin response to not detecting that a set in the cache was hit, notifying the reload control unit to utilize a set indicated by a replacement policy unit for reloading the second cache line in the cache.10. The processor of claim 8 , wherein the cache line crossing link indicator that is associated with the first cache line is only ...

Подробнее

Номер записи: 63

28-01-2016 дата публикации

Using a decrementer interrupt to start long-running hardware operations before the end of a shared processor dispatch cycle

Номер: US20160026586A1

Автор: David A. Larson, Michael J. Vance, Stuart Z. Jacobs

Принадлежит: International Business Machines Corp

Systems, methods, and computer program products to perform an operation, the operation comprising processing a first logical partition on a shared processor for the duration of a dispatch cycle, issuing, by a hypervisor, at a predefined time prior to completion of the dispatch cycle, a lightweight hypervisor decrementer (HDEC) interrupt specifying a cache line address buffer location in a virtual processor, and responsive to the lightweight HDEC, writing, by the shared processor, a set of cache line addresses used by the first logical partition to the cache line address buffer location in the virtual processor.

Подробнее

Номер записи: 64

25-01-2018 дата публикации

DETERMINING THE EFFECTIVENESS OF PREFETCH INSTRUCTIONS

Номер: US20180024836A1

Автор: Gschwind Michael K., Jacobi Christian, Saporito Anthony, Shum Chung-Lung K., Slegel Timothy J.

Принадлежит:

Effectiveness of prefetch instructions is determined. A prefetch instruction is executed to request that data be fetched into a cache of the computing environment. The effectiveness of the prefetch instruction is determined. This includes updating, based on executing the prefetch instruction, a cache directory of the cache. The updating includes, in the cache directory, effectiveness data relating to the data. The effectiveness data includes whether the data was installed in the cache based on the prefetch instruction. Additionally, the determining the effectiveness includes obtaining at least a portion of the effectiveness data from the cache directory, and using the at least a portion of effectiveness data to determine the effectiveness of the prefetch instruction. 1. A computer program product for facilitating processing within a computing environment , said computer program product comprising: executing a prefetch instruction to request that data be fetched into a cache of the computing environment; and', updating, based on executing the prefetch instruction, a cache directory of the cache, the updating comprising including, in the cache directory, effectiveness information relating to the data, the effectiveness information including whether the data was installed in the cache based on the prefetch instruction;', 'obtaining at least a portion of the effectiveness information from the cache directory; and', 'using the at least a portion of the effectiveness information to determine the effectiveness of the prefetch instruction., 'determining effectiveness of the prefetch instruction, the determining the effectiveness comprising], 'a computer readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising2. The computer program product of claim 1 , wherein based on executing the prefetch instruction and the data being missing from the cache claim 1 , the data is installed ...

Подробнее

Номер записи: 65

25-01-2018 дата публикации

INSTRUCTION TO QUERY CACHE RESIDENCY

Номер: US20180024933A1

Автор: Greiner Dan F., Gschwind Michael K., Jacobi Christian, Saporito Anthony, Shum Chung-Lung K., Slegel Timothy J.

Принадлежит:

A query is performed to obtain cache residency and/or other information regarding selected data. The data to be queried is data of a cache line, prefetched or otherwise. The capability includes a Query Cache instruction that obtains cache residency information and/or other information and returns an indication of the requested information. 1. A computer program product for executing instructions in a computing environment , said computer program product comprising: [ an address of data to be queried; and', 'a plurality of parameters to control searching for the address in one or more caches, the plurality of parameters including a cache level parameter and a control parameter, the cache level parameter providing a cache level indication specifying a particular cache level to commence searching for the address and specifying that one or more other cache levels are available to be searched, and the control parameter indicating up to a selected cache level to be searched for the address; and, 'obtaining, by a processor, an instruction for execution in the computing environment, the instruction configured to provide, obtaining the address;', 'searching one or more caches on one or more cache levels for the address, wherein the searching is controlled by the cache level parameter and the control parameter and is not to go beyond the selected cache level defined by the control parameter for searching the address; and', 'returning information based on the searching., 'executing the instruction, the executing including], 'a computer readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising2. The computer program product of claim 1 , wherein the instruction includes an address field and the address is provided by the address field of the instruction.3. The computer program product of claim 1 , wherein the instruction includes a cache level field and the cache level parameter is ...

Подробнее

Номер записи: 66

25-01-2018 дата публикации

ADAPTIVE TABLEWALK TRANSLATION STORAGE BUFFER PREDICTOR

Номер: US20180024941A1

Автор: Levinsky Gideon, Pape John, Shah Manish, Smolens Jared

Принадлежит:

A system for generating predictions for a hardware table walk to find a map of a given virtual address to a corresponding physical address is disclosed. The system includes a plurality memories, which each includes respective plurality of entries, each of which includes a prediction of a particular one of a plurality of buffers which includes a portion of a virtual to physical address translation map. A first circuit may generate a plurality of hash values to retrieve a plurality of predictions from the plurality of memories, where each has value depends on a respective address and information associated with a respective thread. A second circuit may select a particular prediction of the retrieved predictions to use based on a history of previous predictions. 1. An apparatus , comprising:a plurality of memories, wherein each memory includes a plurality of entries, wherein each entry of the plurality of entries includes a respective prediction of a plurality of predictions, wherein each prediction of the plurality of predictions includes information identifying a given one of a plurality of buffers; receive a plurality of addresses;', 'generate a plurality of hash values, wherein each hash value is dependent upon a respective address of the plurality of addresses, and identification information associated with a respective process of a plurality of processes; and', 'retrieve a respective prediction of a plurality of retrieved predictions from each one of the plurality of memories dependent upon a respective one of the plurality of hash values; and, 'a first circuit configured toa second circuit configured to select a given prediction of the plurality of retrieved predictions dependent upon a history of previous predictions.2. The apparatus of claim 1 , wherein to generate the plurality of hash values claim 1 , the first circuit is further configured to generate a second hash value of the plurality of hash values dependent upon a first hash value of the plurality of ...

Подробнее

Номер записи: 67

10-02-2022 дата публикации

Techniques For Metadata Processing

Номер: US20220043654A1

Автор: "DeHon Andre", Boling Eli

Принадлежит:

Techniques are described for metadata processing that can be used to encode an arbitrary number of security policies for code running on a processor. Metadata may be added to every word in the system and a metadata processing unit may be used that works in parallel with data flow to enforce an arbitrary set of policies. In one aspect, the metadata may be characterized as unbounded and software programmable to be applicable to a wide range of metadata processing policies. Techniques and policies have a wide range of uses including, for example, safety, security, and synchronization. Additionally, described are aspects and techniques in connection with metadata processing in an embodiment based on the RISC-V architecture. 1. A method of processing instructions comprising:establishing a metadata processing domain that is separated and isolated from an associated instruction processing domain;establishing at least one control/status register (CSR) configured to facilitate an exchange of information between the metadata processing domain and the instruction processing domain;receiving from the instruction processing domain, for metadata processing, a current instruction with an associated metadata tag, the metadata processing being performed in the metadata processing domain;determining, in the metadata processing domain and in accordance with the current instruction and metadata tags associated with the current instruction, whether a rule exists in a rule cache for the current instruction, the rule cache including rules on metadata used by said metadata processing to define allowed operations; andresponsive to determining that no rule exists in the rule cache for the current instruction, performing rule cache miss processing in the metadata processing domain, wherein the rule cache miss processing includes performing first rule cache miss processing for a first set of one or more rules using a rule cache miss handler, the rule cache miss handler generating at least one ...

Подробнее

Номер записи: 68

24-01-2019 дата публикации

DUAL DATA STREAMS SHARING DUAL LEVEL TWO CACHE ACCESS PORTS TO MAXIMIZE BANDWIDTH UTILIZATION

Номер: US20190026111A1

Автор: Anderson Timothy, Zbiciak Joseph

Принадлежит:

A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck. 1. A processing device comprising:a memory;first and second command queues each being configured to supply memory addresses;an arbiter coupled to the first command queue and to the second command queue to receive the memory addresses supplied by the first and second command queues, the arbiter being configured to select a memory address supplied by either the first command queue or the second command queue as a selected memory address based at least partially on a selected preference setting applied to the arbiter; andan interface coupled to the arbiter and to the memory, the interface being configured to submit a memory access request corresponding to the selected memory address to the memory.2. The processing device of claim 1 , further comprising an arbitration controller coupled to the arbiter claim 1 , the arbitration controller being configured to apply to the arbiter a selected preference claim 1 , wherein the selected preference setting of the arbiter is selected as one of a first preference setting in which the first command queue is a preferred command queue and the second command queue is a non-preferred command queue and a second preference setting in which the second command queue is a preferred command queue and the ...

Подробнее

Номер записи: 69

24-01-2019 дата публикации

LOW GRANULARITY COARSE DEPTH TEST EFFICIENCY ENHANCEMENT

Номер: US20190026855A1

Автор: Appu Abhishek R., Chivukula Vamsee Vardhan, Koker Altug, Mandal Saikat, Ranganathan Vasanth, Ray Joydeep, Sharma Saurabh, Surti Prasoonkumar, Szerszen Karol A.

Принадлежит: Intel Corporation

Briefly, in accordance with one or more embodiments, an apparatus comprises a processor to compute depth values for one or more 4×4 blocks of pixels using 16 source interpolators and 8 destination interpolators on an incoming fragment of pixel data if the destination is in min/max format, and a memory to store a depth test result performed on the one or more 4×4 blocks of pixels. Otherwise the processor is to compute depth values for one or more 8×4 blocks of pixels using 16 source interpolators and 16 destination interpolators if the destination is in plane format. 1. (canceled)2. An apparatus comprising:a processor to compute depth values for one or more blocks of pixels using a first number of source interpolators and a second number of destination interpolators in response to a destination of the one or more blocks of pixels having a min/max format; andmemory to store depth test results for the one or more blocks of pixels,wherein the processor is to read min/max values for the one or more blocks of pixels from the memory and to return a destination value for each of the one or more blocks of pixels in response to the destination having the min/max format.3. The apparatus of claim 2 , wherein the one or more blocks of pixels comprise 4×4 blocks of pixels.4. The apparatus of claim 2 , wherein the first number of source interpolators is 16 and the second number of destination interpolators is 8.5. The apparatus of claim 2 , wherein the first number of source interpolators is larger than the second number of destination interpolators.6. The apparatus of claim 2 , wherein the first number of source interpolators is double the second number of destination interpolators.7. The apparatus of claim 2 , wherein the memory comprises a coarse depth buffer.8. The apparatus of claim 2 , wherein the memory comprises a bank-annotation buffer.9. The apparatus of claim 2 , wherein the processor is to summarize an intermediate depth test result for the one or more blocks of pixels ...

Подробнее

Номер записи: 70

23-01-2020 дата публикации

HIERARCHICAL GENERAL REGISTER FILE (GRF) FOR EXECUTION BLOCK

Номер: US20200026514A1

Автор: Appu Abhishek R., Hoekstra Eric J., HURD LINDA L., Koker Altug, Lueh Guei-yuan, Maiyuran Subramaniam, Pal Supratim, Puffer David, Ray Joydeep, Schluessler Travis T., Sinha Kamal, Surti Prasoonkumar, Veernapu Kiran C.

Принадлежит: Intel Corporation

In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed. 1. (canceled)2. A general purpose graphics processor , comprising:a general-purpose graphics processing compute block comprising a plurality of processing resources to execute compute instructions, wherein each of the plurality of processing resources comprises a local general register file (GRF) which operates at a first speed;a shared general register file (GRF) communicatively coupled to the plurality of processing resources, wherein the shared GRF operates at a second speed, slower than the first speed.3. The general purpose graphics processor of claim 2 , wherein each of the plurality of processing resources is to retrieve data from the shared GRF when the data is unavailable in the local GRF.4. The general purpose graphics processor of claim 2 , wherein:the local GRF operates at a first power level; andthe shared GRF operates at a second power level, lower than the first power level.5. The general purpose graphics processor of claim 2 , wherein:the local GRF and the shared GRF are separate memory structures.6. The general purpose graphics processor of claim 2 , wherein:the local GRF and the shared GRF are embodied in a single memory structure.7. The general purpose graphics processor of claim 2 , wherein:the shared GRF comprises a virtualized address space.8. The general purpose graphics processor of claim 2 , wherein:each of the plurality of processing resources is to detect a data context switch between two or more of the plurality of processing resources, and in response to the data context switch, to redirect a data inquiry from the local GRF to a remote memory device9. The general purpose graphics processor of claim 2 , further comprising:a shared cache memory ...

Подробнее

Номер записи: 71

23-01-2020 дата публикации

REGULATING HARDWARE SPECULATIVE PROCESSING AROUND A TRANSACTION

Номер: US20200026558A1

Автор: Busaba Fadi Y., Gschwind Michael Karl, Schwarz Eric M., Shum Chung-Lung K.

Принадлежит:

A transaction is detected. The transaction has a begin-transaction indication and an end-transaction indication. If it is determined that the begin-transaction indication is not a no-speculation indication, then the transaction is processed. 1. A method comprising:determining, by one or more computer processors, that instructions preceding a transaction have not completed;prohibiting, by one or more computer processors, the transaction from being processed until a determination is made that indicates that all pending outside instructions are not, or are no longer, being processed in a speculative manner; anddetermining, by one or more computer processors, that an end-transaction indication associated with the transaction indicates an end to a period of no-speculation transaction processing.2. The method of claim 1 , wherein the transaction comprises two or more instructions to be processed atomically on a data structure in a memory.3. The method of claim 1 , the method comprising:determining, by one or more computer processors, that a begin-transaction indication associated with a transaction is a no-speculation indication, wherein the begin-transaction indication is selected from the group consisting of: a new instruction, a new prefix instruction, or a variant of an instruction in a current instruction set architecture.4. The method of claim 1 , the method comprising:determining, by one or more computer processors, that the instructions preceding the transaction have completed; andresponsive to determining that the instructions preceding the transaction have completed, processing, by one or more computer processors, the transaction.5. The method of claim 4 , the method comprising:responsive to processing the transaction, determining, by one or more computer processors, whether an end-transaction indication associated with the transaction is a no-speculation indication; andresponsive to determining that the end-transaction indication is the no-speculation ...

Подробнее

Номер записи: 72

23-01-2020 дата публикации

EFFICIENT SILENT DATA TRANSMISSION BETWEEN COMPUTER SERVERS

Номер: US20200026656A1

Автор: Liao Mengze, Liu Yang, Yu Jiang

Принадлежит:

Aspects of the invention include receiving a request to transfer data from a first storage device, coupled to a sending server, to a second storage device, coupled to a receiving server. The data is transferred from the first storage device to the second storage device in response to the request. The transferring includes allocating a first temporary memory on the sending server and moving the data from the first storage device to the first temporary memory. The transferring also includes initiating a remote direct memory access (RDMA) between the first temporary memory and a second temporary memory on the second server. The RDMA causes the data to be transferred from the first temporary memory to the second temporary memory independently of an operating system executing on a processor of the sending server or the receiving server. The transferring further includes receiving a notification that the transfer completed. 1. A method comprising:receiving a request to transfer data from a first storage device to a second storage device, the first storage device coupled to a sending server and the second storage device coupled to a receiving server; and allocating a first temporary memory on the sending server;', 'moving the data from the first storage device to the first temporary memory;', 'initiating a transfer of the data to the second storage device via a remote direct memory access (RDMA) between the first temporary memory and a second temporary memory on the second server, the RDMA causing the data to be transferred from the first temporary memory to the second temporary memory independently of an operating system executing on a processor of the sending server and independently of an operating system executing on a processor of the receiving server; and', 'receiving a notification that the transfer to the second storage device was completed., 'transferring the data from the first storage device to the second storage device in response to the request, the ...

Подробнее

Номер записи: 73

28-01-2021 дата публикации

VIRTUAL NETWORK PRE-ARBITRATION FOR DEADLOCK AVOIDANCE AND ENHANCED PERFORMANCE

Номер: US20210026768A1

Автор: Chirca Kai, PIERSON Matthew David, Wu Daniel

Принадлежит:

A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to, in a first clock cycle, a pre-arbitration winner between a first memory access request and a second memory access request based on a first number of credits allocated to a first destination device and a second number of credits allocated to a second destination device. The arbiter circuit is further configured to, in a second clock cycle select a final arbitration winner from among the pre-arbitration winner and a subsequent memory access request based on a comparison of a priority of the pre-arbitration winner and a priority of the subsequent memory access request. 1. An integrated circuit device comprising:a set of processor interfaces;a data path configured to couple the set of processor interfaces to a shared resource; receive a set of requests via the set of processor interfaces;', 'select a first request from among the set of requests for service over the data path;', 'receive a subsequent request after the set of requests is received;', 'select a second request from among the first request and the subsequent request for service over the data path; and', 'cause the data path to service the second request., 'an arbiter circuit coupled to the set of processor interfaces and the data path and configured to2. The integrated circuit device of claim 1 , wherein the arbiter circuit is configured to select the first request based on a first set of criteria and select the second request based on a second set of criteria that is different from the first set of criteria.3. The integrated circuit device of claim 2 , wherein the arbiter circuit is configured to select the first request based on the first set of criteria by:determining a respective credit cost for each ...

Подробнее

Номер записи: 74

28-01-2021 дата публикации

INSTRUCTION CACHE COHERENCE

Номер: US20210026770A1

Автор: ISHII Yasuo, RAFACZ Matthew Andrew

Принадлежит:

A data processing apparatus is provided, which includes a cache to store operations produced by decoding instructions fetched from memory. The cache is indexed by virtual addresses of the instructions in the memory. Receiving circuitry receives an incoming invalidation request that references a physical address in the memory. Invalidation circuitry invalidates entries in the cache where the virtual address corresponds with the physical address. Coherency is thereby achieved when using a cache that is indexed using virtual addresses. 1. A data processing apparatus comprising:a cache to store operations produced by decoding instructions fetched from memory, wherein the cache is indexed by virtual addresses of the instructions in the memory;receiving circuitry to receive an incoming invalidation request, wherein the incoming invalidation request references a physical address in the memory; andinvalidation circuitry to invalidate entries in the cache where the virtual address corresponds with the physical address.2. The data processing apparatus according to claim 1 , comprisingcorrespondence table storage circuitry to store indications of physical addresses of the instructions fetched from the memory.3. The data processing apparatus according to claim 2 , whereinin response to the incoming invalidation request, the invalidation circuitry is adapted to determine a correspondence table index of the correspondence table storage circuitry that corresponds with the physical address referenced in the incoming invalidation request.4. The data processing apparatus according to claim 3 , whereinthe cache is adapted to store one of the operations in association with one of the correspondence table indexes of the correspondence table storage circuitry containing one of the indications of the physical addresses of one of the instructions that decodes to the one of the operations.5. The data processing apparatus according to claim 2 , whereinthe indications of the physical ...

Подробнее

Номер записи: 75

28-01-2021 дата публикации

AN APPARATUS AND METHOD FOR STORING BOUNDED POINTERS

Номер: US20210026773A1

Автор: SMITH Lee Douglas

Принадлежит:

An apparatus and method are provided for storing bounded pointers. One example apparatus comprises a storage comprising storage elements to store bounded pointers, each bounded pointer comprising a pointer value and associated attributes including at least range information, and processing circuitry to store a bounded pointer in a chosen storage element. The storing process comprises storing in the chosen storage element a pointer value of the bounded pointer, and storing in the storage element the range information of the bounded pointer, such that the range information indicates both a read range of the bounded pointer and a write range of the bounded pointer that differs to the read range. The read range comprises at least one memory address for which reading is allowed when using the bounded pointer, and the write range comprises at least one memory address to which writing is allowed when using the bounded pointer. 1. An apparatus comprising:a storage comprising storage elements to store bounded pointers, each bounded pointer comprising a pointer value and associated attributes including at least range information; and storing in the chosen storage element a pointer value of the bounded pointer;', the read range comprises at least one memory address for which reading is allowed when using the bounded pointer; and', 'the write range comprises at least one memory address to which writing is allowed when using the bounded pointer., 'storing in the chosen storage element the range information of the bounded pointer, such that the range information indicates both a read range of the bounded pointer and a write range of the bounded pointer that differs to the read range, wherein], 'processing circuitry to store a bounded pointer in a chosen storage element, said storing comprising2. An apparatus according to claim 1 , wherein either:the write range is a proper subset of the read range; orthe read range is a proper subset of the write range.3. An apparatus according ...

Подробнее

Номер записи: 76

28-01-2021 дата публикации

STREAMING ENGINE WITH MULTI DIMENSIONAL CIRCULAR ADDRESSING SELECTABLE AT EACH DIMENSION

Номер: US20210026776A1

Автор: Zbiciak Joseph

Принадлежит:

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops. 1. A device comprising:a stream template register configured to store a stream template that includes a set of addressing mode indicators that includes a respective addressing mode indicator for each loop of a set of nested loops; for each loop of the set of nested loops, a control word circuit configured to, in response to the respective addressing mode indicator of the loop, provide a control word that specifies a memory block size of the loop; and', 'an adder circuit coupled to the control word circuit to receive the control word and configured to circularly traverse a memory region according to the control word such that the address generator provides a set of addresses for the set of nested loops; and, 'an address generator coupled to the stream template register to receive the set of addressing mode indicators, wherein the address generator includesa memory interface coupled to the address generator to receive the set of addresses and configured to retrieve a set of data associated with the set of addresses from a memory.2. The device of claim 1 , wherein:the stream template further includes a first block size and a second block size; andeach of the set of addressing mode indicators selects between the first block size and a function of the first block size and the second block size.3. The device of claim 2 , wherein claim 2 , for each loop of the set of nested loops claim 2 , the respective control word circuit includes:an adder coupled to add the first block size and the second block size to provide a sum;a multiplexer coupled to ...

Подробнее

Номер записи: 77

29-01-2015 дата публикации

METHOD FOR FLASH COMPRESSED INSTRUCTION CACHING FOR LIMITED RAM/FLASH DEVICE ARCHITECTURES

Номер: US20150032945A1

Автор: Marconcini Stefano

Принадлежит: THOMSON LICENSING

Compression and the caching of decompressed code in RAM is described by using an uncompressed paged instruction caching fault method to keep all of code compressed in a FLASH memory. The method only decompresses and caches in DRAM memory the portion of code that is miming at a certain instance in time (i.e., DRAM window), which maintains a pre-fetched portion of code based on static windowing FLASH. 1. A method for memory management in a device , the method comprising the steps of:{'b': '12', 'caching uncompressed code from a FLASH memory in the device to a Dynamic Random Access Memory (DRAM) in the device ();'}{'b': '14', 'maintaining compressed code in said FLASH (); and'}{'b': '14', 'caching said uncompressed code in DRAM during a period of time while starting up said device ().'}212. The method of claim 1 , wherein said caching uncompressed code () comprises:{'b': '20', 'dimensioning of the DRAM memory area for the uncompressed code (); and'}applying a pass operation at a compilation time to generate executable code from the DRAM cache.320. The method of claim 2 , wherein the applying a pass operation () restructures the executable code claim 2 , said pass operation further comprises:{'b': '32', 'embedding one or more jumps to run-time support ();'}{'b': '33', 'assimilating pages of code resident in certain areas of FLASH to FLASH blocks of the FLASH component ();'}{'b': '34', 'building runtime support tables (); and'}{'b': '36', 'building compressed code and prefetchable pages ().'}414. The method of claim 3 , wherein said maintaining code compressed in FLASH () step further comprises:{'b': '40', 'loading code residing in the assimilated pages based on a predefined fixed number of prefetched pages from said FLASH to a predefined caching area in said DRAM ().'}540. The method of claim 4 , wherein said loading () further comprises:{'b': '26', 'decompressing pages from a compressed cache buffer to a decompressed cache buffer of allocated DRAM for code execution ...

Подробнее

Номер записи: 78

02-02-2017 дата публикации

SPECULATIVE CACHE MODIFICATION

Номер: US20170031827A1

Автор: JR. James E., McCormick

Принадлежит:

In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing a speculative cache modification design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a cache communicably interfaced with the data bus; a pipeline communicably interfaced with the data bus, in which the pipeline is to receive a store instruction corresponding to a cache line to be written to cache; caching logic to perform a speculative cache write of the cache line into the cache before the store instruction retires from the pipeline; and cache line validation logic to determine if the cache line written into the cache is valid or invalid, in which the cache line validation logic is to invalidate the cache line speculatively written into the cache when determined invalid and further in which the store instruction is allowed to retire from the pipeline when the cache line is determined to be valid. 1. An integrated circuit comprising:a data bus;a cache communicably interfaced with the data bus;a pipeline communicably interfaced with the data bus, the pipeline to receive a store instruction corresponding to a cache line to be written to cache;caching logic to perform a speculative cache write of the cache line into the cache before the store instruction retires from the pipeline; andcache line validation logic to determine if the cache line written into the cache is valid or invalid, wherein the cache line validation logic to invalidate the cache line speculatively written into the cache when determined invalid and wherein the store instruction is allowed to retire from the pipeline when the cache line is determined valid.2. The integrated circuit of claim 1 , wherein the integrated circuit comprises a central processing unit for one of a tablet computing device or a smartphone.3. The integrated circuit of :wherein the store instruction corresponding to the cache line to be ...

Подробнее

Номер записи: 79

02-02-2017 дата публикации

HYBRID COMPUTING MODULE

Номер: US20170031843A1

Автор: de Rochemont L. Pierre, Kovacs Alexander J.

Принадлежит:

A hybrid system-on-chip provides a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fully integrated power management system that switches DC power at speeds that match or approach processor core clock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die. 1. A general purpose computational operating system that comprises a hybrid computer module , which further includes: a microprocessor die mounted on the chip carrier;', 'a memory bank consisting of at least one discrete memory die mounted on the semiconductor chip carrier adjacent to the microprocessor die;', 'a fully integrated power management module having a resonant gate transistor embedded within it that synchronously transfers data from main memory to the microprocessor at the processor clock speed;', 'a memory management architecture and operating system that compiles program stacks as a collection of pointers to the addresses where elemental code blocks are stored in main memory;', 'a memory controller that sequentially references the pointers stored within the program stacks and fetches a copy of the program stack item referenced by the pointer from main memory and loads the copy into a microprocessor die;', 'an interrupt bus that halts the loading process when an alert to a program jump or change to a global variable is registered and sends a memory management variable to a look-up table;', 'a look-up table that redirects the controller to a new program stack following a program jump before it reinitiates the loading process;', 'a look-up table that fetches and stores the change to a global variable at its primary location in main memory before it reinitiates the loading process:, 'a semiconductor chip carrier having electrical traces and passive component networks monolithically formed on the surface of the carrier substrate to maintain and manage electrical signal communications betweenwherein program ...

Подробнее

Номер записи: 80

02-02-2017 дата публикации

HYBRID COMPUTING MODULE

Номер: US20170031844A1

Автор: de Rochemont L. Pierre, Kovacs Alexander J.

Принадлежит:

A hybrid system-on-chip provides a plurality of memory and processor die mounted on a semiconductor carrier chip that contains a fay integrated power management system that switches DC power at speeds that match or approach processor core clock speeds, thereby allowing the efficient transfer of data between off-chip physical memory and processor die. 1. A general purpose stack machine computing module having an operating system that comprises;a hybrid computer module, which includes: an application-specific integrated circuit (ASIC) processor die mounted on the chip carrier that is designed with machine code that matches and supports a structured programming language so it functions as the general purpose stack machine processor;', 'a main memory bank consisting of at least one discrete memory die mounted on the semiconductor chip carrier adjacent to the ASIC processor die;', 'a fully integrated power management module having a resonant gate transistor embedded within it that synchronously transfers data from main memory to the ASIC processor die at the processor clock speed;', 'a memory management architecture and operating system that compiles program stacks as a collection of pointers to the addresses where elemental code blocks are stored at a primary location in main memory;', 'a memory controller that sequentially references the pointers stored within the program stacks and fetches a copy of the item referenced by the pointer in the program stack from main memory and loads the copy into a microprocessor die;', 'an interrupt bus that halts the loading process when an alert to a program jump or change to a global variable is registered and sends a memory management variable to a look-up table;', 'a look-up table that redirects the controller to a new program stack following a program jump before it reinitiates the loading process;', 'a look-up table that fetches and stores the change to a global variable at its primary location in main memory before it ...

Подробнее

Номер записи: 81

04-02-2016 дата публикации

Method for scheduling operation of a solid state disk

Номер: US20160034190A1

Автор: Hsiao Yi-Long, LIN Cheng-Yi

Принадлежит:

A method for scheduling operations of a solid state disk includes receiving accessing operations from a host, temporarily storing the accessing operations, setting a higher priority to the accessing operations having a shorter operation time, rearranging sequence of the accessing operations according to the set priorities, distributing the accessing operations to corresponding flash memories to process data according to the accessing operations, and transmitting processed data to the host to increase efficiency of the accessing operations. 1. A method for scheduling operations of a solid state disk , comprising:receiving accessing operations from a host;temporarily storing the accessing operations;setting a higher priority to the accessing operations having a shorter operation time, and rearranging sequence of the accessing operations;distributing the accessing operations to corresponding flash memories to process data according to the accessing operations; andtransmitting processed data to the host.2. The method of claim 1 , wherein temporarily storing the accessing operation is temporarily storing the accessing operation in a cache memory of the solid state disk.3. The method of claim 1 , wherein the sequence of the accessing operations from an accessing operation having shortest operation time to an accessing operation having longest operation time is respectively a read operation claim 1 , a modify operation claim 1 , a write operation claim 1 , and an erase operation.4. The method of claim 1 , wherein the accessing operations are distributed to corresponding flash memories using a plurality of first in first out pipelines.5. The method of claim 1 , wherein each of the flash memories concurrently perform similar accessing operations.6. The method of claim 1 , wherein the processed data are transmitted to the host using a plurality of first in first out pipelines. 1. Field of the InventionThe present invention presents a method for scheduling operations of a ...

Подробнее

Номер записи: 82

04-02-2016 дата публикации

Method and Apparatus for Ensuring Data Cache Coherency

Номер: US20160034395A1

Автор: Robert Graham Isherwood, Yin Nam Ko

Принадлежит: Imagination Technologies Ltd

A multithreaded processor can concurrently execute a plurality of threads in a processor core. The threads can access a shared main memory through a memory interface; the threads can generate read and write transactions that cause shared main memory access. An incoherency detection module prevents incoherency by maintaining a record of outstanding global writes, and detecting a conflicting global read. A barrier is sequenced with the conflicting global write. The conflicting global read is allowed to proceed after the sequence of the conflicting global write and the barrier are cleared. The sequence can be maintained by a separate queue for each thread of the plurality.

Подробнее

Номер записи: 83

04-02-2016 дата публикации

BUS-BASED CACHE ARCHITECTURE

Номер: US20160034399A1

Автор: Allen Michael S., GIRI Abhijit, Srivastava Saurbh

Принадлежит: ANALOG DEVICES TECHNOLOGY

Digital signal processors often operate on two operands per instruction, and it is desirable to retrieve both operands in one cycle. Some data caches connect to the processor over two busses and internally uses two or more memory banks to store cache lines. The allocation of cache lines to specific banks is based on the address that the cache line is associated. When two memory accesses map to the same memory bank, fetching the operands incurs extra latency because the accesses are serialized. An improved bank organization for providing conflict-free dual-data cache access—a bus-based data cache system having two data buses and two memory banks—is disclosed. Each memory bank works as a default memory bank for the corresponding data bus. As long as the two values of data being accessed belong to two separate data sets assigned to the two respective data buses, memory bank conflicts are avoided. 1. A bus-based cache memory system for dual-data cache access , the bus-based data cache system comprising:a first data bus for receiving data memory accesses from a processor;a first memory bank for caching data accessed by the processor;a second data bus for receiving data memory accesses from the processor; anda second memory bank for caching data accessed by the processor;wherein the first memory bank is a first default memory bank for data memory accesses received over the first data bus and the second memory bank is a second default memory bank for data memory accesses received over the second data bus.2. The bus-based data cache system of claim 1 , further comprising:an instruction bus for receiving instruction memory accesses from the processor; andan instruction memory bank for caching instructions.3. The bus-based data cache system of claim 1 , further comprising: receive a first data memory access over the first data bus;', 'check whether the first memory bank has data corresponding to the first data memory access; and', 'if checking whether the first memory bank ...

Подробнее

Номер записи: 84

04-02-2016 дата публикации

Storage medium and information processing device for exchanging data with secondary storage section through cache

Номер: US20160034402A1

Автор: Satoshi Goshima

Принадлежит: Kyocera Document Solutions Inc

An information processing device includes a main control circuit including a central arithmetic processor that executes first processing through a first program, a sub-control circuit that executes second processing independently of the first processing, a primary storage circuit, and a secondary storage circuit. The secondary storage circuit has a slower access speed than the primary storage circuit. The secondary storage circuit stores a second program used for third processing executed once the first processing and the second processing are both complete. The main control circuit further includes a cache memory having a faster access speed than the secondary storage circuit and a cache controller. In a situation in which the second processing is not yet complete at a completion time of the first processing, the cache controller executes pre-reading of the second program from the secondary storage circuit and stores the second program into the cache memory.

Подробнее

Номер записи: 85

01-02-2018 дата публикации

Processor and method for executing instructions on processor

Номер: US20180032336A1

Автор: Jian OUYANG, WEI Qi, Yong Wang

Принадлежит: Beijing Baidu Netcom Science And Technology Co Ltd

The present application discloses a processor and a method for executing an instruction on a processor. A specific implementation of the processor includes: a host interaction device, an instruction control device, an off-chip memory, an on-chip cache and an array processing device, wherein the host interaction device is configured to exchange data and instructions with a host connected with the processor, wherein the exchanged data has a granularity of a matrix; the off-chip memory is configured to store a matrix received from the host, on which a matrix operation is to be performed; and the instruction control device is configured to convert an external instruction received from the host to a series of memory access instructions and a series of computing instructions and execute the converted instructions. The implementation can improve the execution efficiency of a deep learning algorithm.

Подробнее

Номер записи: 86

17-02-2022 дата публикации

Pointer dereferencing within memory sub-system

Номер: US20220050637A1

Автор: Dhawal Bavishi

Принадлежит: Micron Technology Inc

Various embodiments described herein provide for a memory sub-system read operation or a memory sub-system write operation that can be requested by a host system and involves performing a multi-level (e.g., two-level) pointer dereferencing internally within the memory sub-system. Such embodiments can at least reduce the number of read operations that a host system sends to a memory sub-system to perform a multi-level dereferencing operation.

Подробнее

Номер записи: 87

31-01-2019 дата публикации

SEMICONDUCTOR MEMORY DEVICE INCLUDING A CONTROL CIRCUIT AND AT LEAST TWO MEMORY CELL ARRAYS

Номер: US20190034081A1

Автор: Hara Tokumasa, SHIRAKAWA Masanobu

Принадлежит:

A memory device includes memory cell array including a first and second plane and first and second caches. A controller is configured to output status information in response to a status read command. The status information indicating the states of the caches. The controller begins a first process in response to a command addressed to the first plane if the status information indicates the first and second caches are in the ready state, and begins a second process on the second plane according to a second command to the second plane if the status information indicates at least the second cache is in the ready state. 1. A semiconductor memory device , comprising:a memory cell array;a cache that holds data transferred from the memory cell array; anda control circuit configured to output first information indicating whether or not access to the cache from outside the device is available and second information indicating whether or not a reservation for access to the cache from outside the device is available.2. The device according to claim 1 , wherein the control circuit includes a first pad through which the first information is output and a second pad through which the second information is output.3. The device according to claim 1 , wherein a first command received from outside the device has the access to the cache and a second command received from outside the device subsequently to the first command has the reservation for the access.4. The device according to claim 3 , wherein the control circuit includes a first register for storing an address specified in the first command and a second register for storing an address specified in the second command.5. The device according to claim 4 , wherein the control circuit outputs information that the reservation for access to the cache from outside the device is available through the second information upon completion of the first command.6. The device according to claim 5 , wherein an address specified in a third ...

Подробнее

Номер записи: 88

31-01-2019 дата публикации

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

Номер: US20190034200A1

Автор: Shimizu Takashi, Watanabe Yasuhiro

Принадлежит: FUJITSU LIMITED

An information processing device includes a first package including a first arithmetic circuit, first cache memory and a transmitting circuit, as well as a second package including a second arithmetic circuit, second cache memory and a receiving circuit. The first arithmetic circuit is configured to provide transfer data to the first cache memory that is destined for the second cache memory. The transmitting circuit is configured to transmit to the receiving circuit an indication of a data transfer of the transfer data and to restrict use of the first cache memory for data other than the transfer data during the data transfer. The receiving circuit is configured to receive the indication of the data transfer, to acquire the transfer data stored in the first cache memory and to store the acquired transfer data in the second cache memory. 1. An information processing device comprising: a first arithmetic circuit, and', 'a second arithmetic circuit; and, 'a first arithmetic package including'} a third arithmetic circuit, and', 'a fourth arithmetic circuit,, 'a second arithmetic package coupled to the first arithmetic unit and including'}wherein the first arithmetic package further includesa first cache memory configured to hold data input to and output from the second arithmetic circuit in accordance with a procedure of maintaining consistency between the data input to and output from the second arithmetic circuit and data stored in a circuit other than the second arithmetic circuit,a transmitting circuit configured to transmit, to the second arithmetic package, information indicating start of transmission of transmission data from the second arithmetic circuit to the fourth arithmetic circuit,a cache managing circuit configured to write the transmission data to the first cache memory and to restrict use of the first cache memory by data other than the transmission data, andwherein the second arithmetic package further includesa second cache memory configured to hold ...

Подробнее

Номер записи: 89

31-01-2019 дата публикации

Precise invalidation of virtually tagged caches

Номер: US20190034349A1

Автор: Brian Stempel, Michael Scott Mcilvaine, Robert Douglas Clancy, Spencer Williams, Thomas Philip Speier, William MCAVOY

Принадлежит: Qualcomm Inc

A translation lookaside buffer (TLB) index valid bit is set in a first line of a virtually indexed, virtually tagged (VIVT) cache. The first line of the VIVT cache is associated with a first TLB entry which stores a virtual address to physical address translation for the first cache line. The TLB index valid bit of the first line is cleared upon determining that the translation is no longer stored in the first TLB entry. An indication of a received invalidation instruction is stored. When a context synchronization instruction is received, the first line of the VIVT cache is cleared based on the TLB index valid bit being cleared and the stored indication of the invalidate instruction.

Подробнее

Номер записи: 90

30-01-2020 дата публикации

Preventing Information Leakage In Out-Of-Order Machines Due To Misspeculation

Номер: US20200034152A1

Автор: Carlson David A., Mukherjee Shubhendu S.

Принадлежит:

Typical out-of-order machines can be exploited by security vulnerabilities, such as Meltdown and Spectre, that enable data leakage during misspeculation events. A method of preventing such information leakage includes storing information regarding multiple states of an out-of-order machine to a reorder buffer. This information includes the state of instructions, as well as an indication of data moved to a cache in the transition between states. In response to detecting a misspeculation event at a later state, access to at least a portion of the cache storing the data can be prevented. 1. A method , comprising:storing information regarding a first state of an out-of-order machine to a reorder buffer, the information indicating a state of at least one register;storing information regarding a second state of the out-of-order machine to the reorder buffer, the information indicating whether data is moved to a cache during transition from the first state to the second state; andin response to detecting a misspeculation event of the second state, preventing access to at least a portion of the cache storing the data.2. The method of claim 1 , wherein the preventing access includes invalidating the at least a portion of the cache storing the data.3. The method of claim 1 , wherein the preventing access includes invalidating an entirety of the cache.4. The method of claim 1 , wherein the cache includes a d-cache.5. The method of claim 1 , wherein the cache includes at least one of a branch target cache claim 1 , a branch target buffer claim 1 , a store-load dependence predictor claim 1 , an instruction cache claim 1 , a translation buffer claim 1 , a second level cache claim 1 , a last level cache claim 1 , and a DRAM cache.6. The method of claim 1 , further comprising claim 1 , in response to detecting the misspeculation event claim 1 , invalidating at least one of a branch predictor claim 1 , a branch target cache claim 1 , a branch target buffer claim 1 , a store-load ...

Подробнее

Номер записи: 91

05-02-2015 дата публикации

Method of Adapting a Uniform Access Indexing Process to a Non-Uniform Access Memory, and Computer System

Номер: US20150039907A1

Автор: Beaverson Arthur J., Bowden Paul

Принадлежит:

Method and apparatus for constructing an index that scales to a large number of records and provides a high transaction rate. New data structures and methods are provided to ensure that an indexing algorithm performs in a way that is natural (efficient) to the algorithm, while a non-uniform access memory device sees IO (input/output) traffic that is efficient for the memory device. One data structure, a translation table, is created that maps logical buckets as viewed by the indexing algorithm to physical buckets on the memory device. This mapping is such that write performance to non-uniform access SSD and flash devices is enhanced. Another data structure, an associative cache is used to collect buckets and write them out sequentially to the memory device as large sequential writes. Methods are used to populate the cache with buckets (of records) that are required by the indexing algorithm. Additional buckets may be read from the memory device to cache during a demand read, or by a scavenging process, to facilitate the generation of free erase blocks. 136.-. (canceled)37. A method of adapting a uniform access indexing process with a non-uniform access memory , the method comprising:a) storing a dictionary of index records in the non-uniform access memory, each index record comprising fields for an index key, a reference count and a physical block address, the index keys being uniformly distributed and unique;b) maintaining a bucket translation table for mapping logical bucket identifiers to physical bucket locations of the memory including generating a logical bucket identifier by displacement hashing an index key and the table comprising a mapping of the logical bucket identifier to a physical bucket location of the memory where the associated index record is stored;c) collecting in cache a plurality of bucket entries, wherein each bucket entry comprises a set of index records having the same logical bucket identifier prior to writing the collection of entries ...

Подробнее

Номер записи: 92

04-02-2021 дата публикации

COMMAND RESULT CACHING FOR BUILDING APPLICATION CONTAINER IMAGES

Номер: US20210034537A1

Автор: Shuster Boaz

Принадлежит:

Implementations of the disclosure provide systems and methods for receiving, by a processing device, a request for an application image. A sequence of commands associated with the application image and a value of a parameter associated with the sequence of commands is received. Responsive to determining that the sequence of commands has been previously executed with the value of the parameter, the processing device retrieves, from a cache, a result of executing the sequence with the value of the parameter. The application image is built using the first result of executing the sequence. 1. A method comprising:receiving, by a processing device, a request for an application image;receiving, by the processing device, a sequence of one or more commands associated with the application image and a value of a parameter associated with the sequence of commands;responsive to determining that the sequence has been previously executed with the value of the parameter, retrieving, from a cache, a result of executing the sequence with the value of the parameter; andbuilding the application image using the result of executing the sequence.2. The method of claim 1 , further comprising: determining that a result of executing the sequence is cacheable;', 'generating the result of executing the sequence by executing the sequence; and', 'storing the result of executing the sequence in the cache., 'responsive to determining that the sequence has not been previously executed3. The method of claim 1 , wherein determining that the sequence has been previously executed comprises:producing a first key value for the sequence by applying a hash function to at least one command of the sequence and the value of the parameter;comparing the first key value with each of a plurality of second key values, wherein each of the plurality of second key values corresponds to a previously executed sequence of one or more commands; andidentifying a second key value of the plurality of second key values that ...

Подробнее

Номер записи: 93

04-02-2021 дата публикации

HARDWARE FOR SPLIT DATA TRANSLATION LOOKASIDE BUFFERS

Номер: US20210034544A1

Автор: Rais Ron, Shwartsman Stanislav, Yanover Igor, Zaltsman Assaf

Принадлежит:

Systems, methods, and apparatuses relating to hardware for split data translation lookaside buffers. In one embodiment, a processor includes a decode circuit to decode instructions into decoded instructions, an execution circuit to execute the decoded instructions, and a memory circuit comprising a load data translation lookaside buffer circuit and a store data translation lookaside buffer circuit separate and distinct from the load data translation lookaside buffer circuit, wherein the memory circuit sends a memory access request of the instructions to the load data translation lookaside buffer circuit when the memory access request is a load data request and to the store data translation lookaside buffer circuit when the memory access request is a store data request to determine a physical address for a virtual address of the memory access request. 1. A processor comprising:a decode circuit to decode instructions into decoded instructions;an execution circuit to execute the decoded instructions; anda memory circuit comprising a load data translation lookaside buffer circuit and a store data translation lookaside buffer circuit separate and distinct from the load data translation lookaside buffer circuit, wherein the memory circuit sends a memory access request of the instructions to the load data translation lookaside buffer circuit when the memory access request is a load data request and to the store data translation lookaside buffer circuit when the memory access request is a store data request to determine a physical address for a virtual address of the memory access request, and the load data translation lookaside buffer circuit and the store data translation lookaside buffer circuit are different sizes and different associativity.2. The processor of claim 1 , wherein the load data translation lookaside buffer circuit comprises more entry storage locations than the store data translation lookaside buffer circuit.3. The processor of claim 1 , wherein the load ...

Подробнее

Номер записи: 94

11-02-2016 дата публикации

CLIENT-SIDE DEDUPLICATION WITH LOCAL CHUNK CACHING

Номер: US20160041777A1

Автор: Dinkar Abhijit S., Smith Brian R., Tripathy Tarun K.

Принадлежит: DELL PRODUCTS L.P.

Techniques and mechanisms described herein facilitate the transmission of a data stream from a client device to a networked storage system. According to various embodiments, a fingerprint for a data chunk may be identified by applying a hash function to the data chunk via a processor. The data chunk may be determined by parsing a data stream at the client device. A determination may be made as to whether the data chunk is stored in a chunk file repository at the client device. A block map update request message including information for updating a block map may be transmitted to a networked storage system via a network. The block map may identify a designated memory location at which the chunk is stored at the networked storage system. 1. A method comprising:at a client device comprising a processor and memory, identifying a fingerprint for a data chunk by applying a hash function to the data chunk via a processor, the data chunk determined by parsing a data stream at the client device; anddetermining whether the data chunk is stored in a chunk file repository at the client device; andtransmitting a block map update request message to a networked storage system via a network, the block map update request message including information for updating a block map at the networked storage system, the block map identifying a designated memory location at which the chunk is stored at the networked storage system.2. The method recited in claim 1 , the method further comprising:when it is determined that the data chunk is not stored in the local chunk cache, determining whether the data chunk is stored at the networked storage system by transmitting the fingerprint to the networked storage system via the network.3. The method recited in claim 2 , the method further comprising:when it is determined that the data chunk is not stored at the networked storage system, transmitting the data chunk to the networked storage system for storage.4. The method recited in claim 2 , wherein ...

Подробнее

Номер записи: 95

11-02-2016 дата публикации

DATA PROCESSING DEVICE

Номер: US20160041912A1

Автор: Nakajima Masami

Принадлежит:

The disclosed invention enables the operation of an MIMD type, an SIMD type, or coexistence thereof in a multiprocessor system including a plurality of CPUs and reduces power consumption for instruction fetch by CPUs operating in the SIMD type. A plurality of CPUs and a plurality of memories corresponding thereto are provided. When the CPUs fetch instruction codes of different addresses from the corresponding memories, the CPUs operate independently (operation of the MIMD type). On the other hand, when the CPUs issue requests for fetching an instruction code of a same address from the corresponding memories, that is, operate in the SIMD type, the instruction code read from one of the memories by one access is parallelly supplied to the CPUs. 1. A data processing device comprising:a plurality of CPUs; anda plurality of memories corresponding to the CPUs,wherein when the CPUs issue requests for fetching instruction codes of different addresses from the corresponding memories, the instruction codes are supplied from the corresponding memories to the corresponding CPUs, andwherein when the CPUs issue requests for fetching an instruction code of a same address from the corresponding memories, the instruction code read from one of the memories by one access to the same address is parallelly supplied to the CPUs.2. The data processing device according to claim 1 ,wherein the memories are instruction cache memories, an instruction cache common bus is further provided, and the instruction cache common bus is coupled to the CPUs and the instruction cache memories,wherein when the CPUs issue requests for fetching instruction codes of different addresses, the instruction codes are supplied from the corresponding instruction cache memories to the corresponding CPUs, andwherein when the CPUs issue requests for fetching an instruction code of a same address, the instruction code read from one of the instruction cache memories by one access to the address is parallelly supplied to ...

Подробнее

Номер записи: 96

11-02-2016 дата публикации

SYSTEM AND METHOD FOR MIRRORING A VOLATILE MEMORY OF A COMPUTER SYSTEM

Номер: US20160041917A1

Автор: Amer Maher, BADALONE Riccardo, Takefman Michael L., Trojanowski Bart

Принадлежит:

A system and method for mirroring a volatile memory to a CPIO device of a computer system is disclosed. According to one embodiment, a command buffer and a data buffer are provided to store data and a command for mirroring the data. The command specifies metadata associated with the data. The data is mirrored a non-volatile memory of the CPIO device based on the command. 1. A method for mirroring a volatile memory of a computer system to a co-processor input/output (CPIO) device , the method comprising:providing a command buffer and a data buffer;receiving a command and data, the command specifying metadata associated with the data;storing the command in the command buffer and storing the data in the data buffer; andmirroring the data and the metadata to a non-volatile memory of the CPIO device based on the command.2. The method of claim 1 , further comprising storing the metadata in the command buffer.3. The method of claim 1 , further comprising storing the metadata in the data buffer.4. The method of claim 1 , wherein the metadata comprises one or more of a size of the data claim 1 , context claim 1 , a key claim 1 , and a timestamp associated with the data.5. The method of claim 1 , wherein the CPIO device comprises the command buffer and the data buffer.6. The method of claim 1 , wherein the size of the data is a variable size.7. The method of claim 6 , wherein the variable size is a cache line size.8. The method of claim 1 , wherein the command buffer and the data buffer are mixed in a First In First Out (FIFO) buffer.9. The method of claim 1 , further comprising providing a plurality of functions to:create a persistence context for an application running on the computer system;manage the persistence context including recreating a memory image;reset the persistence context;delete the persistence context; andupdate the persistence context with data stored in the volatile memory.10. The method of claim 9 , further comprising: providing a run-time environment in ...

Подробнее

Номер записи: 97

09-02-2017 дата публикации

DETERMINING LENGTHS OF ACKNOWLEDGMENT DELAYS FOR I/O COMMANDS

Номер: US20170038972A1

Автор: MA MING, Murthy Srinivasa D, Nazari Siamak, Wang Jin

Принадлежит:

Example implementations relate to determining lengths of acknowledgment delays for input/output (I/O) commands. In example implementations, a length of an acknowledgment delay for a respective I/O command may be based on cache availability, and activity level of a drive at which the respective I/O command is directed, after the respective I/O command has been executed. Acknowledgments for respective I/O commands may be transmitted after respective periods of time equal to respective lengths of acknowledgment delays have elapsed. 1. A system for determining lengths of acknowledgment delays for input/output (I/O) commands , the system comprising:a plurality of drives;a cache to store I/O commands directed at the plurality of drives; and the processor is to determine lengths of acknowledgment delays for I/O commands; and', 'a length of an acknowledgment delay for a respective I/O command is based on cache availability, and activity level of a drive at which the respective I/O command is directed, after the respective I/O command has been executed., 'a processor communicatively coupled to the plurality of drives and to the cache, wherein2. The system of claim 1 , wherein the processor is further to:set a length of acknowledgment delays, for I/O commands that are directed at a first drive of the plurality of drives, at a first non-zero value if cache availability is below a first cache availability threshold, and idle percentage of the first drive is below a first idle percentage threshold, after a first I/O command directed at the first drive has been executed; andset a length of acknowledgment delays, for I/O commands that are directed at a second drive of the plurality of drives, at zero if cache availability is below the first cache availability threshold, and idle percentage of the second drive is above the first idle percentage threshold, after a second I/O command directed at the second drive has been executed.3. The system of claim 2 , wherein the processor is ...

Подробнее

Номер записи: 98

09-02-2017 дата публикации

Adaptive core grouping

Номер: US20170039069A1

Автор: John Gmuender, Miao Mao, ZHONG Chen

Принадлежит: SonicWall US Holdings Inc

The present invention relates to a system, method, and non-transitory storage medium executable by one or more processors at a multi-processor system that improves load monitoring and processor-core assignments as compared to conventional approaches. A method consistent with the present invention includes a first data packet being received at a multi-processor system. After the first packet is received it may be sent to a first processor where the first processor identifies a first processing task associated with the first data packet. The first data packet may then be forwarded to a second processor that is optimized for processing the first processing task of the first data packet. The second processor may then process the first processing task of the first data packet. Program code associated with the first processing task may be stored in a level one (L1) cache at the first processor.

Подробнее

Номер записи: 99

09-02-2017 дата публикации

INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND RECORDING MEDIUM STORING AN INFORMATION PROCESSING PROGRAM

Номер: US20170039138A1

Автор: TODA Hideki

Принадлежит: RICOH COMPANY, LTD.

An information processing system includes circuitry that executes multiple threads that, in converting data, concurrently performs calculation required for a data conversion on an attribute including a type of the data and a memory that stores the attribute associated with a converted value obtained by the calculation. The memory includes a storage area for the attribute, allocated to each of the multiple threads and to store the attribute associated with a value indicating an address that stores the converted value corresponding to the attribute and a storage area for the converted value, to be shared by all of the multiple threads and store the attribute associated with the converted value obtained by the calculation. 1. An information processing system comprising:circuitry to execute multiple threads that, in converting data, concurrently performs calculation required for a data conversion on an attribute including a type of the data; anda memory to store the attribute associated with a converted value obtained by the calculation,wherein the memory includes:a storage area for the attribute, allocated to each of the multiple threads and to store the attribute associated with a value indicating an address that stores the converted value corresponding to the attribute; anda storage area for the converted value, to be shared by all of the multiple threads and store the attribute associated with the converted value obtained by the calculation.2. The information processing system according to claim 1 , wherein the data is image data claim 1 , and the data conversion is a bitmap conversion that converts the image data into a bitmap image.3. The information processing system according to claim 1 , wherein the circuitry determines whether or not a new attribute exists in the storage area for the attribute claim 1 , and claim 1 , if it is determined by the circuitry that the new attribute exists in the storage area for the attribute claim 1 , instead of performing the ...

Подробнее

Номер записи: 100