Method for controling cache system comprising direct-mapped cache and fully-associative buffer
[0001] The present invention relates to a cache system and a control method therefor used for reducing the access time of a central processing unit (CPU), and more particularly, to a cache system constructed with a direct-mapped cache and a fully associative buffer and to a control method therefor, to exploit both temporal and spatial locality adaptively. [0002] Since access time to a memory upon the request by a CPU for instructions or data can engender considerable delays, as shown in [0003] Among the above memories, the cache system 11 is accessed before accessing to the main memory 12, and therefore its structure and controlling method may significantly impact the execution speed and power consumption of a CPU. The cache system 11 is designed to exploit the principle of locality. [0004] The locality is divided into spatial locality and temporal locality. The spatial locality refers to the tendency for adjacent or nearby memory locations to be referenced close together in time. The temporal locality is the likelihood that data retrieved once will be retrieved again soon. [0005] Cache systems exploit temporal locality by retaining recently referenced data, and spatial locality by fetching multiple words as a cache block whenever a miss occurs. These two approaches for optimizing each type of locality contradict each other when cache capacity is fixed. This is because the increment in the block size is inversely proportional to the number of cache blocks. Therefore, as the size of a block increases, more data adjacent to the accessed memory address are copied in the cache system. In this case, data blocks reside shorter in the cache system because of the reduced number of cache blocks. Thus, if the storage capacity of the cache system is fixed at a predetermined level, as the size of a block increases, the cache system has a higher spatial locality but a lower temporal locality. Conversely, as the size of a block decreases, the cache system has a lower spatial locality but a higher temporal locality. [0006] To reduce the above conflicts as much as possible, a cache system including two cache memories, which are separately controlled, has been suggested. According to a conventional cache control method, complex mechanisms were used to exploit two localities, e.g., methods of using locality prediction table, compiler, locality detection unit, prefetching, and so forth. These conventional cache control methods have problems in that they offer high design complexity and high hardware cost. [0007] To solve the above problems, it is an object of the present invention to provide a cache system that consists of two caches with different configurations and its control method to exploit each type of locality, resulting in reduced miss ratio and power consumption. [0008] Accordingly, to achieve the above object, the present invention provides a cache system constructed as a direct-mapped cache configured with a small block size and a fully associative spatial buffer configured with a large block size, and a method for controlling the cache system. Here, if accesses to the direct-mapped cache and the fully associative spatial buffer are a miss, data of the accessed address and data of the adjacent addresses are copied to the large block in the fully associative buffer according to a first-in-first-out (FIFO) algorithm. Also, one or more small blocks accessed before among the large block of data to be expelled from the fully associative spatial buffer are moved into the direct-mapped cache. [0009] Accoding to the cache system and control method of the present invention, one or more small blocks accessed before among the large block of data to be expelled from the fully associative spatial buffer are moved into the direct-mapped cache. Thus, the accessed data reside in the direct-mapped cache for relative long period, and therefor the temporal locality is raised without reducing spatial locality thereby reducing access miss rates and power consumption. [0010] [0011] [0012] [0013] [0014] [0015] [0016] [0017] Referring to [0018] The direct-mapped cache 21 includes a data storage unit 211 and a control bit storage unit 212. The data storage unit 211 is configured such that data accessed by a central processing unit (CPU, not shown) are stored in an 8-byte small block SUB. The control bit storage unit 212 stores a group of bits, i.e., a 1-bit valid bit V, a 1-bit dirty bit D, and an n-bit tag T, for each small block SUB in response to an index signal DMI input through an address bus 201. A comparator 203 checks to see if the value of a tag signal DMT input from the CPU through the address bus 201 exists in the tags T of the control bit storage unit 212, and generates an access result signal DMH indicative of whether access to the direct-mapped cache 21 is a hit or miss. The access result signal DMH is input to the CPU through a control bus (not shown). A multiplexer 204 selectively inputs to the data storage unit 211 of the direct-mapped cache 21 one data word DW among 8 bytes of data words DWs from a data bus 202 and 8 bytes of data words DWs from small blocks 220, 221, 222, and 223 of the fully associative spatial buffer 22. [0019] The fully associative spatial buffer 22 is configured such that data of an address accessed by the CPU and data of adjacent addresses are stored in a 32-byte large block consisting of the four small blocks 220, 221, 222, and 223. A content addressable memory CAM 227 in the fully associative spatial buffer 22 generates a valid bit V 226 for the large block 220, 221, 222, and 223 and a dirty bit D 225 and a hit bit H 224 for each of the small blocks 220, 221, 222, and 223 in response to a tag signal SBT input from the CPU through the address bus 201. An AND gate 205 checks whether an address generated in the content addressable memory 227 is valid and generates an access result signal SBH indicative of whether access to the fully associative spatial buffer 22 is a hit or a miss. The access result signal SBH is input to the CPU through the control bus. [0020] Two-bit offset control signal SBO from the address bus 201 selectively enables inputs and outputs of the small blocks 220, 221, 222, and 223 by issuing each bank enable signal BE through a multiplexer 207. Furthermore, 8 bytes of data words DWs from the data bus 202 or from a subordinate memory such as a main memory are input to the selectively enabled small blocks 220, 221, 222, and 223 through a multiplexer 206. [0021] A method for controlling the cache system according to the present invention will now be described with reference to [0022] First, the direct-mapped cache 21 and the fully associative spatial buffer 22 are accessed in parallel at the same level by read or write operation upon a write or read request from the CPU (step S301). Then, a check is made as to whether access to the direct-mapped cache 21 is a hit (step S302). If read access to the direct-mapped cache 21 is a hit (step S303), the read data are transmitted to the CPU (step S315 [0023] On the other hand, while accessing to the direct-mapped cache 21, a check is made as to whether access to the fully associative spatial buffer 22 is a hit (step S305) at the same time. If read access to the fully associative spatial buffer 22 is a hit (step S306), the read data are transmitted to the CPU (step S315 [0024] If accesses to both direct-mapped cache 21 and fully associative spatial buffer 22 are a miss, a check is made as to whether an empty large block 220, 221, 222, and 223 exists in the fully associative spatial buffer 22 (step S308). Here, if the states of all valid bits V 226 in the fully associative spatial buffer 22 are set, no empty large block 220, 221, 222, and 223 exists therein. If the state of one of the valid bits 226 is not set, an empty large block 220, 221, 222, and 223 exists in the fully associative spatial buffer 22. [0025] If an empty large block 220, 221, 222, and 223 exists in the fully associtive spatial buffer 22, a large data block is copied to the empty large block 220, 221, 222, and 223 in the fully associative spatial buffer 22 according to a first-in-first-out (FIFO) algorithm (step S313). Then, a hit bit H3, H2, H1 or H0 for each small block 220, 221, 222 or 223 accessed is set (step S314). [0026] On the other hand, if no empty large block 220, 221, 222, and 223 exists in the fully associative spatial buffer 22, at least one of small data blocks 220, 221, 222, or 223 accessed is detected within the large block 220, 221, 222, and 223 of data to be expelled from the fully associative spatial buffer 22 according to a FIFO algorithm (step S309). In other words, in the step S309, at least one of small data blocks 220, 221, 222, or 223 which has set hit bit is detected. [0027] Next, a check is made as to whether the small data block SUB to be expelled from the direct-mapped cache(21) has been written according to the step S306 [0028] Furthermore, at least one of the small data blocks 220, 221, 222, or 223 that has been confirmed as an accessed block in the step S309 is moved to the direct-mapped cache 21 (step S312). Then, the data of an address accessed in the step S301 and data of adjacent addresses are copied into the corresponding large block 220, 221, 222 and 223 in the fully associative buffer according to the FIFO algorithm (S313). Then, a hit bit H3, H2, H1, or H0 of the small block of data 220, 221, 222, or 223 accessed is set (step S314). [0029] [0030] In a victim cache system, a direct-mapped cache is configured the same as a victim buffer in terms of size of blocks. Reference numerals 411, 421, 431, 441, 451, 461, 471, 481, 491, 511, 521, 531, 541, 551, 561, 571, 581, and 591 denote graphs of a victim cache system including a direct-mapped cache and a victim buffer, each having 8-byte blocks. Reference numberals 412, 422, 432, 442, 452, 462, 472, 482, 492, 512, 522, 532, 542, 552, 562, 572, 582, and 592 denote graphs of a victim cache system including a direct-mapped cache and a victim buffer each having 16-byte blocks. The reference numberals 413, 423, 433, 443, 453, 463, 473, 483, 493, 513, 523, 533, 543, 553, 563, 573, 583, 593, 611, 621, 631, 641, 651, 661, 671, 681, and 691 denote graphs of a victim cache system including a direct-mapped cache and a victim buffer having 32 byte blocks. The reference numberals 414, 424, 434, 444, 454, 464, 474, 484, 494, 514, 524, 534, 544, 554, 564, 574, 584, 594, 612, 622, 632, 642, 652, 662, 672, 682, and 692 denote graphs of the cache system according to the present invention including the direct-mapped cache 21 of [0031] [0032] In the prefetching mode, a prefetching is performed when at least two hit bits are set in hit bits H3, H2, H1, H0 of a large data block SUB3, SUB2, SUB1, and SUB0. Hereunder, a prefetch bit P is set and a prefetching is performed when a hit bit is finally set in a state that the other 3 hit bits of 4 hit bits H3, H2, H1, H0 of a large data block SUB3, SUB2, SUB1, and SUB0 has been set. Here, when a small data block SUB3, SUB2, SUB1, or SUB0 is accessed again within a large block SUB3, SUB2, SUB1, and SUB0 whose prefetch bit P is set, a prefetching is not performed because the corresponding prefetching has been already performed. [0033] When an accessing is tried to the fully associative buffer 22, a current read address for searching in the content addressable memory 227 is instantly inputted to an address generator AG. If a hit bit is finally set in a state that the other 3 hit bits of 4 hit bits H3, H2, H1, H0 of a large data block SUB3, SUB2, SUB1, and SUB0 has been set, all logic states of the 4 hit bits H3, H2, H1, H0 are “1”. In this case, the prefetch controller 208 inverts a state of a prefetch signal PFS which is inputted to an AND gate 232, so that the state of the prefetch signal PFS is “1”. Also, an address generator AG inputs a next address of the input address from the content addressable memory 227, to a multiplexer 228. In other words, the address generator AG inputs a next address of the input address whose all hit bits are set as “1” to the multiplexer 228. Then, the content addressable memory 227 internally searches an address which is same as the input address from the multiplexer 228. Here, any address of a large data block whose prefetch bit P is set as “1” is not searched, so that the search time is reduced. [0034] According to the search operation, a one cycle delay is present, but this overhead is negligible because prefetching initiates only about 1.5%˜2.5% of the total number of addresses generated by the CPU. In other words, the average MCPI(Memory Cycles Per Instruction) is increased by about 0.06%. [0035] After the search operation, if the next address is present in the content addressable memory 227, the content addressable memory 227 inputs a signal of “1” state to the AND gate 205, so that an output signal from an inverter 209 and the AND gate 232 go to be “0” state. In other words, a fetch enable signal PFSE of “0” state is inputted to the address generator AG. Thereby, the address generator AG does not transmit the next address to the address bus 201, so that the prefetching is not performed. [0036] Meanwhile, after the search operation, if the next address is not present in the content addressable memory 227, the content addressable memory 227 inputs a signal of “0” state to the AND gate 205, so that an output signal from an inverter 209 and the AND gate 232 go to be “1” state. In other words, a fetch enable signal PFSE of “1” state is inputted to the address generator AG. Thereby, the address generator AG transmits the next address to the address bus 201, so that the prefetching is performed. This prefetching is performed within a step for updating the fully associative buffer 22 (the step S313 in [0037] According to the prefetching mode, the prefetching is performed to data of next address of a large data block which has high access ratio. Thereby, the large block of the prefetched data also has high access ratio, so that overall access time is reduced. [0038] As described above, Accoding to the cache system and control method of the present invention, one or more small blocks accessed before among the large block of data to be expelled from the fully associative spatial buffer are moved into the direct-mapped cache. Thus, the accessed data reside in the direct-mapped cache for relative long period, and therefor the temporal locality is raised without reducing spatial locality thereby reducing access miss rates and power consumption. [0039] While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. A cache system and a control method therefor are provided. The cache system is organized as a direct-mapped cache configured with a small block size and a fully associative spatial buffer configured with a large block size, consisting of a plurality of small blocks. Here, if accesses to the direct-mapped cache and the fully associative buffer are a miss, data of the accessed address and data of the adjacent addresses are copied to the large block in the fully associative spatial buffer according to a first-in-first-out (FIFO) algorithm. Furthermore, if one or more small data blocks accessed before exist among its corresponding large block of data to be expelled from the fully associative buffer, the small block(s) accessed is copied to the direct-mapped cache. 1. A cache system comprising a direct-mapped cache and a fully associative spatial buffer wherein:
the direct-mapped cache is configured with a small block size; the fully associative spatial buffer is configured with a large block size consisting of a plurality of small blocks of the small block size; if accesses to the direct-mapped cache and the fully associative spatial buffer are a miss, data of the accessed address and data of the adjacent addresses are copied to the large block in the fully associative buffer according to a first-in-first-out (FIFO) algorithm; and one or more small blocks accessed before among the large block of data to be expelled from the fully associative spatial buffer are moved into the direct-mapped cache. 2. A method for controlling a cache system constructed with a direct-mapped cache configured with a small block size and a fully associative spatial buffer configured with a large block size, consisting of a plurality of small blocks, the method comprising the steps of:
(a) copying data of the accessed address and data of the adjacent addresses to the large block in the fully associative buffer according to a first-in-first-out (FIFO) algorithm if accesses to the direct-mapped cache and the fully associative spatial buffer are a miss; and (b) moving one or more small blocks accessed before among the large block of data to be expelled from the fully associative spatial buffer in the step (a), into the direct-mapped cache. 3. The method of if at least two small data blocks of a large data block in the fully associative buffer are accessed, prefetching data of next address of the large data block, and copying the prefetched data to the large block in the fully associative buffer within the step (a), according to the first-in-first-out (FIFO) algorithm. 4. A method for controlling a cache system constructed with a direct-mapped cache configured with a small block size and a fully associative spatial buffer configured with a large block size, consisting of a plurality of small blocks, the method comprising the steps of:
(a) checking whether an empty large block exists in the fully associative buffer if accesses to the direct-mapped cache and the fully associative buffer are a miss; (b) if the empty large block exists in the step (a), performing step (c5) (c) if no empty block exists in the step (a), performing steps (c1)-(c5); (c1) checking a small block of data that has been accessed among the large block of data to be expelled from the fully associative buffer according to a first-in-first-out (FIFO) algorithm; (c2) checking whether the small block of data to be expelled from the direct-mapped cache has been in a written state; (c3) if the small block of data to be expelled has been in a written state, moving the small block of data to be expelled into a main memory; (c4) copying the small block of data confirmed to have been accessed in the step (c1) to the direct-mapped cache; and (c5) copying data of an address accessed in the step (a) and data of adjacent addresses in the large block to the fully associative buffer according to a FIFO algorithm.TECHNICAL FIELD
BACKGROUND ART
DISCLOSURE OF THE INVENTION
BRIEF DESCRIPTION OF THE DRAWINGS
BEST MODE FOR CARRYING OUT THE INVENTION
INDUSTRIAL APPLICABILITY