Difference between revisions of "Load/Store unit"
(→Stage 2) |
(→Stage 3) |
||
Line 81: | Line 81: | ||
== Stage 3 == | == Stage 3 == | ||
+ | This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection | ||
− | + | for ( dcache_way = 0; dcache_way < `DCACHE_WAY; dcache_way++ ) | |
− | + | assign way_matched_oh[dcache_way] = ( ( ldst2_tag_read[dcache_way] == ldst2_address.tag && ( ldst2_privileges_read[dcache_way].can_write || ldst2_privileges_read[dcache_way].can_read ) ) && ldst2_valid ); | |
− | + | assign is_hit = |way_matched_oh; | |
− | + | ||
− | + | In case of update (just write operation) or replacement (both read and write operation) the read and write ports of the data cache are enabled. | |
− | + | ||
− | + | assign is_replacement = ldst2_update_valid && ldst2_evict_valid; | |
+ | assign is_update = ldst2_update_valid && !ldst2_evict_valid; | ||
+ | always_comb begin | ||
+ | ..... | ||
+ | .... | ||
+ | end else if ( is_update ) begin | ||
+ | data_sram_read_enable = 1'b0; | ||
+ | data_sram_write_enable = {( `DCACHE_WIDTH/8 ){1'b1}}; | ||
+ | .... | ||
+ | end else if ( is_replacement ) begin | ||
+ | data_sram_read_enable = 1'b1; | ||
+ | data_sram_write_enable = {( `DCACHE_WIDTH/8 ){1'b1}}; | ||
+ | .... | ||
+ | end | ||
+ | |||
+ | There is a second read port in order to execute the data snoop requests. This port is WRITE FIRST so the Cache Controller receives the last version of data also when a store instruction is about to be performed on the same cache line. | ||
+ | |||
+ | The principal output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing, plus another important signal about the thread sleeping if a miss occurs. | ||
+ | |||
+ | ldst3_thread_sleep[thread_idx] = ( is_load_miss || is_store_miss ) && ldst2_instruction.thread_id == thread_id_t'( thread_idx ); | ||
+ | ldst3_miss = is_load_miss || is_store_miss; | ||
+ | ldst3_evict = is_replacement; | ||
+ | ldst3_flush = ldst2_is_flush; |
Revision as of 12:50, 25 September 2017
This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself in order to reduce the memory access latency. It is divided in three stages (more details will be furnished further). It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit a signal in order to stop a thread when a miss raises. Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well.
The Load Store Unit does not store specific coherence protocol information (as stable states) but it stores privileges for all cached addresses. Each cache line has two privileges: can read and can write. Those privileges determine cache misses/hits and are updated only by the Cache Controller.
Contents
Stage 1
This is the first stage of the Load/Store Pipeline Unit. This stage has one queue per thread in which store the threads-relative instructions coming from the Operand Fetch, then provides in parallel one instruction per thread to the second stage .
If the second stage is able to execute the instruction provided by the i-th thread, it asserts combinatorially the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy.
Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary. The alignment is done for byte, half-word and word operation. The vectorial alignment implies the compression of the data in order to write it consecutively. For example, a vec16i8 - that has got 1 significative byte each 4 bytes - is compressed to have 16 consecutive bytes.
PER MIRKO: SECONDO ME NON FUNZIONA STA COSA SE CI OPERI DI NUOVO
The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx
PER MIRKO: SECONDO MEINVECE CI VUOLE INSTRUCTION VALID
The stage contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer. The output of this this buffer competes with the normal issued load/store instruction to be re-executed. The recycled instructions have an higher priority respect to the other operations.
Note that this stage consumes much memory space because the queues store the entire instructions and the relative fetched operands.
Stage 2
This stage has the main scope to choose a request to serve and to fetch tag&privileges from the tag cache.
It receives from the previous stage the load/store requests and the recycled request for each thread ( ldst1_valid and ldst1_recycle_valid) while it receives the update signal from the cache controller(cc_update_ldst_xxx).
About the signals from the first stage of the load/store unit, each thread can issue a request (normal or recycled) and choosed in a round-robin manner (ldst1_fifo_winner). After choosed a thread, if both normal and recycled request are active for that winner thread, the recycled request ha ever the maximum priority. This choose is mandatory if we want respect the scheduling order.
always_comb begin if ( ldst1_recycle_valid[ldst1_fifo_winner_id] ) begin ... end else begin ... (normal request) end
Both kind of request are dequeued using two different signals (ldst2_dequeue_instruction and ldst2_recycled).
Cache controller input dependencies
Cache update signals
Over the stage 1 requests, the cache update signal from the cache controller has the highest priority over the other ones. The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache. So, the highest priority is dispatched throughout these signals:
ldst1_fifo_requestor = ( ldst1_valid | ldst1_recycle_valid ) & {`THREAD_NUMB{~cc_update_ldst_valid}} & ~sleeping_thread_mask_next; ldst1_request_valid = |ldst1_fifo_requestor; tag_sram_read1_address = ( cc_update_ldst_valid ) ? cc_update_ldst_address.index : ldst1_fifo_request.address.index; next_request = ( cc_update_ldst_valid ) ? cc_update_request : ldst1_fifo_request; ldst2_valid = ldst1_request_valid;
The cc_update_ldst_valid determines if a request from the previous stage is valid (ldst1_fifo_requestor) and what is the request that can access to the tag read port (tag_sram_read1_address). Consequently, it determines what is the correct output request to the third stage (next_request). The output request contains the tags and other values (e.g. ldst2_hw_lane_mask) needed to the 3rd stage to execute its tasks.
The cache controller update signal is the unique signal in charge to write the tag cache (troughout the write tag port) and the cache privileges.
Snoop signals
In parallel, another reading can happen because of a snooping request by the cache controller. This reading is done througout a second tag read port. So, in the end, there are two read ports and one write port. The two read ports have ever the priority over the write port because the cache controller sometimes could read a chache line and simoultaneously write over it. In order to avoid that it reads the same data it is writing, the read port has the highest priority over the write operation.
Command signals
The Cache Controller can send four type of command.
- If a INSTRUCTION command is send by Cache Controller, a pending instruction has to complete.
- If a UPDATE_INFO occurs, this stage updates infos and nothing is propagated to the next one.
- If a UPDATE_INFO_DATA command is send by Cache Controller, the current stage has to update infos, using index, tag and privileges send by Cache Controller. Furthermore, This stage must forward those information and the store_value to the data cache stage.
- If an EVICT occurs, the next stage requires the evicted tag and the new index to construct the evicting address.
assign cc_command_is_update_data = cc_update_ldst_valid & ( cc_update_ldst_command == CC_UPDATE_INFO_DATA | cc_update_ldst_command == CC_REPLACEMENT ); assign cc_command_is_evict = cc_update_ldst_valid & ( cc_update_ldst_command == CC_REPLACEMENT );
The first signal is asserted if the cache controller wants to read and modify a cache line at the same time, e.g. for a replacement event. This is the the reason about the highest priority to the read ports.
The second signal is asserted only if a read has to be executed (e.g. a cache line eviction).
Thread wakeup signals
The last resposability is about the thead wake-up.
assign sleeping_thread_mask_next = ( sleeping_thread_mask | ldst3_thread_sleep ) & ( ~thread_wakeup_mask ); assign thread_wakeup_mask = thread_wakeup_oh & {`THREAD_NUMB{cc_wakeup}};
If a cache miss stops a thread (ldst3_thread_sleep), when a cache transaction is completed, the cache controller asserts the cc_wakeup to restart again the thread.
Stage 3
This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection
for ( dcache_way = 0; dcache_way < `DCACHE_WAY; dcache_way++ ) assign way_matched_oh[dcache_way] = ( ( ldst2_tag_read[dcache_way] == ldst2_address.tag && ( ldst2_privileges_read[dcache_way].can_write || ldst2_privileges_read[dcache_way].can_read ) ) && ldst2_valid ); assign is_hit = |way_matched_oh;
In case of update (just write operation) or replacement (both read and write operation) the read and write ports of the data cache are enabled.
assign is_replacement = ldst2_update_valid && ldst2_evict_valid; assign is_update = ldst2_update_valid && !ldst2_evict_valid; always_comb begin ..... .... end else if ( is_update ) begin data_sram_read_enable = 1'b0; data_sram_write_enable = {( `DCACHE_WIDTH/8 ){1'b1}}; .... end else if ( is_replacement ) begin data_sram_read_enable = 1'b1; data_sram_write_enable = {( `DCACHE_WIDTH/8 ){1'b1}}; .... end
There is a second read port in order to execute the data snoop requests. This port is WRITE FIRST so the Cache Controller receives the last version of data also when a store instruction is about to be performed on the same cache line.
The principal output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing, plus another important signal about the thread sleeping if a miss occurs.
ldst3_thread_sleep[thread_idx] = ( is_load_miss || is_store_miss ) && ldst2_instruction.thread_id == thread_id_t'( thread_idx ); ldst3_miss = is_load_miss || is_store_miss; ldst3_evict = is_replacement; ldst3_flush = ldst2_is_flush;