Difference between revisions of "L2 and Directory cache controller"

From NaplesPU Documentation
Jump to: navigation, search
(Stage 2)
(Stage 3)
Line 128: Line 128:
  
 
== Stage 3 ==
 
== Stage 3 ==
Stage 3 is responsible for the actual execution of requests. Once a request is processed, this module issues signals to the units in the above stages in order to update control data properly. Every group of signals to a particular unit is managed by a subsystem, each one represented in the picture below. Each subsystem is simply a combinatorial logic that "converts" signals from protocol ROM in proper commands to the relative unit.
+
Stage 3 is responsible for the actual execution of requests based on the protocol ROM. Once a request is processed, this module issues signals to the units in the above stages in order to update information and data in caches properly. Every group of signals to a particular unit is managed by a subsystem, each one represented in the picture below. Each subsystem is simply a combinatorial logic that "converts" signals from protocol ROM in proper commands to the relative unit.
 
 
 
 
  
 
[[File:Stage3.png|800px|DC stage 3]]
 
[[File:Stage3.png|800px|DC stage 3]]
 
 
  
 
=== Current State Selector ===
 
=== Current State Selector ===
Line 143: Line 139:
 
* replacement queue;
 
* replacement queue;
  
If none of the conditions above are met then cache block must be in state N.
+
The following code shows how the control logic selects the information for the issued request:
 +
 
 +
always_comb begin
 +
if ( dc2_message_tshr_hit ) begin
 +
current_address      = dc2_message_address;
 +
current_state        = dc2_message_tshr_entry_info.state;
 +
current_sharers_list = dc2_message_tshr_entry_info.sharers_list;
 +
current_owner        = dc2_message_tshr_entry_info.owner;
 +
end else if ( dc2_message_cache_hit ) begin
 +
current_address      = dc2_message_address;
 +
current_state        = dc2_message_cache_state;
 +
current_sharers_list = dc2_message_cache_sharers_list;
 +
current_owner        = dc2_message_cache_owner;
 +
end else if (is_replacement) begin
 +
current_address      = dc2_message_address;
 +
current_state        = dc2_replacement_state;
 +
current_sharers_list = dc2_replacement_sharers_list;
 +
current_owner        = dc2_replacement_owner;
 +
end else begin
 +
current_address      = dc2_message_address;
 +
current_state        = {`DIRECTORY_STATE_WIDTH{1'b0}}; // State N
 +
current_sharers_list = {`TILE_COUNT{1'b0}};
 +
current_owner        = tile_address_t'(TILE_MEMORY_ID);
 +
end
 +
end
 +
 
 +
As shown in the above logic, if a TSHR hit occurs then the most updated information for that block are retrieved from the THSR. Otherwise, if a cache hit occurs the information required are fetched from the L2 cache. In case of replacement, those are retrieved from the replacement output signals from the previous stage. If none of the conditions above are met then cache block is considered in state N.
  
 
=== Protocol ROM ===
 
=== Protocol ROM ===
This module implements the coherence protocol as represented in figure below. It takes in input the current state and the request type and decodes the next actions.
+
This module implements the coherence protocol as represented in the figure below. It takes in input the current state and the request type and decodes the next actions.
  
 
[[File:MSI_DC.jpg|1000px|MSI_DC]]
 
[[File:MSI_DC.jpg|1000px|MSI_DC]]
  
The coherence protocol used is MSI plus some changes due to the directory's inclusivity. In particular, a new stable state has been added, ''N'', meaning the block ''is not'' cached in directory and has to be taken from off-chip memory. The adding of this state has been necessary because when a block reach the stable state ''I'' it is not updated automatically to off-chip memory until a replacement request for that block has been issued. So the stable state ''I'' means that the block ''is'' cached ''only'' by directory controller and that could have a more recent copy than memory. [TODO: MIRKO]
+
The coherence protocol used is MSI plus some changes due to the directory's inclusivity. In particular, a new stable state has been added, ''N'', meaning the block ''is not'' cached in the directory and has to be fetched from the main memory. The N state has been necessary since when a block reaches the stable state ''I'' states that the block ''is'' cached ''only'' by directory controller, and it is not present in any L1 cache, but the directory still has information on the block. While the directory has no information on blocks in state N.
  
Furthermore other two non-stable states have been added:
+
Furthermore, new non-stable states have been added:
  
* state '''MN_A''' in which the directory controller is waiting for data from block owner in order to send it to the off-chip memory after a replacement request issued for that block. Subsequent requests on the same block are ''stalled'' until data has been received from block owner and sent to off-chip memory. Note that the block is ''invalidated'' so a new access to the off-chip memory has to be done;
+
* state '''MN_A''' in which the directory controller is evicting the block which was in state M, and is waiting for an acknowledge message (MC_Ack) from the main memory. This might happens after a replacement request issued for that block. Further requests on the same block are ''stalled'' until data has been received from block owner and sent to the memory. Note that the block is ''invalidated'' so new access to the main memory is necessary;
* state '''NS_D''' in which the directory controller is waiting for data coming from off-chip memory in order to serve coherence request(s) for that block. Subsequent requests on the same block are stalled until data has been received from off-chip memory and sent to requestor(s).
+
* state '''SN_A''' in which the directory controller is evicting the block which was in state S, and is waiting for an acknowledge message (MC_Ack) from the main memory. Similar to the '''MN_A''' state;
 +
* state '''NS_D''' in which the directory controller is waiting for data coming from the memory. This might occurs after an Fwd-getS request on a block in state N. Further requests on the same block are stalled until data has been received from the main memory and sent to requestor(s).
  
 
=== TSHR Update Logic ===
 
=== TSHR Update Logic ===

Revision as of 11:47, 25 June 2019

The Directory controller manages the L2 cache and the ownership of memory lines, it is organized in a distributed directory structure.

Introduction

This component is composed of three stages, each one with particular tasks. This approach has been taken in order to manage the complexity of the component and to ease testing phase.

This component interfaces with the Network Interface in order to send/receive coherence requests.


Directory Controller

Stage 1

Stage 1 is responsible for issuing requests to the control logic. All requests are coherence request/response from the network interface.

TSHR Signals

The arbiter checks if a pending request is already issued in the pipeline or ongoing in the TSHR (see TSHR Update Logic). Tags and sets for each type of request are forwarded from TSHR to the arbiter. TSHR entries are considered valid for that class of request if and only if its hit signal is asserted:

// Signals to TSHR
assign ni_request_address                             = ni_request.memory_address;
assign dc1_tshr_lookup_tag[REQUEST_TSHR_LOOKUP_PORT]  = ni_request_address.tag;
assign dc1_tshr_lookup_set[REQUEST_TSHR_LOOKUP_PORT]  = ni_request_address.index;

// Signals from TSHR
assign request_tshr_hit                               = tshr_lookup_hit[REQUEST_TSHR_LOOKUP_PORT];
assign request_tshr_index                             = tshr_lookup_index[REQUEST_TSHR_LOOKUP_PORT];
assign request_tshr_entry_info                        = tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT];

Stall Protocol ROM

In order to be compliant with the coherence protocol all incoming coherence requests on blocks whose coherence state is non-stable state have to be stalled. This task is performed through a protocol ROM whose output signal will stall the issue of that coherence request when asserted, e.g. when a block is in state S_D and a GetS, GetM or a replacement request on the same block are stalled. In order to assert this signal the protocol ROM receives in input the type of the request, the state and the actual owner of the block:

assign dpr_state              = tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT].state;
assign dpr_message_type       = ni_request.packet_type;
assign dpr_from_owner         = ni_request.source == request_tshr_entry_info.owner;

dc_stall_protocol_rom stall_protocol_rom (
.input_state         ( dpr_state        ),
.input_request       ( dpr_message_type ),
.input_is_from_owner ( dpr_from_owner   ),
.dpr_output_stall    ( stall_request    )
);

Note that if the request does not come from the current owner it can be issued because it does not change the coherence state for the block (see Coherence Protocol).

Issuing a Request

In order to issue a request, it is required that:

  • TSHR is not full and the address of the request is not already in the TSHR;
  • the network interface is available;
  • further stages are not busy;

The following code shows the issuing logic case for a replacement request, other cases are similar:

can_issue_replacement_request = !rp_empty && 

   !tshr_full && !replacement_request_tshr_hit &&

   ! (( dc2_pending ) || ( dc3_pending )) &&

   ni_forwarded_request_network_available && ni_response_network_available;

A cache coherence request adds more constraints other than those above, that is:

  • the network interface provides a valid request;
  • if the request is already in TSHR it has to be not valid;
  • if the request is already in TSHR and valid it must not have been stalled by Protocol ROM (see Stall Signals).

The latter two are added in order to give priority to pending requests first.

assign can_issue_request = ni_request_valid && 

    !tshr_full && 

   ( !request_tshr_hit || 
          ( request_tshr_hit  && !request_tshr_entry_info.valid) ||
          ( request_tshr_hit && request_tshr_entry_info.valid  && !stall_request ) ) &&

   ! (( dc2_pending ) || ( dc3_pending )) &&

   ni_forwarded_request_network_available && ni_response_network_available;

Finally, responses are never stalled, those are elaborated whenever the network interface outputs a response:

assign can_issue_response = ni_response_valid;

Requests Scheduler

Once the issuing conditions have been verified, two or more requests could be ready to be scheduled at the same time so a fixed-priority scheduler is used. In particular this scheduler uses fixed priorities set as below:

  1. replacement request
  2. coherence response
  3. coherence request

This ordering ensures coherence is preserved. Once a type of request is scheduled this block drives the output signals for the second stage.

L2 Tag & Directory State Cache

Finally, a cache memory stores L2 tags and their directory state (recall that the directory is inclusive). The directory state is updated whenever a request is processed by Stage 3 and the protocol modifies it.

Stage 2

Stage 2 manages L2 Data and Info caches, and forwards signals from Stage 1 to Stage 3. It also contains all related logic for managing cache hits and block replacement. The policy used to replace a block is LRU (Least Recently Used).

The L2 cache contains cache data along with coherence information, i.e. the owner and sharers list (the directory state is included in L2 Directory State Cache).

Stage 3 updates LRU and cache data once the request is processed.

TSHR

Transaction Status Handling Register is used to track ongoing coherence transaction on scheduled memory blocks; whenever a memory line is in the TSHR it is in a non-stable state.

A TSHR entry comprises the following information:

Valid Address State Sharers list Owner
  • Valid: entry is valid
  • Address: entry memory address
  • State: actual coherence state
  • Sharers list: list of sharers for the block (one-hot codified)
  • Owner: block owner

See MSHR for details about this module implementation.

Stage 3

Stage 3 is responsible for the actual execution of requests based on the protocol ROM. Once a request is processed, this module issues signals to the units in the above stages in order to update information and data in caches properly. Every group of signals to a particular unit is managed by a subsystem, each one represented in the picture below. Each subsystem is simply a combinatorial logic that "converts" signals from protocol ROM in proper commands to the relative unit.

DC stage 3

Current State Selector

Before a coherence request is processed the correct source for cache block state has to be chosen. These data can be fetched from:

  • cache memory;
  • TSHR;
  • replacement queue;

The following code shows how the control logic selects the information for the issued request:

always_comb begin
	if ( dc2_message_tshr_hit ) begin
		current_address      = dc2_message_address;
		current_state        = dc2_message_tshr_entry_info.state;
		current_sharers_list = dc2_message_tshr_entry_info.sharers_list;
		current_owner        = dc2_message_tshr_entry_info.owner;
	end else if ( dc2_message_cache_hit ) begin
		current_address      = dc2_message_address;
		current_state        = dc2_message_cache_state;
		current_sharers_list = dc2_message_cache_sharers_list;
		current_owner        = dc2_message_cache_owner;
	end else if (is_replacement) begin
		current_address      = dc2_message_address;
		current_state        = dc2_replacement_state;
		current_sharers_list = dc2_replacement_sharers_list;
		current_owner        = dc2_replacement_owner;
	end else begin
		current_address      = dc2_message_address;
		current_state        = {`DIRECTORY_STATE_WIDTH{1'b0}}; // State N
		current_sharers_list = {`TILE_COUNT{1'b0}};
		current_owner        = tile_address_t'(TILE_MEMORY_ID);
	end
end

As shown in the above logic, if a TSHR hit occurs then the most updated information for that block are retrieved from the THSR. Otherwise, if a cache hit occurs the information required are fetched from the L2 cache. In case of replacement, those are retrieved from the replacement output signals from the previous stage. If none of the conditions above are met then cache block is considered in state N.

Protocol ROM

This module implements the coherence protocol as represented in the figure below. It takes in input the current state and the request type and decodes the next actions.

MSI_DC

The coherence protocol used is MSI plus some changes due to the directory's inclusivity. In particular, a new stable state has been added, N, meaning the block is not cached in the directory and has to be fetched from the main memory. The N state has been necessary since when a block reaches the stable state I states that the block is cached only by directory controller, and it is not present in any L1 cache, but the directory still has information on the block. While the directory has no information on blocks in state N.

Furthermore, new non-stable states have been added:

  • state MN_A in which the directory controller is evicting the block which was in state M, and is waiting for an acknowledge message (MC_Ack) from the main memory. This might happens after a replacement request issued for that block. Further requests on the same block are stalled until data has been received from block owner and sent to the memory. Note that the block is invalidated so new access to the main memory is necessary;
  • state SN_A in which the directory controller is evicting the block which was in state S, and is waiting for an acknowledge message (MC_Ack) from the main memory. Similar to the MN_A state;
  • state NS_D in which the directory controller is waiting for data coming from the memory. This might occurs after an Fwd-getS request on a block in state N. Further requests on the same block are stalled until data has been received from the main memory and sent to requestor(s).

TSHR Update Logic

TSHR could be updated in three different ways:

  • entry allocation;
  • entry deallocation;
  • entry update.

TSHR is used to store cache lines data whose coherence transactions are pending. This is the case in which a cache line is in a non-stable state. So an entry allocation is made every time the cache line's state moves towards a non-stable state. In opposite way a deallocation is made when a cache line's state enters a stable state. Finally an update is made when there is something to change regarding the TSHR line but cache line's state is yet non-stable.

assign tshr_allocate     =  current_state_is_stable & !next_state_is_stable;
assign tshr_deallocate   = !current_state_is_stable &  next_state_is_stable; 
assign tshr_update       = !current_state_is_stable & !next_state_is_stable & coherence_update_info_en;

Once one of the previous signals is asserted then properly informations are passed to TSHR. The remaining controls are simply sanity checks in order to check all signals are set properly. [TODO: verify]

Note that if the operation is an entry allocation then the index of the first empty THSR line is passed. This index is obtained directly from THSR unit. Remember that at this point there is surely an empty TSHR line otherwise the request would have not been issued (see Issue Signals).

If the operation is an update or a deallocation the the index is obtained from Stage 1 (through Stage 2) in which TSHR is queried for the index of the entry associated with the actual request.

assign dc3_update_tshr_index = tshr_allocate ? tshr_empty_index : dc2_message_tshr_index;

Cache Update Logic

Cache could be updated in three different ways:

  • entry allocation;
  • entry deallocation;
  • entry update.

Unlike TSHR, cache is used to store cache lines data whose coherence transactions are completed. This is the case in which a cache line is in a stable state. So an entry allocation is made every time the cache line's state moves towards a stable state except when the state is N (that is when cache line has to be invalidated). In opposite way a deallocation is made when a cache line's state enters a non-stable state, that is when its data are written in TSHR (see TSHR Update Logic). Finally an update is made when there is something to change regarding the cache line but cache line's state is already stable.

assign allocate_cache    = next_state_is_stable & ( coherence_update_info_en | dpr_output.store_data ) & ~(tshr_deallocate & dpr_output.invalidate_cache_way) & ~update_cache; 
assign deallocate_cache  = tshr_allocate & dc2_message_cache_hit ;
assign update_cache      = current_state_is_stable & next_state_is_stable & dc2_message_cache_hit & ( coherence_update_info_en | dpr_output.store_data );

Once one of the previous signals is asserted then properly informations are passed to cache memory.
Note that the remaining controls are simply sanity checks in order to check all signals are set properly.

Replacement Logic

Every time a replacement request has been issued its cache block is not invalidated. That is because the same cache line is replaced with a new valid one from another previous coherence request. The replaced cache block is queued in a replacement queue where it lies until a new request can be issued from Stage 1 (this replacement request will have maximum priority and will be scheduled first of all other pending requests (see Requests Scheduler)).

So this module manages replacement queue and allows a cache block to be enqueued when there is a replacement, that is when the actual coherence request need to store its info in cache memory ((allocate_cache || update_cache) & !dc2_message_cache_hit), it's not a replacement request as well (!is_replacement) and it doesn't need TSHR in order to complete (!deallocate_cache, see Cache Update Logic):

assign do_replacement  = dc2_message_valid && dc2_message_cache_valid
       && ((allocate_cache || update_cache) && !deallocate_cache) 
       && !is_replacement 
       && !dc2_message_cache_hit;
assign dc3_replacement_enqueue = dc2_message_valid && do_replacement;

Message Generator

This module sends a forward request or a response message to the network interface whenever one is available. Messages are formatted properly with the coherence protocol.

Note that this block manages instruction cache misses as well. In this case requests are forwarded directly to memory without passing through coherence logic.

See Also

Coherence