Difference between revisions of "L2 and Directory cache controller"

From NaplesPU Documentation
Jump to: navigation, search
(Stage1)
(Replacement Logic)
 
(179 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Il Directory Controller è quel componente che si occupa della gestione della cache L2 (o LLC), in maniera tale da realizzare i meccanismi di coerenza e, sfruttando l'inclusività della L2, è possibile gestire contemporanemante anche la cache della Directory. Infatti se una linea di cache non è presente nella L2 non sarà presente in nessuna delle cache L1 e quindi risulta inutile avere la corrispettiva linea di cache della directory. Pertanto le informazioni della Cache Directory sono associate a quelle della cache L2.
+
The '''Directory controller''' manages the L2 cache and the ownership of memory lines, it is organized in a distributed directory structure.
 +
 
 +
== Introduction ==
 +
This component is composed of three stages, each one with particular tasks. This approach has been taken in order to manage the complexity of the component and to ease testing phase.
 +
 
 +
This component interfaces with the Network Interface in order to send/receive coherence requests.
 +
 
  
Così come si può dedurre dal codice, il Directory Controller può essere suddiviso in tre stage, che vedremo nel dettaglio in seguito.
 
  
 
[[File:L2_cache.jpg|Directory Controller]]
 
[[File:L2_cache.jpg|Directory Controller]]
  
Nel Directory Controller è presente anche un ulteriore componente trasversale, il TSHR, (transaction status handling registers) che si occupa della gestione delle richieste in sospeso.
+
== Stage 1 ==
 +
Stage 1 is responsible for issuing requests to the control logic. All requests are coherence request/response from the network interface.
  
Un'assunzione fondamentale relativa al Directory Controller è che in ogni momento può essere processata una sola richiesta per volta all'interno della pipe. La filosofia alla base della gestione delle richieste è che se una linea i trova in uno stato stabile, allora è memorizzata in cache, altrimenti nel TSHR.
+
=== TSHR Signals ===
 +
The arbiter checks if a pending request is already issued in the pipeline or ''ongoing'' in the TSHR (see [[L2 and Directory cache controller#TSHR Update Logic | TSHR Update Logic]]). Tags and sets for each type of request are forwarded from TSHR to the arbiter.
 +
TSHR entries are considered valid for that class of request if and only if its hit signal is asserted:
 +
 +
// Signals to TSHR
 +
assign ni_request_address                            = ni_request.memory_address;
 +
assign dc1_tshr_lookup_tag[REQUEST_TSHR_LOOKUP_PORT]  = ni_request_address.tag;
 +
assign dc1_tshr_lookup_set[REQUEST_TSHR_LOOKUP_PORT]  = ni_request_address.index;
 +
 +
// Signals from TSHR
 +
assign request_tshr_hit                              = tshr_lookup_hit[REQUEST_TSHR_LOOKUP_PORT];
 +
assign request_tshr_index                            = tshr_lookup_index[REQUEST_TSHR_LOOKUP_PORT];
 +
assign request_tshr_entry_info                        = tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT];
  
Il Directory Controller si interfaccia con la Network Interface per la ricezione e l'invio di richieste e risposte.
+
=== Stall Protocol ROM ===
 +
In order to be compliant with the coherence protocol all incoming coherence requests on blocks whose coherence state is non-stable state have to be stalled. This task is performed through a protocol ROM whose output signal will stall the issue of that coherence request when asserted, e.g. when a block is in state S_D and a ''GetS'', ''GetM'' or a ''replacement'' request on the same block are stalled.  
 +
In order to assert this signal the protocol ROM receives in input the type of the request, the state and the actual owner of the block:
  
== Stage1 ==
+
assign dpr_state              = tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT].state;
Lo stage 1 ha lo scopo principale di dover eseguire l'issue di una richiesta in ingresso al controllore. Tali richieste provengono dalla Network Interface (forwarded request e response) più una richiesta di replacement generata internamente dal controllore (verrà spiegato nel dettaglio nello stage 3).
+
assign dpr_message_type      = ni_request.packet_type;
Per poter eseguire una richiesta, bisogna interrogare;
+
assign dpr_from_owner        = ni_request.source == request_tshr_entry_info.owner;
* il TSHR, se c'è una richiesta già pendente o se è pieno;
+
* la protocol ROM, se la richiesta deve essere bloccata;
+
dc_stall_protocol_rom stall_protocol_rom (
* la Network Interface, se la rete non è disponibile ad inviare un possibile pacchetto;
+
.input_state        ( dpr_state        ),
* gli stage successivi, per vedere se ci sono o meno richieste ancora nella pipe.
+
.input_request      ( dpr_message_type ),
 +
.input_is_from_owner ( dpr_from_owner  ),
 +
.dpr_output_stall    ( stall_request    )
 +
);
  
Di seguito è riportato il codice equivalente:
+
Note that if the request does not come from the current owner it can be issued because it does not change the coherence state for the block (see [[L2 and Directory cache controller#Protocol ROM | Coherence Protocol]]).
  
can_issue_replacement_request = !rp_empty && !tshr_full && !replacement_request_tshr_hit &&
+
=== Issuing a Request ===
    ! (( dc2_pending ) ||( dc3_pending )
+
In order to issue a request, it is required that:
    ) && ni_forwarded_request_network_available && ni_response_network_available;
+
* TSHR is not full and the address of the request is not already in the TSHR;
 +
* the network interface is available;
 +
* further stages are not busy;
  
Mentre la condizione relativa all'invio di una response risulta essere particolarmente semplice, quella relativa alla request è molto simile a quella descritta per la replacement request, ma complicata nella valutazione della entry prensente nel TSHR, così da non violare le condizioni imposte dal protocollo, ovvero: che non ci sia stato un hit, oppure se c'è stato un hit e la richiesta non è valida, oppure, se c'è stato un hit e la richiesta è valida, il protocollo non stalla tale richiesta. Tale complicazione è dovuta al fatto che bisogna eseguire richieste relative a richieste precedentemente pendenti.
+
The following code shows the issuing logic case for a replacement request, other cases are similar:
 +
 
 +
can_issue_replacement_request = !rp_empty &&
 +
 +
    !tshr_full && !replacement_request_tshr_hit &&
 +
 +
    ! (( dc2_pending ) || ( dc3_pending )) &&
 +
 +
    ni_forwarded_request_network_available && ni_response_network_available;
 +
 
 +
A cache coherence request adds more constraints other than those above, that is:
 +
* the network interface provides a valid request;
 +
* if the request is already in TSHR it has to be not valid;
 +
* if the request is already in TSHR and valid it must not have been stalled by Protocol ROM (see [[L2 and Directory cache controller#Stall Protocol ROM | Stall Signals]]).
 +
 
 +
The latter two are added in order to give priority to pending requests first.
 +
 
 +
assign can_issue_request = ni_request_valid &&
 +
 +
    !tshr_full &&
 +
 +
    ( !request_tshr_hit ||
 +
          ( request_tshr_hit  && !request_tshr_entry_info.valid) ||
 +
          ( request_tshr_hit && request_tshr_entry_info.valid  && !stall_request ) ) &&
 +
 +
    ! (( dc2_pending ) || ( dc3_pending )) &&
 +
 +
    ni_forwarded_request_network_available && ni_response_network_available;
 +
 
 +
Finally, responses are never stalled, those are elaborated whenever the network interface outputs a response:
  
assign can_issue_request = ni_request_valid && !tshr_full && ( !request_tshr_hit || ( request_tshr_hit  && !request_tshr_entry_info.valid) ||
 
    ( request_tshr_hit && request_tshr_entry_info.valid  && !stall_request ) ) &&
 
    ! (( dc2_pending ) || ( dc3_pending )
 
    ) && ni_forwarded_request_network_available && ni_response_network_available;
 
 
  assign can_issue_response = ni_response_valid;
 
  assign can_issue_response = ni_response_valid;
  
Come detto, tale stage scambia informazioni con il TSHR. Più precisamente invia allo stesso i tag e i set relativi ad una richiesta, risposta o replacement request, ottenendo come risposta un segnale di hit per ogni diversa tipologia; solo se il segnale di hit è alto le relative informazioni (index e entry_info) possono essere interpretate.
+
=== Requests Scheduler ===
 +
Once the issuing conditions have been verified, two or more requests could be ready to be scheduled at the same time so a fixed-priority scheduler is used.  
 +
In particular this scheduler uses fixed priorities set as below:
 +
 
 +
# replacement request
 +
# coherence response
 +
# coherence request
 +
 
 +
This ordering ensures coherence is preserved. Once a type of request is scheduled this block drives the output signals for the second stage.
  
Di seguito è riportato come esempio il codice relativo alle Request:
+
=== L2 Tag & Directory State Cache ===
 +
Finally, a cache memory stores L2 tags and their directory state (recall that the directory is inclusive). The directory state is updated whenever a request is processed by Stage 3 and the protocol modifies it.
  
assign dc1_tshr_lookup_tag[REQUEST_TSHR_LOOKUP_PORT] = ni_request.memory_address[`L2_CACHE_TAG_SUBFIELD];
+
== Stage 2 ==
assign dc1_tshr_lookup_set[REQUEST_TSHR_LOOKUP_PORT] = ni_request.memory_address[`L2_CACHE_SET_SUBFIELD];
 
  
  assign request_tshr_hit  = tshr_lookup_hit[REQUEST_TSHR_LOOKUP_PORT];
+
Stage 2 manages L2 Data and Info caches, and forwards signals from Stage 1 to Stage 3. It also contains all related logic for managing cache hits and block replacement. The policy used to replace a block is LRU (Least Recently Used).
assign request_tshr_index  = tshr_lookup_index[REQUEST_TSHR_LOOKUP_PORT];
 
assign request_tshr_entry_info  =  tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT];
 
  
L'interazione con la Protocol ROM in questo stage ha come obiettivo principale quello di determinare se una richiesta deve essere o meno "stallata", così come previsto dal protocollo di coerenza. Il tutto viene realizzato attraverso la comunicazione da parte dello Stage 1 alla sezione apposita della Protocol ROM del tipo di messaggio, dello stato e dell'owner della relativa risorsa.
+
The L2 cache contains cache data along with coherence information, i.e. the owner and sharers list (the directory state is included in [[L2 and Directory cache controller#L2 Tag & Directory State Cache| L2 Directory State Cache]]).
  
assign dpr_state = tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT].state;
+
Stage 3 updates LRU and cache data once the request is processed.
assign dpr_message_type = ni_request.packet_type;
 
assign dpr_from_owner = ni_request.source == request_tshr_entry_info.owner;
 
  
assign stall_request = dpr_output_request.stall;
+
=== TSHR ===
 +
''Transaction Status Handling Register'' is used to track ongoing coherence transaction on scheduled memory blocks; whenever a memory line is in the TSHR it is in a non-stable state. <br>
  
Determinate le condizioni per poter eseguire l'issue di una richiesta, uno scheduler fixed-priority esegue lo scheduling delle richieste nel seguent ordine: replacement request, response, forwarded request. In particolare, nel caso in cui sia presente almeno una richiesta all'interno della Replacement Queue, essa sarà sempre prelevata ed eseguita con massima priorità al fine di preservare l'ordine di esecuzione delle istruzioni, che equivale a preservare la coerenza.
+
A TSHR entry comprises the following information:
  
 +
{| class="wikitable"
 +
|-
 +
! Valid
 +
! Address
 +
! State
 +
! Sharers list 
 +
! Owner
 +
|-
 +
|}
  
In conclusione della descrizione dello Stage 1, possiamo notare la presenza della tag&state cache L2, che conserva i tag della cache L2 e lo stato della Directory. In questo caso possiamo verificare come i segnali di scrittura e aggiornamento delle voci presenti all'interno della cache L2 siano retroazionate dallo Stage 3 (ciò rappresenta uno dei motivi principali per cui può essere processata una singola richiesta alla volta).
+
* Valid: entry is valid
 +
* Address: entry memory address
 +
* State: actual coherence state
 +
* Sharers list: list of sharers for the block (one-hot codified)
 +
* Owner: block owner
  
==== TSHR ====
+
See [[L1_Cache_Controller#Implementation details |MSHR]] for details about this module implementation.
Il TSHR ha lo stesso identico funzionamento dell'MSHR. Si rimanda al paragrafo nel [[L1_Cache_Controller|Cache Controller]] per la dinamica del funzionamento interno.
 
  
=== Stage 2 ===
+
== Stage 3 ==
 +
Stage 3 is responsible for the actual execution of requests based on the protocol ROM. Once a request is processed, this module issues signals to the units in the above stages in order to update information and data in caches properly. Every group of signals to a particular unit is managed by a subsystem, each one represented in the picture below. Each subsystem is simply a combinatorial logic that "converts" signals from protocol ROM in proper commands to the relative unit.
  
Per quanto riguarda lo Stage 2 possiamo notare che riceve in ingresso l'output dello Stage 1 e la retroazione proveniente dallo Stage 3, in output segnali diretti allo Stage 3, in parte costituiti da semplici inoltri degli input provenienti dallo Stage 1.
+
[[File:Stage3.png|800px|DC stage 3]]
  
All'interno dello Stage 2 possiamo notare la presenza della cache di livello L2 contenente i dati, per cui ritroviamo la logica di hit/miss plru. Di seguito è riportato come viene eseguito il controllo di hit:
+
=== Current State Selector ===
 +
Before a coherence request is processed the correct source for cache block state has to be chosen. These data can be fetched from:
  
generate
+
* cache memory;
genvar way_idx;
+
* TSHR;
    for ( way_idx = 0; way_idx < `L2_CACHE_WAY; way_idx++ ) begin
+
* replacement queue;
      assign hit_oh[way_idx] = dc1_message_cache_valid[way_idx] && ( dc1_message_cache_tag[way_idx] == dc1_message_address.tag );
 
    end
 
endgenerate
 
  
Il codice riportato mostra l'assegnazione della way selezionata, che in caso di hit è assegnata al relativo indirizzo, altrimenti alla lru_way, che rappresenta l'output della pseudo-lru, ovvero la linea di cache meno utilizzata.
+
The following code shows how the control logic selects the information for the issued request:
  
  assign selected_way = hit ? hit_idx : lru_way;
+
  always_comb begin
 +
if ( dc2_message_tshr_hit ) begin
 +
current_address      = dc2_message_address;
 +
current_state        = dc2_message_tshr_entry_info.state;
 +
current_sharers_list = dc2_message_tshr_entry_info.sharers_list;
 +
current_owner        = dc2_message_tshr_entry_info.owner;
 +
end else if ( dc2_message_cache_hit ) begin
 +
current_address      = dc2_message_address;
 +
current_state        = dc2_message_cache_state;
 +
current_sharers_list = dc2_message_cache_sharers_list;
 +
current_owner        = dc2_message_cache_owner;
 +
end else if (is_replacement) begin
 +
current_address      = dc2_message_address;
 +
current_state        = dc2_replacement_state;
 +
current_sharers_list = dc2_replacement_sharers_list;
 +
current_owner        = dc2_replacement_owner;
 +
end else begin
 +
current_address      = dc2_message_address;
 +
current_state        = {`DIRECTORY_STATE_WIDTH{1'b0}}; // State N
 +
current_sharers_list = {`TILE_COUNT{1'b0}};
 +
current_owner        = tile_address_t'(TILE_MEMORY_ID);
 +
end
 +
end
  
All'interno del Memory Bank possiamo osservare come la lettura venga effettuata utilizzando l'index proveniente dallo Stage 1 e la way ottenuta come descritto, mentre la scrittura è sotto il diretto controllo dello Stage 3. I dati contenuti all'interno della cache sono i dati contenuti nella cache L2 e i dati della Cache Directory, cioè l'owner della linea e la lista degli sharer .
+
As shown in the above logic, if a TSHR hit occurs then the most updated information for that block are retrieved from the THSR. Otherwise, if a cache hit occurs the information required are fetched from the L2 cache. In case of replacement, those are retrieved from the replacement output signals from the previous stage. If none of the conditions above are met then cache block is considered in state N.
  
=== Stage 3 ===  
+
=== Protocol ROM ===
Il compito fondamentale dello Stage 3 è quello di eseguire effettivamente la richiesta in ingresso attraverso i segnali provenienti dalla Protocol ROM. Prima di poter determinare i segnali di controllo in uscita bisogna selezionare opportunamente lo stato in ingresso. Ciò dipende da vari fattori:
+
This module implements the coherence protocol as represented in the figure below. It takes in input the current state and the request type and decodes the next actions.
  
* se c'è stato un cache hit, allora lo stato da processare viene prelevato dalla cache;
+
[[File:MSI_Protocol_dc-rom_new.png|1600px]]
* se c'è stato un TSHR hit, allora c'è una richesta pendente e bisogna prelevare lo stato dal TSHR;
 
* se c'è un replacement, allora bisogna prelevare lo stato dal messaggio stesso;
 
* se non si è verificato nessuno dei precedenti casi, allora lo stato è N, non presente in cache.
 
  
La selezione viene realizzata attraverso un scheduler a priorità fissa e l'uscita comprende informazioni relative alla coerenza (stato, sharerer list e owner) con in relativo indirizzo.
+
The coherence protocol used is MSI plus some changes due to the directory's inclusivity. In particular, a new stable state has been added, ''N'', meaning the block ''is not'' cached in the directory and has to be fetched from the main memory. The N state has been necessary since when a block reaches the stable state ''I'' states that the block ''is'' cached ''only'' by directory controller, and it is not present in any L1 cache, but the directory still has information on the block. While the directory has no information on blocks in state N.
  
Le uscite della Protocol ROM sono molteplici e determinano tutte le principali azioni che svolge lo Stage 3: TSHR update, Cache update, PseudoLRU update e generazione dei messaggi.
+
Furthermore, new non-stable states have been added:
  
[[File:Stage3.png|800px|DC stage 3]]
+
* state '''MN_A''' in which the directory controller is evicting the block which was in state M, and is waiting for an acknowledge message (MC_Ack) from the main memory. This might happens after a replacement request issued for that block. Further requests on the same block are ''stalled'' until data has been received from block owner and sent to the memory. Note that the block is ''invalidated'' so new access to the main memory is necessary;
 +
* state '''SN_A''' in which the directory controller is evicting the block which was in state S, and is waiting for an acknowledge message (MC_Ack) from the main memory. Similar to the '''MN_A''' state;
 +
* state '''NS_D''' in which the directory controller is waiting for data coming from the memory. This might occurs after an Fwd-getS request on a block in state N. Further requests on the same block are stalled until data has been received from the main memory and sent to requestor(s).
 +
 
 +
 
 +
For further details about the memory coherence protocol, please refer to:
 +
* ''[[MSI Protocol]]''
 +
 
 +
=== TSHR Update Logic ===
 +
TSHR could be updated in three different ways:
 +
 
 +
* entry allocation;
 +
* entry deallocation;
 +
* entry update.
 +
 
 +
TSHR is used to store cache lines data whose coherence transactions are ''ongoing''. This is the case in which a cache line is in a non-stable state. So an entry allocation is made every time the cache line's state moves towards a non-stable state. In the opposite way, deallocation is performed whenever a cache line's state enters a stable state. Finally, an update is made when there is something to change regarding the TSHR line but the cache line's state is non-stable yet:
 +
 
 +
assign tshr_allocate    =  current_state_is_stable & !next_state_is_stable;
 +
assign tshr_deallocate  = !current_state_is_stable &  next_state_is_stable;
 +
assign tshr_update      = !current_state_is_stable & !next_state_is_stable & coherence_update_info_en;
  
==== TSHR update ====
+
Note that, if the operation is an entry allocation then the index of the first entry available is passed directly by the TSHR module. Remember that at this point there is surely an empty TSHR line otherwise the request would have not been issued (see [[L2 and Directory cache controller#Issue Signals | Issue Signals]]), since all pending requests are stalled when the TSHR is full.
Per TSHR update si intende allocare, deallocare o aggiornare una entry del TSHR. La filosofia alla base della Directory è avere linee di cache valide quando lo stato è stabile, mentre avere la linea di cache allocata nel TSHR quando lo stato è instabile. Per allocare una linea nel TSHR, lo stato corrente deve essere stabile e il successivo no; condizione opposta per la deallocazione di una entry. L'aggiornamento, invece, avviene solo se stato corrente e successivo sono entrambi instabili e bisogna aggiornare una delle informazioni contenute al suo interno.  
 
  
Il segnale che valida un'operazione su TSHR è asserito ogni qual volta viene eseguita una qualsiasi operazione su TSHR.
+
In case of update or deallocation, the index of the entry is forwarded by Stage 1 (through Stage 2):
  
  assign coherence_update_info_en  =
+
  assign dc3_update_tshr_index = tshr_allocate ? tshr_empty_index : dc2_message_tshr_index;
  ( current_state != dpr_output.next_state ) | // cambiamento nello stato
 
  dpr_output.owner_clear | dpr_output.owner_set_requestor | dpr_output.sharers_add_owner | // cambiamento nell'owner
 
  dpr_output.sharers_add_requestor | dpr_output.sharers_clear | dpr_output.sharers_remove_requestor; // cambianento nella sharer list
 
  
assign tshr_allocate = current_state_is_stable && !next_state_is_stable,
+
=== Cache Update Logic ===
  tshr_update  = !current_state_is_stable & !next_state_is_stable & coherence_update_info_en,
+
Cache could be updated in three different ways:
  tshr_deallocate = !current_state_is_stable && next_state_is_stable;
 
  
assign dc3_update_tshr_entry_info.valid =( tshr_allocate | tshr_update ) & ~tshr_deallocate;  
+
* entry allocation;
assign dc3_update_tshr_enable  = dc2_message_valid && ( tshr_allocate || tshr_deallocate || tshr_update ) ;
+
* entry deallocation;
 +
* entry update.
  
==== Cache update ====
+
Unlike TSHR, the cache stores cache lines data whose coherence transactions are ''completed'', and the tracked ache line is considered in a stable state. So an entry allocation is made every time the cache line's state moves towards a stable state from non-stable and it was not already into the cache. In the opposite way, deallocation whenever a cache line's state enters a non-stable state, then it is tracked in the TSHR (see [[L2 and Directory cache controller#TSHR Update Logic | TSHR Update Logic]]). Finally, an update occurs whenever there is something to change regarding the cache line in compliance with the protocol ROM:
Per quanto riguarda la fase di Cache update è necessario definire attentamente quando allocare, deallocare o aggiornare una linea di cache. Ricordando che una linea si trova in cache se lo stato associato è stabile e si trova nel TSHR se lo stato è instabile, l'allocazione di una linea di cache avviene se si passa ad uno stato stabile che non sia N (ovvero la nuova linea di cache non sia invalida), mentre l'update avviene quando una linea era già presente nella cache e bisogna aggiornarne le informazioni (senza però passare ad uno stato instabile).
 
  
Le condizioni per la deallocazione di una linea di cache sono legate anche alla gestione del replacement. Essa avviene quando una linea passa dalla cache al TSHR o eliminata (stato N).
+
assign allocate_cache    = next_state_is_stable & ( coherence_update_info_en | dpr_output.store_data ) & ~(tshr_deallocate & dpr_output.invalidate_cache_way) & ~update_cache;
 +
assign deallocate_cache  = tshr_allocate & dc2_message_cache_hit ;
 +
assign update_cache      = current_state_is_stable & next_state_is_stable & dc2_message_cache_hit & ( coherence_update_info_en | dpr_output.store_data );
  
assign update_cache    = dc2_message_cache_hit & current_state_is_stable & next_state_is_stable & ( coherence_update_info_en | dpr_output.store_data );
+
=== Replacement Logic ===
assign deallocate_cache = ( tshr_allocate & dc2_message_cache_hit) ;
+
Whenever a replacement request occurs the current cache block is invalidated, but the entry is not freed. That is because the same cache line is replaced with a new valid one from a previous coherence request with the same set. The replaced cache block is queued in a replacement queue until Stage 1 issues it (replacement requests have the maximum priority and will be scheduled as soon as they are pending (see [[L2 and Directory cache controller#Requests Scheduler | Requests Scheduler]])).
assign allocate_cache    = next_state_is_stable & ( coherence_update_info_en | dpr_output.store_data ) & ~(tshr_deallocate & dpr_output.invalidate_cache_way);
 
  
assign dc3_update_cache_enable  = dc2_message_valid && !is_replacement && ( allocate_cache || update_cache || deallocate_cache ),
+
This module manages the replacement queue and allows a cache block to be enqueued whenever there is a replacement, this happens whenever the actual coherence request need to store data and info in cache memory, then the current request is not a replacement itself (''!is_replacement'') and a cache miss occurs:
  dc3_update_cache_validity_bit  = ~dpr_output.invalidate_cache_way,
 
  
Ora si consideri questa osservazione: il passaggio allo stato N avviene solo quando è stato eseguito un replacement (tale condizione è generica e non dipendente dal protocollo). Per rendere più efficiente la gestione del replacement, si evita di deallocare la linea di cache ad ogni replacement, sostituendola subito con la nuova entry (quindi già valida) e quella uscente verrà subito inserita nella Replacement Queue. Questa è la parte più delicata dell'aggiornamento della cache, perchè potenzialmente si legge contemporaneamente un dato dalla memoria (che va nella replacement queue), si invia un messaggio sulal rete e si alloca una linea nel TSHR.
+
assign do_replacement  = dc2_message_valid && dc2_message_cache_valid
 +
        && ((allocate_cache || update_cache) && !deallocate_cache)  
 +
        && !is_replacement
 +
        && !dc2_message_cache_hit;
 +
 +
assign dc3_replacement_enqueue = dc2_message_valid && do_replacement;
  
==== Protocol ROM ====
+
Signal <code>dc2_message_cache_valid</code> states if the selected way stores a valid lane, if so this line has to be recalled if in state '''M''', and pushed back to the main memory. In case of hit in the case, expression <code>!dc2_message_cache_hit</code>, there is no need of replacement since the control logic is updating an existing line.
Il modulo directory_protocol_rom implementa il protocollo MSI dal punto di vista del Directory Controller, così come descritto in figura.
 
  
[[File:MSI_DC.jpg|1000px|MSI_DC]]
+
=== Message Generator ===
 +
This module sends forward or response messages to the network interface whenever is required by the protocol ROM:
  
Così come visto per il Cache Controller, anche in questo caso è stato utilizzato MSI, apportando però alcune modifiche: l'inserimento dello stato N rappresenta una condizione nella quale il dato che si intende ottenere è presente solo in memoria centrale, mentre lo stato I lo stato in cui il dato è nel Directory Controller, ma in nessuna cache. Ciò è stato necessario al fine di preservare l'inclusività. Ne derivano altri due stati transienti MN<sup>A</sup> e NS<sup>D</sup>: il primo è dovuto ad una operazione di replacement, mentre il secondo deriva dall'ottenimento del dato dalla memoria centrale a seguito della richiesta da parte di un Cache Controller.
+
dpr_output.message_response_send,
 +
        ...
 +
dpr_output.message_forwarded_send,
  
Così come si può evincere nel codice del protocollo adottato, negli eventi di replacement le linee di cache non vengono invalidate: questo perchè, come detto precedentemente, l'operazione di replacement  sostituisce subito la vecchia linea di cache con la nuova, evitando di passare attraverso la procedura di invalidazione della entry sovrascritta (quindi un'evenutale invalidazione invaliderebbe erroneamente la linea di cache già entrata).
+
The above snippet shows the output of the protocol ROM related to the output message to generate. When <code>message_response_send</code> is asserted the directory sends a response over the network, dually, when <code>message_forwarded_send</code> is high, a forwarded is generated.
  
assign do_replacement = dc2_message_valid && update_cache && !dc2_message_cache_hit && dc2_message_cache_valid;
+
Note that this block manages instruction cache misses as well. In such a case, requests are forwarded directly to memory bypassing the coherence logic.
assign dc3_replacement_enqueue  = dc2_message_valid && do_replacement;
 
  
==== Generazione dei messaggi di uscita ====
+
== See Also ==
I messaggi che possono essere generati possono essere di forward request o di response: quando e come generare tali messaggi è definito dal protocollo di coerenza.
+
[[Coherence]]

Latest revision as of 10:12, 12 July 2019

The Directory controller manages the L2 cache and the ownership of memory lines, it is organized in a distributed directory structure.

Introduction

This component is composed of three stages, each one with particular tasks. This approach has been taken in order to manage the complexity of the component and to ease testing phase.

This component interfaces with the Network Interface in order to send/receive coherence requests.


Directory Controller

Stage 1

Stage 1 is responsible for issuing requests to the control logic. All requests are coherence request/response from the network interface.

TSHR Signals

The arbiter checks if a pending request is already issued in the pipeline or ongoing in the TSHR (see TSHR Update Logic). Tags and sets for each type of request are forwarded from TSHR to the arbiter. TSHR entries are considered valid for that class of request if and only if its hit signal is asserted:

// Signals to TSHR
assign ni_request_address                             = ni_request.memory_address;
assign dc1_tshr_lookup_tag[REQUEST_TSHR_LOOKUP_PORT]  = ni_request_address.tag;
assign dc1_tshr_lookup_set[REQUEST_TSHR_LOOKUP_PORT]  = ni_request_address.index;

// Signals from TSHR
assign request_tshr_hit                               = tshr_lookup_hit[REQUEST_TSHR_LOOKUP_PORT];
assign request_tshr_index                             = tshr_lookup_index[REQUEST_TSHR_LOOKUP_PORT];
assign request_tshr_entry_info                        = tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT];

Stall Protocol ROM

In order to be compliant with the coherence protocol all incoming coherence requests on blocks whose coherence state is non-stable state have to be stalled. This task is performed through a protocol ROM whose output signal will stall the issue of that coherence request when asserted, e.g. when a block is in state S_D and a GetS, GetM or a replacement request on the same block are stalled. In order to assert this signal the protocol ROM receives in input the type of the request, the state and the actual owner of the block:

assign dpr_state              = tshr_lookup_entry_info[REQUEST_TSHR_LOOKUP_PORT].state;
assign dpr_message_type       = ni_request.packet_type;
assign dpr_from_owner         = ni_request.source == request_tshr_entry_info.owner;

dc_stall_protocol_rom stall_protocol_rom (
.input_state         ( dpr_state        ),
.input_request       ( dpr_message_type ),
.input_is_from_owner ( dpr_from_owner   ),
.dpr_output_stall    ( stall_request    )
);

Note that if the request does not come from the current owner it can be issued because it does not change the coherence state for the block (see Coherence Protocol).

Issuing a Request

In order to issue a request, it is required that:

  • TSHR is not full and the address of the request is not already in the TSHR;
  • the network interface is available;
  • further stages are not busy;

The following code shows the issuing logic case for a replacement request, other cases are similar:

can_issue_replacement_request = !rp_empty && 

   !tshr_full && !replacement_request_tshr_hit &&

   ! (( dc2_pending ) || ( dc3_pending )) &&

   ni_forwarded_request_network_available && ni_response_network_available;

A cache coherence request adds more constraints other than those above, that is:

  • the network interface provides a valid request;
  • if the request is already in TSHR it has to be not valid;
  • if the request is already in TSHR and valid it must not have been stalled by Protocol ROM (see Stall Signals).

The latter two are added in order to give priority to pending requests first.

assign can_issue_request = ni_request_valid && 

    !tshr_full && 

   ( !request_tshr_hit || 
          ( request_tshr_hit  && !request_tshr_entry_info.valid) ||
          ( request_tshr_hit && request_tshr_entry_info.valid  && !stall_request ) ) &&

   ! (( dc2_pending ) || ( dc3_pending )) &&

   ni_forwarded_request_network_available && ni_response_network_available;

Finally, responses are never stalled, those are elaborated whenever the network interface outputs a response:

assign can_issue_response = ni_response_valid;

Requests Scheduler

Once the issuing conditions have been verified, two or more requests could be ready to be scheduled at the same time so a fixed-priority scheduler is used. In particular this scheduler uses fixed priorities set as below:

  1. replacement request
  2. coherence response
  3. coherence request

This ordering ensures coherence is preserved. Once a type of request is scheduled this block drives the output signals for the second stage.

L2 Tag & Directory State Cache

Finally, a cache memory stores L2 tags and their directory state (recall that the directory is inclusive). The directory state is updated whenever a request is processed by Stage 3 and the protocol modifies it.

Stage 2

Stage 2 manages L2 Data and Info caches, and forwards signals from Stage 1 to Stage 3. It also contains all related logic for managing cache hits and block replacement. The policy used to replace a block is LRU (Least Recently Used).

The L2 cache contains cache data along with coherence information, i.e. the owner and sharers list (the directory state is included in L2 Directory State Cache).

Stage 3 updates LRU and cache data once the request is processed.

TSHR

Transaction Status Handling Register is used to track ongoing coherence transaction on scheduled memory blocks; whenever a memory line is in the TSHR it is in a non-stable state.

A TSHR entry comprises the following information:

Valid Address State Sharers list Owner
  • Valid: entry is valid
  • Address: entry memory address
  • State: actual coherence state
  • Sharers list: list of sharers for the block (one-hot codified)
  • Owner: block owner

See MSHR for details about this module implementation.

Stage 3

Stage 3 is responsible for the actual execution of requests based on the protocol ROM. Once a request is processed, this module issues signals to the units in the above stages in order to update information and data in caches properly. Every group of signals to a particular unit is managed by a subsystem, each one represented in the picture below. Each subsystem is simply a combinatorial logic that "converts" signals from protocol ROM in proper commands to the relative unit.

DC stage 3

Current State Selector

Before a coherence request is processed the correct source for cache block state has to be chosen. These data can be fetched from:

  • cache memory;
  • TSHR;
  • replacement queue;

The following code shows how the control logic selects the information for the issued request:

always_comb begin
	if ( dc2_message_tshr_hit ) begin
		current_address      = dc2_message_address;
		current_state        = dc2_message_tshr_entry_info.state;
		current_sharers_list = dc2_message_tshr_entry_info.sharers_list;
		current_owner        = dc2_message_tshr_entry_info.owner;
	end else if ( dc2_message_cache_hit ) begin
		current_address      = dc2_message_address;
		current_state        = dc2_message_cache_state;
		current_sharers_list = dc2_message_cache_sharers_list;
		current_owner        = dc2_message_cache_owner;
	end else if (is_replacement) begin
		current_address      = dc2_message_address;
		current_state        = dc2_replacement_state;
		current_sharers_list = dc2_replacement_sharers_list;
		current_owner        = dc2_replacement_owner;
	end else begin
		current_address      = dc2_message_address;
		current_state        = {`DIRECTORY_STATE_WIDTH{1'b0}}; // State N
		current_sharers_list = {`TILE_COUNT{1'b0}};
		current_owner        = tile_address_t'(TILE_MEMORY_ID);
	end
end

As shown in the above logic, if a TSHR hit occurs then the most updated information for that block are retrieved from the THSR. Otherwise, if a cache hit occurs the information required are fetched from the L2 cache. In case of replacement, those are retrieved from the replacement output signals from the previous stage. If none of the conditions above are met then cache block is considered in state N.

Protocol ROM

This module implements the coherence protocol as represented in the figure below. It takes in input the current state and the request type and decodes the next actions.

MSI Protocol dc-rom new.png

The coherence protocol used is MSI plus some changes due to the directory's inclusivity. In particular, a new stable state has been added, N, meaning the block is not cached in the directory and has to be fetched from the main memory. The N state has been necessary since when a block reaches the stable state I states that the block is cached only by directory controller, and it is not present in any L1 cache, but the directory still has information on the block. While the directory has no information on blocks in state N.

Furthermore, new non-stable states have been added:

  • state MN_A in which the directory controller is evicting the block which was in state M, and is waiting for an acknowledge message (MC_Ack) from the main memory. This might happens after a replacement request issued for that block. Further requests on the same block are stalled until data has been received from block owner and sent to the memory. Note that the block is invalidated so new access to the main memory is necessary;
  • state SN_A in which the directory controller is evicting the block which was in state S, and is waiting for an acknowledge message (MC_Ack) from the main memory. Similar to the MN_A state;
  • state NS_D in which the directory controller is waiting for data coming from the memory. This might occurs after an Fwd-getS request on a block in state N. Further requests on the same block are stalled until data has been received from the main memory and sent to requestor(s).


For further details about the memory coherence protocol, please refer to:

TSHR Update Logic

TSHR could be updated in three different ways:

  • entry allocation;
  • entry deallocation;
  • entry update.

TSHR is used to store cache lines data whose coherence transactions are ongoing. This is the case in which a cache line is in a non-stable state. So an entry allocation is made every time the cache line's state moves towards a non-stable state. In the opposite way, deallocation is performed whenever a cache line's state enters a stable state. Finally, an update is made when there is something to change regarding the TSHR line but the cache line's state is non-stable yet:

assign tshr_allocate     =  current_state_is_stable & !next_state_is_stable;
assign tshr_deallocate   = !current_state_is_stable &  next_state_is_stable; 
assign tshr_update       = !current_state_is_stable & !next_state_is_stable & coherence_update_info_en;

Note that, if the operation is an entry allocation then the index of the first entry available is passed directly by the TSHR module. Remember that at this point there is surely an empty TSHR line otherwise the request would have not been issued (see Issue Signals), since all pending requests are stalled when the TSHR is full.

In case of update or deallocation, the index of the entry is forwarded by Stage 1 (through Stage 2):

assign dc3_update_tshr_index = tshr_allocate ? tshr_empty_index : dc2_message_tshr_index;

Cache Update Logic

Cache could be updated in three different ways:

  • entry allocation;
  • entry deallocation;
  • entry update.

Unlike TSHR, the cache stores cache lines data whose coherence transactions are completed, and the tracked ache line is considered in a stable state. So an entry allocation is made every time the cache line's state moves towards a stable state from non-stable and it was not already into the cache. In the opposite way, deallocation whenever a cache line's state enters a non-stable state, then it is tracked in the TSHR (see TSHR Update Logic). Finally, an update occurs whenever there is something to change regarding the cache line in compliance with the protocol ROM:

assign allocate_cache    = next_state_is_stable & ( coherence_update_info_en | dpr_output.store_data ) & ~(tshr_deallocate & dpr_output.invalidate_cache_way) & ~update_cache; 
assign deallocate_cache  = tshr_allocate & dc2_message_cache_hit ;
assign update_cache      = current_state_is_stable & next_state_is_stable & dc2_message_cache_hit & ( coherence_update_info_en | dpr_output.store_data );

Replacement Logic

Whenever a replacement request occurs the current cache block is invalidated, but the entry is not freed. That is because the same cache line is replaced with a new valid one from a previous coherence request with the same set. The replaced cache block is queued in a replacement queue until Stage 1 issues it (replacement requests have the maximum priority and will be scheduled as soon as they are pending (see Requests Scheduler)).

This module manages the replacement queue and allows a cache block to be enqueued whenever there is a replacement, this happens whenever the actual coherence request need to store data and info in cache memory, then the current request is not a replacement itself (!is_replacement) and a cache miss occurs:

assign do_replacement  = dc2_message_valid && dc2_message_cache_valid
       && ((allocate_cache || update_cache) && !deallocate_cache) 
       && !is_replacement 
       && !dc2_message_cache_hit;

assign dc3_replacement_enqueue = dc2_message_valid && do_replacement;

Signal dc2_message_cache_valid states if the selected way stores a valid lane, if so this line has to be recalled if in state M, and pushed back to the main memory. In case of hit in the case, expression !dc2_message_cache_hit, there is no need of replacement since the control logic is updating an existing line.

Message Generator

This module sends forward or response messages to the network interface whenever is required by the protocol ROM:

	dpr_output.message_response_send,
       ...
	dpr_output.message_forwarded_send,

The above snippet shows the output of the protocol ROM related to the output message to generate. When message_response_send is asserted the directory sends a response over the network, dually, when message_forwarded_send is high, a forwarded is generated.

Note that this block manages instruction cache misses as well. In such a case, requests are forwarded directly to memory bypassing the coherence logic.

See Also

Coherence