Difference between revisions of "Load/Store unit"

From NaplesPU Documentation
Jump to: navigation, search
Line 2: Line 2:
  
 
The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}.
 
The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}.
 +
 +
/*
 +
* This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself
 +
* in order to reduce the memory access latency. It is divided in three stages (more details will be furnished inside them).
 +
* It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit
 +
* a signal in order to stop a thread when a miss raises.
 +
* Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well.
 +
*/
 +
 +
== Stage 1 ==
 +
 +
/*
 +
* This is the first stage of the Load/Store Pipeline Unit.
 +
* This stage has one queue per thread in which store the threads-relative instructions coming from the
 +
* Operand Fetch, then provides in parallel one instruction per thread to the second stage .
 +
*
 +
* If the second stage is able to execute the instruction provided of thread i-th, asserts combinatorially
 +
* the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction
 +
* has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy.
 +
*
 +
* Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary.
 +
*
 +
* The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx
 +
*
 +
* It contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer.
 +
*  The output of this this buffer is send to the 2nd stage.
 +
*
 +
* Note that this stage consumes much memory space because the queues store the entire instructions and the
 +
* relative fetched operands.
 +
*
 +
*/
 +
 +
== Stage 2 ==
 +
 +
/*
 +
* This stage has the main scope to choose a request to serve and fetching the tag&privileges from the tag cache.
 +
*
 +
* It receives from the previous stage the load/store requests plus the recycle request for each thread; both kind of request
 +
* are dequeued using two different signals (they are treated differently), but assert the same signal. The recycling has the
 +
* priority over the regular requests.
 +
*
 +
* The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache and it has the highest priority
 +
*
 +
* Basing on these signals, an arbiter choose a request and it performs the tag&privileges read or update.
 +
* In parallel, another reading can happen because of a snooping request by the cache controller.
 +
*
 +
 +
*/
 +
 +
== Stage 3 ==
 +
 +
/*
 +
* This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection and
 +
* the data fetching, if necessary. It receives as well other two signals about flushing, and evicting/updating.
 +
*
 +
* The output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing,
 +
* plus another important signal about the thread sleeping if a miss occurs.
 +
*/

Revision as of 18:03, 22 September 2017

It holds cache L1 data and tags, schedules different thread memory requests, and brings data in \texttt{Writeback}.

The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}.

/*

* This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself
* in order to reduce the memory access latency. It is divided in three stages (more details will be furnished inside them).
* It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit
* a signal in order to stop a thread when a miss raises.
* Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well.
*/

Stage 1

/*

* This is the first stage of the Load/Store Pipeline Unit.
* This stage has one queue per thread in which store the threads-relative instructions coming from the
* Operand Fetch, then provides in parallel one instruction per thread to the second stage .
*
* If the second stage is able to execute the instruction provided of thread i-th, asserts combinatorially
* the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction
* has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy.
* 
* Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary.
* 
* The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx
* 
* It contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer.
*  The output of this this buffer is send to the 2nd stage.
*
* Note that this stage consumes much memory space because the queues store the entire instructions and the
* relative fetched operands.
*
*/

Stage 2

/*

* This stage has the main scope to choose a request to serve and fetching the tag&privileges from the tag cache.
* 
* It receives from the previous stage the load/store requests plus the recycle request for each thread; both kind of request
* are dequeued using two different signals (they are treated differently), but assert the same signal. The recycling has the
* priority over the regular requests. 
* 
* The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache and it has the highest priority
* 
* Basing on these signals, an arbiter choose a request and it performs the tag&privileges read or update.
* In parallel, another reading can happen because of a snooping request by the cache controller.
* 
*  
*/

Stage 3

/*

* This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection and
* the data fetching, if necessary. It receives as well other two signals about flushing, and evicting/updating.
*
* The output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing,
* plus another important signal about the thread sleeping if a miss occurs.
*/