Load/Store unit

From NaplesPU Documentation
Revision as of 18:03, 22 September 2017 by Fabio (talk | contribs)
Jump to: navigation, search

It holds cache L1 data and tags, schedules different thread memory requests, and brings data in \texttt{Writeback}.

The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}.

/*

* This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself
* in order to reduce the memory access latency. It is divided in three stages (more details will be furnished inside them).
* It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit
* a signal in order to stop a thread when a miss raises.
* Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well.
*/

Stage 1

/*

* This is the first stage of the Load/Store Pipeline Unit.
* This stage has one queue per thread in which store the threads-relative instructions coming from the
* Operand Fetch, then provides in parallel one instruction per thread to the second stage .
*
* If the second stage is able to execute the instruction provided of thread i-th, asserts combinatorially
* the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction
* has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy.
* 
* Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary.
* 
* The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx
* 
* It contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer.
*  The output of this this buffer is send to the 2nd stage.
*
* Note that this stage consumes much memory space because the queues store the entire instructions and the
* relative fetched operands.
*
*/

Stage 2

/*

* This stage has the main scope to choose a request to serve and fetching the tag&privileges from the tag cache.
* 
* It receives from the previous stage the load/store requests plus the recycle request for each thread; both kind of request
* are dequeued using two different signals (they are treated differently), but assert the same signal. The recycling has the
* priority over the regular requests. 
* 
* The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache and it has the highest priority
* 
* Basing on these signals, an arbiter choose a request and it performs the tag&privileges read or update.
* In parallel, another reading can happen because of a snooping request by the cache controller.
* 
*  
*/

Stage 3

/*

* This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection and
* the data fetching, if necessary. It receives as well other two signals about flushing, and evicting/updating.
*
* The output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing,
* plus another important signal about the thread sleeping if a miss occurs.
*/