Load/Store unit
From NaplesPU Documentation
It holds cache L1 data and tags, schedules different thread memory requests, and brings data in \texttt{Writeback}.
The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}.
/*
* This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself * in order to reduce the memory access latency. It is divided in three stages (more details will be furnished inside them). * It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit * a signal in order to stop a thread when a miss raises. * Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well. */
Stage 1
/*
* This is the first stage of the Load/Store Pipeline Unit. * This stage has one queue per thread in which store the threads-relative instructions coming from the * Operand Fetch, then provides in parallel one instruction per thread to the second stage . * * If the second stage is able to execute the instruction provided of thread i-th, asserts combinatorially * the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction * has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy. * * Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary. * * The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx * * It contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer. * The output of this this buffer is send to the 2nd stage. * * Note that this stage consumes much memory space because the queues store the entire instructions and the * relative fetched operands. * */
Stage 2
/*
* This stage has the main scope to choose a request to serve and fetching the tag&privileges from the tag cache. * * It receives from the previous stage the load/store requests plus the recycle request for each thread; both kind of request * are dequeued using two different signals (they are treated differently), but assert the same signal. The recycling has the * priority over the regular requests. * * The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache and it has the highest priority * * Basing on these signals, an arbiter choose a request and it performs the tag&privileges read or update. * In parallel, another reading can happen because of a snooping request by the cache controller. * * */
Stage 3
/*
* This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection and * the data fetching, if necessary. It receives as well other two signals about flushing, and evicting/updating. * * The output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing, * plus another important signal about the thread sleeping if a miss occurs. */