Difference between revisions of "Load/Store unit"
From NaplesPU Documentation
Line 2: | Line 2: | ||
The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}. | The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}. | ||
+ | |||
+ | /* | ||
+ | * This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself | ||
+ | * in order to reduce the memory access latency. It is divided in three stages (more details will be furnished inside them). | ||
+ | * It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit | ||
+ | * a signal in order to stop a thread when a miss raises. | ||
+ | * Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well. | ||
+ | */ | ||
+ | |||
+ | == Stage 1 == | ||
+ | |||
+ | /* | ||
+ | * This is the first stage of the Load/Store Pipeline Unit. | ||
+ | * This stage has one queue per thread in which store the threads-relative instructions coming from the | ||
+ | * Operand Fetch, then provides in parallel one instruction per thread to the second stage . | ||
+ | * | ||
+ | * If the second stage is able to execute the instruction provided of thread i-th, asserts combinatorially | ||
+ | * the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction | ||
+ | * has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy. | ||
+ | * | ||
+ | * Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary. | ||
+ | * | ||
+ | * The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx | ||
+ | * | ||
+ | * It contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer. | ||
+ | * The output of this this buffer is send to the 2nd stage. | ||
+ | * | ||
+ | * Note that this stage consumes much memory space because the queues store the entire instructions and the | ||
+ | * relative fetched operands. | ||
+ | * | ||
+ | */ | ||
+ | |||
+ | == Stage 2 == | ||
+ | |||
+ | /* | ||
+ | * This stage has the main scope to choose a request to serve and fetching the tag&privileges from the tag cache. | ||
+ | * | ||
+ | * It receives from the previous stage the load/store requests plus the recycle request for each thread; both kind of request | ||
+ | * are dequeued using two different signals (they are treated differently), but assert the same signal. The recycling has the | ||
+ | * priority over the regular requests. | ||
+ | * | ||
+ | * The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache and it has the highest priority | ||
+ | * | ||
+ | * Basing on these signals, an arbiter choose a request and it performs the tag&privileges read or update. | ||
+ | * In parallel, another reading can happen because of a snooping request by the cache controller. | ||
+ | * | ||
+ | * | ||
+ | */ | ||
+ | |||
+ | == Stage 3 == | ||
+ | |||
+ | /* | ||
+ | * This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection and | ||
+ | * the data fetching, if necessary. It receives as well other two signals about flushing, and evicting/updating. | ||
+ | * | ||
+ | * The output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing, | ||
+ | * plus another important signal about the thread sleeping if a miss occurs. | ||
+ | */ |
Revision as of 18:03, 22 September 2017
It holds cache L1 data and tags, schedules different thread memory requests, and brings data in \texttt{Writeback}.
The \texttt{Load Store Unit} does not store specific coherence protocol information (as stable states) but it stores \textit{privileges} for all cached addresses. Each cache line has two privileges: \textit{can read} and \textit{can write}. Those privileges determine cache misses/hits and are updated only by the \texttt{Cache Controller}.
/*
* This is the unit inside the core that executes the load and store operations. It contains an L1 data cache inside itself * in order to reduce the memory access latency. It is divided in three stages (more details will be furnished inside them). * It basically interfaces the Operand fetch stage and the Writeback stages. Furthermore, it sends to instruction buffer unit * a signal in order to stop a thread when a miss raises. * Note that the signals to the writeback stage go to the cache controller (throughout the core interface module) as well. */
Stage 1
/*
* This is the first stage of the Load/Store Pipeline Unit. * This stage has one queue per thread in which store the threads-relative instructions coming from the * Operand Fetch, then provides in parallel one instruction per thread to the second stage . * * If the second stage is able to execute the instruction provided of thread i-th, asserts combinatorially * the i-th bit of the ldst2_dequeue_instruction mask in order to notify the stage1 that the instruction * has been consumed. In this way, the second stage stalls the instructions in this stage, if it is busy. * * Before to enqueue the request, the data are aligned and compressed in a proper vector and replicated if necessary. * * The flush operation forces the data to be enqueued, even if the instruction_valid signal is not asserted. xxx * * It contains a recycle buffer: if a cache miss occurs in the 3th stage, the data is putted in this buffer. * The output of this this buffer is send to the 2nd stage. * * Note that this stage consumes much memory space because the queues store the entire instructions and the * relative fetched operands. * */
Stage 2
/*
* This stage has the main scope to choose a request to serve and fetching the tag&privileges from the tag cache. * * It receives from the previous stage the load/store requests plus the recycle request for each thread; both kind of request * are dequeued using two different signals (they are treated differently), but assert the same signal. The recycling has the * priority over the regular requests. * * The signal cc_update_ldst_valid is important and establishes when the cache controller wants to update the cache and it has the highest priority * * Basing on these signals, an arbiter choose a request and it performs the tag&privileges read or update. * In parallel, another reading can happen because of a snooping request by the cache controller. * * */
Stage 3
/*
* This unit primary receives the cached tag&privileges from the previous stage in order to execute the hit/miss detection and * the data fetching, if necessary. It receives as well other two signals about flushing, and evicting/updating. * * The output of this stage is to determine if one of these events happened: cache miss, eviction, and flushing, * plus another important signal about the thread sleeping if a miss occurs. */