Network router
Network router implementation is discussed on this page.
Contents
Router
The router moves data between two or more terminals, so the interface is standard: input and output flit, input and output write enable, and backpressure signals.
This is a virtual-channel flow control X-Y look-ahead router for a 2D-mesh topology.
The first choice is to use only input buffering, so this will take one pipe stage. Another technique widely used is the look-ahead routing, that permits the route calculation of the next node. It is possible to merge the virtual channel and switch allocation in just one stage.
Recapping, there are 4 stages, two of them working in parallel (routing and allocation stages), for a total of three stages. To further reduce the pipeline stages, the crossbar and link traversal stage is not buffered, reducing the stages at two and, de facto, merging the last stage to the first one.
First stage
There will be five different port - cardinal directions plus local port -, each one with IV different queues, where IV is the number of virtual channels presented.
There are two queues: one to house flits (FQ) and another to house only head flits (HQ). The queue lengths are equals to contemplate the worst case - packets with only one flit. Every time a valid flit enters in this unit, the HQ enqueues its only if the flit type is `head' or `head-tail'. The FQ has the task of housing all the flits, while the HQ has to "register" all the entrance packets. To assert the dequeue signal for HQ, either allocator grant assertion and the output of a tail flit have to happen, so the number of elements in the HQ determines the number of packet entered in this virtual channel.
header_fifo ( .enqueue_en ( wr_en_in & flit_in.vc_id == i & ( flit_in.flit_type == HEADER | flit_in.flit_type == HT ) ), .value_i ( flit_in.next_hop_port ), .dequeue_en ( ( ip_flit_in_mux[i].flit_type == TAIL | ip_flit_in_mux[i].flit_type == HT ) & sa_grant[i] ), ... ...
This organization works only if a condition is respected: the flits of each packets are stored consecutively and ordered in the FQ. To obtain this condition, a deterministic routing has to be used and all the network interfaces have to send all the flits of a packet without interleaving with other packet flits.
Second stage
The second stage has got two units working in parallel: the look-ahead routing unit and allocator unit. This two units are linked throughout a intermediate logic. The allocator unit has to accord a grant for each port. This signal is feedback either to first stage and to a second-stage multiplexer as selector signal. This mux receives as input all the virtual channel output for that port, electing as output only one flit - based on the selection signal. This output flit goes in the look-ahead routing to calculate the next-hop port destination.
Allocation
The allocation unit grants a flit to go toward a specific port of a specific virtual channel, handling the contention of virtual channels and crossbar ports. Each single allocator is a two-stage input-first separable allocator that permits a reduced number of component respect to other allocator.
The overall unit receives as many allocation request as the ports are. Each request asks to obtain a destination port grant for each of its own virtual channel - the total number of request lines is P x V x P. The allocation outputs are two for each port: (1) the winner destination port that will go into the crossbar selection; (2) the winner virtual channel that is feedback to move the proper flit at the crossbar input.
The allocation unit has to respect the following rules:
- the packets can move only in their respective virtual channel;
- a virtual channel can request only one port per time;
- the physical link can be interleaved by flits belonging to different flows;
- when a packet acquires a virtual channel on an output port, no other packets on different input ports can acquire that virtual channel on that output port.
Allocatore core
The virtual channel and switch allocation is logically the same for both, so it is encased in a unit called allocator core. It is simply a parametrizable number of parallel arbiters in which the input and output are properly scrambled and the output are or-ed to obtain a port-granularity grant.
The difference between other stages is that each arbiter is a round-robin arbiter with a grant-hold circuit. This permits to obtain an uninterrupted use of the obtained resource, especially requested to respect one of the rule in the VC allocation.
rr_arbiter u_rr_arbiter ( . request ( request ), . update_lru ('{ default : '0}) , . grant_oh ( grant_arb ), .* ); assign grant_oh = anyhold ? hold : grant_arb ; assign hold = last & hold_in ; assign anyhold = | hold ; always_ff @( posedge clk , posedge reset ) last <= grant_oh ;
Virtual channel allocation
The first step for the virtual channel allocator is removed because the hypothesis is that only one port per time can be requested for each virtual channel. Under this condition, a first-stage arbitration is useless, so only the second stage is implemented troughout the allocatore_core instantiation.
The use of grant-hold arbiters in the second stage avoids that a packet loses its grant when other requests arrive after this grant. The on-off input signal is properly used to avoid that a flit is send to a full virtual channel in the next node.
Switch allocation
The switch allocator receives as input the output signals from VC allocation and all the port requests. For each port, there is a signal assertion for each winning virtual channel. These winners now compete for a switch allocation. Two arbiter stage are necessary. The first stage arbiter has as many round-robin arbiter as the input port are. Each round-robin arbiter chooses one VC per port and uses this result to select the request port associated at this winning VC. The winning request port goes at the input of second stage arbiter as well as the winning requests for the other ports. The second stage arbiter is an instantiation of the allocator core and chooses what input port can access to the physical links. This signal is important for two reasons: (1) it is moved toward the round-robin unit previously and-ed with the winning VC for each port; (2) it is registered, and-ed with the winning destination port, and used as selection port for the crossbar (for each port).
Flit handler
A mux uses the granted_vc signal to grant one of the input flit to the output register. This flit then will goes to the input crossbar port.
always_comb begin flit_in_granted_mod[i] = flit_in_granted[i]; if ( flit_in_granted[i].flit_type == HEADER || flit_in_granted[i].flit_type == HT ) flit_in_granted_mod[i].next_hop_port = port_t'( lk_next_port[i] ); end
Next hop routing calculation
The look-ahead routing calculates the destination port of the next node instead of the actual one because the actual destination port is yet ready in the header flit. The algorithm is a version of the X-Y deterministic routing. It is deadlock-free because it removes four on eight possible turns: when a packet turns towards Y directions, it cannot turn more.