Difference between revisions of "Network interface"

From NaplesPU Documentation
Jump to: navigation, search
(Created page with "The network interface implementation is discussed on this page. == Network Interface == The Network Interface is the "glue" that merge all the component inside a tile that w...")
 
(Rebuilt packets queue)
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The network interface implementation is discussed on this page.
+
The network interface implementation is discussed in this page.
  
== Network Interface ==
+
The Network Interface (NI) has the role of abstracting the network communication details, providing a high-level interface to all the components in the tile which require communication. The Router Interface splits application messages (such as coherence related messages) into multiple flits to be injected in the local router.
  
The Network Interface is the "glue" that merge all the component inside a tile that want to communicate with other tile in the NoC. It has several interface with the element inside the tile and an interface with the router.
+
The virtual channels used by tile components are reported below.
Basically, it has to convert a packet from the tile into flit injected in to the network and viceversa. In order to avoid deadlock, four different virtual network are used: request, forwaded request, response and service network.
 
  
The interface to the tile communicate with directory controller, cache controller and service units (boot manager, barrier core unit, synchronization manager).
+
[[File:NI_VN.jpg|600px|Virtual channel usage]]
The units use the VN in this way:
 
  
[[File:NI_VN.jpg|400px|Ni virual network]]
+
Due to the coherence protocol implemented, both the directory controller and the cache controller might require access to the Response virtual channel. This is carefully handled by the network interface, which stores incoming requests in FIFOs and arbitrating concurrent requests.
 +
 
 +
Another main feature supported is multicast addressing. The directory controller, in some cases, might send the same packet to multiple recipients (e.g. ACKs). In the current implementation, it is handled by the Network Interface as a sequence of unicast messages.
 +
 
 +
== General architecture ==
 +
 
 +
The network interface has a regular internal structure. This is because ejection and injection services are implemented as two distinct functionalities, allowing a modular design. Moreover, the ejection and injection logics are almost the same for every virtual channel.
 +
 
 +
For this reason, two separated modules are provided, that together handle the whole network communication ''for a single virtual channel'':
 +
* <code>virtual_network_core_to_net</code>, which handles flit injection;
 +
* <code>virtual_network_net_to_core</code>, which handles ejection.
 +
 
 +
Both of them are parameterized and easily adapt to different virtual channel needs. In particular, the virtual channels number and the packet length must be specified.
 +
 
 +
As the router ejects flits toward the tile, only the network interface assigned to that specific virtual channel reassembles them back and buffers the reconstructed packet until the corresponding component is ready to work. The virtual channel 0 instantiation is reported below.
 +
 
 +
// --- Request Virtual Network VC0 --- //
 +
virtual_network_net_to_core # (
 +
.VCID            ( VC0                                  ),
 +
.PACKET_BODY_SIZE ( $bits ( coherence_request_message_t ) ),
 +
.FLIT_NUMB        ( `RESP_FLIT_NUMB                      ),
 +
...
 +
...
 +
)
 +
request_virtual_network_net_to_core (
 +
...
 +
//Cache Controller interface
 +
.vn_ntc_packet_out    ( ni_request          ),
 +
.vn_ntc_packet_valid  ( ni_request_valid    ),
 +
.core_packet_consumed ( dc_request_consumed  ),
 +
//Router interface
 +
.vn_ntc_credit        ( ni_credit[VC0]      ),
 +
.router_flit_valid    ( router_flit_in_valid ),
 +
.router_flit_in      ( router_flit_in      )
 +
);
 +
 
 +
Note that the packet body size parameter is linked with the flit number parameter, but the module handles them separately.
 +
 
 +
On the other side, the injection logic buffers outgoing packets, splitting them into flits and competes with the others VCs to obtain access to the unique router local port. To grant access to the local injection port, a round-robin arbiter with a grant-and-hold circuitry has been developed. The granted virtual channel index is used as a selector in a multiplexer which sends the right flit to the router.
 +
 
 +
assign vno_requests =
 +
{vn_packet_pending[ VC3 ] & ~router_credit[VC3],
 +
vn_packet_pending[ VC2 ] & ~router_credit[VC2],
 +
vn_packet_pending[ VC1 ] & ~router_credit[VC1],
 +
vn_packet_pending[ VC0 ] & ~router_credit[VC0]};
 +
 +
rr_arbiter # (
 +
.NUM_REQUESTERS ( `VC_PER_PORT )
 +
)
 +
ni_request_rr_arbiter (
 +
.clk        ( clk          ),
 +
.reset      ( reset        ),
 +
.request    ( vno_requests ),
 +
.update_lru ( 1'b1        ),
 +
.grant_oh  ( vno_granted  )
 +
) ;
 +
 +
oh_to_idx # (
 +
.NUM_SIGNALS ( `VC_PER_PORT ),
 +
.DIRECTION  ( "LSB0"      )
 +
)
 +
ni_request_grant_oh_to_idx (
 +
.one_hot ( vno_granted    ),
 +
.index  ( vco_granted_id )
 +
);
 +
 +
assign ni_flit_out    = vn_flit_out[vco_granted_id],
 +
ni_flit_out_valid = vn_flit_valid[vco_granted_id];
 +
 
 +
Special care must be given to the response virtual channel. Two injection and two ejection modules are instanced for this virtual channel, each interfacing respectively with the cache controller and the directory controller.
 +
 
 +
The ejection module supports a specific parameter to let it know if it is interfacing the directory or the cache controller. When this parameter is set, it will check the core_destination field in the flit (see [[Network#Data structures|flit structure]]) to know if it should ignore it.
 +
 
 +
The injection modules are not aware of who is using them. For this reason, both will compete to be granted access to the virtual channel (and the winner will compete with the others to access router input port). A round-robin arbiter with a grant-and-hold circuitry is used. This will ensure that once one of the controllers gains access to the virtual channel, it will retain it until the full request has been sent.
 +
 
 +
// each bit is high respectively if the Cache Controller or the Directory Controller wants to inject a packet
 +
assign pending_tmp = {response_in[CC_ID].vn_packet_pending, response_in[DC_ID].vn_packet_pending};
 +
 +
// the arbiter chooses among the two of them
 +
grant_hold_rr_arbiter #(
 +
.NUM_REQUESTERS( 2 )
 +
)
 +
response_vn_rr_arbiter (
 +
.clk      ( clk                  ),
 +
.reset    ( reset                ),
 +
.request  ( pending_tmp          ),
 +
.hold_in  ( pending_tmp          ),
 +
.grant_oh ( response_vn_grant_oh )
 +
);
 +
 +
// the arbitration result is used to select the winning flit, and the signals are updated accordingly
 +
assign vn_packet_pending[ VC1 ] = |response_vn_grant_oh ;
 +
assign vn_flit_out[VC1]        = response_vn_grant_oh[0]? response_in[DC_ID].vn_flit_out  : response_in[CC_ID].vn_flit_out;
 +
assign vn_flit_valid[VC1]      = response_vn_grant_oh[0]? response_in[DC_ID].vn_flit_valid : response_in[CC_ID].vn_flit_valid;
 +
 
 +
[[File:NI.png|900px|Network Interface]]
 +
 
 +
== Network to core module ==
 +
 
 +
This module is composed of two parts: a control unit which handles the incoming flits, rebuilding them as a packet; and a queue of rebuilt packets.
 +
 
 +
=== Control unit ===
 +
 
 +
The control unit has a logic based on registers used to store temporary results of the rebuilding process.
 +
 
 +
Each register tracks the count of flits already received, and uses this count to merge them into a packet.
 +
 
 +
logic      [$clog2( FLIT_NUMB ) - 1 : 0] count;
 +
 +
flit_body_t [FLIT_NUMB - 1 : 0]           rebuilt_packet;
 +
 +
...
 
   
 
   
The unit is divided in two parts:  
+
if (router_flit_valid) begin
* TO router, in which the vn_core2net units buffer and convert the packet in flit;
+
rebuilt_packet[count] <= router_flit_in.payload;
* FROM router, in which the vn_net2core units buffer and convert the flit in packet.
+
 +
if (router_flit_in.flit_type == TAIL || router_flit_in.flit_type == HT) begin
 +
count <= '{default: '0};
 +
cu_packet_rebuilt_compl <= 1'b1;
 +
end else
 +
count <= count + 1;
 +
...
  
These two units support the multicast, sending k times a packet in unicast as many as the destinations are.
+
Please note that the control logic also detects if the flits are for the cache controller or to the directory controller, that are connected on the same response virtual channel.
 +
 +
if (router_flit_in.flit_type == HEADER || router_flit_in.flit_type == HT) begin
 +
cu_is_for_cc <= router_flit_in.core_destination == TO_CC;
 +
cu_is_for_dc <= router_flit_in.core_destination == TO_DC;
 +
end
  
The vn_net2core units should be four as well as vn_core2net units, but the response network is linked with the DC and CC at the same time.
+
=== Rebuilt packets queue ===
So the solution is to add another vn_net2core and vn_core2net unit with the same output of the other one. If the output of the NI contains two different output port - so an output arbiter is useless, the two vn_core2net response units, firstly, has to compete among them and, secondly, among all the VN.
 
  
[[File:NI.png|800px|Network Interface]]
+
Rebuilt packets are stored in a FIFO, so we can enqueue multiple requests. When the receiver component is ready to handle the request, it will assert the core_packet_consumed signal, freeing one buffer slot.
  
Note that packet_body_size is linked with the flit_numb, but we prefer to calculate them separately. (FILT_NUM = ceil(PACKET_BODY/FLIT_PAYLOAD) )
+
The back-pressure signals will be raised when there are two free buffer slots. This accounts for the worst case, when there is a sequence of 1-flit packets incoming, some of which are yet in the pipe stages and should not be lost. The pipe stages in between are two: the router crossbar and the control unit of this module.
  
=== vn_net2core ===
+
sync_fifo #(
 +
.WIDTH                ( PACKET_BODY_SIZE    ),
 +
.SIZE                  ( PACKET_FIFO_SIZE    ),
 +
.ALMOST_FULL_THRESHOLD ( PACKET_FIFO_SIZE - 2 )
 +
)
 +
rebuilt_packet_fifo (
 +
...
 +
.almost_full ( packet_alm_fifo_full      ),
 +
.enqueue_en  ( enqueue_en                ),
 +
.value_i    ( cu_rebuilt_packet        ),
 +
.empty      ( rebuilt_packet_fifo_empty ),
 +
.almost_empty(                          ),
 +
.dequeue_en  ( core_packet_consumed      ),
 +
.value_o    ( vn_ntc_packet_out        )
 +
);
 +
 +
assign vn_ntc_credit      = packet_alm_fifo_full;
  
This module stores incoming flit from the network and rebuilt the original packet. Also, it handles back-pressure informations (credit on/off).
+
The enqueue signal is generated from the incoming flit virtual channel ID. For the reason explained above, the only exception are that of cache and directory controllers. In that case, the module will also check that the flit is for the cache/directory controller, and it will enqueue it based on the TYPE parameter.
A flit is formed by an header and a body, the header has two fields: |TYPE|VCID|. VCID is fixed by the virtual channel ID where the flit is sent. The virtual channel depends on the type of message. The filed TYPE can be: HEAD, BODY, TAIL or HT. It is used by the control units to handles different flits.
 
  
When the control unit checks the TAIL or HT header, the packet is complete and stored in packed FIFO output directly connected to the Cache Controller.
+
generate
 +
if ( TYPE == "CC" )
 +
assign
 +
enqueue_en    = cu_packet_rebuilt_compl & cu_is_for_cc;
 +
else if ( TYPE == "DC" )
 +
assign enqueue_en = cu_packet_rebuilt_compl & cu_is_for_dc;
 +
else
 +
assign enqueue_en = cu_packet_rebuilt_compl;
 +
endgenerate
 +
 
 +
=== Example ===
 +
 
 +
If the incoming packet arrives as this sequence of flits:
  
E.g. : If those flit sequence occurs:
+
1st Flit = {FLIT_TYPE_HEAD, FLIT_BODY_SIZE'h20}
          1st Flit in => {FLIT_TYPE_HEAD, FLIT_BODY_SIZE'h20}
+
2nd Flit = {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h40}
          2nd Flit in => {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h40}
+
3rd Flit = {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h60}
          3rd Flit in => {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h60}
+
4th Flit = {FLIT_TYPE_TAIL, FLIT_BODY_SIZE'h10};
          4th Flit in => {FLIT_TYPE_TAIL, FLIT_BODY_SIZE'h10};
 
  
 
The rebuilt packet passed to the Cache Controller is:
 
The rebuilt packet passed to the Cache Controller is:
          Packet out => {FLIT_BODY_SIZE'h10, FLIT_BODY_SIZE'h60, FLIT_BODY_SIZE'h40, FLIT_BODY_SIZE'h20}
 
  
A FIFO stores the reconstructed packet. When the CC can read, it asserts packet_consumed bit.
+
Packet = {FLIT_BODY_SIZE'h10, FLIT_BODY_SIZE'h60, FLIT_BODY_SIZE'h40, FLIT_BODY_SIZE'h20}
 
The FIFO threshold is reduced of 2 due to controller: if a sequence of consecutive 1-flit packet arrives, the on-off backpressure almost_full signal will raise up the clock edge after the threshold crossing as usual, so it is important to reduce of 2 the threshold to avoid packet lost. If the packet arriving near the threshold are bigger than 1 flit, the enqueue will be stopped with 1 free buffer space.
 
  
==== Control unit ====
+
== Core to network module ==
Flits from the network are not stored in any FIFOs. The router_valid signal is directly connected to the rebuilt packet control unit.
 
In Control Unit all incoming flit are mounted in a packet. It checks the Flit header, if it is a TAIL or a HT type, the control unit stores the composed packet in the output FIFO to the Cache Controller.
 
  
[[File:N2C_CU.png|800px|N2C_CU]]
+
This module splits a packet into flits and sends them to the router local port. It also supports multicasting, implemented as multiple unicast messages. It is composed of two parts: a packet queue, and a control unit which handles the outgoing flits.
  
=== vn_core2net ===
+
=== Request queue ===
  
This module stores the original packet and converts in flit for the network. The conversion in flit starts fetching the packet from an internal queue.
+
Incoming requests are enqueued in a FIFO as the control unit handles them. It also provides stop signals for the requesting device, in case the FIFO gets full.
When the requestor has to send a packet, it asserts packed_valid bit, directly connected to the FIFO enqueue_en port. Those informations are used by the Control Unit to translate packet in FLITs for each destination.
 
  
==== Control unit ====
+
The enqueued structure is composed of a packet body along with all the recipients of the message.
The Control Unit strips the packet from the Cache Controller into N flits for the next router. It checks the packet_has_data field, if a packet does not contain data, the CU generates just a flit (HT type), otherwise it generates N flits. It supports multicasting through multiple unicast messages.
 
  
A priority encoder selects from a mask which destination has to be served. All the information of the header flit are straightway filled, but the flit type.
+
typedef struct packed {
 +
logic [PACKET_BODY_SIZE - 1 : 0] packet_body;
 +
logic packet_has_data;
 +
tile_address_t [DEST_NUMB - 1 : 0 ] packet_destinations;
 +
logic [DEST_NUMB - 1 : 0 ] packet_destinations_valid;
 +
} packet_information_t;
  
  assign packet_dest_pending                = packet_destinations_valid & ~dest_served;
+
A request will be enqueued when the requester asserts the <code>packet_valid</code> signal, and the head of the FIFO will be dequeued when the control unit notifies completion.
 +
 
 +
sync_fifo # (
 +
.WIDTH                ( $bits ( packet_information_t ) ),
 +
.SIZE                  ( PACKET_FIFO_SIZE              ),
 +
.ALMOST_FULL_THRESHOLD ( PACKET_ALMOST_FULL_THRESHOLD  )
 +
)
 +
packet_in_fifo (
 +
...
 +
.almost_full  ( vn_packet_fifo_full    ),
 +
.enqueue_en  ( packet_valid          ),
 +
.value_i      ( packet_information_in  ),
 +
.empty        ( packet_fifo_empty      ),
 +
.almost_empty (                        ),
 +
.dequeue_en  ( cu_packet_dequeue      ),
 +
.value_o      ( packet_information_out )
 +
) ;
 +
 +
Request signals are generated, which allow this module to compete for router port access.
 +
 
 +
  assign packet_pending                              = ~packet_fifo_empty;
 +
 
 +
=== Control unit ===
 +
 
 +
Control unit's responsibilities are:
 +
* split a packet into flits;
 +
* determine the type (head, body, tail, head-tail) of each flit;
 +
* calculate a pre-routing of the outgoing flits, as routers implement routing look-ahead (see [[Network]]);
 +
* properly handle multicast messages, if required;
 +
* account for router back-pressure signals;
 +
 
 +
control_unit_packet_to_flit # (
 +
parameter DEST_OH          = "TRUE",
 +
...
 +
parameter PACKET_BODY_SIZE = 256,
 +
parameter DEST_NUMB        = 4 )
 +
(
 +
...
 +
input  logic                                                packet_valid,
 +
input  logic                                                packet_has_data,
 +
input  tile_address_t [DEST_NUMB - 1 : 0]                  packet_destinations,
 +
input  logic          [DEST_NUMB - 1 : 0]                  packet_destinations_valid,
 +
 +
input  logic                                                flit_credit,
 +
 +
output logic          [$clog2 ( PACKET_BODY_SIZE ) - 1 : 0] cu_packet_chunck_sel,
 +
output logic                                                cu_flit_valid,
 +
output flit_header_t                                        cu_flit_out_header,
 +
output logic                                                cu_packet_dequeue
 +
 +
);
 +
 
 +
A request is considered fulfilled when a unicast message has been sent to all its recipients. Recipients are served in a round-robin fashion.
  
 
  rr_arbiter # (
 
  rr_arbiter # (
    .NUM_REQUESTERS ( DEST_NUMB )
+
.NUM_REQUESTERS ( DEST_NUMB )
 
  )
 
  )
 
  rr_arbiter (
 
  rr_arbiter (
    .clk        ( clk                  ) ,
+
.clk        ( clk                  ) ,
    .reset      ( reset                ) ,
+
.reset      ( reset                ) ,
    .request    ( packet_dest_pending  ) ,
+
.request    ( packet_dest_pending  ) ,
    .update_lru ( 1'b0                ) ,
+
.update_lru ( 1'b0                ) ,
    .grant_oh  ( destination_grant_oh )
+
.grant_oh  ( destination_grant_oh )
  ) ;
+
  );
  
The units performs the multicast throughout k unicast: when a destination is served (a packet is completed), the corresponding bit in the destination mask is deasserted.
+
Already served recipients are stored in a bit mask, which gets updated with the grant signal after each sending.
  
 
  dest_served <= dest_served | destination_grant_oh;
 
  dest_served <= dest_served | destination_grant_oh;
  
[[File:C2N_CU.png|800px|C2N_CU]]
+
This bit mask tracks remaining recipients.
  
The units has to know if the multicast is on. In this case, the signal packet_destinations_valid is a bitmap of destination to reach and the real_dest has the TILE_COUNT width; else the signal real_dest contains the (x,y) coordinates of the destination
+
assign packet_dest_pending                = packet_destinations_valid & ~dest_served;
 +
assign packet_has_dest_pending            = |packet_dest_pending;
 +
 
 +
Routing is done for each recipient, based on the destination address.
 +
 
 +
The parameter DEST_OH determines how multicast addresses are generated. If DEST_OH is true, the module will consider the input signal packet_destination as a one-hot encoded bit mask, where every tile has a corresponding position in this mask. Otherwise, if DEST_OH is false, multicast addresses are passed to the module into the packet_destinations input signal, and packet_destinations_valid is used to know which position into the array contains a valid address.
  
 
  generate
 
  generate
    if ( DEST_OH == "TRUE" ) begin
+
if ( DEST_OH == "TRUE" ) begin
      assign
+
assign
          real_dest.x  = destination_grant_id[`TOT_X_NODE_W - 1 : 0 ],
+
real_dest.x  = destination_grant_id[`TOT_X_NODE_W - 1 : 0 ],
          real_dest.y  = destination_grant_id[`TOT_Y_NODE_W + `TOT_X_NODE_W - 1 -: `TOT_X_NODE_W];
+
real_dest.y  = destination_grant_id[`TOT_Y_NODE_W + `TOT_X_NODE_W - 1 : `TOT_X_NODE_W];
    end else  
+
end else begin
      assign real_dest = packet_destinations[destination_grant_id];
+
assign real_dest = packet_destinations[destination_grant_id];
 +
end
 
  endgenerate
 
  endgenerate
  
Note: if DEST_OH is false, the core_destination signal contains the component ID inside the tile that will receive the packet, else it has no sense.
+
A finite state machine handles the remaining required actions, keeping count of how many flits have been sent up until now, and choosing the flit payload among the current packet chunk to be sent.
 
 
assign cu_flit_out_header.core_destination = tile_destination_t'( destination_grant_oh[`DEST_TILE_W -1 : 0] );
 

Latest revision as of 11:05, 2 July 2019

The network interface implementation is discussed in this page.

The Network Interface (NI) has the role of abstracting the network communication details, providing a high-level interface to all the components in the tile which require communication. The Router Interface splits application messages (such as coherence related messages) into multiple flits to be injected in the local router.

The virtual channels used by tile components are reported below.

Virtual channel usage

Due to the coherence protocol implemented, both the directory controller and the cache controller might require access to the Response virtual channel. This is carefully handled by the network interface, which stores incoming requests in FIFOs and arbitrating concurrent requests.

Another main feature supported is multicast addressing. The directory controller, in some cases, might send the same packet to multiple recipients (e.g. ACKs). In the current implementation, it is handled by the Network Interface as a sequence of unicast messages.

General architecture

The network interface has a regular internal structure. This is because ejection and injection services are implemented as two distinct functionalities, allowing a modular design. Moreover, the ejection and injection logics are almost the same for every virtual channel.

For this reason, two separated modules are provided, that together handle the whole network communication for a single virtual channel:

  • virtual_network_core_to_net, which handles flit injection;
  • virtual_network_net_to_core, which handles ejection.

Both of them are parameterized and easily adapt to different virtual channel needs. In particular, the virtual channels number and the packet length must be specified.

As the router ejects flits toward the tile, only the network interface assigned to that specific virtual channel reassembles them back and buffers the reconstructed packet until the corresponding component is ready to work. The virtual channel 0 instantiation is reported below.

// --- Request Virtual Network VC0 --- //
virtual_network_net_to_core # (
	.VCID             ( VC0                                   ),
	.PACKET_BODY_SIZE ( $bits ( coherence_request_message_t ) ),
	.FLIT_NUMB        ( `RESP_FLIT_NUMB                       ),
	...
	...
)
request_virtual_network_net_to_core (
	...
	//Cache Controller interface
	.vn_ntc_packet_out    ( ni_request           ),
	.vn_ntc_packet_valid  ( ni_request_valid     ),
	.core_packet_consumed ( dc_request_consumed  ),
	//Router interface
	.vn_ntc_credit        ( ni_credit[VC0]       ),
	.router_flit_valid    ( router_flit_in_valid ),
	.router_flit_in       ( router_flit_in       )
);

Note that the packet body size parameter is linked with the flit number parameter, but the module handles them separately.

On the other side, the injection logic buffers outgoing packets, splitting them into flits and competes with the others VCs to obtain access to the unique router local port. To grant access to the local injection port, a round-robin arbiter with a grant-and-hold circuitry has been developed. The granted virtual channel index is used as a selector in a multiplexer which sends the right flit to the router.

assign vno_requests =
	{vn_packet_pending[ VC3 ] & ~router_credit[VC3],
		vn_packet_pending[ VC2 ] & ~router_credit[VC2],
		vn_packet_pending[ VC1 ] & ~router_credit[VC1],
		vn_packet_pending[ VC0 ] & ~router_credit[VC0]};

rr_arbiter # (
	.NUM_REQUESTERS ( `VC_PER_PORT )
)
ni_request_rr_arbiter (
	.clk        ( clk          ),
	.reset      ( reset        ),
	.request    ( vno_requests ),
	.update_lru ( 1'b1         ),
	.grant_oh   ( vno_granted  )
) ;

oh_to_idx # (
	.NUM_SIGNALS ( `VC_PER_PORT ),
	.DIRECTION   ( "LSB0"       )
)
ni_request_grant_oh_to_idx (
	.one_hot ( vno_granted    ),
	.index   ( vco_granted_id )
);

assign ni_flit_out    = vn_flit_out[vco_granted_id],
	ni_flit_out_valid = vn_flit_valid[vco_granted_id];

Special care must be given to the response virtual channel. Two injection and two ejection modules are instanced for this virtual channel, each interfacing respectively with the cache controller and the directory controller.

The ejection module supports a specific parameter to let it know if it is interfacing the directory or the cache controller. When this parameter is set, it will check the core_destination field in the flit (see flit structure) to know if it should ignore it.

The injection modules are not aware of who is using them. For this reason, both will compete to be granted access to the virtual channel (and the winner will compete with the others to access router input port). A round-robin arbiter with a grant-and-hold circuitry is used. This will ensure that once one of the controllers gains access to the virtual channel, it will retain it until the full request has been sent.

// each bit is high respectively if the Cache Controller or the Directory Controller wants to inject a packet
assign pending_tmp = {response_in[CC_ID].vn_packet_pending, response_in[DC_ID].vn_packet_pending};

// the arbiter chooses among the two of them
grant_hold_rr_arbiter #(
	.NUM_REQUESTERS( 2 )
)
response_vn_rr_arbiter (
	.clk      ( clk                  ),
	.reset    ( reset                ),
	.request  ( pending_tmp          ),
	.hold_in  ( pending_tmp          ),
	.grant_oh ( response_vn_grant_oh )
);

// the arbitration result is used to select the winning flit, and the signals are updated accordingly
assign vn_packet_pending[ VC1 ] = |response_vn_grant_oh ;
assign vn_flit_out[VC1]         = response_vn_grant_oh[0]? response_in[DC_ID].vn_flit_out   : response_in[CC_ID].vn_flit_out;
assign vn_flit_valid[VC1]       = response_vn_grant_oh[0]? response_in[DC_ID].vn_flit_valid : response_in[CC_ID].vn_flit_valid;

Network Interface

Network to core module

This module is composed of two parts: a control unit which handles the incoming flits, rebuilding them as a packet; and a queue of rebuilt packets.

Control unit

The control unit has a logic based on registers used to store temporary results of the rebuilding process.

Each register tracks the count of flits already received, and uses this count to merge them into a packet.

logic       [$clog2( FLIT_NUMB ) - 1 : 0] count;

flit_body_t [FLIT_NUMB - 1 : 0]           rebuilt_packet;

...

if (router_flit_valid) begin
	rebuilt_packet[count] <= router_flit_in.payload;
	
	if (router_flit_in.flit_type == TAIL || router_flit_in.flit_type == HT) begin
		count <= '{default: '0};
		cu_packet_rebuilt_compl <= 1'b1;
	end else
		count <= count + 1;
...

Please note that the control logic also detects if the flits are for the cache controller or to the directory controller, that are connected on the same response virtual channel.

	if (router_flit_in.flit_type == HEADER || router_flit_in.flit_type == HT) begin
		cu_is_for_cc <= router_flit_in.core_destination == TO_CC;
		cu_is_for_dc <= router_flit_in.core_destination == TO_DC;
	end 

Rebuilt packets queue

Rebuilt packets are stored in a FIFO, so we can enqueue multiple requests. When the receiver component is ready to handle the request, it will assert the core_packet_consumed signal, freeing one buffer slot.

The back-pressure signals will be raised when there are two free buffer slots. This accounts for the worst case, when there is a sequence of 1-flit packets incoming, some of which are yet in the pipe stages and should not be lost. The pipe stages in between are two: the router crossbar and the control unit of this module.

sync_fifo #(
	.WIDTH                 ( PACKET_BODY_SIZE     ),
	.SIZE                  ( PACKET_FIFO_SIZE     ),
	.ALMOST_FULL_THRESHOLD ( PACKET_FIFO_SIZE - 2 ) 
)
rebuilt_packet_fifo (
	...
	.almost_full ( packet_alm_fifo_full      ),
	.enqueue_en  ( enqueue_en                ),
	.value_i     ( cu_rebuilt_packet         ),
	.empty       ( rebuilt_packet_fifo_empty ),
	.almost_empty(                           ),
	.dequeue_en  ( core_packet_consumed      ),
	.value_o     ( vn_ntc_packet_out         )
);

assign vn_ntc_credit       = packet_alm_fifo_full;

The enqueue signal is generated from the incoming flit virtual channel ID. For the reason explained above, the only exception are that of cache and directory controllers. In that case, the module will also check that the flit is for the cache/directory controller, and it will enqueue it based on the TYPE parameter.

generate
	if ( TYPE == "CC" )
		assign
			enqueue_en    = cu_packet_rebuilt_compl & cu_is_for_cc;
	else if ( TYPE == "DC" )
		assign enqueue_en = cu_packet_rebuilt_compl & cu_is_for_dc;
	else
		assign enqueue_en = cu_packet_rebuilt_compl;
endgenerate

Example

If the incoming packet arrives as this sequence of flits:

1st Flit = {FLIT_TYPE_HEAD, FLIT_BODY_SIZE'h20}
2nd Flit = {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h40}
3rd Flit = {FLIT_TYPE_BODY, FLIT_BODY_SIZE'h60}
4th Flit = {FLIT_TYPE_TAIL, FLIT_BODY_SIZE'h10};

The rebuilt packet passed to the Cache Controller is:

Packet = {FLIT_BODY_SIZE'h10, FLIT_BODY_SIZE'h60, FLIT_BODY_SIZE'h40, FLIT_BODY_SIZE'h20}

Core to network module

This module splits a packet into flits and sends them to the router local port. It also supports multicasting, implemented as multiple unicast messages. It is composed of two parts: a packet queue, and a control unit which handles the outgoing flits.

Request queue

Incoming requests are enqueued in a FIFO as the control unit handles them. It also provides stop signals for the requesting device, in case the FIFO gets full.

The enqueued structure is composed of a packet body along with all the recipients of the message.

typedef struct packed {
	logic [PACKET_BODY_SIZE - 1 : 0] packet_body;
	logic packet_has_data;
	tile_address_t [DEST_NUMB - 1 : 0 ] packet_destinations;
	logic [DEST_NUMB - 1 : 0 ] packet_destinations_valid;
} packet_information_t;

A request will be enqueued when the requester asserts the packet_valid signal, and the head of the FIFO will be dequeued when the control unit notifies completion.

sync_fifo # (
	.WIDTH                 ( $bits ( packet_information_t ) ),
	.SIZE                  ( PACKET_FIFO_SIZE               ),
	.ALMOST_FULL_THRESHOLD ( PACKET_ALMOST_FULL_THRESHOLD   )
)
packet_in_fifo (
	...
	.almost_full  ( vn_packet_fifo_full    ),
	.enqueue_en   ( packet_valid           ),
	.value_i      ( packet_information_in  ),
	.empty        ( packet_fifo_empty      ),
	.almost_empty (                        ),
	.dequeue_en   ( cu_packet_dequeue      ),
	.value_o      ( packet_information_out )
) ;

Request signals are generated, which allow this module to compete for router port access.

assign packet_pending                               = ~packet_fifo_empty;

Control unit

Control unit's responsibilities are:

  • split a packet into flits;
  • determine the type (head, body, tail, head-tail) of each flit;
  • calculate a pre-routing of the outgoing flits, as routers implement routing look-ahead (see Network);
  • properly handle multicast messages, if required;
  • account for router back-pressure signals;
control_unit_packet_to_flit # (
	parameter DEST_OH          = "TRUE",
	...
	parameter PACKET_BODY_SIZE = 256,
	parameter DEST_NUMB        = 4 )
(
	...
	input  logic                                                packet_valid,
	input  logic                                                packet_has_data,
	input  tile_address_t [DEST_NUMB - 1 : 0]                   packet_destinations,
	input  logic          [DEST_NUMB - 1 : 0]                   packet_destinations_valid,

	input  logic                                                flit_credit,

	output logic          [$clog2 ( PACKET_BODY_SIZE ) - 1 : 0] cu_packet_chunck_sel,
	output logic                                                cu_flit_valid,
	output flit_header_t                                        cu_flit_out_header,
	output logic                                                cu_packet_dequeue

);

A request is considered fulfilled when a unicast message has been sent to all its recipients. Recipients are served in a round-robin fashion.

rr_arbiter # (
	.NUM_REQUESTERS ( DEST_NUMB )
)
rr_arbiter (
	.clk        ( clk                  ) ,
	.reset      ( reset                ) ,
	.request    ( packet_dest_pending  ) ,
	.update_lru ( 1'b0                 ) ,
	.grant_oh   ( destination_grant_oh )
);

Already served recipients are stored in a bit mask, which gets updated with the grant signal after each sending.

dest_served <= dest_served | destination_grant_oh;

This bit mask tracks remaining recipients.

assign packet_dest_pending                 = packet_destinations_valid & ~dest_served;
assign packet_has_dest_pending             = |packet_dest_pending;

Routing is done for each recipient, based on the destination address.

The parameter DEST_OH determines how multicast addresses are generated. If DEST_OH is true, the module will consider the input signal packet_destination as a one-hot encoded bit mask, where every tile has a corresponding position in this mask. Otherwise, if DEST_OH is false, multicast addresses are passed to the module into the packet_destinations input signal, and packet_destinations_valid is used to know which position into the array contains a valid address.

generate
	if ( DEST_OH == "TRUE" ) begin
		assign
			real_dest.x  = destination_grant_id[`TOT_X_NODE_W - 1 : 0 ],
			real_dest.y  = destination_grant_id[`TOT_Y_NODE_W + `TOT_X_NODE_W - 1 : `TOT_X_NODE_W];
	end else begin
		assign real_dest = packet_destinations[destination_grant_id];
	end
endgenerate

A finite state machine handles the remaining required actions, keeping count of how many flits have been sent up until now, and choosing the flit payload among the current packet chunk to be sent.