Difference between revisions of "Heterogeneous Tile"
(→Memory Interface) |
(→Synchronization Interface) |
||
Line 41: | Line 41: | ||
===Synchronization Interface=== | ===Synchronization Interface=== | ||
− | The Synchronization Interface connects the user logic with the synchronization | + | The Synchronization Interface connects the user logic with the synchronization module core-side allocated within the tile (namely the barrier_core unit). |
− | module core-side allocated within the tile (namely the barrier_core unit). | + | Such an interface allows user logic to synchronize on a thread grain. |
− | Such an interface allows user logic to synchronize on a thread grain. The | + | |
− | synchronization mechanism supports inter- and intra- tile barrier | + | The synchronization mechanism supports inter- and intra- tile barrier synchronization. When a thread hits a synchronization point, it issues a request to the distributed synchronization master through the Synchronization Interface. |
− | synchronization. | + | Then, the thread is stalled (up to the user logic) till its release signal is high again. |
− | When a thread hits a synchronization point, it issues a request to the | + | |
− | distributed synchronization master through the Synchronization Interface. | + | A custom core has to implement the following interface if |
− | Then, the thread is stalled (up to the user logic) till its release signal is | ||
− | high again. | ||
== Heterogeneous Dummy provided == | == Heterogeneous Dummy provided == |
Revision as of 17:59, 13 May 2019
The nu+ project provides a heterogeneous tile integrated into the NoC, meant to be extended by the user. Such a tile provides a first example of how to integrate a custom module in the network-on-chip with a dedicated tile.
Memory Interface
The Memory Interface provides a transparent way to interact with the coherence system. The memory interface implements a simple valid/available handshake per thread, a different thread might issue different memory transaction and those are concurrently handled by the coherence system.
When a thread has a memory request, it first checks the availability bit related to its ID, if this is high the thread issues a memory transaction setting the valid bit and loading all the needed information on the Memory Interface.
Supported memory operations are reported below along with their opcodes:
LOAD_8 = 'h0 - 'b000000 LOAD_16 = 'h1 - 'b000001 LOAD_32 = 'h2 - 'b000010 LOAD_V_8 = 'h7 - 'b000111 LOAD_V_16 = 'h8 - 'b001000 LOAD_V_32 = 'h9 - 'b001001 STORE_8 = 'h20 - 'b100000 STORE_16 = 'h21 - 'b100001 STORE_32 = 'h22 - 'b100010 STORE_V_8 = 'h24 - 'b100100 STORE_V_16 = 'h25 - 'b100101 STORE_V_32 = 'h26 - 'b100110
A custom core to be integrate d in the nu+ system ought to implement the following interface in order to communicate with the memory system:
/* Memory Interface */ // To Heterogeneous LSU output logic req_out_valid, // Valid signal for issued memory requests output logic [31 : 0] req_out_id, // ID of the issued request, mainly used for debugging output logic [THREAD_IDX_W - 1 : 0] req_out_thread_id, // Thread ID of issued request. Requests running on different threads are dispatched to the CC conccurrently output logic [7 : 0] req_out_op, // Operation performed output logic [ADDRESS_WIDTH - 1 : 0] req_out_address, // Issued request address output logic [DATA_WIDTH - 1 : 0] req_out_data, // Data output // From Heterogeneous LSU input logic resp_in_valid, // Valid signal for the incoming responses input logic [31 : 0] resp_in_id, // ID of the incoming response, mainly used for debugging input logic [THREAD_IDX_W - 1 : 0] resp_in_thread_id, // Thread ID of the incoming response input logic [7 : 0] resp_in_op, // Operation code input logic [DATA_WIDTH - 1 : 0] resp_in_cache_line, // Incoming data input logic [BYTES_PERLINE - 1 : 0] resp_in_store_mask, // Bitmask of the position of the requesting bytes in the incoming data bus input logic [ADDRESS_WIDTH - 1 : 0] resp_in_address, // Incoming response address
Synchronization Interface
The Synchronization Interface connects the user logic with the synchronization module core-side allocated within the tile (namely the barrier_core unit). Such an interface allows user logic to synchronize on a thread grain.
The synchronization mechanism supports inter- and intra- tile barrier synchronization. When a thread hits a synchronization point, it issues a request to the distributed synchronization master through the Synchronization Interface. Then, the thread is stalled (up to the user logic) till its release signal is high again.
A custom core has to implement the following interface if
Heterogeneous Dummy provided
This FSM first synchronizes with other ht in the NoC. Each dummy core in a ht tile requires a synchronization for LOCAL_BARRIER_NUMB threads (default = 4). The SEND_BARRIER state sends LOCAL_BARRIER_NUMB requests with barrier ID 42 through the Synchronization interface. It sets the total number of threads synchronizing on the barrier ID 42 equal to TOTAL_BARRIER_NUMB (= LOCAL_BARRIER_NUMB x `TILE_HT, number of heterogeneous tile in the system). When the last barrier is issued, SEND_BARRIER jumps to WAIT_SYNCH waiting for the ACK from the synchronization master. At this point all threads in each ht tile are synchronized, and the FSM starts all pending memory transactions. The START_MEM_READ_TRANS performs LOCAL_WRITE_REQS read operations (default = 128), performing a LOAD_8 operation (op code = 0) each time. In the default configuration, 128 LOAD_8 operations on consecutive addresses are spread among all threads and issued to the LSU through the Memory interface. When read operations are over, the FSM starts write operations in a similar way. The START_MEM_WRITE_TRANS performs LOCAL_WRITE_REQS (default = 128) write operations on consecutive addresses through the Memory interface. This time the operation performed is a STORE_8, and all ht tile are issuing the same store operation on same addresses compiting for the ownership in a transparent way. The coherence is totally handled by the LSU and CC, on the core side lsu_het_almost_full bitmap states the availability of the LSU for each thread (both writing and reading). In both states, a thread first checks the availability stored in a position equal to its ID (lsu_het_almost_full[thread_id]), then performs a memory transaction.