Include
Contents
NPU Defines
Main NPU core defines are declared in the npu_defines.sv
file which stores global define at the core level. Parameters, such as the number of hardware lanes, number of register per register file, or memory address width, are defined in this file:
`define HW_LANE 16 `define ADDRESS_SIZE 32 `define REGISTER_NUMBER 64
along with information on special purpose registers:
`define PC_REG ( `REGISTER_NUMBER - 1 ) `define RA_REG ( `REGISTER_NUMBER - 2 ) `define SP_REG ( `REGISTER_NUMBER - 3 ) `define FP_REG ( `REGISTER_NUMBER - 4 ) `define MASK_REG ( `REGISTER_NUMBER - 5 )
and opcode definitions and instruction decoded-related data structure definitions:
NOT = `OP_CODE_WIDTH'b000000, OR = `OP_CODE_WIDTH'b000001, AND = `OP_CODE_WIDTH'b000010, XOR = `OP_CODE_WIDTH'b000011,
User Defines
User defines are included into the file npu_user_defines.sv
which exposes to the final user all the configurable parameters of the architecture. Typically those parameters are bound to be a power of two.
Core-side user defines: all the following are bound to be a power of two
- THREAD_NUMB: number of hardware thread instantiated, default 8.
- USER_ICACHE_SET: number of instruction cache sets, default 32.
- USER_ICACHE_WAY: number of instruction cache sets, default 4.
- USER_DCACHE_SET: number of data cache sets, default 32.
- USER_DCACHE_WAY: number of data cache sets, default 4.
- USER_L2CACHE_SET: number of L2 data cache sets, default 128.
- USER_L2CACHE_WAY: number of L2 data cache sets, default 8.
- NPU_SPM: when defined allocates a scratchpad memory in each NPU core.
- NPU_FPU: when defined allocates an FPU in each NPU core.
System-side user defines:
- DIRECTORY_BARRIER: when defined the manycore system supports a distributed directory mechanism spread over all tiles. Otherwise, it allocates a single centralized directory. The single-core version always has a centralized synchronization master.
- CENTRAL_SYNCH_ID: Centralized directory ID, used only when DIRECTORY_BARRIER is undefined.
- NoC_X_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a power of 2.
- NoC_Y_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a 3power of 2.
- TILE_MEMORY_ID: Memory Tile ID.
- TILE_H2C_ID: Host interface Tile ID.
- TILE_NPU: number of tile with an NPU core.
- IO_MAP_BASE_ADDR: base address of the non-coherent memory space dedicated for IO devices, default value 0xFF00_0000.
- IO_MAP_SIZE: width of the non-coherent memory space dedicated for IO devices, default value 0x00FF_FF00.
Furthermore, DISPLAY variables are defined, all commented by default. When a DISPLAY variable is active, it generates a file, under a folder named after the selected kernel. Each DISPLAY variable logs a defined kind of transaction, namely:
- DISPLAY_MEMORY: logs on file the memory state at the end of the kernel execution.
- DISPLAY_MEMORY_TRANS: logs on file all requests to the main memory.
- DISPLAY_MEMORY_CONTROLLER: displays on shell memory requests to the memory controller and its responses.
- DISPLAY_INT: logs every integer operation in the integer pipeline, and their results.
- DISPLAY_CORE: enables log from the core (file display_core.txt).
- DISPLAY_ISSUE: logs all scheduled instructions, and tracks the scheduled PC and the issued Thread, when DISPLAY_CORE is defined.
- DISPLAY_INT: logs all results from the integer module, when DISPLAY_CORE is defined.
- DISPLAY_WB: logs all results from the writeback module, when DISPLAY_CORE is defined.
- DISPLAY_LDST: enables logging into the load/store unit (file display_ldst.txt).
- DISPLAY_CACHE_CONTROLLER: logs memory transactions between Load/Store unit and the main memory.
- DISPLAY_SYNCH_CORE: logs synchronization requests within the core.
- DISPLAY_BARRIER_CORE: logs synchronization releases from the Synchronization master.
- DISPLAY_COHERENCE: logs all coherence transactions among CCs, DCs and MC.
- DISPLAY_THREAD_STATUS: displays all active threads status and trap reason.
These variables selectively enable the logging of a specific feature. For each define, a log file is typically created in npu/simulationlog/<name_of_the_kernel>/display_<name>
. DISPLAY_SIMULATION_LOG variable has to be always defined in the simulation flow, this also displays architectural information on the tcl shell.
Scratchpad Memory Defines
Defines related to the configuration of the scratchpad memory are stored into the npu_spm_defines.sv
file, which exposes to the final user all the configurable parameters related to the scratchpad memory at the core-level, along with other component-specific typedefs. The SPM has the following configurable parameters:
- SM_PROCESSING_ELEMENTS: number of concurrent input requests, default value 16 (equal to the number of hardware lanes).
- SM_ENTRIES: number of entries per bank (similar to cache sets).
- SM_MEMORY_BANKS: number of banks, default value 16.
- SM_BYTE_PER_ENTRY: number of bytes per entry, default value 4.
Network Defines
Network related defines are spread over two include files, namely npu_message_service_defines.sv
and npu_network_define.sv
. The first defines shared typedefs used for the host interfacing mechanism over the Network-on-Chip.
The latter defines the length of flits and exposes two configuration parameters two the final user:
- VC_PER_PORT: number of Virtual Channel per router port (5 ports), default value 4, must be a power of 2.
- QUEUE_LEN_PER_VC: length of the FIFO for each Virtual Channel, default value 16, must be a power of 2.
The following code snippet shows structures that ease flit management:
typedef struct packed { flit_type_t flit_type; vc_id_t vc_id; port_t next_hop_port; tile_address_t destination; tile_destination_t core_destination; } flit_header_t;
typedef logic [`PAYLOAD_W-1:0] flit_body_t;
typedef struct packed { flit_header_t header; flit_body_t payload; } flit_t;
Synchronization Defines
Synchronization related structures and types are declared in the npu_sychronization_defines.sv
file, which mainly defines a data structure that describes the synchronization message at the highest level of the network stack:
typedef struct packed { barrier_t id_barrier; cnt_barrier_t cnt_setup; tile_id_t tile_id_source; } sync_account_message_t;
typedef struct packed { barrier_t id_barrier; logic [$bits(cnt_barrier_t)+$bits(tile_id_t)-1:0] padding; } sync_release_message_t;
Synchronization traffics are encapsulated in a sync_message_t and then in a service_message_t:
typedef struct packed { sync_message_type_t sync_type;
union packed { sync_account_message_t account_mess; sync_release_message_t release_mess; } sync_mess; } sync_message_t;
typedef struct packed{ cnt_barrier_t cnt; tile_mask_t mask_slave; } barrier_data_t;
Coherence Defines
Coherence related defines are stored into npu_coherence_defines.sv
which declares all types and structures used by coherence actors (mainly Cache Controller and Directory Controller), such as TSHR entry type
typedef struct packed { logic valid; logic [`DIRECTORY_STATE_WIDTH - 1 : 0] state; l2_cache_address_t address; tile_mask_t sharers_list; tile_address_t owner; } tshr_entry_t;
or the request a CC can accept:
typedef enum coherence_request_t { load = 0, store = 1, replacement = 2, Fwd_GetS = 3, Fwd_GetM = 4, Inv = 5, Put_Ack = 6, Data_from_Dir_ack_eqz = 7, Data_from_Dir_ack_gtz = 8, Data_from_Owner = 9, Inv_Ack = 10, Last_Inv_Ack = 11, recall = 12, flush = 13, load_uncoherent = 14, store_uncoherent = 15, replacement_uncoherent = 16, flush_uncoherent = 17, Fwd_Flush = 18, dinv = 19, dinv_uncoherent = 20 } coherence_requests_enum_t;
and, of course, kind of message a coherence actor can send:
typedef enum message_request_t { GETS = 0, GETM = 1, PUTS = 2, PUTM = 3, DIR_FLUSH = 13 } message_requests_enum_t;
Types and functions here defined are extensively used by the protocol ROMs, at both Cache Controller- and Directory Controller-side.