Difference between revisions of "Include"

From NaplesPU Documentation
Jump to: navigation, search
(Coherence Defines)
(Network Defines)
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Nu+ Defines ==  
+
== NPU Defines ==  
The '''Nu+ Define''' include file (nuplus_defines.sv) stores global define at core level. Parameters, such as number of hardware lanes, number of register per register file, or memory address width, are defined in this file:
+
Main NPU core defines are declared in the <code>npu_defines.sv</code> file which stores global define at the core level. Parameters, such as the number of hardware lanes, number of register per register file, or memory address width, are defined in this file:
  
 
  `define HW_LANE                16
 
  `define HW_LANE                16
Line 22: Line 22:
  
 
== User Defines ==  
 
== User Defines ==  
The '''User Defines''' include file (nuplus_user_defines.sv) exposes to the final user all the configurable parameters of the architecture. Typically those parameters are bound to be a power of two.
+
User defines are included into the file <code>npu_user_defines.sv</code> which exposes to the final user all the configurable parameters of the architecture. Typically those parameters are bound to be a power of two.
  
 
'''Core-side user defines''': all the following are bound to be a power of two
 
'''Core-side user defines''': all the following are bound to be a power of two
Line 33: Line 33:
 
* USER_L2CACHE_SET: number of L2 data cache sets, default 128.
 
* USER_L2CACHE_SET: number of L2 data cache sets, default 128.
 
* USER_L2CACHE_WAY: number of L2 data cache sets, default 8.  
 
* USER_L2CACHE_WAY: number of L2 data cache sets, default 8.  
* NUPLUS_SPM: when defined allocates a scratchpad memory in each nu+ core.
+
* NPU_SPM: when defined allocates a scratchpad memory in each NPU core.
* NUPLUS_FPU: when defined allocates a FPU in each nu+ core.
+
* NPU_FPU: when defined allocates an FPU in each NPU core.
  
 
'''System-side user defines''':  
 
'''System-side user defines''':  
Line 40: Line 40:
 
* DIRECTORY_BARRIER: when defined the manycore system supports a distributed directory mechanism spread over all tiles. Otherwise, it allocates a single centralized directory. The single-core version always has a centralized synchronization master.
 
* DIRECTORY_BARRIER: when defined the manycore system supports a distributed directory mechanism spread over all tiles. Otherwise, it allocates a single centralized directory. The single-core version always has a centralized synchronization master.
 
* CENTRAL_SYNCH_ID: Centralized directory ID, used only when DIRECTORY_BARRIER is undefined.
 
* CENTRAL_SYNCH_ID: Centralized directory ID, used only when DIRECTORY_BARRIER is undefined.
* NoC_X_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be power of 2.
+
* NoC_X_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a power of 2.
* NoC_Y_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be power of 2.
+
* NoC_Y_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a 3power of 2.
 
* TILE_MEMORY_ID: Memory Tile ID.
 
* TILE_MEMORY_ID: Memory Tile ID.
 
* TILE_H2C_ID: Host interface Tile ID.
 
* TILE_H2C_ID: Host interface Tile ID.
* TILE_NUPLUS: number of tile with a nu+ core.
+
* TILE_NPU: number of tile with an NPU core.
 
* IO_MAP_BASE_ADDR: base address of the non-coherent memory space dedicated for IO devices, default value 0xFF00_0000.
 
* IO_MAP_BASE_ADDR: base address of the non-coherent memory space dedicated for IO devices, default value 0xFF00_0000.
 
* IO_MAP_SIZE: width of the non-coherent memory space dedicated for IO devices, default value 0x00FF_FF00.
 
* IO_MAP_SIZE: width of the non-coherent memory space dedicated for IO devices, default value 0x00FF_FF00.
  
 
Furthermore, DISPLAY variables are defined, all commented by default. When a DISPLAY variable is active, it generates a file, under a folder named after the selected kernel. Each DISPLAY variable logs a defined kind of transaction, namely:
 
Furthermore, DISPLAY variables are defined, all commented by default. When a DISPLAY variable is active, it generates a file, under a folder named after the selected kernel. Each DISPLAY variable logs a defined kind of transaction, namely:
* DISPLAY_MEMORY: logs on file the memory state at the end of the kernel execution.
+
* DISPLAY_MEMORY: logs on file the memory state at the end of the kernel execution.
* DISPLAY_MEMORY_TRANS: logs on file all requests to the main memory.
+
* DISPLAY_MEMORY_TRANS: logs on file all requests to the main memory.
* DISPLAY_INT: logs every integer operation in the integer pipeline, and their results.
+
* DISPLAY_MEMORY_CONTROLLER: displays on shell memory requests to the memory controller and its responses.
* DISPLAY_CORE: enables log from the core (file display_core.txt).
+
* DISPLAY_INT: logs every integer operation in the integer pipeline, and their results.
* DISPLAY_ISSUE: logs all scheduled instructions, and tracks the scheduled PC and the issued Thread, when DISPLAY_CORE is defined.
+
* DISPLAY_CORE: enables log from the core (file display_core.txt).
* DISPLAY_INT: logs all results from the integer module, when DISPLAY_CORE is defined.
+
* DISPLAY_ISSUE: logs all scheduled instructions, and tracks the scheduled PC and the issued Thread, when DISPLAY_CORE is defined.
* DISPLAY_WB: logs all results from the writeback module, when DISPLAY_CORE is defined.
+
* DISPLAY_INT: logs all results from the integer module, when DISPLAY_CORE is defined.
* DISPLAY_LDST: enables logging into the load/store unit (file display_ldst.txt).
+
* DISPLAY_WB: logs all results from the writeback module, when DISPLAY_CORE is defined.
* DISPLAY_CACHE_CONTROLLER: logs memory transactions between Load/Store unit and the main memory.
+
* DISPLAY_LDST: enables logging into the load/store unit (file display_ldst.txt).
* DISPLAY_SYNCH_CORE: logs synchronization requests within the core.
+
* DISPLAY_CACHE_CONTROLLER: logs memory transactions between Load/Store unit and the main memory.
* DISPLAY_BARRIER_CORE: logs synchronization releases from the Synchronization master.
+
* DISPLAY_SYNCH_CORE: logs synchronization requests within the core.
* DISPLAY_COHERENCE: logs all coherence transactions among CCs, DCs and MC.
+
* DISPLAY_BARRIER_CORE: logs synchronization releases from the Synchronization master.
* DISPLAY_THREAD_STATUS: displays all active threads status and trap reason.
+
* DISPLAY_COHERENCE: logs all coherence transactions among CCs, DCs and MC.
 +
* DISPLAY_THREAD_STATUS: displays all active threads status and trap reason.
  
These variables selectively enable the logging of a specific feature. For each define, a log file is typically created in nuplus/simulationlog/<name_of_the_kernel>/display_<name>. DISPLAY_SIMULATION_LOG variable has to be always defined in the simulation flow, this also displays architectural information on the tcl shell.
+
These variables selectively enable the logging of a specific feature. For each define, a log file is typically created in <code>npu/simulationlog/<name_of_the_kernel>/display_<name></code>. DISPLAY_SIMULATION_LOG variable has to be always defined in the simulation flow, this also displays architectural information on the tcl shell.
  
 
== Scratchpad Memory Defines ==
 
== Scratchpad Memory Defines ==
  
The '''Scratchpad Memory Define''' include file (nuplus_spm_defines.sv) exposes to the final user, all the configurable parameters related to the scratchpad memory at the core-level, along with other component-specific typedefs. The SPM has the following configurable paramenters:  
+
Defines related to the configuration of the scratchpad memory are stored into the <code>npu_spm_defines.sv</code> file, which exposes to the final user all the configurable parameters related to the scratchpad memory at the core-level, along with other component-specific typedefs. The SPM has the following configurable parameters:  
  
 
* SM_PROCESSING_ELEMENTS: number of concurrent input requests, default value 16 (equal to the number of hardware lanes).
 
* SM_PROCESSING_ELEMENTS: number of concurrent input requests, default value 16 (equal to the number of hardware lanes).
Line 76: Line 77:
 
== Network Defines ==
 
== Network Defines ==
  
The '''Network Defines''' are spread over two include files, namely nuplus_message_service_defines.sv and nuplus_network_define.sv. The first defines shared typedefs used for the host interfacing mechanism over the Network-on-Chip.  
+
Network related defines are spread over two include files, namely <code>npu_message_service_defines.sv</code> and <code>npu_network_define.sv</code>. The first defines shared typedefs used for the host interfacing mechanism over the Network-on-Chip.  
 
The latter defines the length of flits and exposes two configuration parameters two the final user:
 
The latter defines the length of flits and exposes two configuration parameters two the final user:
 
* VC_PER_PORT: number of Virtual Channel per router port (5 ports), default value 4, must be a power of 2.
 
* VC_PER_PORT: number of Virtual Channel per router port (5 ports), default value 4, must be a power of 2.
 
* QUEUE_LEN_PER_VC: length of the FIFO for each Virtual Channel, default value 16, must be a power of 2.
 
* QUEUE_LEN_PER_VC: length of the FIFO for each Virtual Channel, default value 16, must be a power of 2.
  
The remainder of the file defines structures that ease flit management:
+
The following code snippet shows structures that ease flit management:
  
 
  typedef struct packed {
 
  typedef struct packed {
Line 100: Line 101:
 
== Synchronization Defines ==
 
== Synchronization Defines ==
  
The '''Synchronization Defines''' include file (nuplus_sychronization_defines.sv) mainly defines data structure that describe the synchronization message at the highest level of the network stack:
+
Synchronization related structures and types are declared in the <code>npu_sychronization_defines.sv</code> file, which mainly defines a data structure that describes the synchronization message at the highest level of the network stack:
  
 
  typedef struct packed {
 
  typedef struct packed {
Line 131: Line 132:
 
== Coherence Defines ==  
 
== Coherence Defines ==  
  
The '''Coherence Defines''' include file (called nuplus_coherence_defines.sv) defines all types and structures used by coherence actors (mainly '''Cache Controller''' and '''Directory Controller'''), such as TSHR entry type
+
Coherence related defines are stored into <code>npu_coherence_defines.sv</code> which declares all types and structures used by coherence actors (mainly '''Cache Controller''' and '''Directory Controller'''), such as TSHR entry type
 
  typedef struct packed {
 
  typedef struct packed {
 
     logic                                  valid;
 
     logic                                  valid;

Latest revision as of 12:20, 1 July 2019

NPU Defines

Main NPU core defines are declared in the npu_defines.sv file which stores global define at the core level. Parameters, such as the number of hardware lanes, number of register per register file, or memory address width, are defined in this file:

`define HW_LANE                 16
`define ADDRESS_SIZE            32
`define REGISTER_NUMBER         64

along with information on special purpose registers:

`define PC_REG                  ( `REGISTER_NUMBER - 1 )
`define RA_REG                  ( `REGISTER_NUMBER - 2 )
`define SP_REG                  ( `REGISTER_NUMBER - 3 )
`define FP_REG                  ( `REGISTER_NUMBER - 4 )
`define MASK_REG                ( `REGISTER_NUMBER - 5 )

and opcode definitions and instruction decoded-related data structure definitions:

NOT      = `OP_CODE_WIDTH'b000000,
OR       = `OP_CODE_WIDTH'b000001,
AND      = `OP_CODE_WIDTH'b000010,
XOR      = `OP_CODE_WIDTH'b000011,

User Defines

User defines are included into the file npu_user_defines.sv which exposes to the final user all the configurable parameters of the architecture. Typically those parameters are bound to be a power of two.

Core-side user defines: all the following are bound to be a power of two

  • THREAD_NUMB: number of hardware thread instantiated, default 8.
  • USER_ICACHE_SET: number of instruction cache sets, default 32.
  • USER_ICACHE_WAY: number of instruction cache sets, default 4.
  • USER_DCACHE_SET: number of data cache sets, default 32.
  • USER_DCACHE_WAY: number of data cache sets, default 4.
  • USER_L2CACHE_SET: number of L2 data cache sets, default 128.
  • USER_L2CACHE_WAY: number of L2 data cache sets, default 8.
  • NPU_SPM: when defined allocates a scratchpad memory in each NPU core.
  • NPU_FPU: when defined allocates an FPU in each NPU core.

System-side user defines:

  • DIRECTORY_BARRIER: when defined the manycore system supports a distributed directory mechanism spread over all tiles. Otherwise, it allocates a single centralized directory. The single-core version always has a centralized synchronization master.
  • CENTRAL_SYNCH_ID: Centralized directory ID, used only when DIRECTORY_BARRIER is undefined.
  • NoC_X_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a power of 2.
  • NoC_Y_WIDTH: Network-on-Chip mesh x dimension width, default value 2, must be a 3power of 2.
  • TILE_MEMORY_ID: Memory Tile ID.
  • TILE_H2C_ID: Host interface Tile ID.
  • TILE_NPU: number of tile with an NPU core.
  • IO_MAP_BASE_ADDR: base address of the non-coherent memory space dedicated for IO devices, default value 0xFF00_0000.
  • IO_MAP_SIZE: width of the non-coherent memory space dedicated for IO devices, default value 0x00FF_FF00.

Furthermore, DISPLAY variables are defined, all commented by default. When a DISPLAY variable is active, it generates a file, under a folder named after the selected kernel. Each DISPLAY variable logs a defined kind of transaction, namely:

  • DISPLAY_MEMORY: logs on file the memory state at the end of the kernel execution.
  • DISPLAY_MEMORY_TRANS: logs on file all requests to the main memory.
  • DISPLAY_MEMORY_CONTROLLER: displays on shell memory requests to the memory controller and its responses.
  • DISPLAY_INT: logs every integer operation in the integer pipeline, and their results.
  • DISPLAY_CORE: enables log from the core (file display_core.txt).
  • DISPLAY_ISSUE: logs all scheduled instructions, and tracks the scheduled PC and the issued Thread, when DISPLAY_CORE is defined.
  • DISPLAY_INT: logs all results from the integer module, when DISPLAY_CORE is defined.
  • DISPLAY_WB: logs all results from the writeback module, when DISPLAY_CORE is defined.
  • DISPLAY_LDST: enables logging into the load/store unit (file display_ldst.txt).
  • DISPLAY_CACHE_CONTROLLER: logs memory transactions between Load/Store unit and the main memory.
  • DISPLAY_SYNCH_CORE: logs synchronization requests within the core.
  • DISPLAY_BARRIER_CORE: logs synchronization releases from the Synchronization master.
  • DISPLAY_COHERENCE: logs all coherence transactions among CCs, DCs and MC.
  • DISPLAY_THREAD_STATUS: displays all active threads status and trap reason.

These variables selectively enable the logging of a specific feature. For each define, a log file is typically created in npu/simulationlog/<name_of_the_kernel>/display_<name>. DISPLAY_SIMULATION_LOG variable has to be always defined in the simulation flow, this also displays architectural information on the tcl shell.

Scratchpad Memory Defines

Defines related to the configuration of the scratchpad memory are stored into the npu_spm_defines.sv file, which exposes to the final user all the configurable parameters related to the scratchpad memory at the core-level, along with other component-specific typedefs. The SPM has the following configurable parameters:

  • SM_PROCESSING_ELEMENTS: number of concurrent input requests, default value 16 (equal to the number of hardware lanes).
  • SM_ENTRIES: number of entries per bank (similar to cache sets).
  • SM_MEMORY_BANKS: number of banks, default value 16.
  • SM_BYTE_PER_ENTRY: number of bytes per entry, default value 4.

Network Defines

Network related defines are spread over two include files, namely npu_message_service_defines.sv and npu_network_define.sv. The first defines shared typedefs used for the host interfacing mechanism over the Network-on-Chip. The latter defines the length of flits and exposes two configuration parameters two the final user:

  • VC_PER_PORT: number of Virtual Channel per router port (5 ports), default value 4, must be a power of 2.
  • QUEUE_LEN_PER_VC: length of the FIFO for each Virtual Channel, default value 16, must be a power of 2.

The following code snippet shows structures that ease flit management:

typedef struct packed {
    flit_type_t flit_type;
    vc_id_t vc_id;
    port_t next_hop_port;
    tile_address_t destination;
    tile_destination_t core_destination;
} flit_header_t;
typedef logic [`PAYLOAD_W-1:0] flit_body_t;
typedef struct packed {
    flit_header_t header;
    flit_body_t payload;
} flit_t;

Synchronization Defines

Synchronization related structures and types are declared in the npu_sychronization_defines.sv file, which mainly defines a data structure that describes the synchronization message at the highest level of the network stack:

typedef struct packed {
    barrier_t id_barrier;
    cnt_barrier_t cnt_setup;
    tile_id_t tile_id_source;
} sync_account_message_t;
typedef struct packed {
    barrier_t id_barrier;
    logic [$bits(cnt_barrier_t)+$bits(tile_id_t)-1:0] padding;
} sync_release_message_t;

Synchronization traffics are encapsulated in a sync_message_t and then in a service_message_t:

typedef struct packed {
    sync_message_type_t sync_type;
    union packed {
         sync_account_message_t account_mess;
         sync_release_message_t release_mess;
    } sync_mess;
} sync_message_t;
typedef struct packed{
    cnt_barrier_t cnt;
    tile_mask_t mask_slave;
} barrier_data_t;

Coherence Defines

Coherence related defines are stored into npu_coherence_defines.sv which declares all types and structures used by coherence actors (mainly Cache Controller and Directory Controller), such as TSHR entry type

typedef struct packed {
    logic                                  valid;
    logic [`DIRECTORY_STATE_WIDTH - 1 : 0] state;
    l2_cache_address_t                     address;
    tile_mask_t                            sharers_list;
    tile_address_t                         owner;
} tshr_entry_t;

or the request a CC can accept:

typedef enum coherence_request_t {
    load                   = 0,
    store                  = 1,
    replacement            = 2,
    Fwd_GetS               = 3,
    Fwd_GetM               = 4,
    Inv                    = 5,
    Put_Ack                = 6,
    Data_from_Dir_ack_eqz  = 7,
    Data_from_Dir_ack_gtz  = 8,
    Data_from_Owner        = 9,
    Inv_Ack                = 10,
    Last_Inv_Ack           = 11,
    recall                 = 12,
    flush                  = 13,
    load_uncoherent        = 14,
    store_uncoherent       = 15,
    replacement_uncoherent = 16,
    flush_uncoherent       = 17,
    Fwd_Flush              = 18,
    dinv                   = 19,
    dinv_uncoherent        = 20
} coherence_requests_enum_t;

and, of course, kind of message a coherence actor can send:

typedef enum message_request_t {
    GETS      = 0,
    GETM      = 1,
    PUTS      = 2,
    PUTM     = 3,
    DIR_FLUSH = 13
} message_requests_enum_t;

Types and functions here defined are extensively used by the protocol ROMs, at both Cache Controller- and Directory Controller-side.