Difference between revisions of "The NaplesPU Hardware architecture"
Line 1: | Line 1: | ||
+ | TODO: aggiungere riferimento alla configurazione single core | ||
+ | |||
The '''nu+''' manycore is a parametrizable regular mesh Network on Chip (NoC) of configurable tile, developed by CeRICT in the framework of the MANGO FETHPC project. The main objective of '''nu+''' is to enable resource-efficient HPC based on special-purpose customized hardware. Our aim is to build an application-driven architecture to achieve the best hardware/software configuration for any data-parallel kernel. Specialized data-parallel accelerators have been known to provide higher efficiency than general-purpose processors for codes with significant amounts of regular data-level parallelism (DLP). However every parallel kernel has its own ideal configuration. | The '''nu+''' manycore is a parametrizable regular mesh Network on Chip (NoC) of configurable tile, developed by CeRICT in the framework of the MANGO FETHPC project. The main objective of '''nu+''' is to enable resource-efficient HPC based on special-purpose customized hardware. Our aim is to build an application-driven architecture to achieve the best hardware/software configuration for any data-parallel kernel. Specialized data-parallel accelerators have been known to provide higher efficiency than general-purpose processors for codes with significant amounts of regular data-level parallelism (DLP). However every parallel kernel has its own ideal configuration. | ||
Line 4: | Line 6: | ||
[[File:Manycore.png|800px|nu+ manycore architecture]] | [[File:Manycore.png|800px|nu+ manycore architecture]] | ||
+ | |||
+ | TODO: manca synchronization_core (ed io_interface?) | ||
+ | |||
+ | TODO: aggiungere figura architettura single core | ||
User design can set an high number of parameter for every need, such as: | User design can set an high number of parameter for every need, such as: | ||
Line 11: | Line 17: | ||
* Register file size (scalar and vector). | * Register file size (scalar and vector). | ||
* L1 and L2 cache size and way number. | * L1 and L2 cache size and way number. | ||
− | |||
There are all the hardware main section. Each of them covers important aspects of the hardware. | There are all the hardware main section. Each of them covers important aspects of the hardware. | ||
== Hardware sections == | == Hardware sections == | ||
+ | |||
+ | TODO: dividere in comuni, single core, many core, infrastruttura | ||
[[Core|nu+ core architecture]] | [[Core|nu+ core architecture]] | ||
+ | |||
+ | [[Debug|DSU architecture]] TODO rimuovere? | ||
[[Coherence|Coherence architecture]] | [[Coherence|Coherence architecture]] | ||
Line 25: | Line 34: | ||
[[Synchronization|Synchronization architecture]] | [[Synchronization|Synchronization architecture]] | ||
− | [[ | + | TODO [[System interface|System interface]] descrizione dettagliata interfaccia item (comandi, console) e memoria |
+ | |||
+ | TODO [[System deployment|System deployment]] descrizione uart_router, memory_controller, con riferimento a template nexys4ddr |
Revision as of 15:59, 30 December 2018
TODO: aggiungere riferimento alla configurazione single core
The nu+ manycore is a parametrizable regular mesh Network on Chip (NoC) of configurable tile, developed by CeRICT in the framework of the MANGO FETHPC project. The main objective of nu+ is to enable resource-efficient HPC based on special-purpose customized hardware. Our aim is to build an application-driven architecture to achieve the best hardware/software configuration for any data-parallel kernel. Specialized data-parallel accelerators have been known to provide higher efficiency than general-purpose processors for codes with significant amounts of regular data-level parallelism (DLP). However every parallel kernel has its own ideal configuration.
Each nu+ tile has the same basic components, it provides a configurable GPU-like open-source soft-core meant to be used as a configurable FPGA overlay. This HPC-oriented accelerator merges the SIMT paradigm with vector processor model. Furthermore, each tile has a Cache Controller and a Directory Controller, those components handle data coherence between different cores in different tiles.
TODO: manca synchronization_core (ed io_interface?)
TODO: aggiungere figura architettura single core
User design can set an high number of parameter for every need, such as:
- NoC topology and Tile number.
- Threads per core number. Each thread has a different PC, so one core can executes as many program as many threads it has.
- Hardware lanes per thread. Each thread can be a vector operation (here called hardware lane).
- Register file size (scalar and vector).
- L1 and L2 cache size and way number.
There are all the hardware main section. Each of them covers important aspects of the hardware.
Hardware sections
TODO: dividere in comuni, single core, many core, infrastruttura
DSU architecture TODO rimuovere?
TODO System interface descrizione dettagliata interfaccia item (comandi, console) e memoria
TODO System deployment descrizione uart_router, memory_controller, con riferimento a template nexys4ddr