Difference between revisions of "Main Page"

From NaplesPU Documentation
Jump to: navigation, search
(Documentation)
 
(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
nu+ is an open-source GPU-like compute core, developed by Alessandro Cilardo's research group at the University of Naples Federico II / CeRICT, now being integrated in the framework of the MANGO FETHPC project. The main objective of nu+ is to enable resource-efficient HPC based on special-purpose customized hardware. In MANGO, the GPU-like core is meant to be used to support architecture-level exploration for massively parallel manycore systems but, as one of its primary objectives, nu+ also targets FPGA-accelerated HPC systems. In that respect, nu+ will provide an FPGA overlay solution, used to readily build tailored processing elements preserving software support, guaranteeing improved resource efficiency, yet avoiding the development of a dedicated accelerator from scratch through the support for familiar programming models.
+
The ''Naples Processing Unit'', dubbed '''NaplesPU''' or '''NPU''', is a comprehensive open-source manycore accelerator, encompassing all the architecture layers from the compute core up to the on-chip interconnect, the coherence memory hierarchy, and the compilation toolchain.
 +
Entirely written in System Verilog HDL, '''NaplesPU''' exploits the three forms of parallelism that you normally find in modern compute architectures, particularly in heterogeneous accelerators such as GPU devices: vector parallelism, hardware multithreading, and manycore organization. Equipped with a complete LLVM-based compiler targeting the '''NaplesPU''' vector ISA, the '''NPU''' open-source project will let you experiment with all of the flavors of today’s manycore technologies.
  
[[File:Manycore.png|800px|nu+ manycore architecture]]
+
The '''NPU''' manycore architecture is based on a parameterizable mesh of configurable tiles connected through a Network on Chip (NoC). Each tile has a Cache Controller and a Directory Controller, handling data coherence between different cores in different tiles. The compute core is based on a vector pipeline featuring a lightweight control unit, so as to devote most of the hardware resources to the acceleration of data-parallel kernels. Memory operations and long-latency instructions are masked by exploiting hardware multithreading. Each hardware thread (roughly equivalent to a wavefront in the OpenCL terminology or a CUDA warp in the NVIDIA terminology) has its own PC, register file, and control registers. The number of threads in the '''NaplesPU''' system is user-configurable.
  
The nu+ manycore is a parametrizable regular mesh Network on Chip (NoC) of configurable tile. Each nu+ tile has the same basic components, it provides a configrable GPU-like open-source softcore meant to be used as a configurable FPGA overlay. This HPC-oriented accelerator merges the SIMT paradigm with vector processor model. Futhermore, each tile has a Cache Controller and a Directory Controller, those components handle data coherence between different cores in different tiles. On top of the customized hardware core, we are also developing a nu+ compiler backend relying on the LLVM infrastructure.  
+
[[File:Overview.jpeg|1200px|center|NaplesPU overview]]
  
[[File:Nup pipe.png|700px|nu+ microarchitecture]]
 
 
The core is based on a RISC in-order pipeline. Its control unit is intentionally kept lightweight. The architecture masks memory and operation latencies by heavily relying on hardware multithreading. By ensuring a light control logic, the core can devote most of its resources for accelerating computing in highly data-parallel kernels. In the hardware multithreading \nuplus architecture, each hardware thread has its own PC, register file, and control registers. The number of threads is user configurable. A \nuplus hardware thread is equivalent to a wavefront in the AMD terminology and a CUDA warp in the NVIDIA terminology. The processor uses a deep pipeline to improve clock speed.
 
  
 
== Getting started ==
 
== Getting started ==
This section shows how to approach with nu+ project for simulating or implementing a kernel for nu+ architecture. Kernel means a complex application such as matrix multiplication, transpose of a matrix or similar that is written in a high-level programming language, for example, C/C++.
+
This section shows how to approach the project for simulating or implementing a kernel for NaplesPU architecture. Kernel means a complex application such as matrix multiplication, transpose of a matrix or similar that is written in a high-level programming language, such as C/C++.
  
 
=== Required software ===
 
=== Required software ===
Line 16: Line 14:
 
* Git
 
* Git
 
* Xilinx Vivado 2018.2 or ModelSim (e.g. Questa Sim-64 vsim 10.6c_1)
 
* Xilinx Vivado 2018.2 or ModelSim (e.g. Questa Sim-64 vsim 10.6c_1)
* nu+ toolchain
+
* NaplesPU toolchain
  
 
=== Building process ===
 
=== Building process ===
The first step is to obtain the source code of nu+ architecture from the official repository by cloning a repository from [https://gitlab.com/vincenscotti/nuplus]
+
The first step is to obtain the source code of NaplesPU architecture from the official repository by cloning it from [https://gitlab.com/vincenscotti/nuplus]
  
 
In Ubuntu Linux environment, this step is fulfilled by starting following command:
 
In Ubuntu Linux environment, this step is fulfilled by starting following command:
  
<code> $ git clone 'https://gitlab.com/vincenscotti/nuplus' </code>
+
<code> $ git clone https://github.com/AlessandroCilardo/NaplesPU </code>
  
In the nu+ repository, toolchain consist of a sub-module of the repository so is needed to update. In Ubuntu Linux environment, just type the following command in a root folder of the repository:  
+
In the NaplesPU repository, the toolchain is a git sub-module of the repository so is needed to be created and updated. In Ubuntu Linux environment, just type the following command in a root folder of the repository:  
  
 
<code> $ git submodule update --init </code>
 
<code> $ git submodule update --init </code>
Line 31: Line 29:
 
Then, the third step is to install a toolchain. This process is described [[http://www.naplespu.com/doc/index.php?title=Toolchain here]].
 
Then, the third step is to install a toolchain. This process is described [[http://www.naplespu.com/doc/index.php?title=Toolchain here]].
  
At this point, in a root folder of the repository, there are a few sub-folders. Two of these sub-folders are of particular interest for the purpose:
+
=== Simulate a kernel ===
* software, where are stored all kernels (in a sub-folder kernel);
+
The following folders are of particular interest for the purpose:
* tools, where are stored all scripts for simulating one or more kernel.
+
* software, stores all kernels;
 +
* tools, stores all scripts for simulation.
  
=== Simulate a kernel ===
+
For simulating a kernel there are three ways:
For simulating a kernel there are three-way:
 
 
* starting test.sh script
 
* starting test.sh script
* starting setup_project.sh from a root folder of the repository, if simulator software chosen is Vivado;
+
* starting setup_project.sh from the root folder of the repository, if the simulator software is Vivado;
* starting simulate.sh from a root folder of the repository, if simulator software chosen is ModelSim.
+
* starting simulate.sh from the root folder of the repository, if the simulator software is ModelSim.
  
First of all, is needed to load Vivado or ModelSim function in the shell. This step is mandatory for all ways. In Ubuntu Linux environment, if the simulator software chosen is Vivado, it can be possible by launching the following command:
+
First of all, source Vivado or ModelSim in the shell. This step is mandatory for all ways. In Ubuntu Linux environment:
  
 
<code>$ source Vivado/folder/location/settingXX.sh</code>
 
<code>$ source Vivado/folder/location/settingXX.sh</code>
  
 
where XX depends on the installed version of Vivado (32 o 64 bit).
 
where XX depends on the installed version of Vivado (32 o 64 bit).
 
If simulator software chosen is ModelSim, is needed to add environment variable of ModelSim to PATH variable and load a license in the shell. In Ubuntu Linux environment, the commands to launch is:
 
 
<code>$ export PATH=$PATH:ModelSim/folder/location/bin/</code>
 
<code>$ export LM_LICENSE_FILE=1717@vlsi2:$LM_LICENSE_FILE</code>
 
  
 
==== test.sh script ====
 
==== test.sh script ====
For start test.sh script, type following command with some options in a nuplus/tools folder:
+
The test.sh script, located in the npu/tools folder, runs all the kerels listed in it and compares the output from NPU with the expected result produced by a standard x86 architecture:
  
 
<code>$ ./test.sh [option]</code>
 
<code>$ ./test.sh [option]</code>
  
Options that can be use are:
+
Options are:
 
* -h,  --help                  show this help
 
* -h,  --help                  show this help
 
* -t,  --tool=vsim or vivado  specify the tool to use, default: vsim
 
* -t,  --tool=vsim or vivado  specify the tool to use, default: vsim
Line 63: Line 56:
 
* -tn, --thread-numb=VALUE    specify the thread number, default: 8
 
* -tn, --thread-numb=VALUE    specify the thread number, default: 8
  
This script allows starting one or more kernel defined in an array of the script. The test.sh script provides to compile kernels and run them on nu+ and x86 architecture. Once the simulation is terminated, for each kernel, results of both execution are compared by a Python script for verifying the correctness of result of nu+ architecture.
+
The test.sh script automatically compiles the kernels and runs them on NaplesPU and x86 architecture. Once the simulation is terminated, for each kernel, the results of both executions are compared by a Python script for verifying the correctness.
  
In folder tools, there is a log file, called cosim.log, where are stored some information about simulation.
+
In the tools folder, the file cosim.log stores the output of the simulator.
  
 
==== setup_project.sh script ====
 
==== setup_project.sh script ====
For start setup_project.sh script, type following command with some options in a nuplus folder:
+
The setup_project.sh script can be run as follow from the root of the project:
  
 
<code>$ tools/vivado/setup_project.sh [option]</code>
 
<code>$ tools/vivado/setup_project.sh [option]</code>
  
Options that can be used are:
+
Options are:
 
* -h, --help                  show this help
 
* -h, --help                  show this help
 
* -k, --kernel=KERNEL_NAME    specify the kernel to use
 
* -k, --kernel=KERNEL_NAME    specify the kernel to use
Line 78: Line 71:
 
* -c, --core-mask=VALUE      specify the core activation mask, default: 1
 
* -c, --core-mask=VALUE      specify the core activation mask, default: 1
 
* -t, --thread-mask=VALUE    specify the thread activation mask, default FF
 
* -t, --thread-mask=VALUE    specify the thread activation mask, default FF
* -m, --mode=gui or batch specify the tool mode, it can run in either gui or batch mode, default: gui
+
* -m, --mode=gui or batch     specify the tool mode, it can run in either gui or batch mode, default: gui
  
This script allows starting a kernel specified in the command. The kernel will compile and run on nu+ architecture. Simulation is performed by Vivado:
+
This script starts the kernel specified in the command. The kernel ought be already compiled before running it on the NaplesPU architecture:  
  
 
  tools/vivado/setup_project.sh -k mmsc -c 3 -t $(( 16#F )) -m gui
 
  tools/vivado/setup_project.sh -k mmsc -c 3 -t $(( 16#F )) -m gui
  
About the parameters:
+
Parameter -c 3 passes the one-hot mask for the core activation: 3 is (11)2, hence tile 0 and 1 will start their cores. Parameter
 
+
-t $(( 16#F )) refers to the active thread mask for each core, it is a one-hot mask that states which thread is active in each core: F is (00001111)2 so thread 0 to 3 are running.
3 is the one-hot mask that says which core should be active: 3 is (11)2, hence 2 cores;
+
Parameter -m gui states in which mode the simulator executes.
$(( 16#F )) is the one-hot mask that says which thread should be active for a core: F is (00001111)2 so 4 threads active;
 
gui is the mode in which execute Vivado (you can run it as a batch if you prefer).
 
  
 
==== simulate.sh script ====
 
==== simulate.sh script ====
For start simulate.sh script, type following command with some options in a nuplus folder:
+
The simulate.sh script can be run as follow from the root of the project:
  
 
<code>$ tools/modelsim/simulate.sh [option]</code>
 
<code>$ tools/modelsim/simulate.sh [option]</code>
  
Options that can be used are:
+
Options:
 
* -h, --help                  show this help
 
* -h, --help                  show this help
 
* -k, --kernel=KERNEL_NAME    specify the kernel to use
 
* -k, --kernel=KERNEL_NAME    specify the kernel to use
Line 101: Line 92:
 
* -c, --core-mask=VALUE      specify the core activation mask, default: 1
 
* -c, --core-mask=VALUE      specify the core activation mask, default: 1
 
* -t, --thread-mask=VALUE    specify the thread activation mask, default FF
 
* -t, --thread-mask=VALUE    specify the thread activation mask, default FF
* -m, --mode=gui or batch specify the tool mode, it can run in either gui or batch mode, default: gui
+
* -m, --mode=gui or batch     specify the tool mode, it can run in either gui or batch mode, default: gui
 
 
This script allows starting a kernel specified in the command. The kernel will compile and run on nu+ architecture. Simulation is performed by ModelSim.
 
  
== Documentation ==
+
This script starts the kernel specified in the command. The kernel ought be already compiled before running it on the NaplesPU architecture:
  
[[The nu+ Hardware architecture|The nu+ Hardware Architecture]]
+
== Full Documentation ==
  
[[toolchain|The nu+ Toolchain]]
+
[[The nu+ Hardware architecture|The NaplesPU Hardware Architecture]]
  
[[ISA|The nu+ Instruction Set Architecture]]
+
[[toolchain|The NaplesPU Toolchain]]
  
TODO [[Kernels|Writing nu+ applications]]: spiegazione workflow kernel, scrittura-compilazione-file generati (altrove sara' spiegata la simulazione e il loading su scheda)
+
[[ISA|The NaplesPU Instruction Set Architecture]]
  
[[applications|Example of nu+ applications]] TODO spostare contenuti in una sezione "Examples" da includere nella pagina di sopra
+
[[Extending nu+|Extending NaplesPU]]
  
[[nu+ HowTos |nu+ HowTo]] TODO: tradurre e verificare contenuti
+
[[Heterogeneous Tile|Heterogeneous Tile]]  
  
[[TileEterogenea | Progetto di Tile Eterogenea]] TODO: strutturato, figura wrapper, come si cala nel sistema, dove modificare
+
[[Programming Model|Programming Model]]
  
 
== Further information on MediaWiki ==
 
== Further information on MediaWiki ==
  
TODO: da rimuovere/salvare altrove
+
The '''NaplesPU''' project documentation will be based on MediaWiki.
 
 
The '''nu+''' project documentation will be based on MediaWiki.
 
 
''For information and guides on using MediaWiki, please see the links below:''
 
''For information and guides on using MediaWiki, please see the links below:''
  

Latest revision as of 14:10, 21 July 2019

The Naples Processing Unit, dubbed NaplesPU or NPU, is a comprehensive open-source manycore accelerator, encompassing all the architecture layers from the compute core up to the on-chip interconnect, the coherence memory hierarchy, and the compilation toolchain. Entirely written in System Verilog HDL, NaplesPU exploits the three forms of parallelism that you normally find in modern compute architectures, particularly in heterogeneous accelerators such as GPU devices: vector parallelism, hardware multithreading, and manycore organization. Equipped with a complete LLVM-based compiler targeting the NaplesPU vector ISA, the NPU open-source project will let you experiment with all of the flavors of today’s manycore technologies.

The NPU manycore architecture is based on a parameterizable mesh of configurable tiles connected through a Network on Chip (NoC). Each tile has a Cache Controller and a Directory Controller, handling data coherence between different cores in different tiles. The compute core is based on a vector pipeline featuring a lightweight control unit, so as to devote most of the hardware resources to the acceleration of data-parallel kernels. Memory operations and long-latency instructions are masked by exploiting hardware multithreading. Each hardware thread (roughly equivalent to a wavefront in the OpenCL terminology or a CUDA warp in the NVIDIA terminology) has its own PC, register file, and control registers. The number of threads in the NaplesPU system is user-configurable.

NaplesPU overview


Getting started

This section shows how to approach the project for simulating or implementing a kernel for NaplesPU architecture. Kernel means a complex application such as matrix multiplication, transpose of a matrix or similar that is written in a high-level programming language, such as C/C++.

Required software

Simulation or implementation of any kernel relies on the following dependencies:

  • Git
  • Xilinx Vivado 2018.2 or ModelSim (e.g. Questa Sim-64 vsim 10.6c_1)
  • NaplesPU toolchain

Building process

The first step is to obtain the source code of NaplesPU architecture from the official repository by cloning it from [1]

In Ubuntu Linux environment, this step is fulfilled by starting following command:

$ git clone https://github.com/AlessandroCilardo/NaplesPU

In the NaplesPU repository, the toolchain is a git sub-module of the repository so is needed to be created and updated. In Ubuntu Linux environment, just type the following command in a root folder of the repository:

$ git submodule update --init

Then, the third step is to install a toolchain. This process is described [here].

Simulate a kernel

The following folders are of particular interest for the purpose:

  • software, stores all kernels;
  • tools, stores all scripts for simulation.

For simulating a kernel there are three ways:

  • starting test.sh script
  • starting setup_project.sh from the root folder of the repository, if the simulator software is Vivado;
  • starting simulate.sh from the root folder of the repository, if the simulator software is ModelSim.

First of all, source Vivado or ModelSim in the shell. This step is mandatory for all ways. In Ubuntu Linux environment:

$ source Vivado/folder/location/settingXX.sh

where XX depends on the installed version of Vivado (32 o 64 bit).

test.sh script

The test.sh script, located in the npu/tools folder, runs all the kerels listed in it and compares the output from NPU with the expected result produced by a standard x86 architecture:

$ ./test.sh [option]

Options are:

  • -h, --help show this help
  • -t, --tool=vsim or vivado specify the tool to use, default: vsim
  • -cn, --core-numb=VALUE specify the core number, default: 1
  • -tn, --thread-numb=VALUE specify the thread number, default: 8

The test.sh script automatically compiles the kernels and runs them on NaplesPU and x86 architecture. Once the simulation is terminated, for each kernel, the results of both executions are compared by a Python script for verifying the correctness.

In the tools folder, the file cosim.log stores the output of the simulator.

setup_project.sh script

The setup_project.sh script can be run as follow from the root of the project:

$ tools/vivado/setup_project.sh [option]

Options are:

  • -h, --help show this help
  • -k, --kernel=KERNEL_NAME specify the kernel to use
  • -s, --single-core select the single core configuration, by default the manycore is selected
  • -c, --core-mask=VALUE specify the core activation mask, default: 1
  • -t, --thread-mask=VALUE specify the thread activation mask, default FF
  • -m, --mode=gui or batch specify the tool mode, it can run in either gui or batch mode, default: gui

This script starts the kernel specified in the command. The kernel ought be already compiled before running it on the NaplesPU architecture:

tools/vivado/setup_project.sh -k mmsc -c 3 -t $(( 16#F )) -m gui

Parameter -c 3 passes the one-hot mask for the core activation: 3 is (11)2, hence tile 0 and 1 will start their cores. Parameter -t $(( 16#F )) refers to the active thread mask for each core, it is a one-hot mask that states which thread is active in each core: F is (00001111)2 so thread 0 to 3 are running. Parameter -m gui states in which mode the simulator executes.

simulate.sh script

The simulate.sh script can be run as follow from the root of the project:

$ tools/modelsim/simulate.sh [option]

Options:

  • -h, --help show this help
  • -k, --kernel=KERNEL_NAME specify the kernel to use
  • -s, --single-core select the single core configuration, by default the manycore is selected
  • -c, --core-mask=VALUE specify the core activation mask, default: 1
  • -t, --thread-mask=VALUE specify the thread activation mask, default FF
  • -m, --mode=gui or batch specify the tool mode, it can run in either gui or batch mode, default: gui

This script starts the kernel specified in the command. The kernel ought be already compiled before running it on the NaplesPU architecture:

Full Documentation

The NaplesPU Hardware Architecture

The NaplesPU Toolchain

The NaplesPU Instruction Set Architecture

Extending NaplesPU

Heterogeneous Tile

Programming Model

Further information on MediaWiki

The NaplesPU project documentation will be based on MediaWiki. For information and guides on using MediaWiki, please see the links below: