Main Page

From NaplesPU Documentation

Revision as of 14:38, 19 June 2019 by Mirko (talk | contribs) (→‎Documentation)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

NaplesPU is an open-source GPU-like compute core, developed by Alessandro Cilardo's research group at the University of Naples Federico II / CeRICT, now being integrated into the MANGO FETHPC project. The main objective of NaplesPU is to enable resource-efficient HPC based on special-purpose customized hardware. In MANGO, the GPU-like core is meant to be used to support architecture-level exploration for massively parallel manycore systems but, as one of its primary objectives, NaplesPU also targets FPGA-accelerated HPC systems. In that respect, NaplesPU will provide an FPGA overlay solution, used to readily build tailored processing elements preserving software support, guaranteeing improved resource efficiency, yet avoiding the development of a dedicated accelerator from scratch through the support for familiar programming models.

The NaplesPU manycore is a parametrizable regular mesh Network on Chip (NoC) of configurable tile. Each NPU tile has the same basic components, it provides a configurable GPU-like open-source softcore meant to be used as a configurable FPGA overlay. This HPC-oriented accelerator merges the SIMT paradigm with the vector processor model. Furthermore, each tile has a Cache Controller and a Directory Controller, those components handle data coherence between different cores in different tiles. On top of the customized hardware core, we are also developing a NaplesPU compiler backend relying on the LLVM infrastructure.

The core is based on a RISC in-order pipeline. Its control unit is intentionally kept lightweight. The architecture masks memory and operation latencies by heavily relying on hardware multithreading. By ensuring a light control logic, the core can devote most of its resources for accelerating computing in highly data-parallel kernels. In the hardware multithreading NPU core, each hardware thread has its own PC, register file, and control registers. The number of threads is user configurable. An NPU hardware thread is equivalent to a wavefront in the AMD terminology and a CUDA warp in the NVIDIA terminology. The processor uses a deep pipeline to improve clock speed.

1 Getting started
2 Documentation
3 Further information on MediaWiki

Getting started

This section shows how to approach the project for simulating or implementing a kernel for NaplesPU architecture. Kernel means a complex application such as matrix multiplication, transpose of a matrix or similar that is written in a high-level programming language, for example, C/C++.

Required software

Simulation or implementation of any kernel relies on the following dependencies:

Git
Xilinx Vivado 2018.2 or ModelSim (e.g. Questa Sim-64 vsim 10.6c_1)
NaplesPU toolchain

Building process

The first step is to obtain the source code of NaplesPU architecture from the official repository by cloning it from [1]

In Ubuntu Linux environment, this step is fulfilled by starting following command:

$ git clone https://gitlab.com/vincenscotti/nuplus

In the NaplesPU repository, the toolchain is a git sub-module of the repository so is needed to be created and updated. In Ubuntu Linux environment, just type the following command in a root folder of the repository:

$ git submodule update --init

Then, the third step is to install a toolchain. This process is described [here].

Simulate a kernel

The following folders are of particular interest for the purpose:

software, stores all kernels;
tools, stores all scripts for simulation.

For simulating a kernel there are three ways:

starting test.sh script
starting setup_project.sh from the root folder of the repository, if the simulator software is Vivado;
starting simulate.sh from the root folder of the repository, if the simulator software is ModelSim.

First of all, source Vivado or ModelSim in the shell. This step is mandatory for all ways. In Ubuntu Linux environment:

$ source Vivado/folder/location/settingXX.sh

where XX depends on the installed version of Vivado (32 o 64 bit).

test.sh script

The test.sh script, located in the npu/tools folder, runs all the kerels listed in it and compares the output from NPU with the expected result produced by a standard x86 architecture:

$ ./test.sh [option]

Options are:

-h, --help show this help
-t, --tool=vsim or vivado specify the tool to use, default: vsim
-cn, --core-numb=VALUE specify the core number, default: 1
-tn, --thread-numb=VALUE specify the thread number, default: 8

The test.sh script automatically compiles the kernels and runs them on NaplesPU and x86 architecture. Once the simulation is terminated, for each kernel, the results of both executions are compared by a Python script for verifying the correctness.

In the tools folder, the file cosim.log stores the output of the simulator.

setup_project.sh script

The setup_project.sh script can be run as follow from the root of the project:

$ tools/vivado/setup_project.sh [option]

Options are:

-h, --help show this help
-k, --kernel=KERNEL_NAME specify the kernel to use
-s, --single-core select the single core configuration, by default the manycore is selected
-c, --core-mask=VALUE specify the core activation mask, default: 1
-t, --thread-mask=VALUE specify the thread activation mask, default FF
-m, --mode=gui or batch specify the tool mode, it can run in either gui or batch mode, default: gui

This script starts the kernel specified in the command. The kernel should be already compiled before running it on the NaplesPU architecture:

tools/vivado/setup_project.sh -k mmsc -c 3 -t $(( 16#F )) -m gui

Parameter -c 3 passes the one-hot mask for the core activation: 3 is (11)2, hence tile 0 and 1 will start their cores. Parameter -t $(( 16#F )) refers to the active thread mask for each core, it is a one-hot mask that states which thread is active in each core: F is (00001111)2 so thread 0 to 3 are running. Parameter -m gui states in which mode the simulator executes.

simulate.sh script

The simulate.sh script can be run as follow from the root of the project:

$ tools/modelsim/simulate.sh [option]

Options:

-h, --help show this help
-k, --kernel=KERNEL_NAME specify the kernel to use
-s, --single-core select the single core configuration, by default the manycore is selected
-c, --core-mask=VALUE specify the core activation mask, default: 1
-t, --thread-mask=VALUE specify the thread activation mask, default FF
-m, --mode=gui or batch specify the tool mode, it can run in either gui or batch mode, default: gui

This script starts the kernel specified in the command. The kernel should be already compiled before running it on the NaplesPU architecture: