Programming Model

From NaplesPU Documentation
Revision as of 12:32, 3 June 2019 by Mirko (talk | contribs) (Memory Model Matching)
Jump to: navigation, search

OpenCL support for Nu+

OpenCL defines a platform as a set of computing devices on which the host is connected to. Each device is further divided into several compute units (CUs), each of them defined as a collection of processing elements (PEs). Recall that the target platform is architecturally designed around a single core, structured in terms of a set of at most eight hardware threads. Each hardware threads competes with each other to have access to sixteen hardware lanes, to execute both scalar and vector operations on 32-bit wide integer or floating-point operands.

Plat model.png

The computing device abstraction is physically mapped on the nu+ many-core architecture. The nu+ many-core can be configured in terms of the number of cores. Each nu+ core maps on the OpenCL Compute Unit. Internally, the nu+ core is composed of hardware threads, each of them represents the abstraction of the OpenCL processing element.

Execution Model Matching

From the execution model point of view, OpenCL relies on an N-dimensional index space, where each point represents a kernel instance execution. Since the physical kernel instance execution is done by the hardware threads, the OpenCL work-item is mapped onto a nu+ single hardware-thread. Consequently, a work-group is defined as a set of hardware threads, and all work-items in a work-group executes on a single computing unit.

Execution model.png

Memory Model Matching

OpenCL partitions the memory in four different spaces:

  • the global and constant spaces: elements in those spaces are accessible by all work-items in all work-group.
  • the local space: visible only to work-items within a work-group
  • the private space: visible to only a single work-item.

The target-platform is provided of a DDR memory, that is the device memory in OpenCL nomenclature. Consequently, variables are physically mapped on this memory. The compiler itself, by looking at the address-space qualifier, verifies the OpenCL constraints are satisfied.

Each nu+ core is also equipped with a Scratchpad Memory, an on-chip non-coherent memory portion exclusive to each core. This memory is compliant with the OpenCL local memory features. However, referring to the state of the platform implementation, since it is not possible to instantiate more than one core without having timing violation, the concepts of local and global memory are merged into the nu+ global memory.

Memory mod.png

This memory section is private to each hardware thread, that is the OpenCL work-item, and cannot be addressed by others. As a result, each stack area acts as the OpenCL private memory.