Difference between revisions of "Extending NaplesPU for OpenCL support"

From NaplesPU Documentation
Jump to: navigation, search
Line 1: Line 1:
 
This section describes how the OpenCL C kernel compilation is supported in the nu+ toolchain.
 
This section describes how the OpenCL C kernel compilation is supported in the nu+ toolchain.
 
Since Clang natively supports the IR generation of OpenCL C kernels, our work focused on the adaption of the generated code on the nu+ core. As the nu+ start routine requires to jump to the ''main'' label after having initialized all necessary resources, it is necessary to build a custom optimization pass, to auto-generate the main-function and to define the calling frame to the kernel function. Also OpenCL requires the implementation of custom libraries, extending the ''libc'' set.
 
Since Clang natively supports the IR generation of OpenCL C kernels, our work focused on the adaption of the generated code on the nu+ core. As the nu+ start routine requires to jump to the ''main'' label after having initialized all necessary resources, it is necessary to build a custom optimization pass, to auto-generate the main-function and to define the calling frame to the kernel function. Also OpenCL requires the implementation of custom libraries, extending the ''libc'' set.
 +
 +
== LLVM IR Optimization ==
 +
 +
An IR optimization can be of two different types:
 +
* Analysis, does not provide a modification of the intermediate representation;
 +
* Transform, produces a modified IR as output.
 +
 +
LLVM provides an API to create a new optimization pass, hierarchically implemented on the ''Pass'' Class from which the following ones inherit:
 +
* ModulePass: it is the most general pass. It allows the whole module and the function of which it is composed to be checked.
 +
* FunctionPass: it allows functions defined in all modules (except for the external ones) to be managed without pursuing a specific order. Functions cannot be deleted.
 +
* BasicBlockPass: it allows basic blocks to be managed . A basic block can be modified but not removed.
 +
 +
As a consequence of the classification provided above, for our purpose the definition of a ModulePass is the most suitable choice.
 +
 +
To write a custom module pass, it is necessary to define a new class by inheriting the ModulePass one and overriding the ''runOnModule()'' method. It contains the code that is executed at each LLVM per-module optimization. If the defined pass modifies the IR, the ''runOnModule'' function returns a ''true'' value. Otherwise, it returns a ''false'' value. As previously discussed, the custom pass should create the ''main'' function by parsing the arguments from the ''argv'' field, and then by calling the kernel function with the right parameters.
 +
 +
The generated code should be of the form as described below.
 +
<syntaxhighlight>
 +
; Function Attrs: nounwind
 +
define void @main(i32, i8** nocapture readonly) local_unnamed_addr #0 {
 +
  %3 = bitcast i8** %1 to i32*
 +
  %4 = load i32, i32* %3, align 4
 +
  ...
 +
  tail call void @kernel_f(i32 %4, ...) #2
 +
  ret void
 +
}
 +
</syntaxhighlight>
 +
 +
As a consequence, the ModulePass should be structured in the following points:
 +
 +
* Definition of the main function signature:
 +
<syntaxhighlight>
 +
FunctionType *main_type =
 +
TypeBuilder<void(int, char**), false>::get(ctx);
 +
Function *func =
 +
cast<Function>(M.getOrInsertFunction("main", main_type));
 +
</syntaxhighlight>
 +
 +
*Creation of an ''entry'' labeled basic block:
 +
<syntaxhighlight>
 +
BasicBlock *block = BasicBlock::Create(ctx, "entry", func, 0);
 +
builder.SetInsertPoint(block);
 +
</syntaxhighlight>
 +
 +
* For each kernel argument, create an aligned load instruction to retrieve the argument from the ''argv'' array and add it to the kernel arguments:
 +
<syntaxhighlight>
 +
argument_list[counter] = builder.CreateLoad(arg.getType(),
 +
builder.CreateBitCast(builder.CreateGEP(&argv,
 +
builder.getInt32(index)),
 +
arg.getType()->getPointerTo()));
 +
argument_list[counter]->setAlignment(4);
 +
kernel_arguments.push_back(argument_list[counter++]);
 +
</syntaxhighlight>
 +
 +
*Kernel function call:
 +
<syntaxhighlight>
 +
builder.CreateCall(kernel_function, kernel_arguments);
 +
</syntaxhighlight>
 +
 +
Once the pass is defined, the next step consists to register the pass in the ''PassManager'' to add it to the IR optimization pipeline.
 +
LLVM provides a set of macros that are to be used to initialize the pass:
 +
 +
<syntaxhighlight>
 +
INITIALIZE_PASS_BEGIN(...)
 +
INITIALIZE_PASS_END(...)
 +
</syntaxhighlight>
 +
 +
and to define the pass dependencies:
 +
<syntaxhighlight>
 +
INITIALIZE_PASS_DEPENDENCIES(...)
 +
</syntaxhighlight>
 +
 +
The pass initialization must be linked through the ''InitializePasses.h'' header file, in which the following function has to be defined:
 +
<syntaxhighlight>
 +
void initialize<PassName>Pass(PassRegistry &);
 +
</syntaxhighlight>
 +
 +
In addition, the specific ModulePass constructor must be declared in the ''LinkAllPasses.h'' header file.
 +
 +
By following these steps, the optimization pass created is automatically chained to the ''opt'' tool pipeline. Moreover, it is useful to automatically execute the pass in each ''clang'' compilation. In order to realize the desired behaviour, the code generation step has to be modified. Clang provides the ''EmitAssemblyHelper::CreatePasses'' method to add a pass to the [[PassManager]].
 +
 +
As an example, consider the following trivial OpenCL kernel:
 +
 +
<syntaxhighlight>
 +
__kernel void kernel_function(unsigned param,
 +
  __global unsigned* out)
 +
{
 +
 +
unsigned int i = 0;
 +
*out = i + param;
 +
 +
}
 +
</syntaxhighlight>
 +
 +
The optimization pass creates the main function as follows:
 +
<syntaxhighlight>
 +
define void @main(i32, i8** nocapture readonly) local_unnamed_addr #1 {
 +
entry:
 +
  %2 = getelementptr i8*, i8** %1, i64 6
 +
  %3 = bitcast i8** %2 to i32*
 +
  %4 = load i32, i32* %3, align 4
 +
  %5 = getelementptr i8*, i8** %1, i64 7
 +
  %6 = bitcast i8** %5 to i32**
 +
  %7 = load i32*, i32** %6, align 4
 +
  tail call void @kernel_function(i32 %4, i32* %7)
 +
  ret void
 +
}
 +
</syntaxhighlight>
 +
 +
As a result, the nu+ processor is now capable to run OpenCL kernels.

Revision as of 14:58, 14 May 2019

This section describes how the OpenCL C kernel compilation is supported in the nu+ toolchain. Since Clang natively supports the IR generation of OpenCL C kernels, our work focused on the adaption of the generated code on the nu+ core. As the nu+ start routine requires to jump to the main label after having initialized all necessary resources, it is necessary to build a custom optimization pass, to auto-generate the main-function and to define the calling frame to the kernel function. Also OpenCL requires the implementation of custom libraries, extending the libc set.

LLVM IR Optimization

An IR optimization can be of two different types:

  • Analysis, does not provide a modification of the intermediate representation;
  • Transform, produces a modified IR as output.

LLVM provides an API to create a new optimization pass, hierarchically implemented on the Pass Class from which the following ones inherit:

  • ModulePass: it is the most general pass. It allows the whole module and the function of which it is composed to be checked.
  • FunctionPass: it allows functions defined in all modules (except for the external ones) to be managed without pursuing a specific order. Functions cannot be deleted.
  • BasicBlockPass: it allows basic blocks to be managed . A basic block can be modified but not removed.
As a consequence of the classification provided above, for our purpose the definition of a ModulePass is the most suitable choice.
To write a custom module pass, it is necessary to define a new class by inheriting the ModulePass one and overriding the runOnModule() method. It contains the code that is executed at each LLVM per-module optimization. If the defined pass modifies the IR, the runOnModule function returns a true value. Otherwise, it returns a false value. As previously discussed, the custom pass should create the main function by parsing the arguments from the argv field, and then by calling the kernel function with the right parameters. 
The generated code should be of the form as described below.
; Function Attrs: nounwind
define void @main(i32, i8** nocapture readonly) local_unnamed_addr #0 {
  %3 = bitcast i8** %1 to i32*
  %4 = load i32, i32* %3, align 4
  ...
  tail call void @kernel_f(i32 %4, ...) #2
  ret void
}

As a consequence, the ModulePass should be structured in the following points:

  • Definition of the main function signature:
FunctionType *main_type = 
TypeBuilder<void(int, char**), false>::get(ctx);
Function *func = 
cast<Function>(M.getOrInsertFunction("main", main_type));
  • Creation of an entry labeled basic block:
BasicBlock *block = BasicBlock::Create(ctx, "entry", func, 0);
builder.SetInsertPoint(block);
  • For each kernel argument, create an aligned load instruction to retrieve the argument from the argv array and add it to the kernel arguments:
argument_list[counter] = builder.CreateLoad(arg.getType(),
	builder.CreateBitCast(builder.CreateGEP(&argv, 
	builder.getInt32(index)), 
	arg.getType()->getPointerTo()));
argument_list[counter]->setAlignment(4);
kernel_arguments.push_back(argument_list[counter++]);
  • Kernel function call:
builder.CreateCall(kernel_function, kernel_arguments);

Once the pass is defined, the next step consists to register the pass in the PassManager to add it to the IR optimization pipeline. LLVM provides a set of macros that are to be used to initialize the pass:

INITIALIZE_PASS_BEGIN(...)
INITIALIZE_PASS_END(...)

and to define the pass dependencies:

INITIALIZE_PASS_DEPENDENCIES(...)

The pass initialization must be linked through the InitializePasses.h header file, in which the following function has to be defined:

void initialize<PassName>Pass(PassRegistry &);

In addition, the specific ModulePass constructor must be declared in the LinkAllPasses.h header file.

By following these steps, the optimization pass created is automatically chained to the opt tool pipeline. Moreover, it is useful to automatically execute the pass in each clang compilation. In order to realize the desired behaviour, the code generation step has to be modified. Clang provides the EmitAssemblyHelper::CreatePasses method to add a pass to the PassManager.

As an example, consider the following trivial OpenCL kernel:

__kernel void kernel_function(unsigned param,
  __global unsigned* out)
{

 unsigned int i = 0;
 *out = i + param;

}

The optimization pass creates the main function as follows:

define void @main(i32, i8** nocapture readonly) local_unnamed_addr #1 {
entry:
  %2 = getelementptr i8*, i8** %1, i64 6
  %3 = bitcast i8** %2 to i32*
  %4 = load i32, i32* %3, align 4
  %5 = getelementptr i8*, i8** %1, i64 7
  %6 = bitcast i8** %5 to i32**
  %7 = load i32*, i32** %6, align 4
  tail call void @kernel_function(i32 %4, i32* %7)
  ret void
}

As a result, the nu+ processor is now capable to run OpenCL kernels.