Difference between revisions of "ISA"
(→R type instructions) |
|||
Line 302: | Line 302: | ||
| Rd = Ra <= Rb | | Rd = Ra <= Rb | ||
|- | |- | ||
− | | | + | | cmpugt |
| 20 | | 20 | ||
| unsigned compare greater then | | unsigned compare greater then | ||
| Rd = Ra > Rb | | Rd = Ra > Rb | ||
|- | |- | ||
− | | | + | | cmpuge |
| 21 | | 21 | ||
| unsigned compare greater or equal | | unsigned compare greater or equal | ||
| Rd = Ra >= Rb | | Rd = Ra >= Rb | ||
|- | |- | ||
− | | | + | | cmpult |
| 22 | | 22 | ||
| unsigned compare less then | | unsigned compare less then | ||
| Rd = Ra < Rb | | Rd = Ra < Rb | ||
|- | |- | ||
− | | | + | | cmpule |
| 23 | | 23 | ||
| unsigned compare less or equal | | unsigned compare less or equal | ||
| Rd = Ra <= Rb | | Rd = Ra <= Rb | ||
|- | |- | ||
− | | | + | | cmpfeq |
| 37 | | 37 | ||
| floating point compare equal | | floating point compare equal | ||
| Rd = Ra == Rb | | Rd = Ra == Rb | ||
|- | |- | ||
− | | | + | | cmpfne |
| 38 | | 38 | ||
| floating point compare not equal | | floating point compare not equal | ||
| Rd = Ra != Rb | | Rd = Ra != Rb | ||
|- | |- | ||
− | | | + | | cmpfgt |
| 39 | | 39 | ||
| floating point compare greater then | | floating point compare greater then | ||
| Rd = Ra > Rb | | Rd = Ra > Rb | ||
|- | |- | ||
− | | | + | | cmpfge |
| 40 | | 40 | ||
| floating point compare greater or equal | | floating point compare greater or equal | ||
| Rd = Ra >= Rb | | Rd = Ra >= Rb | ||
|- | |- | ||
− | | | + | | cmpflt |
| 41 | | 41 | ||
| floating point compare less then | | floating point compare less then | ||
| Rd = Ra < Rb | | Rd = Ra < Rb | ||
|- | |- | ||
− | | | + | | cmpfle |
| 42 | | 42 | ||
| floating point compare less or equal | | floating point compare less or equal |
Revision as of 18:57, 23 November 2017
Contents
Register File
The nu+ register file is composed by a scalar register file and a vector register file; each one containing 64 registers.
The scalar register file has 64 registers. The first 58 are general purpose registers, while the remaining 8 are special purpose registers. Each scalar register can store up to 32 bits of data. However the nu+ architecture can support also 64 bits of data, storing it in a couple of contiguous registers.
The vector register file has 64 general purpose registers
Each vector register can store up to 512 bits of data. Each vector can store 16 x 32 bits or 8 x 64 bits of data.
Data Types
The following table sums up the data types that are possible to use in nu+. The Type column has the C/C++ type names, the LLVM type column presents the type names used in LLVM and the Register column shows the register type in which a value of a specific type is stored.
The highlighted types are those the architecture natively supports, given the register files width. The others are obtained through extension, so that they can be seen as the supported ones. Their advantage resides in a more efficient use of the system memory.
Type | LLVM Type | Register | Notes |
---|---|---|---|
bool | i1 | scalar (32 bits) | It is expanded to 32 bits |
char | i8 | scalar (32 bits) | It is expanded to 32 bits |
short | i16 | scalar (32 bits) | It is expanded to 32 bits |
int | i32 | scalar (32 bits) | |
float | f32 | scalar (32 bits) | |
long long int | i64 | scalar (64 bits) | |
double | f64 | scalar (64 bits) | |
vec16i8, vec16u8 | v16i8 | vector (16 x 32 bits) | It is expanded to 32 bits vector |
vec16i16, vec16u16 | v16i16 | vector (16 x 32 bits) | It is expanded to 32 bits vector |
vec16i32, vec16u32 | v16i32 | vector (16 x 32 bits) | |
vec16f32 | v16f32 | vector (16 x 32 bits) | |
vec8i8, vec8u8 | v8i8 | vector (8 x 64 bits) | It is expanded to 64 bits vector |
vec8i16, vec8u16 | v8i16 | vector (8 x 64 bits) | It is expanded to 64 bits vector |
vec8i32, vec8u32 | v8i32 | vector (8 x 64 bits) | It is expanded to 64 bits vector |
vec8f32 | v8f32 | vector (16 x 32 bits) | It is considered as a 16 elements vector |
vec8i64, vec8u64 | v8i64 | vector (8 x 64 bits) | |
vec8f64 | v8f64 | vector (8 x 64 bits) |
Instructions Format
The nu+ instructions have a fixed length of 32 bits. They are grouped in seven types:
- The R type includes the logical and arithmetic operations and memory operations.
- The I type includes the logical and arithmetic operations between a register operand and an immediate operand.
- The MOVEI type includes the load operations of an immediate operand in a register.
- The C type used for control operations and for synchronization instructions.
- The JR type includes jump instructions.
- The M type includes the instructions used to access memory.
- The M-poly type is used for memory instructions which uses a polyhedral access pattern.
R type instructions
- RR (Register to Register) has a destination register and two source registers.
- RI (Register Immediate) has a destination register and one source registers and an immediate encoded in the instruction word.
or | 1 | or | Rb |
---|---|---|---|
and | 2 | and | Rd = Ra & Rb |
xor | 3 | xor | Rd = Ra ^ Rb |
add | 4 | addition | Rd = Ra + Rb |
sub | 5 | subtraction | Rd = Ra – Rb |
mull | 6 | multiplication | Rd = Ra * Rb |
mulhs | 7 | high multiply | Rd = Ra * Rb |
mulhu | 8 | high multiply unsigned | Rd = Ra * Rb |
ashr | 9 | arithmetic shift right | Rd = Ra ‘>> Rb |
shr | 10 | shift right | Rd = Ra >> Rb |
shl | 11 | shift left | Rd = Ra << Rb |
clz | 12 | count leading zeros | |
ctz | 13 | count trailing zeros | |
shuffle | 24 | vector shuffle | Rd[i] = Ra[Rb[i]] |
getlane | 25 | Get lane from vector | Rd = Ra[Rb] |
move | 32 | move register | Rd = Ra |
fadd | 33 | floating point add | Rd = Ra + Rb |
fsub | 34 | floating point sub | Rd = Ra – Rb |
fmul | 35 | floating point multiplication | Rd = Ra * Rb |
fdiv | 36 | floating point division | Rd = Ra / Rb |
sext8 | 43 | sign extend 8 bits | |
sext16 | 44 | sign extend 16 bits | |
sext32 | 45 | sign extend 32 bits | |
f32tof64 | 46 | cast float to double | |
f64tof32 | 47 | cast double to float | |
i32tof32 | 48 | cast integer to float | |
f32toi32 | 49 | cast float to integer | |
cmpeq | 14 | compare equal | Rd = Ra == Rb |
cmpne | 15 | compare not equal | Rd = Ra != Rb |
cmpgt | 16 | compare greater then | Rd = Ra > Rb |
cmpge | 17 | compare greater or equal | Rd = Ra >= Rb |
cmplt | 18 | compare less then | Rd = Ra < Rb |
cmple | 19 | compare less or equal | Rd = Ra <= Rb |
cmpugt | 20 | unsigned compare greater then | Rd = Ra > Rb |
cmpuge | 21 | unsigned compare greater or equal | Rd = Ra >= Rb |
cmpult | 22 | unsigned compare less then | Rd = Ra < Rb |
cmpule | 23 | unsigned compare less or equal | Rd = Ra <= Rb |
cmpfeq | 37 | floating point compare equal | Rd = Ra == Rb |
cmpfne | 38 | floating point compare not equal | Rd = Ra != Rb |
cmpfgt | 39 | floating point compare greater then | Rd = Ra > Rb |
cmpfge | 40 | floating point compare greater or equal | Rd = Ra >= Rb |
cmpflt | 41 | floating point compare less then | Rd = Ra < Rb |
cmpfle | 42 | floating point compare less or equal | Rd = Ra <= Rb |
I type instructions
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
ori | 1 | or | Imm |
andi | 2 | and | Rd = Ra & Imm |
xori | 3 | xor | Rd = Ra ^ Imm |
addi | 4 | addition | Rd = Ra + Imm |
subi | 5 | subtraction | Rd = Ra – Imm |
mulli | 6 | multiplication | Rd = Ra * Imm |
mulhi | 7 | high multiply | Rd = Ra * Imm |
mulhui | 8 | high multiply unsigned | Rd = Ra * Imm |
ashri | 9 | arithmetic shift right | Rd = Ra ‘>> Imm |
shri | 10 | shift right | Rd = Ra >> Imm |
shli | 11 | shift left | Rd = Ra << Imm |
getlane | 25 | Get lane from vector | Rd = Ra[Imm] |
MOVEI type instructions
MVI (Move Immediate) has a destination register and a 16 bit instruction encoded immediate.
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
moveil | 0 | move the 16 less significant bits | Rd = Ra & 0xFFFF |
moveih | 1 | move the 16 most significant bits | Rd = (Ra >> 16) & 0xFFFF |
movei | 2 | move the 16 less significant bits with zero extension | Rd = (Rd ^ Rd) & (Ra & 0xFFFF) |
C type instructions
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
barrier_core | 0 | barrier through all the nu+’s cores | |
barrier_thread | 1 | barrier through all the threads of a core | |
flush | 2 | flush a cache line to the system memory |
JR type instructions
J type instructions
M type instructions
MEM (Memory Instruction) has a destination/source field, in case of load the first register asses the destination register, otherwise in case of store the first register contains the store value. Next in both cases there is the base address and the immediate. The sum of base address and immediate will give the effective memory address.
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
loadXD_s8 | 0 | load 1 byte with sign extension | Rd = [Rbase + Offset] |
loadXD_s16 | 1 | load 2 bytes with sign extension | Rd = [Rbase + Offset] |
load32D | 2 | load 1 word | Rd = [Rbase + Offset] |
loadXD_u8 | 4 | load 1 byte with zero extension | Rd = [Rbase + Offset] |
loadXD_u16 | 5 | load 2 bytes with zero extension | Rd = [Rbase + Offset] |
load64D_s32 | 2 | load 1 word sign-extended to 1 double-word | Rd = [Rbase + Offset] |
load64D_u32 | 6 | load 1 word zero-extended to 1 double-word | Rd = [Rbase + Offset] |
load64D | 3 | load 1 double-word | Rd = [Rbase + Offset] |
loadD_vYi8 | 7 | load a vector of Y bytes with sign extension | Rd = [Rbase + Offset] |
loadD_vYi16 | 8 | load a vector of Y 2 bytes with sign extension | Rd = [Rbase + Offset] |
loadD_vYi32 | 9 | load a vector of Y words with sign extension | Rd = [Rbase + Offset] |
loadD_v8i64 | 10 | load a vector of 8 double-words | Rd = [Rbase + Offset] |
loadD_vYu8 | 11 | load a vector of Y bytes with zero extension | Rd = [Rbase + Offset] |
loadD_vYu16 | 12 | load a vector of Y 2 bytes with zero extension | Rd = [Rbase + Offset] |
loadD_vYu32 | 13 | load a vector of Y words with zero extension | Rd = [Rbase + Offset] |
loadD_g_32 | 16 | load 16 words from different memory addresses | Rd[i] = [Rbase[i]] |
storeXD_8 | 32 | store 1 byte | [Rbase + Offset] = Rs |
storeXD_16 | 33 | store 2 bytes | [Rbase + Offset] = Rs |
store32D | 34 | store 1 word | [Rbase + Offset] = Rs |
store64D_32 | 34 | store 1 word | [Rbase + Offset] = Rs |
store64D | 35 | store 1 double-word | [Rbase + Offset] = Rs |
storeD_vYi8 | 32 | store Y bytes | [Rbase + Offset] = Rs |
storeD_vYi16 | 33 | store Y 2 bytes | [Rbase + Offset] = Rs |
storeD_vYi32 | 34 | store Y words | [Rbase + Offset] = Rs |
storeD_v8i64 | 35 | store Y double-words | [Rbase + Offset] = Rs |
storeD_s_32 | 42 | store 16 words to different memory addresses | [Rbase[i]] = Rs[i] |