Difference between revisions of "ISA"
From NaplesPU Documentation
Line 2: | Line 2: | ||
The nu+ register file is composed by a '''scalar''' register file and a '''vector''' register file; each one containing 64 registers. | The nu+ register file is composed by a '''scalar''' register file and a '''vector''' register file; each one containing 64 registers. | ||
− | [[File:ScalarRegFile.png| | + | [[File:ScalarRegFile.png|800px]] |
− | [[File:VectorRegFile.png| | + | [[File:VectorRegFile.png|800px]] |
= Instructions Format = | = Instructions Format = |
Revision as of 08:40, 25 October 2017
Contents
Register File
The nu+ register file is composed by a scalar register file and a vector register file; each one containing 64 registers.
Instructions Format
The nu+ instructions have a fixed length of 32 bits. They are grouped in seven types:
- The R type includes the logical and arithmetic operations and memory operations.
- The I type includes the logical and arithmetic operations between a register operand and an immediate operand.
- The MOVEI type includes the load operations of an immediate operand in a register.
- The C type used for control operations and for synchronization instructions.
- The JR type includes jump instructions.
- The M type includes the instructions used to access memory.
- The M-poly type is used for memory instructions which uses a polyhedral access pattern.
R type instructions
- RR (Register to Register) has a destination register and two source registers.
- RI (Register Immediate) has a destination register and one source registers and an immediate encoded in the instruction word.
or | 1 | or | Rb |
---|---|---|---|
and | 2 | and | Rd = Ra & Rb |
xor | 3 | xor | Rd = Ra ^ Rb |
add | 4 | addition | Rd = Ra + Rb |
sub | 5 | subtraction | Rd = Ra – Rb |
mull | 6 | multiplication | Rd = Ra * Rb |
mulh | 7 | high multiply | Rd = Ra * Rb |
mulhu | 8 | high multiply unsigned | Rd = Ra * Rb |
ashr | 9 | arithmetic shift right | Rd = Ra ‘>> Rb |
shr | 10 | shift right | Rd = Ra >> Rb |
shl | 11 | shift left | Rd = Ra << Rb |
clz | 12 | count leading zeros | |
ctz | 13 | count trailing zeros | |
shuffle | 24 | vector shuffle | Rd[i] = Ra[Rb[i]] |
getlane | 25 | Get lane from vector | Rd = Ra[Rb] |
move | 32 | move register | Rd = Ra |
add_f | 33 | floating point add | Rd = Ra + Rb |
sub_f | 34 | floating point sub | Rd = Ra – Rb |
mul_f | 35 | floating point multiplication | Rd = Ra * Rb |
div_f | 36 | floating point division | Rd = Ra / Rb |
sext8 | 43 | sign extend 8 bits | |
sext16 | 44 | sign extend 16 bits | |
sext32 | 45 | sign extend 32 bits | |
f32tof64 | 46 | cast float to double | |
f64tof32 | 47 | cast double to float | |
i32tof32 | 48 | cast integer to float | |
f32toi32 | 49 | cast float to integer | |
cmpeq | 14 | compare equal | Rd = Ra == Rb |
cmpne | 15 | compare not equal | Rd = Ra != Rb |
cmpgt | 16 | compare greater then | Rd = Ra > Rb |
cmpge | 17 | compare greater or equal | Rd = Ra >= Rb |
cmplt | 18 | compare less then | Rd = Ra < Rb |
cmple | 19 | compare less or equal | Rd = Ra <= Rb |
cmpgt_u | 20 | unsigned compare greater then | Rd = Ra > Rb |
cmpge_u | 21 | unsigned compare greater or equal | Rd = Ra >= Rb |
cmplt_u | 22 | unsigned compare less then | Rd = Ra < Rb |
cmple_u | 23 | unsigned compare less or equal | Rd = Ra <= Rb |
cmpeq_fp | 37 | floating point compare equal | Rd = Ra == Rb |
cmpne_fp | 38 | floating point compare not equal | Rd = Ra != Rb |
cmpgt_fp | 39 | floating point compare greater then | Rd = Ra > Rb |
cmpge_fp | 40 | floating point compare greater or equal | Rd = Ra >= Rb |
cmplt_fp | 41 | floating point compare less then | Rd = Ra < Rb |
cmple_fp | 42 | floating point compare less or equal | Rd = Ra <= Rb |
I type instructions
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
ori | 1 | or | Imm |
andi | 2 | and | Rd = Ra & Imm |
xori | 3 | xor | Rd = Ra ^ Imm |
addi | 4 | addition | Rd = Ra + Imm |
subi | 5 | subtraction | Rd = Ra – Imm |
mulli | 6 | multiplication | Rd = Ra * Imm |
mulhi | 7 | high multiply | Rd = Ra * Imm |
mulhui | 8 | high multiply unsigned | Rd = Ra * Imm |
ashri | 9 | arithmetic shift right | Rd = Ra ‘>> Imm |
shri | 10 | shift right | Rd = Ra >> Imm |
shli | 11 | shift left | Rd = Ra << Imm |
getlane | 25 | Get lane from vector | Rd = Ra[Imm] |
MOVEI type instructions
MVI (Move Immediate) has a destination register and a 16 bit instruction encoded immediate.
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
moveil | 0 | move the 16 less significant bits | Rd = Ra & 0xFFFF |
moveih | 1 | move the 16 most significant bits | Rd = (Ra >> 16) & 0xFFFF |
movei | 2 | move the 16 less significant bits with zero extension | Rd = (Rd ^ Rd) & (Ra & 0xFFFF) |
C type instructions
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
barrier_core | 0 | barrier through all the nu+’s cores | |
barrier_thread | 1 | barrier through all the threads of a core | |
flush | 2 | flush a cache line to the system memory |
JR type instructions
J type instructions
M type instructions
MEM (Memory Instruction) has a destination/source field, in case of load the first register asses the destination register, otherwise in case of store the first register contains the store value. Next in both cases there is the base address and the immediate. The sum of base address and immediate will give the effective memory address.
Mnemonic | Opcode | Meaning | Operation |
---|---|---|---|
loadXD_s8 | 0 | load 1 byte with sign extension | Rd = [Rbase + Offset] |
loadXD_s16 | 1 | load 2 bytes with sign extension | Rd = [Rbase + Offset] |
load32D | 2 | load 1 word | Rd = [Rbase + Offset] |
loadXD_u8 | 4 | load 1 byte with zero extension | Rd = [Rbase + Offset] |
loadXD_u16 | 5 | load 2 bytes with zero extension | Rd = [Rbase + Offset] |
load64D_s32 | 2 | load 1 word sign-extended to 1 double-word | Rd = [Rbase + Offset] |
load64D_u32 | 6 | load 1 word zero-extended to 1 double-word | Rd = [Rbase + Offset] |
load64D | 3 | load 1 double-word | Rd = [Rbase + Offset] |
loadD_vYi8 | 7 | load a vector of Y bytes with sign extension | Rd = [Rbase + Offset] |
loadD_vYi16 | 8 | load a vector of Y 2 bytes with sign extension | Rd = [Rbase + Offset] |
loadD_vYi32 | 9 | load a vector of Y words with sign extension | Rd = [Rbase + Offset] |
loadD_v8i64 | 10 | load a vector of 8 double-words | Rd = [Rbase + Offset] |
loadD_vYu8 | 11 | load a vector of Y bytes with zero extension | Rd = [Rbase + Offset] |
loadD_vYu16 | 12 | load a vector of Y 2 bytes with zero extension | Rd = [Rbase + Offset] |
loadD_vYu32 | 13 | load a vector of Y words with zero extension | Rd = [Rbase + Offset] |
loadD_g_32 | 16 | load 16 words from different memory addresses | Rd[i] = [Rbase[i]] |
storeXD_8 | 32 | store 1 byte | [Rbase + Offset] = Rs |
storeXD_16 | 33 | store 2 bytes | [Rbase + Offset] = Rs |
store32D | 34 | store 1 word | [Rbase + Offset] = Rs |
store64D_32 | 34 | store 1 word | [Rbase + Offset] = Rs |
store64D | 35 | store 1 double-word | [Rbase + Offset] = Rs |
storeD_vYi8 | 32 | store Y bytes | [Rbase + Offset] = Rs |
storeD_vYi16 | 33 | store Y 2 bytes | [Rbase + Offset] = Rs |
storeD_vYi32 | 34 | store Y words | [Rbase + Offset] = Rs |
storeD_v8i64 | 35 | store Y double-words | [Rbase + Offset] = Rs |
storeD_s_32 | 42 | store 16 words to different memory addresses | [Rbase[i]] = Rs[i] |