FADDR0R1R2;//R0 = R1 + R2 with round to NEAREST FADD.RZR0R1R2;//R0 = R1 + R2 with round to ZERO FADD.RPR0R1R2;//R0 = R1 + R2 with round to POSITIVE(+Infinity) FADD.RMR0R1R2;//R0 = R1 + R2 with round to MINUS(-Infinity) FMULR0R1R2;//R0 = R1 * R2 with round to NEAREST FMUL.RZR0R1R2;//R0 = R1 * R2 with round to ZERO FMUL.RPR0R1R2;//R0 = R1 * R2 with round to POSITIVE(+Infinity) FMUL.RMR0R1R2;//R0 = R1 * R2 with round to MINUS(-Infinity)
DADDR0R2R4;//R0.64 = R2.64+ R4.64 with round to NEAREST DADD.RZR0R2R4;//R0.64 = R2.64+ R4.64 with round to ZERO DADD.RPR0R2R4;//R0.64 = R2.64+ R4.64 with round to POSITIVE(+Infinity) DADD.RMR0R2R4;//R0.64 = R2.64+ R4.64 with round to MINUS(-Infinity) DMULR0R2R4;//R0.64 = R2.64* R4.64 with round to NEAREST DMUL.RZR0R2R4;//R0.64 = R2.64* R4.64 with round to ZERO DMUL.RPR0R2R4;//R0.64 = R2.64* R4.64 with round to POSITIVE(+Infinity) DMUL.RMR0R2R4;//R0.64 = R2.64* R4.64 with round to MINUS(-Infinity)
对于忽略denormal数据的处理也有对应的modifier(FTZ = flush to ZERO)等
FMUL.FTZR3,R4,R5;
乘加FMA(Fused Multiply Add)
对于乘加运算而言,可以采用两条指令实现,如上面提到的FMUL,然后对乘法的结果调用FADD进行累加,IEEE-754-2008标准提出了FMA的计算标准(Fused Multiply Add),即可以一条指令完成d = a x b + c的运算,同时中间计算结果的精度是无限精度的,只在最后环节进行圆整操作,其可以提供相较于FMUL + FADD组合指令更高的精度。同时FMA指令的latency和FMUL或FADD是一样的,所以单条FMA指令的是FADD和或FADD吞吐的两倍,具体地,单精度浮点指令为FFMA(Float Fused Multiply Add),双精度浮点指令为DFMA(Double Fused Multiply Add)它们各自的不同圆整Modifier如下,
FFMAR0,R1,R2,R3;//R0 = R1 * R2 + R3 with round to NEAREST FFMA.RZR0,R1,R2,R3;//R0 = R1 * R2 + R3 with round to ZERO FFMARPR0,R1,R2,R3;//R0 = R1 * R2 + R3 with round to POSITIVE FFMA.RMR0,R1,R2,R3;//R0 = R1 * R2 + R3 with round to MINUS
DFMAR0,R2,R4,R6;//R0 = R2 * R4 + R6 with round to NEAREST DFMA.RZR0,R2,R4,R6;//R0 = R2 * R4 + R6 with round to ZERO DFMARPR0,R2,R4,R6;//R0 = R2 * R4 + R6 with round to POSITIVE DFMA.RMR0,R2,R4,R6;//R0 = R2 * R4 + R6 with round to MINUS
随路取负和绝对值
对于减法而言,NVidia的指令集架构中并没有独立的减法指令,而是在FADD中随路对数据取负实现,即d = a - b可以写做 d = a + (-b),同时FADD指令中随路做对b的取负操作:
FADDR0R1-R2;//R0 = R1 - R2 with round to NEAREST FADD.RZR0R1-R2;//R0 = R1 - R2 with round to ZERO FADD.RPR0R1-R2;//R0 = R1 - R2 with round to POSITIVE(+Infinity) FADD.RMR0R1-R2;//R0 = R1 - R2 with round to MINUS(-Infinity)