K1实现的IME扩展指令规范

K1实现了IME扩展的相关指令,如何查询该规范中每条指令的格式

比如vmadot指令的opcode,func3,func7的值

https://github.com/space-mit/riscv-ime-extension-spec

这个仓库中的信息不太完整

2 个赞

可以在这里Release The first version of SpacemiT IME extension spec. · space-mit/riscv-ime-extension-spec · GitHub 下载PDF,在第19页查看

1 个赞

update下,仓库挪到 spacemit-com/riscv-ime-extension-spec: Following the RISC-V IME extension standard, and reusing Vector register resources, these instructions can bring more than a tenfold performance improvement to AI applications at a very small hardware cost

1 个赞

The Issue

The spec explicitly states:

NOTE: This extension instructions only support cases where LMUL is less than or equal to 1.

However, the vmadot-gemm-demo.c uses LMUL=2 for the accumulator:

vsetvli t0, zero, e32, m2  // LMUL=2 - VIOLATES SPEC!

This violates the IME specification’s LMUL ≤ 1 constraint.

Possible Explanations

  1. Implementation-specific extension: The hardware may support LMUL=2 for accumulators despite the spec
  2. Demo code bug: Might work accidentally but isn’t spec-compliant
  3. Different interpretation: The accumulator storage might use a different mechanism
  4. Spec vs Implementation gap: The implementation predates or differs from the spec

Verification

The demo executable runs successfully on the native Spacemit RISC-V machine, producing correct results that match the reference implementation. This suggests:

  • The hardware does support LMUL=2 for accumulators
  • The implementation may be more permissive than the spec
  • Or the LMUL=2 is handled differently than standard RVV semantics
1 个赞

Test Results

LMUL=2 Version (Original Demo)

The vmadot-gemm-4x8x4 executable ran successfully on native Spacemit hardware:

Test successful. CRef equal to C.

Both reference (nested loops) and vmadot-accelerated implementations produced identical results, confirming correct operation of the 4×4×8 MAC unit with VLEN=256, SEW=8.

LMUL=1 Version (Spec-Compliant)

A modified version using LMUL=1 was created (vmadot-gemm-demo-lmul1.c) and tested:

Key Changes:

  • Changed vsetvli t0, zero, e32, m2 to vsetvli t0, zero, e32, m1
  • Cleared both v28 and v29 registers separately:
    vxor.vv v28, v28, v28
    vxor.vv v29, v29, v29
    
  • Stored results from both registers:
    vse32.v v28, (%[C])
    addi %[C], %[C], 32
    vse32.v v29, (%[C])
    

Test Results:

Test successful. CRef equal to C.

Comparison:
Both LMUL=2 and LMUL=1 versions produce identical results. The LMUL=1 version:

  • Complies with the IME specification (LMUL ≤ 1)
  • Correctly handles the 16 int32 element result matrix using two separate registers
  • Produces the same output as the LMUL=2 version

Conclusion on LMUL

The native Spacemit RISC-V machine supports both LMUL=1 and LMUL=2 approaches:

  1. LMUL=1 (Spec-Compliant): Uses two separate registers (v28, v29) for the accumulator
  2. LMUL=2 (Demo Code): Uses register grouping (v28-v29 as one logical register)

Both approaches work correctly and produce identical results. The LMUL=1 approach is the recommended method for spec compliance.

1 个赞