Hi, all
Sorry for my English, unfortunately I still don’t know Chinese.
I’ve got my SpacemiT K3 Pico ITX board.
I run STREAM and found an issue with low bandwidth.
OS: preinstalled default Bianbu 4.0.1
Linux kernel: preinstalled default Linux k3 6.18.3-generic #1.0.0~rc2.4 SMP PREEMPT_DYNAMIC Thu Apr 16 11:50:58 CST 2026 riscv64 GNU/Linux
OpenSBI: preinstalled with OS
My reproducer.
How to make:
gcc-15.2
gcc -O3 -march=rv64gc -fno-pic -fno-pie -static -ffast-math -fno-finite-math-only -mcmodel=medany
taskset -c 1 ./stream-36000000
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 36000000 (elements), Offset = 0 (elements)
Memory per array = 274.7 MiB (= 0.3 GiB).
Total memory required = 824.0 MiB (= 0.8 GiB).
Each kernel will be executed 100 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 36045 microseconds.
(= 36045 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 17931.1 0.033547 0.032123 0.037976
Scale: 12767.1 0.046390 0.045116 0.055984
Add: 12323.1 0.070872 0.070112 0.078398
Triad: 12263.0 0.070969 0.070456 0.075608
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
I’ve got ~12000 Mb/s in Triad for several ARRAY_SIZES, but chosen one is enough to reproduce the bandwidth.
I would expect much more on Triad.
Is this bandwidth expected?
If it’s expected higher bandwidth then where I wrong and how I can get better results?