Low bandwidth on STREAM

Hi, all
Sorry for my English, unfortunately I still don’t know Chinese.

I’ve got my SpacemiT K3 Pico ITX board.
I run STREAM and found an issue with low bandwidth.

OS: preinstalled default Bianbu 4.0.1
Linux kernel: preinstalled default Linux k3 6.18.3-generic #1.0.0~rc2.4 SMP PREEMPT_DYNAMIC Thu Apr 16 11:50:58 CST 2026 riscv64 GNU/Linux
OpenSBI: preinstalled with OS

My reproducer.

How to make:
gcc-15.2
gcc -O3 -march=rv64gc -fno-pic -fno-pie -static -ffast-math -fno-finite-math-only -mcmodel=medany

taskset -c 1 ./stream-36000000 

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 36000000 (elements), Offset = 0 (elements)
Memory per array = 274.7 MiB (= 0.3 GiB).
Total memory required = 824.0 MiB (= 0.8 GiB).
Each kernel will be executed 100 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 36045 microseconds.
   (= 36045 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           17931.1     0.033547     0.032123     0.037976
Scale:          12767.1     0.046390     0.045116     0.055984
Add:            12323.1     0.070872     0.070112     0.078398
Triad:          12263.0     0.070969     0.070456     0.075608
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

I’ve got ~12000 Mb/s in Triad for several ARRAY_SIZES, but chosen one is enough to reproduce the bandwidth.

I would expect much more on Triad.
Is this bandwidth expected?
If it’s expected higher bandwidth then where I wrong and how I can get better results?