The running qwen 2.5:3b model doesn't have the relevant information about Spacemit.

I’m running the Qwen2.5:3b model on two boards: a Milk-V Jupiter and a Banana BPI-F3.
It’s strange that the model running on these boards, despite using spacemit-ollama-toolkits, doesn’t know about Spacemit and the X60 and X100 cores they’re developing, nor about processors based on the X60, K1, and M1 cores. My question: am I correct in understanding that this model is programmed by Alibaba Cloud, and it seems Alibaba Cloud doesn’t collaborate with Spacemit; otherwise, the Qwen2.5:3b model would contain the necessary information?

Qwen is licensed under the Apache 2.0 open-source license, allowing for free commercial use without requiring specific authorization. As an open-source model, it is agnostic to the underlying hardware chips used for its execution and requires no specific attribution or labeling.

My apologies for the delay in responding. For some reason, which I won’t go into detail, I was only able to read your message yesterday.

Yes, I read the Qwen license. My question wasn’t about licensing. I had a problem building and running the model; it didn’t understand which processor I was asking about.
After feeding it information from the official spacemit website, the model finally began to understand what I was asking it.

Today, I finally managed to build TensorFlow Lite from source code in Bianbu-2.2.1 on a BPI-F3 board, and a week earlier, I built PyTorch on a Jupiter board. However, I’m still not entirely sure how to use these tools to train this model.

Before training the qwen2.5-3b-q4_k_m.gguf model using TensorFlow Lite, it must first be converted from gguf to tflite.

tflite_convert --input_format=gguf --output_format=tflite --input_file=qwen2.5-3b-q4_k_m.gguf --output_file=qwen2.5-3b-q4_k_m.tflite
However, when attempting to convert, it turned out that TensorFlow Lite requires Bazel version >=6.5.0 installed. To achieve this, I repeatedly tried downloading and successfully building various versions of Bazel.
Currently, I’ve settled on the following option:

  1. wget https://github.com/bazelbuild/bazel/releases/download/7.1.0/bazel-7.1.0-dist.zip

  2. unzip bazel-7.1.0-dist.zip -d ./bazel-7.1.0-dist

  3. grep -n “rules_python” ~/Build/bazel-7.1.0-dist/MODULE.bazel
    30:# #bazel_dep(name = “rules_python”, version = “0.26.0”)
    245:# python = use_extension(“@rules_python//python/extensions:python.bzl”, “python”)
    248:# pip = use_extension(“@rules_python//python/extensions:pip.bzl”, “pip”)

  4. ./compile.sh --without_python

P.S. Can you somehow add Bazel >=6.5.0 to Bianbu-2.2.1 and Debian-13 ?

Well, either you need something other than thick PyTorch, which still needs to be able to compile on and for RISC-V, or TensorFlow, which cannot be compiled on RISC-V at all, or TensorFlow Lite, which can be compiled on RISC-V, but requires software that is also not available on RISC-V.
Why am I saying this? Moreover, in theory, RISC-V does not need a connection with x86. Like x86 doesn’t need RISC-V. If, to fully work on RISC-V, you need software that can only run on x86, then this indicates that RISC-V is not a full-fledged independent platform.

I have a Bazel for K3

I don’t have K3, only 2 boards with K1 = Banana Pi BPI -F3 and Milk-V Jupiter. Bianbu-2.2.1 has bazel-4, no >=6.5.0 and the latter refuses to build on K1.

I was advised to use Onnx Runtime instead of TensorFlow Lite. It can also transform full unquantized models and run trained models on embedded systems. What do you think about this? Is it the best option for running on riscv64 rva22?

Yes. Please refer to this document.

1 Like

Thank you! Yes, I came across this documentation and, frankly, didn’t quite understand what it was all about. Sorry to bother you, but I’m a complete newbie to AI. I’ve never had to work on this topic before. I’m just starting to delve into the intricacies of AI software, especially on riscv64 rva22, not x86. It’s one thing to run ollama qwen2.5:3B, and quite another to download a full, unquantized model, train it, quantize it, and so on. It’s a complete black box for me. But the topic intrigued me, especially since I built my own qwen2.5:3B-f16 version while learning how to use it. The model runs and is currently running on a Milk-V Jupiter board. But at some point, I began to realize that this was a pre-trained model from Alibaba Cloud, and it didn’t include information about SpacemiT and the processors it developed (which is why I created this thread). I wanted to train this model, but I encountered a problem: although the model accepts data, it loses all the information it was given after disconnecting. Now I’m trying to understand the processes, specifically on RISC-V without using x86/ARM, from downloading, training, quantizing, and running the models.
Why wouldn’t I use ARM for this? I’m limited by the performance of my current board; it has a dual-core ARM Cortex-A53 processor and 2GB of RAM. Why not x86? I’m have for a board with 32GB of DDR4 memory, planning to increase the total memory to 64GB, and this computer is currently far from me. Since I have to travel between two cities without a car and I can’t take an x86 with me every time, especially since it’s a desktop PC, using x86 has become a problem for me. I take a Banana Pi BPI-F3 and a Milk-V Jupiter with me, and they don’t take up much space. So, these two boards have become my main ones. I do most of my work on them, and watch videos, listen to music, and so on. Literally everything. Over the time I’ve used them, I’ve discovered a lot of pros and cons of these boards. And in the future, I plan to abandon ARM and x86, i.e. I no longer plan to purchase these architectures.
Of course, someone might ask, “Hey, why don’t you just buy the SPACEMIT MUSE Book RISC-V Laptop?” Of course, that would have been a good option for me, but the thing is, when I bought the BPI-F3, the MUSE Book was out of stock. So, I simply couldn’t afford a laptop with RISC-V architecture.

There’s another issue that raised questions for me, but I quickly forgot about it. Now I’ve encountered it again.

When installing/reinstalling $ sudo apt install spacemit-ollama-toolkit , llama-cli is also installed in addition to ollama, but when I try to run it:
$ llama-cli --version
llama-cli: error while loading shared libraries: libggml-cpu.so: cannot open shared object file: No such file or directory .

The K1 platform is designed as an edge (end-device) platform, so it’s not capable of handling the heavy workloads required for LLM training or quantization. It’s mainly suitable for model deployment and inference, especially with already optimized or quantized models.

Tasks like training, fine-tuning, or quantizing large language models require server-grade resources (high RAM, strong CPU/GPU, etc.), which are far beyond what current RISC-V edge boards can provide.

If portability is a concern and carrying an x86 machine is inconvenient, a practical solution would be to rent a cloud server for training and quantization. Then you can deploy the processed model back onto your RISC-V devices for inference.

This package is intended to provide Ollama. The inclusion of llama-cli was a packaging mistake. Thanks for pointing it out!

No problem. You’ve probably already realized that by testing your platform, I’m looking for flaws, errors, and so on. And I’ll say this: I was a bit skeptical from the start, but now I like RISC-V and am starting to understand a little about the direction the SpacemiT platform is heading.

I understand that. That’s why I’m not planning on completely abandoning the x86 architecture just yet. But I don’t plan on buying any new x86 chips.

That’s why I:

  1. Ordered another 32GB of ECC RAM for my dual-processor board, which has 24 cores and 48 threads, giving me 64GB of ECC RAM. The board allows you to install a maximum volume of up to 256 GB.
  2. I’ve been exploring different models and settled on Qwen, specifically Qwen2.5:3B with Q4_K_M quantization. I’m testing the F16, which has a quantized weight of 6.2GB.
  3. I expect that training this full, unquantized, model will require about 60 GB of RAM for quantization up to F16, and less for Q4_K_M, which I’m currently considering as my primary option. I assume this amount of RAM will be sufficient. As for disk space, I have 1 TB NVMe.

No, I don’t plan to use cloud services for training. I understand it’s complicated, but I want to understand all the intricacies of the process myself and identify all the pros and cons. Renting cloud services might be feasible for some small companies, but I’m trying to train the model myself, so that option isn’t suitable for me. I consider portability issues a secondary concern in this case, and the primary reasons aren’t related to RISC-V. The important thing is that I’ve been using this architecture constantly for the past few months, 24/7.