x86_64-binfmt-P: QEMU internal SIGILL

Emulating a single Docker/Podman container with a non-RISCV architecture is unstable.

screenshot

Running other containers of other architectures simultaneously is impossible, as it results in a similar error in all running containers of a different architecture.

I also run containers on x86 and arm64 processors in Linux and didn’t encounter a similar problem.

This issue might be related to the process in QEMU of translating x86 SIMD instructions into RISC-V Vector instructions. I’m analyzing it.

1 个赞

I didn’t wait for an answer on this forum, so I tried to solve the problem myself.
First, I found, downloaded, and installed qemu-user-static_9.0.2+ds-4ubuntu5_riscv64.deb … but no, the same error.
Now I’m building version 10.0.6 (like Debian 13, but everything works fine there) from source with git. I still don’t understand how to set all the necessary paths in ./configure . It doesn’t install exactly as I tell it to. It only creates bin and share, but libexec and lib don’t work.

I think I’ve fixed this issue—it stems from a problem in QEMU’s common code. Based on my analysis, the vsetvl instruction hardcodes the use of the t6 register, which overwrites other data and causes the error. Perhaps you could apply the follow patch to the QEMU source code and compile it to test.
I can’t upload a pre-built qemu-x86_64 binary on this forum. If you need it, please leave your email address, and I’ll send it to you directly.

Maybe you can try the follow steps to build qemu from source code in spacemit-k1 board.

mkdir build-qemu-x86
../configure --target-list=x86_64-linux-user --static --disable-system
make

My patch to fix the issue.

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 31b9f7d87a..26acc69064 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -3022,10 +3022,10 @@ static void probe_frac_lmul_1(TCGType type, MemOp vsew)
         p->vset_insn = encode_vseti(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
     } else if (lmul_eq_avl) {
         /* rd != 0 and rs1 == 0 uses vlmax */
-        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO, vtype);
+        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_TMP3, TCG_REG_ZERO, vtype);
     } else {
-        p->movi_insn = encode_i(OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
-        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtype);
+        p->movi_insn = encode_i(OPC_ADDI, TCG_REG_TMP3, TCG_REG_ZERO, avl);
+        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP3, vtype);
     }
 }

@@ -3070,6 +3070,8 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);

     if (cpuinfo & CPUINFO_ZVE64X) {
+        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP3);
+
         switch (riscv_lg2_vlenb) {
         case TCG_TYPE_V64:
             tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 6dc77d944b..0f2dced8e2 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -55,6 +55,7 @@ typedef enum {
     TCG_REG_TMP0       = TCG_REG_T6,
     TCG_REG_TMP1       = TCG_REG_T5,
     TCG_REG_TMP2       = TCG_REG_T4,
+    TCG_REG_TMP3       = TCG_REG_T3,
 } TCGReg;

 #define TCG_REG_ZERO  TCG_REG_ZERO

The patch for the tcg-target.c.inc file contains an error.
File tcg-target.h patched.

Summary

rej.zip (650 Bytes)

It might be due to differences in our base versions, but the change is very small—you can modify it directly based on the patch file.

Attached is my pre-built qemu-x86_64 binary—you can give it a try.

qemu-x86.zip (5.7 MB)

1 个赞

I understand you compiled version 10.1.0, although the latest version is currently 10.1.3. At the same time, I tried patching 8.2.2 unpacked from source, https://forum.spacemit.com/uploads/short-url/cmJ4nzrp9YZbuvHm67ygnPdpFeh.zip . I mentioned this version when I mentioned the first file wasn’t patched. I haven’t tested it on 10.1.0 yet.

I’ve now compiled version 10.1.3, patched with your patch, and added it to the environment:
$ export PATH=“/opt/qemu/qemu-10.1.3/qemu-user-static/bin:/opt/qemu/qemu-10.1.3/qemu-user-static/share/qemu:$PATH” where all files are renamed with the static suffix.
I added the necessary data to /proc/sys/fs/binfmt_misc for the required architectures and qemu-arm. After rebooting, it started showing the necessary qemu-x86_64 and qemu-aarch64 files, and qemu-arm, just in case. After that, the docker/podman containers started loading… but for some reason, the error remained the same. :hot_face:

The config was like this:
./configure --prefix=/opt/qemu/qemu-10.1.3/qemu-user-static --mandir=/opt/qemu/qemu-10.1.3/qemu-user-static/share/man --libdir=/opt/qemu/qemu-10.1.3/qemu-user-static/lib --libexecdir=/opt/qemu/qemu-10.1.3/qemu-user-static/libexec --datadir=/opt/qemu/qemu-10.1.3/qemu-user-static/share --static --disable-system --enable-linux-user --container-engine=auto

I tried reproducing the issue using a Podman container, and I encountered the follow issue—it appears to be identical to yours. I will continue working on resolving this issue.

Previously, I was only running x86 programs directly on the K1 board using QEMU, without using container for testing

root@fc52b00f0cbc:/mnt/# apt-get install libc-bin
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  manpages
The following packages will be upgraded:
  libc-bin
1 upgraded, 0 newly installed, 0 to remove and 15 not upgraded.
1 not fully installed or removed.
Need to get 706 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libc-bin amd64 2.35-0ubuntu3.11 [706 kB]
Fetched 706 kB in 9s (74.5 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 5156 files and directories currently installed.)
Preparing to unpack .../libc-bin_2.35-0ubuntu3.11_amd64.deb ...
Unpacking libc-bin (2.35-0ubuntu3.11) over (2.35-0ubuntu3.10) ...
Setting up libc-bin (2.35-0ubuntu3.11) ...
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)
dpkg: error processing package libc-bin (--configure):
 installed libc-bin package post-installation script subprocess returned error exit status 139
Errors were encountered while processing:
 libc-bin
E: Sub-process /usr/bin/dpkg returned an error code (1)

Got it. Yes, the problem I described arises when working with containers with different architectures than the host.

Here are screenshots of a container running on the riscv64 and x86 platform on a host with a Cortex-A53 CPU.

screenshots

Screens:




  • p.s. I forgot to mention. Both the host with the Cortex-A53 CPU and the containers are running Ubuntu 24.04.3 LTS, with the specified architectures selected for the containers accordingly.
    And installed
screen

Sorry, I still haven’t been able to reproduce the illegal instruction you initially reported. However, I did encounter the segmentation fault mentioned earlier when running the following commands in ubuntu:22-04 container:

apt-get update
apt-get install libc-bin

Based on my analysis of the segment fault issues, I think this is an address space conflict issue. The Spacemit-K1 board operates in RISC-V MMU SV39 mode. When an x86 guest application is loaded, it cannot be placed at its preferred address (0x5555555556000) and is instead relocated to a different base address—which then becomes the heap’s starting address. Unfortunately, this new heap base address appears to have already been allocated, resulting in an address space collision and triggering subsequent errors.

Could you please help test the attached patch? In my test environment—QEMU v10.1.0 on Ubuntu 22.04—it resolves the issue. I have also tested it on Ubuntu 24.04 container and have not observed any problems so far.

Also, a quick reminder: after updating the QEMU binary, you’ll need to either follow the steps below or reboot the system to ensure the changes take effect.

sudo sh -c 'echo -1 > /proc/sys/fs/binfmt_misc/qemu-x86_64'
sudo systemctl restart systemd-binfmt

qemu-x86_64_2.zip (1.7 MB)

If the patch above hasn’t resolved your issue yet, we may need you to provide complete reproduction steps and a detailed description of your environment.

Please don’t apologize. Perhaps I was not entirely clear earlier, so I must apologize for the inaccuracies in the first post.

The fact that you encountered a problem while executing
apt-get update
apt-get install libc-bin
I have a similar problem!

I’ll definitely test it!!!

At the moment I’m just analyzing your kernel version 6.6.63. Over the weekend, I managed to rebuild the kernel on Bianbu 2.2.7 according to your instructions, after which the system began to boot a little faster.

It’s difficult to get everything done quickly when you only have one Banana-Pi BPI-F3 motherboard. Therefore, I also ordered the Milk-V Jupiter board and when it arrives, it will be easier for me to master your OS.

Sorry for the delay in responding. I’ve been working on the newly arrived Milk-V Jupiter board and the kernel build.
I tested your patch on both Bianbu Star 2.1.7 and Bianbu 2.2.1.
The patch applies successfully. However, when building qemu-10.1.0, I always get the following

error

Sorry, this error occurred because I didn’t consider other 32-bit architectures. To speed up compilation, I usually build only the x86-64 version. You can fix the compilation error with the following change.

diff --git a/linux-user/main.c b/linux-user/main.c
index 88da676d36..c576560361 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -697,7 +697,7 @@ static int parse_args(int argc, char **argv)

 static int adjust_elf_et_dyn_base(void)
 {
-#if __riscv && __riscv_xlen == 64
+#if __riscv && __riscv_xlen == 64 && TARGET_ABI_BITS == 64

 #define TASK_UNMAPPED_BASE_SV39 TARGET_PAGE_ALIGN((1ull << (39 - 1)) / 3)
 #define ELF_ET_DYN_BASE_SV39    (TASK_UNMAPPED_BASE_SV39 * 2)

So, I made the fixes, and the build completed successfully!
The apt update and apt list --upgradable commands seemed to work without errors.

Screen 1

But then, when I ran apt install libc-bin -y, the command failed with apt --fix-broken install.

Screen 2

Then it also failed with an error.
Unfortunately, the errors returned :face_exhaling:.

As a result, after n number of attempts.

apt upgrade

It looks like you’re on the right track…but the decision isn’t final yet. I hope this will eventually be corrected.

Is the illegal instructions back? The previous patch may not have been fully applied. Please make sure to apply it.

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 31b9f7d87a..26acc69064 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -3022,10 +3022,10 @@ static void probe_frac_lmul_1(TCGType type, MemOp vsew)
         p->vset_insn = encode_vseti(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
     } else if (lmul_eq_avl) {
         /* rd != 0 and rs1 == 0 uses vlmax */
-        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO, vtype);
+        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_TMP3, TCG_REG_ZERO, vtype);
     } else {
-        p->movi_insn = encode_i(OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
-        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtype);
+        p->movi_insn = encode_i(OPC_ADDI, TCG_REG_TMP3, TCG_REG_ZERO, avl);
+        p->vset_insn = encode_vset(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP3, vtype);
     }
 }

@@ -3070,6 +3070,8 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);

     if (cpuinfo & CPUINFO_ZVE64X) {
+        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP3);
+
         switch (riscv_lg2_vlenb) {
         case TCG_TYPE_V64:
             tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 6dc77d944b..0f2dced8e2 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -55,6 +55,7 @@ typedef enum {
     TCG_REG_TMP0       = TCG_REG_T6,
     TCG_REG_TMP1       = TCG_REG_T5,
     TCG_REG_TMP2       = TCG_REG_T4,
+    TCG_REG_TMP3       = TCG_REG_T3,
 } TCGReg;

 #define TCG_REG_ZERO  TCG_REG_ZERO

Since I haven’t encountered the “illegal instruction” error yet, I need your help to provide a complete Podman image along with the reproduction steps.

Well. That’s not really a problem.

Summary
  1. git clone -b v10.1.0 QEMU / QEMU · GitLab
  2. cd qemu
  3. patch -p1 < tcg-riscv.patch
  4. patch -p1 < main-000.patch
  5. patch -p1 < main-001.patch
  6. mkdir build && cd build
  7. …/configure --prefix=/opt/qemu/qemu-10.1.0/qemu-user-static --mandir=/opt/qemu/qemu-10.1.0/qemu-user-static/share/man --libdir=/opt/qemu/qemu-10.1.0/qemu-user-static/lib --libexecdir=/opt/qemu/qemu-10.1.0/qemu-user-static/libexec --datadir=/opt/qemu/qemu-10.1.0/qemu-user-static/share --static --disable-system --enable-linux-user --container-engine=auto
  8. time make -j4
  9. sudo make install && echo ‘export PATH=“/opt/qemu/qemu-10.1.0/qemu-user-static/bin:/opt/qemu/qemu-10.1.0/qemu-user-static/share/qemu:$PATH”’ > ~/.bashrc && source ~/.bashrc
  10. sudo -i
  11. cd /opt/qemu/qemu-10.1.0/qemu-user-static/bin
  12. for i in * ; do cp $i $i-static ; done
  13. ln -snf /opt/qemu/qemu-10.1.0/qemu-user-static/bin/qemu-aarch64-static /opt/qemu/qemu-10.1.0/qemu-user-static/bin/qemu-arm64-static
  14. mkdir -p /usr/libexec/qemu-binfmt
  15. ln -snf /opt/qemu/qemu-10.1.0/qemu-user-static/bin/qemu-x86_64-static /usr/libexec/qemu-binfmt/x86_64-binfmt-P
  16. ln -snf /opt/qemu/qemu-10.1.0/qemu-user-static/bin/qemu-x86_64-static /usr/libexec/qemu-binfmt/aarch64-binfmt-P
  17. ln -snf /opt/qemu/qemu-10.1.0/qemu-user-static/bin/qemu-x86_64-static /usr/libexec/qemu-binfmt/arm-binfmt-P
  18. ln -snf /opt/qemu/qemu-10.1.0/qemu-user-static/bin/qemu-x86_64-static /usr/libexec/qemu-binfmt/arm64-binfmt-P
  19. cd /usr/lib/binfmt.d && touch qemu-x86_64.conf && touch qemu-arm.conf && touch qemu-aarch64.conf
  20. echo ‘:qemu-x86_64:m::\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x3e\x00:\xff\xff\xff\xff\xff\xfe\xfe\xfc\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/libexec/qemu-binfmt/x86_64-binfmt-P:OPF’ > qemu-x86_64.conf
  21. echo ‘:qemu-arm:m::\x7f\x45\x4c\x46\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/libexec/qemu-binfmt/arm-binfmt-P:OPF’ > qemu-arm.conf
  22. echo ‘:qemu-aarch64:m::\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/libexec/qemu-binfmt/aarch64-binfmt-P:OPF’ > qemu-aarch64.conf
  23. sudo systemctl restart systemd-binfmt

I don’t think I missed anything. If I did, I apologize in advance.

Please also provide the commands to be executed inside the Podman container. Thanks.

I think I missed a couple of points.

Item 19 will be like this:

Summary
  1. cd /usr/lib/binfmt.d && touch qemu-x86_64.conf && touch qemu-arm.conf && touch qemu-aarch64.conf && touch qemu-arm64.conf

and I should have added:

Summary

echo ‘:qemu-arm64:m::\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/libexec/qemu-binfmt/aarch64-binfmt-P:OPF’ > qemu-arm64.conf