在Bit-Brick Cluster K1使用distcc集群编译
基础环境
-
Bit-Brick Cluster K1
-
千兆网线
Bit-Brick Cluster K1 是一款高性能计算扩展设备,通过集成多个核心板构建计算集群,显著提升系统算力,满足高计算负载应用需求。它支持最多同时挂载 4 块核心板,并提供丰富的外设接口,便于连接各类外部设备,灵活扩展系统功能。
SSOM-K1 核心板配置如下:
-
进迭时空 K1 RISC-V SoC
-
8GB LPDDR4x
-
64GB eMMC
整个集群总计32个进迭时空x60核心,每个核心主频1.6GHz。
编译时集群核心板为满载状态,建议用风扇被动式散热。
获取节点IP地址
使用USB串口连接到集群,分别获取B C D三个节点的IP地址:
192.168.1.21
192.168.1.22
192.168.1.18
安装distcc
在所有节点上安装distcc,请确保版本一致。
sudo apt update
sudo apt install gcc make libncurses-dev libssl-dev bc flex bison -y
sudo apt install distcc -y
配置distcc
在所有节点上配置distcc:
sudo vim /etc/default/distcc
修改以下内容:
STARTDISTCC="true"
ALLOWEDNETS="192.168.1.0/24" # 允许主控机网段,可根据网络环境修改
LISTENER="0.0.0.0" # 监听所有IP
JOBS="$(nproc)" # 使用全部CPU核心,可根据实际需求修改
在主节点上配置distcc:
ALLOWEDNETS="127.0.0.1 192.168.1.0/24" # 允许主控机网段,可根据网络环境修改
LISTENER="127.0.0.1" # 只监听本地IP
开放distcc端口:
sudo ufw allow 3632/tcp
启动distcc服务:
sudo systemctl restart distcc
配置环境变量,添加主节点和从节点IP地址:
export DISTCC_HOSTS="192.168.1.18 192.168.1.21 192.168.1.22 localhost"
在主节点上查看distcc连接状态:
# 测试distcc连接
distcc --show-hosts
此时应该可以看到主节点和从节点的IP地址,如果没有看到从节点的IP地址,可以尝试重启distcc服务,并尝试如下命令验证主节点和从节点是否可以互相通信:
# 根据网络环境修改IP地址,并替换以下命令中的IP地址
for host in 192.168.1.18 192.168.1.21 192.168.1.22; do
echo "测试 $host ..."
nc -zv $host 3632
distcc --version -h $host
done
正常输出如下:
测试 192.168.1.18 ...
Connection to 192.168.1.18 3632 port [tcp/distcc] succeeded!
distcc 3.4 riscv64-unknown-linux-gnu
(protocols 1, 2 and 3) (default port 3632)
built Apr 1 2024 05:42:12
Copyright (C) 2002, 2003, 2004 by Martin Pool.
Includes miniLZO (C) 1996-2002 by Markus Franz Xaver Johannes Oberhumer.
Portions Copyright (C) 2007-2008 Google.
distcc comes with ABSOLUTELY NO WARRANTY. distcc is free software, and
you may use, modify and redistribute it under the terms of the GNU
General Public License version 2 or later.
Built with Zeroconf support.
Built with GSS-API support for mutual authentication.
Please report bugs to distcc@lists.samba.org
测试 192.168.1.21 ...
Connection to 192.168.1.21 3632 port [tcp/distcc] succeeded!
distcc 3.4 riscv64-unknown-linux-gnu
(protocols 1, 2 and 3) (default port 3632)
built Apr 1 2024 05:42:12
Copyright (C) 2002, 2003, 2004 by Martin Pool.
Includes miniLZO (C) 1996-2002 by Markus Franz Xaver Johannes Oberhumer.
Portions Copyright (C) 2007-2008 Google.
distcc comes with ABSOLUTELY NO WARRANTY. distcc is free software, and
you may use, modify and redistribute it under the terms of the GNU
General Public License version 2 or later.
Built with Zeroconf support.
Built with GSS-API support for mutual authentication.
Please report bugs to distcc@lists.samba.org
测试 192.168.1.22 ...
Connection to 192.168.1.22 3632 port [tcp/distcc] succeeded!
distcc 3.4 riscv64-unknown-linux-gnu
(protocols 1, 2 and 3) (default port 3632)
built Apr 1 2024 05:42:12
Copyright (C) 2002, 2003, 2004 by Martin Pool.
Includes miniLZO (C) 1996-2002 by Markus Franz Xaver Johannes Oberhumer.
Portions Copyright (C) 2007-2008 Google.
distcc comes with ABSOLUTELY NO WARRANTY. distcc is free software, and
you may use, modify and redistribute it under the terms of the GNU
General Public License version 2 or later.
Built with Zeroconf support.
Built with GSS-API support for mutual authentication.
Please report bugs to distcc@lists.samba.org
在distcc 3.4版本中有一个bug,会导致在RISC-V架构下报错,需要在每个节点执行以下命令:
export DISTCC_NO_REWRITE_CROSS=1dcc_gcc_rewrite_fqn
具体问题请查看:Bug: Buffer Overflow Detected on aarch64 with distcc 3.4. · Issue #546 · distcc/distcc · GitHub
该问题预计将在distcc 3.5版本中修复。
编译
一般可以在主节点上执行以下命令编译C程序:
make -j$(nproc) CC=distcc
请根据实际需求修改编译命令,接下来用Linux内核编译为例:
安装编译依赖:
sudo apt-get install debhelper libpfm4-dev libtraceevent-dev asciidoc libelf-dev devscripts git
克隆源码:
git clone https://gitee.com/bianbu-linux/linux-6.6.git --depth=1
cd linux-6.6
生成配置文件:
make k1_defconfig
通过以下脚本编译内核:
vim distcc_build.sh
#!/bin/bash
NUM_JOBS=$(( $(distcc -j) / 2 )) # 计算总任务数
make -j$NUM_JOBS CC="distcc gcc" \
CXX="distcc g++" \
CPP="distcc cpp" \
KBUILD_BUILD_TIMESTAMP='' # 避免时间戳警告
执行脚本:
chmod +x distcc_build.sh
bash distcc_build.sh
也可以直接在主节点上执行以下命令编译内核:
常见问题
如果出现以下报错说明主节点内存不足,可以尝试设置swap分区,或者减少job数量。
[ 5860.732132] Out of memory: Killed process 84853 (cc1) total-vm:59900kB, anon-rss:18336kB, file-rss:896kB, shmem-rss:0kB, UID:1000 pgtables:124kB oom_score_adj:0
[ 5861.875207] Out of memory: Killed process 84772 (cc1) total-vm:60008kB, anon-rss:18372kB, file-rss:896kB, shmem-rss:0kB, UID:1000 pgtables:120kB oom_score_adj:0
[ 5863.718845] Out of memory: Killed process 84498 (cc1) total-vm:60056kB, anon-rss:18664kB, file-rss:512kB, shmem-rss:0kB, UID:1000 pgtables:124kB oom_score_adj:0
[ 5864.533358] Out of memory: Killed process 84593 (cc1) total-vm:58144kB, anon-rss:18044kB, file-rss:1024kB, shmem-rss:0kB, UID:1000 pgtables:116kB oom_score_adj:0
如果在编译过程中卡住出现以下报错说明集群节点间网络不通,可以尝试ping一下各个节点,或者检查防火墙设置。如果以上设置都没有问题,可能由系统问题导致,可以尝试重启集群节点。
distcc[92628] ERROR: failed to connect to 192.168.1.21:3632: Network is unreachable
distcc[92628] ERROR: failed to connect to 192.168.1.22:3632: Network is unreachable
distcc[92628] ERROR: failed to connect to 192.168.1.18:3632: Network is unreachable
如果在编译过程中报错:
*** buffer overflow detected ***: terminated
已中止(核心已转储)
或
Aborted (core dumped)
请尝试设置环境变量:
export DISTCC_NO_REWRITE_CROSS=1dcc_gcc_rewrite_fqn