Each cluster comprises four scalar cores, two vector cores, two
local-memory banks, and a shared L2 cache. The two vector cores
share a single Tensor Core and one local-memory bank, with the
goal of improving tensor-compute resource utilization and data
reuse efficiency under stringent area and bandwidth budgets. The
K3 chip integrates two A100 clusters.