Spine-Triton Attention Implementation

Please see my thread Spine-Triton编译器bug——常数乘法双重错误. There are dual bugs in the Triton to Triton-IR pass. Your original code should be just fine and remain the same. However, I will go ahead to learn the core online softmax kernel.