Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth StudyShawn TanSonglin Yanget al.2025ICLR 2025