PaTH Attention: Position Encoding via Accumulating Householder TransformationsSonglin YangYikang Shenet al.2025NeurIPS 2025Conference paper
Data Engineering for Scaling Language Models to 128K ContextYao FuRameswar Pandaet al.2024ICML 2024Conference paper
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language ModelsYue ZhouYada Zhuet al.2024NAACL 2024Conference paper