A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-ExpertsMohammed Nowaz Rabbani ChowdhuryMeng Wanget al.2024ICML 2024
How Does Promoting the Minority Fraction Affect Generalization? A Theoretical Study of One-hidden-layer Neural Network on Group ImbalanceHongkang LiShuai Zhanget al.2024Ieee Journal Of Selected Topics In Signal Processing