Merging Text Transformer Models from Different Initializations

Publication
Transactions of Machine Learning Research, High Dimensional Learning Dynamics Workshop @ ICML 2024