Merging Feed-Forward Sublayers for Compressed Transformers

Publication
Transactions on Machine Learning Research