Improving Recursive Transformers with Mixture of LoRAs
IntermediateMohammadmahdi Nouriborji, Morteza Rohanian et al.Dec 14arXiv
Recursive transformers save memory by reusing the same layer over and over, but that makes them less expressive and hurts accuracy.
#Mixture of LoRAs#recursive transformers#parameter sharing