Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper
•
2410.20672
•
Published
•
6
Relaxed Recursive Transformer implementation, uptraining with distillation on openwebtext2. arxiv.org/abs/2410.20672