view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand 4 days ago • 49
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 59