Yutao Zeng's picture

4 18

Yutao Zeng

Taoer

·

AI & ML interests

None yet

Recent Activity

authored a paper 23 days ago

Virtual Width Networks

upvoted a paper 24 days ago

Virtual Width Networks

authored a paper 4 months ago

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

View all activity

Organizations

authored a paper 23 days ago

Virtual Width Networks

Paper • 2511.11238 • Published 27 days ago • 35

upvoted a paper 24 days ago

Virtual Width Networks

Paper • 2511.11238 • Published 27 days ago • 35

authored a paper 4 months ago

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26 • 36

updated a collection 4 months ago

Full Paper List

11 items • Updated Aug 27 • 1

upvoted a paper 4 months ago

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26 • 36

updated a collection 6 months ago

Full Paper List

11 items • Updated Aug 27 • 1

authored a paper 6 months ago

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Paper • 2505.24452 • Published May 30 • 5

upvoted a paper 6 months ago

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Paper • 2505.24452 • Published May 30 • 5

commented a paper 6 months ago

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Paper • 2505.24452 • Published May 30 • 5 •

authored 2 papers 7 months ago

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10 • 4

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76

updated a collection 7 months ago

Full Paper List

11 items • Updated Aug 27 • 1

upvoted a paper 7 months ago

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76

authored a paper 8 months ago

Efficient Pretraining Length Scaling

Paper • 2504.14992 • Published Apr 21 • 20

upvoted a paper 8 months ago

Efficient Pretraining Length Scaling

Paper • 2504.14992 • Published Apr 21 • 20

updated a collection 8 months ago

Full Paper List

11 items • Updated Aug 27 • 1

updated 2 models 8 months ago

Open-Foundation-Models/PolyNorm_1B

Text Generation • Updated Apr 8 • 12

Open-Foundation-Models/PolyReLU_1B

Text Generation • Updated Apr 8 • 16

upvoted a paper 9 months ago

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Paper • 2503.16057 • Published Mar 20 • 14

authored a paper 9 months ago

Frac-Connections: Fractional Extension of Hyper-Connections

Paper • 2503.14125 • Published Mar 18 • 22