SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning Paper • 2508.06447 • Published Aug 8, 2025
LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation Paper • 2510.08318 • Published Oct 9, 2025
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit Paper • 2508.09981 • Published Aug 13, 2025 • 2
QVGen: Pushing the Limit of Quantized Video Generative Models Paper • 2505.11497 • Published May 16, 2025 • 4
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration Paper • 2410.01723 • Published Oct 2, 2024 • 4
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models Paper • 2311.16503 • Published Nov 27, 2023