Papers
arxiv:2507.08143

Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores

Published on Jul 10
Authors:
,

Abstract

Compactor, a training-free KV compression strategy using approximate leverage scores, reduces memory usage by 20% to 68% across various LLM tasks while maintaining performance.

AI-generated summary

Modern Large Language Models (LLMs) are increasingly trained to support very large context windows. We present Compactor, a training-free, query-agnostic KV compression strategy that uses approximate leverage scores to determine token importance. We show that Compactor can achieve the same performance as competing methods while retaining 20% fewer tokens in both synthetic and real-world context tasks, while being more task-robust. We further introduce a procedure for context-calibrated compression: inferring the maximum compression a given context supports before significant performance loss. Using context-calibrated compression, we show that Compactor achieves full KV performance on Longbench while reducing the KV memory burden by 68%, on average. To demonstrate the efficacy and generalizability of our approach, we apply Compactor to 27 synthetic and real-world tasks from RULER and Longbench, with models from both the Qwen 2.5 and Llama 3.1 families. Finally, we release compactor-vllm, an inference engine and suite of optimized Triton kernels designed to efficiently support the sparse, non-contiguous memory access patterns inherent to compressed KV caches. This work demonstrates that Compactor offers a practical, high-performance solution for alleviating the memory bottleneck in modern LLM deployment.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.08143 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.08143 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.08143 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.