Panini: Continual Learning in Token Space via Structured Memory
Abstract
Panini enables efficient and accurate language model reasoning through a non-parametric continual learning framework that uses generative semantic workspaces to store and retrieve knowledge, achieving superior performance with reduced computational overhead.
Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains. Across six QA benchmarks, Panini achieves the highest average performance, 5%-7% higher than other competitive baselines, while using 2-30x fewer answer-context tokens, supports fully open-source pipelines, and reduces unsupported answers on curated unanswerable queries. The results show that efficient and accurate structuring of experiences at write time -- as achieved by the GSW framework -- yields both efficiency and reliability gains at read time. Code is available at https://github.com/roychowdhuryresearch/gsw-memory.
Community
Panini introduces a non-parametric continual learning framework where an LLM's knowledge grows not by updating weights, but by structuring each new document into a Generative Semantic Workspace (GSW) — an entity- and event-aware network of QA pairs that captures both explicit facts and latent knowledge. At read time, instead of retrieving raw document chunks like standard RAG, Panini traverses the continually-updated GSW to find reasoning-grounded inference chains relevant to a query. Across six QA benchmarks (including multi-hop tasks like MuSiQue, 2WikiMultiHop, and HotpotQA), Panini achieves 5-7% higher accuracy than competitive baselines while using 2-30x fewer context tokens, and reduces unsupported answers on unanswerable queries. These gains hold across both closed-source (GPT-4o) and fully open-source (Qwen3-8B/14B) pipelines, demonstrating that the framework is not dependent on any single model family. Our key insight: investing compute at write time to structure experiences into rich semantic memory pays off at read time with both efficiency and reliability gains. Our code is available at: https://github.com/roychowdhuryresearch/gsw-memory
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering (2026)
- Implicit Graph, Explicit Retrieval: Towards Efficient and Interpretable Long-horizon Memory for Large Language Models (2026)
- CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering (2026)
- REMem: Reasoning with Episodic Memory in Language Agent (2026)
- SPARC-RAG: Adaptive Sequential-Parallel Scaling with Context Management for Retrieval-Augmented Generation (2026)
- MemWeaver: Weaving Hybrid Memories for Traceable Long-Horizon Agentic Reasoning (2026)
- Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with Graphs (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper