Papers
arxiv:2509.23861

Investigating Multi-layer Representations for Dense Passage Retrieval

Published on Sep 28
Authors:
,

Abstract

Multi-layer Representations (MLR) improve dense retrieval by utilizing vectors from multiple encoder layers, outperforming single-layer models and advanced training techniques.

AI-generated summary

Dense retrieval models usually adopt vectors from the last hidden layer of the document encoder to represent a document, which is in contrast to the fact that representations in different layers of a pre-trained language model usually contain different kinds of linguistic knowledge, and behave differently during fine-tuning. Therefore, we propose to investigate utilizing representations from multiple encoder layers to make up the representation of a document, which we denote Multi-layer Representations (MLR). We first investigate how representations in different layers affect MLR's performance under the multi-vector retrieval setting, and then propose to leverage pooling strategies to reduce multi-vector models to single-vector ones to improve retrieval efficiency. Experiments demonstrate the effectiveness of MLR over dual encoder, ME-BERT and ColBERT in the single-vector retrieval setting, as well as demonstrate that it works well with other advanced training techniques such as retrieval-oriented pre-training and hard negative mining.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.23861 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.23861 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.23861 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.