ExpandSearch: Beyond the limitation of a single query

Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

¹NVIDIA ²Pennsylvania State University
^‡Equal Contributions ^†Project Lead

Abstract

Reasoning-augmented search agents such as Search-R1, trained via reinforcement learning with verifiable rewards (RLVR), demonstrate remarkable capabilities in multi-step information retrieval from external knowledge sources. Nevertheless, due to their limited capabilities in reasoning and search, their performance on multi-hop QA benchmarks remains far from satisfactory. To handle complex or compound queries, we train an LLM-based search agent with the native capability of query expansion through reinforcement learning. In each turn, our search agent proposes several query variants, which are searched simultaneously to cover more relevant information. Meanwhile, given limited post-training data and computing resources, it is very challenging for a search agent to master multiple tasks, including query generation, retrieved information understanding, and answer generation. Therefore, we propose incorporating a pre-trained squeezer model that helps the search agent understand the retrieved documents, allowing the search agent to focus on query generation for high retrieval recall. With the assistance of the squeezer model, we discover that even a small-scale 3B LLM can demonstrate a strong capability of query expansion and achieve state-of-the-art accuracy on multi-hop QA benchmarks. Our experiments across seven question-answering benchmarks demonstrate that our method, named ExpandSearch, achieves an average improvement of 4.4% compared to state-of-the-art baselines, with strong gains on multi-hop reasoning tasks requiring diverse evidence aggregation.

Method

ExpandSearch employs an expand-then-squeeze strategy that addresses two fundamental limitations in existing search agents. The Expand phase generates multiple diverse query variants including syntax expansions (paraphrases and reformulations) and semantic expansions (related concepts), overcoming the brittleness of single-query retrieval. The Squeeze phase uses a frozen long-context LLM to compress retrieved chunks into compact, reasoning-focused summaries, managing information overload while preserving critical facts. This dual approach enables effective multi-hop reasoning by maximizing retrieval recall while maintaining precision.

Key Features

Expand-then-Squeeze Paradigm: Generate multiple query variants for comprehensive coverage, then distill retrieved content to reasoning-critical information
Query Expansion Types: Learn both syntax expansions (reformulations) and semantic expansions (conceptual broadening) through RL
Modular Architecture: Decouple expansion and squeeze components for flexible deployment and post-training optimization
Comprehensive Evaluation: Tested on 7 benchmarks including NQ, HotpotQA, TriviaQA, PopQA, 2WikiMultiHopQA, Musique, and Bamboogle

@article{zhao2025expandsearch, title={Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning}, author={Zhao, Shu and Yu, Tan and Xu, Anbang}, journal={arXiv preprint arXiv:2510.10009}, year={2025} }

Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

ExpandSearch addresses the dual challenges of semantic incompleteness and information overload through an expand-then-squeeze paradigm, enabling LLMs to generate multiple query variants while using a squeezer model to distill only reasoning-critical information.

Abstract

Method

Key Features

Results

BibTeX