ParallelSearch: Train your LLMs to Decompose Query and Search Sub-queries in Parallel

ParallelSearch: Train your LLMs to Decompose Query and Search Sub-queries in Parallel with Reinforcement Learning

¹NVIDIA ²Pennsylvania State University

Abstract

Reasoning-augmented search agents such as Search-R1, trained via reinforcement learning with verifiable rewards (RLVR), demonstrate remarkable capabilities in multi-step information retrieval from external knowledge sources. These agents address the limitations of their parametric memory by dynamically gathering relevant facts to address complex reasoning tasks. However, existing approaches suffer from a fundamental architectural limitation: they process search queries strictly sequentially, even when handling inherently parallelizable and logically independent comparisons. This sequential bottleneck significantly constrains computational efficiency, particularly for queries that require multiple entity comparisons. To address this critical limitation, we propose ParallelSearch, a novel reinforcement learning framework that empowers large language models (LLMs) to recognize parallelizable query structures and execute multiple search operations concurrently. Our approach introduces dedicated reward functions that incentivize the identification of independent query components while preserving answer accuracy through jointly considering correctness, query decomposition quality, and parallel execution benefits. Comprehensive experiments demonstrate that ParallelSearch outperforms state-of-the-art baselines by an average performance gain of 2.9% across seven question-answering benchmarks. Notably, on parallelizable questions, our method achieves a 12.7% performance improvement while requiring only 69.6% of the LLM calls compared to sequential approaches.

Method

ParallelSearch employs a novel reinforcement learning framework with a multi-component reward structure. The architecture identifies when queries can be parallelized and executes them concurrently, significantly reducing the time and computational resources required for complex question answering.

Key Features

Parallel Search Architecture: Train LLMs to perform multiple search operations concurrently
Reinforcement Learning Training: Use PPO for stable and efficient model training
Scalable Implementation: Built on vLLM and Ray for distributed training and inference
Comprehensive Evaluation: Tested on 7 benchmarks including NQ, HotpotQA, and MultihopRAG
Open Source: Full codebase and pre-trained models available for research use

Results

Performance Comparison

Search Efficiency Analysis

Case Study

ParallelSearch achieves state-of-the-art performance across multiple benchmarks while significantly reducing inference time and API calls compared to sequential search baselines.

BibTeX

@article{zhao2025parallelsearch, title={ParallelSearch: Train your LLMs to Decompose Query and Search Sub-queries in Parallel with Reinforcement Learning}, author={Zhao, Shu and Yu, Tan and Xu, Anbang and Singh, Japinder and Shukla, Aaditya and Akkiraju, Rama}, journal={arXiv preprint arXiv:2508.09303}, year={2025} }