Difficulty-aware Dynamic Chain-of-Thought Prompting for Large Language Models via BM25 and Semantic Retrieval

Abstract

Large language models (LLMs) have demonstrated exceptional capabilities across multiple domains and emerged as the core driving force in natural language processing. Their reasoning however can be associated with logical flaws and lack of stability when attempting to solve difficult problems that require multi-step deduction, cross-domain knowledge or implicit constraints, with redundant or insufficient exemplars in conventional prompts and poor fit with exemplar and target problems. To address these issues, we propose a dynamic Chain-of-Thought (CoT) prompting method based on problem difficulty assessment: first, the model performs zero-shot self-evaluation of the required solution steps to dynamically determine the number of exemplars; then, it integrates BM25 retrieval to select the most similar high-quality question-answer pairs, constructing precise Few-shot prompts. Experiments conducted on multiple datasets effectively resolve the two major limitations of traditional exemplar-based methods, enabling LLMs to obtain appropriately tailored exemplars for both multi-step mathematical reasoning and interdisciplinary question answering. Consequently, the accuracy of complex reasoning is improved to varying degrees across tasks.

References

[1] Achiam, Josh, et al. "Gpt-4 technical report." arxiv preprint arxiv:2303.08774 (2023).

[2] Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arxiv preprint arxiv:2302.13971 (2023).

[3] Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901.

[4] Chowdhery, Aakanksha, et al. "Palm: Scaling language modeling with pathways." Journal of Machine Learning Research 24.240 (2023): 1-113.

[5] Chen, Mark, et al. "Evaluating large language models trained on code." arxiv preprint arxiv:2107.03374 (2021).

[6] Wei, Jason, et al. "Chain-of-thought prompting elicits reasoning in large language models." Advances in neural information processing systems 35 (2022): 24824-24837.

[7] Wang, Xuezhi, et al. "Self-consistency improves chain of thought reasoning in language models." arxiv preprint arxiv:2203.11171 (2022).

[8] Ranaldi, Leonardo, and Andre Freitas. "Aligning large and small language models via chain-of-thought reasoning." Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.

[9] Diao, Shizhe, et al. "Active prompting with chain-of-thought for large language models." arxiv preprint arxiv:2302.12246 (2023).

[10] Yao, Yao, Zuchao Li, and Hai Zhao. "Beyond chain-of-thought, effective graph-of-thought reasoning in language models." arxiv preprint arxiv:2305.16582 (2023).

[11] Cheng, Xiaoxue, et al. "Chainlm: Empowering large language models with improved chain-of-thought prompting." arxiv preprint arxiv:2403.14312 (2024).

[12] Zhang, Xuan, et al. "Chain of preference optimization: Improving chain-of-thought reasoning in llms." Advances in Neural Information Processing Systems 37 (2024): 333-356.

[13] Jiang, Fengqing, et al. "Safechain: Safety of language models with long chain-of-thought reasoning capabilities." arxiv preprint arxiv:2502.12025 (2025).

[14] Zhang, Zhuosheng, et al. "Multimodal chain-of-thought reasoning in language models." arxiv preprint arxiv:2302.00923 (2023).

[15] Ma, Ziyang, et al. "Audio-cot: Exploring chain-of-thought reasoning in large audio language model." ar** Wang, Eng Siong Chng, and **e Chen. "Audio-cot: Exploring chain-of-thought reasoning in large audio language model." arxiv preprint arxiv:2501.07246 (2025).

[16] Zheng, Ge, et al. "Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models." Advances in Neural Information Processing Systems 36 (2023): 5168-5191.

[17] Mu, Yao, et al. "Embodiedgpt: Vision-language pre-training via embodied chain of thought." Advances in Neural Information Processing Systems 36 (2023): 25081-25094.

[18] Fu, Yao, et al. "Chain-of-thought hub: A continuous effort to measure large language models' reasoning performance." arxiv preprint arxiv:2305.17306 (2023).

[19] Zhang, Yufeng, et al. "Enhancing chain of thought prompting in large language models via reasoning patterns." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 39. No. 24. 2025.

[20] Zhu, Dawei, et al. "Chain-of-thought matters: improving long-context language models with reasoning path supervision." arxiv preprint arxiv:2502.20790 (2025).

[21] Robertson, Stephen E., et al. Okapi at TREC-3. British Library Research and Development Department, 1995.

[22] Zhang, Xianwei, et al. "A contrastive study of Chinese text segmentation tools in marketing notification texts." Journal of Physics: Conference Series. Vol. 1302. No. 2. IOP Publishing, 2019.

[23] Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." IEEE Transactions on Big Data 7.3 (2019): 535-547.

[24] Cobbe, Karl, et al. "Training verifiers to solve math word problems." arxiv preprint arxiv:2110.14168 (2021).

[25] Hendrycks, Dan, et al. "Measuring mathematical problem solving with the math dataset." arxiv preprint arxiv:2103.03874 (2021).

[26] Lu, Pan, et al. "Learn to explain: Multimodal reasoning via thought chains for science question answering." Advances in Neural Information Processing Systems 35 (2022): 2507-2521.

[27] Bai, Jinze, et al. "Qwen technical report." arxiv preprint arxiv:2309.16609 (2023).

[28] Pan, Yu, Xiaocheng Li, and Hanzhao Wang. "Online-Optimized RAG for Tool Use and Function Calling." arxiv preprint arxiv:2509.20415 (2025).

Authors

  • Zuchen Zhuang Beijing University of Posts and Telecommunications image/svg+xml

DOI:

https://doi.org/10.31449/inf.v50i13.14118

Keywords:

Array, Array, Array, Array, Array

Downloads

Published

06/29/2026

How to Cite

Difficulty-aware Dynamic Chain-of-Thought Prompting for Large Language Models via BM25 and Semantic Retrieval. (2026). Informatica, 50(13). https://doi.org/10.31449/inf.v50i13.14118