arXivtoken optimization2024-12-24

Token-Budget-Aware LLM Reasoning

Tingxu Han, Zhenting Wang, Chunrong Fang, Shiyu Zhao, Shiqing Ma, Zhenyu Chen

View Original →

Abstract

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. This paper finds that the reasoning process of current LLMs is unnecessarily lengthy and can be compressed by including a reasonable token budget in the prompt. The authors propose a token-budget-aware LLM reasoning framework (TALE) that dynamically adjusts the number of reasoning tokens based on the reasoning complexity of each problem. Experiments show that the method effectively reduces token costs in CoT reasoning with only a slight performance reduction.

Relevance to Tokalator

Both TALE and Tokalator treat token expenditure as a controllable variable rather than a side effect. TALE controls tokens at the model reasoning level; Tokalator controls tokens at the IDE session level via per-turn preview and budget decomposition. Published at ACL 2025 Findings.

cs.CLcs.AI

Related Articles