Token-Budget-Aware LLM Reasoning

Abstract

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning enhance LLM performance by decomposing problems into intermediate steps, they also incur significant overhead in token usage, leading to increased costs. This paper finds that the reasoning process of current LLMs is unnecessarily lengthy and can be compressed by including a reasonable token budget in the prompt. The authors propose a token-budget-aware LLM reasoning framework (TALE) that dynamically adjusts the number of reasoning tokens based on the reasoning complexity of each problem. Experiments show that the method effectively reduces token costs in CoT reasoning with only a slight performance reduction.

Relevance to Tokalator

Both TALE and Tokalator treat token expenditure as a controllable variable rather than a side effect. TALE controls tokens at the model reasoning level; Tokalator controls tokens at the IDE session level via per-turn preview and budget decomposition. Published at ACL 2025 Findings.

Token-Budget-Aware LLM Reasoning

Abstract

Relevance to Tokalator

Related Articles

Data Engineering for Scaling Language Models to 128K Context

How Important Is Tokenization in French Medical Masked Language Models?

Towards Adaptive Context Management for Intelligent Conversational Question Answering

How_to_count_tokens_with_tiktoken