기업의 AI ROI 측정 함정: ‘토큰맥싱(Tokenmaxxing)’이 IT 예산을 위협하는 이유

As organizations rush to integrate artificial intelligence into their workflows, a growing trend of tracking “token usage” as a primary performance metric is creating unintended consequences for IT departments and corporate budgets. While leaders aim to encourage AI adoption by quantifying interactions, some experts warn that focusing on input volume—often dubbed “tokenmaxxing”—risks incentivizing inefficiency and inflating operational costs without delivering actual business value.

The push to gamify AI usage has led some major corporations, including Amazon, JPMorgan, Meta, and Disney, to implement internal leaderboards to track how employees interact with AI tools. According to reports, these initiatives have sometimes resulted in extreme usage patterns, such as a Disney employee interacting with Claude AI 460,000 times over a nine-day period. This behavior, where employees inflate usage to climb rankings or meet quotas, often prioritizes raw output over meaningful results.

The Hidden Costs of Measuring Inputs Over Outputs

For IT leaders, the dilemma lies in balancing the need for adoption with the reality of high, variable costs associated with large language models. Trevor Stewart, senior vice president at software development support firm Harness, notes that while leaderboard programs often start with the positive intention of gauging how teams are utilizing new technology, they can inadvertently encourage poor habits. “You are effectively encouraging the wrong behavior,” Stewart explains, noting that employees may turn to complex AI models for tasks that could be handled by simpler, more cost-effective tools.

The Hidden Costs of Measuring Inputs Over Outputs

The reliance on token consumption as a Key Performance Indicator (KPI) is also criticized for its susceptibility to manipulation. Logan Wolf, a partner in global enterprise innovation, AI, and sovereign technology strategy at Kyndryl, compares the practice to rewarding a software developer based solely on the number of lines of code written. Such metrics, Wolf argues, ignore critical factors like code quality, security, and overall efficiency. When token volume becomes the primary goal, quality, risk mitigation, and operational efficiency are often sidelined in favor of “showing the numbers.”

When Token Usage Masks Real Performance

The risk of relying on easily accessible data is that it provides a false sense of progress. Pendo CEO Todd Olson points out that while tracking tokens is a simple way to verify that a tool is being used, it fails to capture the nuance of value creation once the initial “zero-to-one” phase of adoption is over. “After people start using it, the situation becomes much more complex and the judgment becomes much more ambiguous,” Olson states.

This perspective is echoed by Itamar Friedman, CEO of Qodo, an AI code review firm. Friedman draws a parallel to personal health: tracking how many miles one walks daily is insufficient if it ignores caloric intake and other vital health metrics. Similarly, tracking token usage without measuring productivity or the quality of the output provides an incomplete picture of an AI project’s return on investment (ROI). In some cases, developers are encouraged to produce large volumes of AI-generated code, which may introduce severe security vulnerabilities or bugs if not properly reviewed.

Designing Better Metrics for AI Success

To avoid the pitfalls of “tokenmaxxing,” experts suggest shifting the focus from input-based metrics to output-based outcomes. For software development teams, this means prioritizing the amount of code successfully deployed to production environments rather than the raw number of tokens consumed during the generation phase. Stewart of Harness emphasizes the importance of a four-pillar approach: monitoring optimizable costs, identifying wasted expenditure, tracking total tokens, and, most importantly, verifying the actual business value of the resulting work.

Designing Better Metrics for AI Success

As energy costs and inference pricing remain volatile, the financial impact of unchecked token usage is becoming a significant concern for IT departments. Leaders are now tasked with aligning AI incentives with the specific goals of their organizations—ensuring that the drive for innovation does not result in unnecessary operational bloat. By focusing on tangible results, companies can move beyond the surface-level metrics of usage and toward a sustainable strategy for AI integration.

The conversation around AI governance continues to evolve as more enterprises move from pilot programs to full-scale deployment. IT leaders are expected to face increasing scrutiny over their AI spending and the efficacy of their internal adoption policies in the coming fiscal quarters. Readers are encouraged to share their experiences with AI implementation and the metrics their organizations are using to track success in the comments below.

Leave a Comment