Why Prompt Efficiency is the New AI Governance Metric: WVU Medicine’s Brian Dilcher on Epic Agent Factory Token Costs

Brian Dilcher of WVU Medicine identifies token consumption as a critical new challenge for healthcare AI governance. He notes that inefficient prompt engineering within Epic’s Agent Factory can cause a single AI agent to consume 12 times more tokens than an optimized version, potentially leading to significant, unforeseen operational costs for health systems.

As healthcare organizations integrate generative artificial intelligence through Epic Systems’ new capabilities, the economic impact of “token consumption” is shifting from a technical detail to a primary budgetary concern. Dilcher’s observations suggest that the way developers build AI agents will directly dictate the long-term financial viability of these tools within clinical environments.

How do token costs impact healthcare AI budgets?

In the context of Large Language Models (LLMs), a “token” is the basic unit of text processed by the AI. This unit can be a single character, a word, or a part of a word. Because most generative AI services charge providers based on the number of tokens processed and generated, every instruction sent to an AI agent carries a specific price tag.

Traditionally, healthcare IT budgets for Electronic Health Record (EHR) systems like Epic Systems have been based on predictable, subscription-style licensing models. However, the shift toward generative AI introduces a variable cost model. When a health system deploys an agent via Epic’s Agent Factory, the total cost is no longer just the cost of the software; it is also the cost of every interaction that agent has with a patient’s data or a clinician’s query.

Dilcher’s warning regarding the 12-fold difference in token usage highlights a massive scalability risk. If a hospital deploys 1,000 AI agents to assist with documentation, and those agents are built with inefficient prompts, the organization could face costs that are an order of magnitude higher than anticipated. This volatility makes it difficult for Chief Financial Officers (CFOs) to forecast the ROI of AI implementations.

Why does prompt efficiency vary so significantly between developers?

The discrepancy in token usage often stems from how “prompts”—the instructions given to the AI—are structured. Two developers can task an AI agent with the same goal, such as “summarize this clinical note,” but their methods may differ wildly in efficiency.

Why does prompt efficiency vary so significantly between developers?

One developer might use a “verbose” prompt, including excessive context, redundant instructions, or poorly structured data that forces the LLM to process unnecessary information. Another developer might use “prompt engineering” techniques to create a concise, high-density instruction set that achieves the same result using a fraction of the tokens. This difference is not merely a matter of style; it is a matter of fundamental resource management.

The technical reasons for this variance include:

  • Context Window Management: Including too much irrelevant patient history in a single prompt increases the token count for every subsequent turn in the conversation.
  • Instruction Redundancy: Repeating the same rules multiple times within a prompt inflates the cost without necessarily improving accuracy.
  • Output Formatting: Asking an AI to provide highly decorative or overly wordy responses increases the “output tokens,” which are often more expensive than “input tokens.”

What role does Epic’s Agent Factory play in clinical workflows?

Epic’s Agent Factory is designed to allow health systems to build specialized, autonomous AI “agents” that can perform specific tasks within the EHR. These agents are not just simple chatbots; they are programmed to interact with clinical data, follow specific workflows, and assist with tasks like clinical documentation, scheduling, or administrative triage.

Because these agents operate within the highly regulated environment of a hospital, they require a level of precision and reliability that standard consumer AI lacks. The Agent Factory provides the framework for these agents to exist, but the responsibility for their efficiency and accuracy falls on the builders—the IT professionals and clinical informaticists within the health system.

As these agents become more integrated into daily workflows, the “agentic” nature of the technology becomes a cost driver. An autonomous agent may “think” through a problem by making multiple internal calls to the LLM, each of which consumes tokens. If the agent’s logic is circular or inefficient, it can rapidly deplete a department’s AI budget.

How can AI governance committees monitor token usage?

Dilcher suggests that prompt efficiency must become a standard metric for AI governance committees. In many health systems, governance has traditionally focused on clinical safety, data privacy, and HIPAA compliance. While those remain paramount, the economic dimension of AI is becoming equally vital.

To manage these costs, governance committees may need to implement several new oversight layers:

1. Technical Auditing of Prompts

Before an AI agent is moved from a testing environment to a live clinical setting, its “prompt architecture” should be audited. This involves reviewing the instructions to ensure they are as lean as possible without sacrificing the safety or accuracy of the output.

1. Technical Auditing of Prompts

2. Consumption Dashboards

IT leaders require real-time visibility into token consumption. Monitoring tools should be able to break down costs by department, by specific agent, and by individual user. This allows administrators to identify “expensive” agents that may need to be redesigned.

3. Standardized Prompt Libraries

To prevent the 12x cost variance Dilcher identified, health systems can develop a centralized library of “gold-standard” prompts. By using pre-vetted, highly efficient instructions, developers can ensure consistency and cost-control across the organization.

WVU Medicine's Dilcher: In Epic's Agent Factory, Token Efficiency Is the New Governance Frontier

Key Takeaways for Healthcare IT Leaders

  • Variable Costs are the New Normal: Transitioning from fixed software licenses to token-based AI usage requires a shift in how clinical AI is budgeted.
  • Prompt Engineering is a Financial Skill: The ability to write efficient prompts is no longer just a technical requirement; it is a cost-containment strategy.
  • Governance Must Expand: AI governance committees must include metrics for token efficiency and operational cost alongside safety and privacy.
  • Scalability Depends on Efficiency: An AI tool that works in a pilot program may become financially unsustainable when scaled to an entire hospital system if token usage is not optimized.

Frequently Asked Questions

What exactly is a token in AI?

A token is a chunk of text that an LLM processes. Think of it as the “currency” of generative AI. A single word might be one token, or a complex word might be split into three or four tokens.

What exactly is a token in AI?

Why is prompt efficiency so important for hospitals?

Hospitals operate on thin margins. If an AI tool becomes too expensive to run because of inefficient instructions, it could lead to budget cuts in other critical areas or the discontinuation of the AI tool altogether.

Is Epic’s Agent Factory the only way to use AI in healthcare?

No, but it is one of the most integrated ways. Using AI directly within the EHR minimizes the need for clinicians to switch between different applications, but it also means the AI is deeply tied to the hospital’s core data and budget.

The next major development to watch will be the release of more advanced monitoring and observability tools within the Epic ecosystem, which are expected to provide deeper insights into agentic token consumption. As these tools mature, health systems will likely move toward more automated ways of optimizing AI costs.

What are your thoughts on the economic challenges of generative AI in healthcare? Do you believe token costs should be a primary metric for AI governance? Share your comments and insights below.

Leave a Comment