Anthropic is Turning Claude Into an “Internal Operating System” to Redefine How Work Gets Done

For decades, the concept of an operating system has been defined by boundaries: a desktop of icons, a sequence of nested menus and a rigid set of workflows that dictate how a user moves from point A to point B. In the corporate world, this manifests as a fragmented ecosystem of dashboards, spreadsheets, and specialized software, where the “system” is often a series of hurdles employees must navigate to get a single task done.

At Anthropic, the AI safety and research company, that logic is being dismantled. The organization is currently reorganizing itself around a premise that is as simple as it is disruptive: work no longer requires a fixed system to run through. Instead, the company is treating its flagship AI, Claude, as an internal operating system, shifting the center of gravity from software interfaces to a single, intent-driven prompt.

This transition represents more than just the adoption of a new tool. it is a fundamental shift in organizational architecture. By integrating Claude into the very fabric of its daily operations—through products like Claude Code and Cowork—Anthropic is attempting to bypass the traditional “software stack” entirely. In this new model, the AI doesn’t just assist with the work; it interprets the intent, pulls the necessary context from disparate sources, and produces outputs that render traditional analytics dashboards and manual coordination obsolete.

According to internal data from the company, employees now utilize Claude in approximately 60% of their work, reporting productivity gains of roughly 50% via Anthropic’s research on AI transformation. However, this efficiency does not necessarily indicate fewer hours worked. Rather, it has expanded the scope of what is possible, with about 27% of AI-assisted tasks being projects that employees would not have even attempted without the model’s capabilities.

The Rise of ‘Skills’ and Version-Controlled Workflows

One of the primary risks of a prompt-driven environment is inconsistency. When employees rely on natural language to execute complex tasks, there is a high probability that two people will approach the same problem differently, leading to uneven quality and a lack of auditability. To solve this, Anthropic has introduced a layer of standardization known as “Skills.”

Mike Krieger, co-lead of Labs at Anthropic and the Instagram co-founder and former CTO, explains that Skills are essentially packaged, version-controlled workflows. These packages include the specific instructions, context, and proven steps required to complete a task successfully. Once a high-performing workflow is identified—such as a specific method for contract review in the finance department—it is codified as a Skill and made available to the rest of the organization.

Mike Krieger [Photo: Anthropic]

This approach transforms the AI from a general-purpose chatbot into a reproducible corporate asset. By ensuring that the work is consistent and auditable, Anthropic is bridging the gap between the probabilistic nature of Large Language Models (LLMs) and the deterministic requirements of enterprise business operations. The result is a system where a new employee can achieve the same quality of output as a veteran on day one, simply by utilizing a shared Skill.

From Legal Review to Code Migration: AI in Practice

The impact of this “AI-first” office is most evident in departments traditionally resistant to automation, such as legal and high-level engineering. Mark Pike, associate general counsel at Anthropic, demonstrated the flexibility of the system by building a custom legal review plugin in a single afternoon. By feeding Claude the company’s specific policies, playbooks, and legal frameworks via markdown files and system instructions hosted on GitHub, Pike created a tool that evaluates drafts against a defined legal framework, flags risks, and summarizes findings directly in Slack.

Mark Pike
Mark Pike [Photo: Anthropic]

Pike notes that while the AI handles the heavy lifting of pattern-matching and first drafts—including the analysis of 742 Jira tickets in a single conversation—human oversight remains mandatory. Given that LLMs can hallucinate, the final accountability rests with the lawyer, allowing the legal team to pivot their focus toward complex negotiations and high-level judgment calls rather than administrative “busywork.”

In the engineering domain, the gains are even more pronounced. Boris Cherny, head of Claude Code, has indicated that engineering productivity has surged by 200%, measured by the number of pull requests per engineer. This capability is already being mirrored by external partners. For instance, the cloud security firm Wiz reportedly used Claude Code to migrate a 50,000-line codebase in approximately 20 hours—a task their engineers had originally estimated would take two to three months of specialized labor.

The ‘Claude Effect’ and the Benchmark Battle

The internal success at Anthropic is underpinned by what some industry observers call the “Claude Effect”—a surge in performance across key technical benchmarks. As of April 2026, the latest iterations of the model, including Claude 4.5 and 4.6 Opus, have consistently ranked at the top of industry evaluations.

On the SWE-bench, which tests a model’s ability to resolve real-world software engineering issues, Claude achieved a score of approximately 78.7%, edging out OpenAI’s GPT-5.4, which scored 76.9%. Similarly, in the Vals Index—a composite benchmark measuring performance in high-stakes domains like finance and law—the Sonnet 4.6 variant has outperformed competitors such as Google’s Gemini 3.1 Pro in overall task execution.

However, the claim that an AI model can function as an “operating system” remains a point of contention among tech architects. Jeffrey Chivers, CEO of the AI litigation platform Syllo, argues that a true operating system must provide a deterministic and stable foundation. In his view, while Claude is an exceptional tool for building such systems, calling the model itself an operating system may be a “forced effort.”

This tension was highlighted by the emergence of OpenClaw, an open-source agent framework that attempted to turn Claude into a persistent execution layer. By connecting the AI to Slack and Discord and bypassing standard API billing, developers created “always-on” agents. This led Anthropic to intervene in April 2026, blocking subscription-based access for such platforms and requiring a shift to metered API usage to protect its infrastructure from unsustainable demand.

The Paradox of Productivity: More Work, Not Less

One of the most critical insights emerging from Anthropic’s experiment is that AI does not necessarily reduce the total volume of work; instead, it increases the capacity for output. Cat de Jong, head of applied AI at Anthropic, emphasizes that the goal is not simply to do the same work faster, but to do work that was previously impossible.

Cat de Jong
Cat de Jong [Photo: Anthropic]

By developing the Model Context Protocol (MCP), Anthropic has enabled Claude to plug directly into the tools companies actually use, such as Gmail, Slack, and Salesforce. This allows the AI to interact with the world like a human—creating real files and executing code rather than just describing the process. The result is a shift in the nature of the workload: employees spend less time on manual execution and more time on the “supervisory” task of auditing and refining AI output.

This shift introduces a new set of risks. Senthil Muthiah, a senior partner at McKinsey & Company, warns that agentic AI may be compressing the “apprenticeship curve.” There is a growing concern that a new generation of workers may turn into proficient at supervising AI before they have developed the fundamental understanding of the work itself, potentially creating a leadership vacuum in the future.

Key Takeaways: The AI-Native Workflow

  • Shift to Intent: Work is moving away from navigating software interfaces and toward a single-prompt “internal operating system.”
  • Standardization via ‘Skills’: Version-controlled, reusable workflows prevent quality decay in prompt-driven environments.
  • Expansion of Scope: AI is not just saving time; it is enabling 27% more tasks that were previously deemed too complex or time-consuming.
  • The Supervision Burden: Productivity gains are partially offset by the need for rigorous human auditing to prevent “AI slop” and hallucinations.
  • Infrastructure Strain: The move toward “always-on” agents (like OpenClaw) is forcing AI labs to shift from subscription models to metered API usage.

Conclusion: The Future of Enterprise Software

Anthropic’s internal transformation is a living laboratory for the future of the enterprise. If the company’s thesis holds true, the traditional foundation of enterprise software—the rigid, feature-based application—will be replaced by a flexible, conversational layer that governs how work happens. In this world, the value of a tool is not found in its features, but in the trust a user has in the AI’s decision-making process.

The ultimate question for other organizations will be how they reinvest the “saved time.” As Jeffrey Chivers suggests, the critical signal will be whether companies use this efficiency to foster higher-order thinking and mentorship, or simply to pad short-term margins.

As Anthropic continues to iterate on its frontier models and the Model Context Protocol, the next major checkpoint will be the wider industry adoption of “agentic” workflows and the subsequent impact on professional certification and apprenticeship standards. We will continue to monitor the rollout of new Claude iterations and their integration into global enterprise stacks.

What do you reckon? Is the “AI-as-an-OS” model the future of work, or does it create too much organizational fragility? Share your thoughts in the comments below.

Leave a Comment