When a task is too large for a single conversation, let Claude write the orchestration as a script.
On May 28, Anthropic released Claude Opus 4.8, along with a new feature—Dynamic Workflows.
Let's start with a number: 11 days, roughly 750,000 lines of Rust code, 99.8% of existing tests passing. That's the result Bun author Jarred Sumner delivered when migrating the entire Bun runtime from Zig to Rust, and the workhorse behind this migration was Dynamic Workflows.
Released as a research preview, it targets tasks too large for a single agent to complete in one pass. Here's how the official blog puts it:
Some problems are too big for one pass by a single agent, especially in complex, legacy codebases: a bug hunt across an entire service, a migration that touches hundreds of files, a plan you want stress-tested from every angle before you commit to it.
Bug hunting across an entire service, migrations touching hundreds of files, plans that need thorough scrutiny from every angle before you commit—these tasks share a common trait: their scale exceeds what a single conversation can coordinate. Dynamic Workflows' answer is to let Claude write this orchestration as an executable script.
From Subagent to Workflow
To understand where Workflow fits, we need to map out the existing collaboration layers in Claude Code.
At the bottom is a single session: one agent instance works from start to finish, processing sequentially. One level up is the subagent—the main agent spawns helpers to search files, read code, run commands, and report back. Above that are Agent Teams, introduced earlier this year, where multiple independent Claude Code instances collaborate in parallel like a team, with agents able to communicate among themselves.
These layers share a common bottleneck: the orchestrator is always Claude itself. It decides turn by turn what to spawn next, and every subagent result must return to Claude's context window before it can decide the next step. This mechanism is flexible for small tasks, but when coordinating dozens or hundreds of parallel tasks, problems arise: the context window can't hold all the intermediate results, and Claude's attention gets diluted by the flood of process information.
Workflow takes a different approach. This time, Claude doesn't orchestrate turn by turn. Instead, it writes the entire orchestration as a JavaScript script—loops, branches, and intermediate result collection are all baked into code—then hands it off to an independent runtime for execution. The official docs summarize this shift succinctly:
A workflow moves the plan into code. With subagents and skills, Claude is the orchestrator: it decides turn by turn what to spawn next, and every result lands in Claude's context. A workflow script holds the loop, the branching, and the intermediate results itself, so Claude's context holds only the final answer.
The plan is moved into code. The script itself holds loops, branches, and intermediate results, leaving only the final answer in Claude's context. This follows the same line of thinking as "fitting large codebases into limited context windows"—the former solves "how to conserve context," while Workflow solves "what to do when the workload simply won't fit in context."
Putting Agent Teams and Dynamic Workflows side by side:
[Image description: From Anthropic product manager Cat Wu's tweet announcing Dynamic Workflows. On the left, Agent Teams show a mesh-like collaboration; on the right, Dynamic Workflows shows a tree structure where "one Claude fans out to hundreds of tasks, each task goes through implementer → two verifiers → fixer, then fans back in"—the "one file with two reviewers" design in the Bun case is already visible in this diagram.]
Where Does Workflow Actually Run?
Before diving into the architecture, let's clear up a common misconception: many people assume Workflow is some orchestration engine running on Anthropic's server, so they go looking for its API protocol or worry about third-party proxy compatibility.
The reality is: the Workflow tool itself doesn't make any server requests. It's a JavaScript orchestration script that Claude Code runs on your local machine—agent(), parallel(), pipeline() are all control flows executing on your computer. What *does* make server requests are the subagents spawned by agent() calls in the script, and those subagents call the model the same way your main conversation window does.
This has a direct implication: if you're using a third-party API proxy and a Workflow fails, it's not Workflow's fault—it uses the same interface Claude Code always uses to call models, which is Anthropic's native Messages API (not OpenAI's /v1/chat/completions):
With this relationship clear, Workflow's execution model becomes straightforward: a brainless, deterministic JavaScript runtime acts as the conductor—it only loops, concatenates strings, and awaits, containing no LLM itself. Only when the script reaches an agent(...) line does the runtime temporarily hire an LLM subagent to do work. And the "real agent"—the main Claude you're chatting with—isn't running at all during script execution: it finishes its turn after issuing the Workflow call, the script runs independently in the background, and when it's done, a notification wakes the main agent to read the final result.
One sentence sums up this division of labor: The JavaScript runtime is the conductor (brainless, deterministic), temporarily hiring LLMs at agent() points to do work, while the main agent sleeps through the whole process, only woken at the end to read the result. Keep this in mind, and all the other features will make sense.
Workflow's Core Architecture
When a Workflow runs, four components work together:
[Image description: The key line in this diagram is that subagent results flow into script variables first, where loops and validation happen inside the runtime, and only the summarized answer returns to Claude. This is fundamentally different from the subagent model where "every result passes through Claude's brain."]
The official docs provide a comparison table that clearly shows the positioning differences between Workflow, Subagents, and Skills:
[Image description: When reading this table, pay attention to the row "What is reusable." Subagents and skills reuse "a worker" or "an instruction," while Workflow reuses the entire orchestration logic—meaning a complex flow that coordinates hundreds of agents for cross-validation, once written, can be saved and run repeatedly. This is what sets Workflow apart from other primitives.]
What the Script Looks Like: meta, Primitives, and the Easiest Pitfall
Since Workflow is a JS script, what does it actually look like? Let's break down the skeleton.
Every script must start with export const meta = {...}, and meta must be a pure literal—no variables, function calls, or template interpolation. It defines the script's name, a one-line description (shown in the permission popup), and stage divisions:
After meta comes the script body. The core primitives are few enough to fit on one table:
[Image description: Each agent() call's prompt is written directly in the script as a regular JS string. To feed different inputs to different agents, a common pattern is to write the prompt as a "function that returns a string," using .map() to interpolate loop variables. Data flows between agents this way—through plain string concatenation, no message bus: per-item data is interpolated into each agent's prompt via .map(), and cross-stage data is passed by JSON.stringify-ing the previous stage's return value and concatenating it into the next stage's prompt.]
The easiest pitfall in the entire script is confusing pipeline and parallel. The fundamental difference is whether there's a barrier: parallel waits for the entire batch to finish before moving on, while pipeline lets each item flow through all stages independently, without waiting for others. The following pattern is a classic waste:
[Image description: If 5 tasks have varying speeds, this barrier forces fast tasks to wait for slow ones. The correct approach is to put the intermediate transform into a pipeline stage, letting each piece of data move to the next step as soon as it's ready.]
Only three situations truly require a barrier: when the next stage needs to deduplicate or merge the full set; when you need to exit early based on total count (e.g., "skip the entire validation stage if there are 0 bugs"); or when the next stage's prompt needs to reference "all other findings" for cross-comparison. Otherwise, when in doubt, use pipeline.
Three Ways to Trigger
There are three ways to get Claude to start a Workflow.
The first is to use the word "workflow" directly in your prompt. Claude Code will highlight it, indicating that this phrase might trigger a workflow:
[Image description: If you just casually mention "workflow" without intending to trigger one, press alt+w to dismiss the highlight.]
The second is to enable ultracode mode, letting Claude decide whether to start a Workflow:
[Image description: The third is to run a saved Workflow, which appears as a slash command.]
Saved Workflows are stored in two locations, determining their visibility:
[Image description: Press Tab while saving to switch between these two locations. If a project-level and personal-level Workflow share the same name, the project-level one takes priority. Once saved, it becomes a /<name> command, appearing in the slash autocomplete menu alongside regular slash commands.]
After the script is generated but before it actually runs, you have a chance to review what it's about to do. While running, pressing Ctrl+G opens the script in the editor, letting you see exactly what code Claude wrote. This "code is visible and reviewable" property is what makes Workflow more reassuring than black-box automation.
Execution Model and Hard Constraints
Workflow's runtime is isolated from your conversation—the script runs in a separate environment, and your session remains responsive during execution. This isolation also brings a set of hard constraints you need to know.
Concurrency has a cap: at most 16 subagents can run simultaneously (actually min(16, CPU cores − 2), lower on machines with fewer cores), and a single run can have at most 1000 agents. The latter number is a fuse to prevent runaway scripts from burning money in an infinite loop.
For permissions, all subagents spawned inside a Workflow automatically run in acceptEdits mode—file edits no longer prompt for confirmation one by one—and they inherit your current session's tool allowlist. However, one type of action can still interrupt execution: shell commands, web scraping, and MCP tools not in the allowlist will still pop up confirmation dialogs mid-run. So the official recommendation is: before running at scale, add the commands your agents will need to the allowlist, to avoid getting stuck by a permission popup halfway through.
One more thing that's easy to overlook: the script itself has no direct filesystem or shell access—all reading, writing, and execution must go through subagents. The script is purely the "scheduling brain"; the "hands and feet" belong to the subagents.
Resumability is a unique capability of Workflow compared to subagents and Agent Teams. Each run leaves a journal; after modifying the script, you can rerun with resumeFromRunId, and unchanged agent() calls hit the cache directly—only the modified parts and everything after them are re-executed. This is very convenient for debugging orchestration logic—change one line of prompt and rerun, and the agents that already ran correctly hit the cache, saving both time and money. But note one boundary: resumption only works within the same session. You can pause mid-run and continue without losing completed work; but once you exit Claude Code, the next time you come in, this Workflow has to run from scratch.
What the Built-in deep-research Looks Like
To make things less abstract, Anthropic includes a ready-made Workflow for you to try out: /deep-research.
Its usage is simply to follow it with a question:
[Image description: When it runs, it does three things: first, it fans out a batch of web searches from multiple angles; then it fetches and cross-references these sources; next, it votes on each claim; finally, it produces a report with citations, and claims that fail cross-validation are discarded.]
It depends on the WebSearch tool being available. The value of this built-in Workflow is that it bakes "fighting hallucinations" into the orchestration—a single agent searching is easily led astray by one source, but multi-path search plus cross-voting essentially uses structured process to approximate facts. If you want to experience what Workflow feels like, running a /deep-research is the lowest-cost entry point.
Ultracode: Letting Claude Decide Whether to Start a Workflow
If you don't want to manually judge "is this task worth starting a Workflow for" every time, you can delegate that decision to Claude—by enabling ultracode.
[Image description: This command does two things simultaneously: it pushes reasoning effort to xhigh, and it allows Claude to automatically decide when to use a Workflow for your task. Once enabled, a single request might be broken into multiple consecutive Workflows—for example, first one to understand the code, then one to make changes, and finally one to verify. This setting applies to every task in the current session; new sessions reset it.]
The cost is straightforward: the official docs state clearly that each request in ultracode mode consumes significantly more tokens and time compared to lower effort settings. To switch back to daily work mode, just dial it down:
[Image description: This "let the model decide its own scheduling scale" design takes a different direction from Codex's Goals (persistent objectives), which we'll discuss in more detail later.]
Is Workflow a DAG?
When it comes to orchestration, many people's first thought is DAG (Directed Acyclic Graph)—Airflow, Argo, GitHub Actions' needs:, all of which first draw a static dependency graph and then execute along it. Is a Workflow the same?
The answer depends on which graph you're asking about: as a program, it's not necessarily a DAG; but the execution trace of any single run is always a DAG.
Let's start with the program level. Claude Code's Workflow is a Turing-complete imperative JavaScript program. Unlike traditional orchestrators that lock down the dependency graph before execution, it can express things a DAG cannot—most notably, loops. For example: "keep finding bugs until two consecutive rounds produce no new ones":
Beyond loops, it can also write branches determined at runtime (if (bugs.length === 0) return—which path to take depends on the LLM's output from the previous step), and dynamic fan-out (how many agents to spin up in the next phase depends on how many results the previous phase returned)—the shape of the graph is unknown in advance. At the "program structure" level, it has cycles and data-driven branches, which is strictly more expressive than a DAG.
But if you look at "a single specific execution," it is always a DAG. Two reasons: data only flows forward in time—a value cannot depend on a value produced after it; and loops are unrolled—the agents in the N+1th iteration of a while loop are different nodes from those in the Nth iteration. The cycle in the "program" is flattened into a chain in the "trace."
One sentence to remember: A program with while is not a DAG, but the execution trace it produces in a single run is always a DAG. This is precisely what makes it more flexible than traditional DAG orchestrators—traditional tools lock down the graph before execution, while Workflow's topology is produced by running an imperative script, with the shape determined at runtime.
Real Case 1: Bun's Migration from Zig to Rust
None of the concepts above are as convincing as the Bun case. Take another look at the numbers from the beginning: 11 days, roughly 750,000 lines of Rust, 99.8% of existing tests passing, from the first commit to merge.
Bun is a JavaScript runtime written in Zig, with performance as its hallmark. Migrating such a massive runtime from Zig to Rust in its entirety is the kind of engineering that makes your scalp tingle—the Rust borrow checker's strict memory ownership requirements alone would make manual migration a nightmare. Jarred Sumner tackled it with three chained Workflows.
Phase one was lifecycle mapping. The first Workflow did one thing: for every struct field in the Zig codebase, it calculated the corresponding, correct Rust lifetime. This step was isolated because it was the foundation for all subsequent migration work—Rust's memory safety relies on lifetime annotations; if this layer wasn't computed correctly, the resulting .rs files wouldn't even compile.
Phase two was parallel file migration, the part that best demonstrates Workflow's scale advantage. The next Workflow migrated each .zig file into a behaviorally equivalent .rs file, with hundreds of agents working simultaneously, each file also assigned two reviewers for cross-checking. Comparing this scale to Agent Teams reveals the gap—Agent Teams hits a coordination ceiling at three to five members, while Workflow here runs hundreds of agents in parallel with double review.
Phase three was the compile-and-test fix loop. File migration was only half the battle; the real challenge was getting them to compile and pass tests. The third Workflow drove the entire build and test suite, looping to fix until both ran cleanly. This is a classic example of the while loop pattern from the previous section—iterating via script logic rather than Claude watching round by round.
The characteristics of each phase can be summarized in a table:
| Phase | Task | Parallelism | Key Feature |
|-------|------|-------------|-------------|
| 1 | Lifetime mapping | Single agent | Foundation for correctness |
| 2 | File migration | Hundreds of agents + dual review | Massive fan-out |
| 3 | Build & test fix loop | Iterative | while loop pattern |
But it didn't end there. After the migration was merged, an overnight workflow was run to handle cleanup—scanning for unnecessary data copies, opening a separate PR for each optimization found for human final review. This usage pattern—"running unattended overnight to clean up long-tail issues, producing a batch of PRs ready for review"—is an interesting side of Workflow.
To be clear, the official documentation notes that this Rust version of Bun was not yet in production at the time—the entire pipeline ran and tests passed, but it was still far from deployment. Jarred himself said he would write a dedicated article with more details later.
Real Case 2: Auditing 133 Historical Sessions with Workflow
Bun is an extreme case; most people won't get the chance to run a cross-language migration. I tried a more down-to-earth task myself: using Workflow to create a "usage profile" of my own Claude Code historical sessions.
The task: the ~/.claude directory had accumulated 133 sessions, 130MB of jsonl records. I wanted to extract usage patterns, recurring pain points, and automation opportunities. This task was characterized by large data volume and multiple dimensions—perfect for fan-out.
The entire task was split into two parts: "main agent preprocessing + Workflow orchestration." The main agent first did reconnaissance and cleaning: the jsonl was full of tool call noise, which would waste context if fed directly to an agent. So it first wrote a script to compress the 133 sessions into "title + real user input + metadata," yielding 601 real human inputs, then split them into 10 batches. Then Workflow took over: 10 analysis agents each processed one batch in parallel (extracting domain distribution, pain points, automation candidates according to a unified schema), and finally one synthesis agent aggregated and deduplicated across batches, producing a prioritized report.
The execution body of this run, stripped of declarations, looked roughly like this:
const batches = splitIntoBatches(601 inputs, 10);
const results = await parallel(batches.map(batch => analyzeBatch(batch)));
const report = await synthesize(results);The bill: 11 agents, 818,000 tokens, 254 seconds. I also hit a snag—the first run crashed with TypeError: undefined is not an object (evaluating 'batches.map'), because args wasn't passed correctly and was treated as a string. The fix demonstrated the value of Workflow's "script as file, iterable" nature: instead of resending the entire script, I simply edited the script file on disk, hardcoded the path to be self-contained, and reran it with scriptPath.
This case also answers a common question: "Couldn't you just use a few subagents for this? What's the difference?" Yes, you could. The difference isn't in "can it be done," but in "where does the orchestration logic live, and where do intermediate results flow?" If you use the Agent tool to dispatch 10 subagents, all 10 results come back to the main context as tool results, and you have to read them all and decide how to merge them in the next turn—the orchestration logic is ad-hoc judgment, and every coordination step burns main context tokens. Workflow, on the other hand, encodes orchestration as code; the 10 intermediate results never enter the main context—only the final report comes back, with automatic schema validation and concurrent management.
But honestly, for this one-off "map-reduce" task, the gap isn't that big—10 parallel paths, one merge, subagents would suffice. Where Workflow truly pays off is with complexity that scales: when there are multiple phases, loops (keep going until N consecutive rounds with no new findings), multi-round adversarial validation, or fan-out to dozens of units, manual coordination with subagents becomes painful, while scripting it feels natural.
The Fundamental Difference from n8n, Coze, Dify
Seeing "use code to orchestrate multi-step flows + insert LLMs in steps" naturally sparks a thought: isn't this just n8n, Coze, Dify? The only difference is that now the model does the orchestration automatically.
This intuition captures the most critical commonality, but the phrase "the only difference is automatic model orchestration" needs some nuance.
First, the commonality is stronger than you might think. Anthropic's *Building Effective Agents* gives an authoritative definition: Workflows are systems where LLMs and tools are orchestrated via "predefined code paths"; in contrast, Agents are systems where the LLM dynamically directs its own flow at runtime. By this definition, Dynamic Workflow and n8n/Dify/Coze are in the same category—their control flow is deterministic; the LLM doesn't decide "which edge to take next" at runtime. Once the script is written, the runtime executes it mindlessly; the LLM only works inside the nodes. What's *not* in this category is the main conversation's ReAct-style agent (where the model decides the next step in real time). This judgment is entirely correct.
But "the only difference" misses two harder differences. Let's lay them out:
| Aspect | n8n / Dify / Coze | Dynamic Workflow |
|--------|-------------------|------------------|
| Author | Human | Model (generated on the fly) |
| Carrier | Visual DAG | Turing-complete imperative code |
| Expressiveness | Limited (no loops, no dynamic fan-out) | Full (loops, dynamic branches, dynamic fan-out) |
| Reusability | Yes (saved as templates) | Yes (saved as scripts in .claude/workflows/) |
| Node autonomy | Fixed connectors | Each node is an autonomous agent |
Condensing the table into one sentence: Workflow ≈ replacing n8n's graph with a piece of code generated on the fly by the model. It swaps two things: the author (human → model) and the carrier (visual DAG → imperative code). The first brings immediacy and customization (no pre-building, tailored to the task at hand); the second brings increased expressiveness (loops and dynamic fan-out, which visual DAGs cannot do).
As for the most eye-catching difference—"AI automatic orchestration"—it needs to be more precise: AI's involvement happens at the moment of "writing the code," not at the moment of "running the flow." n8n is human-written orchestration with deterministic runtime execution; Workflow is model-written orchestration with deterministic runtime execution—the model is asleep during execution. Both run flows the same way; the difference is only in who wrote the orchestration script.
It's worth noting that both sides are converging: Coze and Dify are adding agent nodes (nodes that become autonomous) and code nodes (where you can write JS/Python), moving toward "code + autonomous nodes"; while Workflow's scripts can be saved into .claude/workflows/ as reusable products, moving toward "build once, reuse many times." So a more accurate conclusion is: your fundamental judgment stands—both are deterministic flow orchestration with LLMs as steps; but the differences go beyond "AI automatic orchestration"—they also include the carrier (Turing-complete code vs. visual DAG) and the fact that each node is an autonomous agent rather than a fixed connector.
How to Hand-Roll a Workflow Before the Official Release
Since Workflow is essentially "deterministic script + calling LLM at nodes," it's entirely feasible to hand-roll something similar before the official release. The core building block is just one thing: claude -p.
claude -p (i.e., --print, headless mode) runs an entire agent loop non-interactively—thinking, calling tools, modifying files—and exits when done. It reads stdin and writes stdout, so it can be piped like any normal CLI tool. Treating each step as a claude -p call, with a shell or Python script as the orchestration loop, gives you a DIY Workflow:
# Step 1: Analyze codebase
claude -p "Analyze the codebase for bugs" > analysis.json
# Step 2: Fix each bug in parallel
for bug in $(cat analysis.json | jq -r '.bugs[].id'); do
claude -p "Fix bug $bug" &
done
wait
# Step 3: Verify fixes
claude -p "Verify all fixes pass tests"Comparing this to the 133-session Workflow earlier, they're isomorphic: & plus wait is the parallel() barrier; $(cat ...) string concatenation is the variable interpolation in prompts. The community has plenty of such practices—the futuresearch.ai article used claude -p with filesystem polling to build an 18-way parallel scanning pipeline: sub-agents wrote results to disk (.json for success, .error for failure), and the orchestrator only polled filenames without pulling outputs into context, reducing complexity from O(n × output size) to O(n × filename).
So what does the official Workflow add over the hand-rolled version? The answer: the model hasn't changed; what's been eliminated is all the engineering grunt work.
One sentence: Workflow productizes this hand-rolled harness, saving engineering grunt work, not changing the model. Understanding this gives you a clear sense of its capability boundaries—it's not magic; it's a runtime that polishes "claude -p + orchestration loop" into something smooth enough to use.
When to Use Workflow
Not every task needs a Workflow. It essentially trades efficiency for massive parallel agent usage, and parallel agents burn tokens. So when does the trade-off make sense?
The official documentation identifies four categories of suitable scenarios.
First, codebase-wide batch audits, such as full-repo bug scanning, performance profiling-guided optimization audits, and security audits. These tasks share the characteristic of "search plus independent verification"—Claude searches the entire service in parallel, then independently verifies each finding to ensure the report surfaces only real issues. Authorization checks, input validation, and full-repo hardening for dangerous patterns follow the same shape.
Second, large-scale migration and modernization, including framework replacement, API deprecation migration, and cross-language porting—Bun is the most extreme example.
Third, critical decisions that require repeated deliberation. When the cost of a wrong answer is high, have Claude approach the problem from multiple independent angles, then dispatch adversarial agents to try to overturn those results, iterating until the answer converges—this adversarial validation can approach a quality level unattainable in a single run.
Fourth, long-tail cleanup, like Bun's overnight workflow—automatically scanning for issues and opening PRs one by one.
Conversely, these types of tasks are overkill for Workflow: small fixes that take one or two steps, exploratory work where you need to make frequent mid-course decisions, and changes to high-risk code like security and payment systems.
Putting Claude Code's existing collaboration primitives together, the selection logic looks roughly like this:
| Primitive | Best for |
|-----------|----------|
| Subagent | "Running errands" (simple, independent tasks) |
| Agent Teams | "Having meetings" (collaborative discussion) |
| Workflow | "Assembly line" (multi-step, parallel, iterative) |
One sentence to distinguish: use subagent when you need "legwork," use Agent Teams when you need "discussion," use Workflow when you need "pipeline processing."
Availability and How to Enable
Dynamic Workflows is currently in research preview, with requirements on both version and plan.
Version: Claude Code v2.1.154 or higher is required. Platform coverage is quite broad: CLI, Desktop, VS Code extension, Claude API, and the three major clouds—Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
Plan differences to note: On Max, Team, and API usage, it's enabled by default; on Pro, it's disabled by default and must be manually enabled in /config under the "Dynamic workflows" line; on Enterprise, it's also disabled by default at launch and must be enabled by an admin in settings. The first time you trigger a Workflow, Claude Code will display what's about to run and ask for confirmation—it won't run silently.
If you want to disable it completely, there are several levels of switches:
- Per-session:
/configtoggle - Environment variable: Set
CLAUDE_CODE_DYNAMIC_WORKFLOWS=0at startup - Organization-wide: Managed settings or admin console
Individuals can toggle it directly in /config, while organizations can disable it uniformly via managed settings or the admin dashboard.
Token Cost: A Must-Calculate
This is the part you absolutely have to account for. Workflow's power comes from massive parallelism, and parallelism burns tokens at scale. The Bun migration, for example, consumed hundreds of thousands of tokens per phase. For the 133-session audit, 11 agents burned 818,000 tokens in 254 seconds—that's roughly 3,200 tokens per second.
Is it worth it? It depends on the task's value. For a one-time audit that saves hours of manual work, 800K tokens is cheap. For a daily cleanup script, you'd want to optimize aggressively—perhaps by reducing parallelism, using cheaper models for sub-tasks, or caching intermediate results.
The key insight: Workflow's token cost scales with the number of parallel agents and the complexity of each node. A simple linear workflow with 3 agents might cost 50K tokens; a massive fan-out with 100 agents could easily hit 5M tokens. Always estimate before you run, and monitor after.
The official documentation is unusually candid here: the token cost of a single Workflow run is significantly higher than a normal Claude Code session. The logic is straightforward—dozens or even hundreds of subagents running in parallel, each burning tokens, plus the overhead of cross-validation, adversarial review, and other "extra redundancy" mechanisms. The bill naturally goes up. In the earlier 133-session case, 11 agents consumed 818K tokens—and that's just a lightweight 10-path orchestration. All of this counts against your plan’s usage and rate limits.
A few practical tips from the official docs are worth jotting down. First, start with a small, well-scoped task to get a feel for how much it costs in your workflow before deciding to let it run on bigger jobs. Second, before scaling up, use /model to make sure you're on the right model, and ask Claude to switch to a smaller one for stages that don't need the strongest capabilities—you don't have to use the most expensive brain at every step. Third, as mentioned earlier, pre-allow-list your commands to avoid mid-run permission popups interrupting a task that's been running for hours.
The good news: you can stop a Workflow at any time, and work already done won't be wasted.
Known Limitations
Since this is still in research preview, here are the boundaries so you don't hit them later:
- No human input during execution: Except for permission confirmation popups, once a Workflow starts running, it won't stop to wait for your sign-off. If your process requires phased approvals, break it into multiple independent Workflows.
- The script itself has no file or shell access: All reads, writes, and execution are handled by subagents; the script only orchestrates.
- Concurrency and total limits: Maximum 16 concurrent subagents, with a cap of 1,000 agents per run.
- Not recoverable across sessions: If you exit Claude Code, the Workflow starts from scratch the next time you open it.
- How to pass parameters to custom Workflows: The built-in
/deep-researchaccepts a question parameter, but the official docs are still vague on how to pass parameters to your own saved Workflows.
The entire feature is still in research preview, so behavior and constraints may change with releases.
Where It Sits in the Claude Code Ecosystem
At this point, it's worth laying out the several extension primitives of Claude Code in one table for a full overview:
[Table not provided in this chunk, but implied—keep as reference.]
From top to bottom, control over orchestration shifts step by step from you to Claude, and then to code. Workflow sits at the far right—you give it a single task description, and the orchestration is handled by Claude-written scripts, scaling up to hundreds of agents.
This brings us back to the foreshadowing at the beginning: why Dynamic Workflows and Opus 4.8 launched on the same day. When you have hundreds of agents working in parallel and cross-reviewing each other's conclusions, the reliability of each node becomes critical. Each node is a probabilistic LLM; ask the same question from a different angle and you may get a different answer. Uncertainty compounds across multi-step flows. Opus 4.8 was specifically strengthened in this area: the official docs say it "makes code defects roughly four times less likely to slip past unnoticed" compared to the previous generation, and it's more likely to flag things it's unsure about. This kind of honesty improvement is exactly what makes "hundreds of agents cross-validating" viable—for cross-validation to work, the reviewer has to actually point out problems, not just nod along. A strong model isn't optional for Workflow; it's the load-bearing wall.
Finally, let's zoom out and put Workflow in a bigger picture. Turning "orchestration logic into code" as a clear product choice is worth pondering: orchestration evolves from a transient behavior that the model figures out on the fly every time, into a code asset that can be read, reviewed, saved, and run repeatedly. Comparing it with Codex's Goals reveals an interesting divergence. Both aim to solve "how to keep pushing forward on large tasks," but they take opposite paths: Codex Goals bets on goal persistence—pin the goal and let the model figure out the steps by itself; Claude Code Workflow goes the orchestration-as-code route—write the process as a script, and rely on the script to keep the flow on track. One governs "where to go," the other governs "how to get there"—two different engineering philosophies. It's too early to say which path will go farther, but the fact that both leading products are putting serious effort into "handling super-scale tasks" shows this is the main battleground for AI coding in the next phase.
A Personal Take
Finally, let me share my own judgment. I think dynamic workflows are extremely powerful and likely represent the direction of the future. Their viability rests on two pieces fitting together seamlessly: one is turning "orchestration" from the model's improvisation into controllable code; the other is having a sufficiently honest and strong frontier model that can support hundreds of agents cross-validating each other—which is exactly why it needed to launch on the same day as Opus 4.8. The frontier model is the foundation; orchestrating as code is what makes "hundreds of agents collaborating without going off the rails" possible. Both are indispensable.
And precisely because it's so deeply dependent on the capabilities of frontier models, my judgment is this: within a year, this approach of "the model writing the orchestration script on the fly, then directing a fleet of agents" will go from being a research preview at one company to becoming the default in almost every coding agent.
By the way, the research behind this article was itself run as a Workflow: fifteen agents working in parallel—reading primary conversation logs, cross-referencing industry materials—using 270K tokens, taking 169 seconds, and finally assembling the results into a single piece of source material. Using Workflow to explain Workflow is probably the most direct dogfooding possible.
References
- Dynamic workflows official documentation
- Introducing dynamic workflows in Claude Code (official blog post)
- Claude Opus 4.8 release notes
- Run agents in parallel (official comparison: subagent / agent teams / workflow)
- Building Effective Agents (Anthropic's definitions of workflows and agents)
- Headless /
claude -pdocumentation - Bun “Rewrite Bun in Rust” PR (oven-sh/bun #30412)