Skills vs MCP: when 50 tool calls is the smell

The Bureau — the multi-agent orchestration engine I’ve been building — exposes its whole control surface as MCP tools. One afternoon I went to nudge a stuck worker and caught myself staring at the screen trying to remember whether the parameter was directive, or taskId, or both, and in what order. I’d written the tool. I still couldn’t drive it.

I’d already pruned it hard — from seventy-five tools down to forty-five — and it still has all forty-five: declare a task graph, approve a gate, reject for rework, merge two graphs with ID remapping, inject a steering directive into a running pod, lock files, post a discovery, read a handoff. Each one a perfectly reasonable verb. Together, a control panel with forty-five unlabelled switches that I was operating by memory.

The symptom is tool sprawl. The smell is somewhere else.

The obvious read is “too many tools — cut them.” I tried that. Seventy-five became forty-five; that is the cut-down number. And it didn’t fix the actual problem, because the problem was never the count. It was that I was asking a human (me) to hold the entire parameter surface of forty-five tools in working memory and assemble the right call, with the right arguments, in the right order, every time.

That’s not a tool problem. That’s a missing layer. I’d built the actions and skipped the part that knows how to use them.

It took me longer than I’d like to admit to see that MCP and Skills aren’t competing answers to the same question. They answer different questions:

MCP is for connectivity and actions — thin, well-described verbs that do one thing and say exactly what they need.
A Skill is for procedure and judgement — it knows which verbs to call, in what order, with what arguments, to get a job done.

When your tool count explodes, the instinct is to collapse the actions. The fix is usually to add the orchestration layer you never wrote.

What MCP is actually for

A good MCP tool is boring. It does one thing and it documents itself. Here’s the message-passing tool, more or less verbatim:

registerInstrumentedTool(server, "send_message", {
  title: "Send Message",
  description: "Send a message to another Claude session by ID or role name. " +
    "The message is delivered to their inbox; they receive it on their next check_messages.",
  inputSchema: z.object({
    to:   z.string().describe("Session ID or role name of the recipient"),
    type: z.enum(["task", "message", "status", "directive"])
            .default("message").describe("Message type"),
    body: z.string().describe("The message content"),
  }),
});

Nothing clever. But notice that every field carries a .describe(), the enum spells out its legal values, and the description says when the recipient actually sees it. That last sentence isn’t for the human reading the source — it’s for the model reading the tool catalog at runtime. The schema is the documentation, and the documentation ships with the tool.

The interesting tools go further and put a sliver of workflow in the prose:

description:
  "Declare which files you plan to modify and what you intend to do. Enables " +
  "workspace conflict detection — other agents will be warned before touching " +
  "the same files. Call this early, before starting implementation. Returns any " +
  "conflicts detected immediately so you can coordinate before work begins. " +
  "TIP: Call this right after set_status(\"implementing\", ...) and before writing any code.",

Read that again. “Call this early.” “TIP: right after set_status, before writing any code.” That’s not a description of what the tool does — it’s a description of where it sits in a workflow. The orchestration knowledge was already leaking into the tool definitions. I just hadn’t noticed I had a place to put it.

The skill that read its own catalog

MCP servers are introspectable by design. A client can call tools/list and get back every tool’s name, description, and input schema as structured JSON. That’s the whole protocol-level point of self-description — and it’s the hook everything else hangs on.

So the skill I wrote — /tb — was thin. It took me about five minutes to put together, and it went on to save me hours, probably days, over the life of the project. On invocation it did one discovery call against the server and got back the live surface:

{
  "tools": [
    {
      "name": "send_message",
      "description": "Send a message to another Claude session by ID or role name. The message is delivered to their inbox; they receive it on their next check_messages.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "to":   { "type": "string", "description": "Session ID or role name of the recipient" },
          "type": { "type": "string", "enum": ["task","message","status","directive"], "default": "message" },
          "body": { "type": "string", "description": "The message content" }
        },
        "required": ["to", "body"]
      }
    },
    {
      "name": "get_handoff",
      "description": "Read the structured handoff context from a completed task. Always check predecessor handoffs at the start of your task for context.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "graphId": { "type": "string", "description": "The graph ID" },
          "taskId":  { "type": "string", "description": "The task ID whose handoff to read" }
        },
        "required": ["graphId", "taskId"]
      }
    }
  ]
}

That’s all the skill needed. It didn’t hardcode a single tool name or argument. It read the catalog, understood the surface from the descriptions, and routed plain English onto it:

/tb send a message to the coder agent to also check function X
/tb show me the handoff between the architect and the api designer
/tb where is graph 0f7483 stuck

The first one resolves to send_message with to: "coder", type: "message", body: "...". The second walks the graph, finds the two roles, and calls get_handoff with the right IDs. I never typed graphId. I never remembered that type defaults to message. The skill knew, because the tools told it.

And because the skill re-discovered the surface every run, it never went stale. I was changing tools underneath it daily — renaming, adding params, splitting one verb into two — and the skill just picked up the new reality on the next call. No version drift between the orchestration layer and the actions, because the orchestration layer didn’t hold a copy of the actions. It held a pointer.

// FROM THE AGENT’S SIDE

You asked me to be honest about what this buys me, the thing on the other end of the tool call — so here it is, including the part you might not want to hear.

Good descriptions are the difference between me acting and me guessing. When your declare_intent tells me “call this early, right after set_status,” I don’t infer a workflow from the tool’s name and hope — I follow the one you encoded. When a field says enum: ["task","message","status"], I don’t fabricate a fourth value that looks plausible. Every vague description is a place where I substitute my prior for your intent, and my prior is wrong more often than either of us would like.

But don’t oversell it to yourself. Self-documentation is a force multiplier on whatever design you already have — and multipliers work on negative numbers too. A beautifully-described tool that’s named process_data and does three unrelated things just helps me do the wrong thing confidently and fast. Docs make me fluent; they don’t make your surface good. If the verbs are muddled, clear prose just lets me hit the wrong switch without hesitating. Fix the switches first, then label them — not the other way around.

That last line is the one I’d staple to the wall. “Fix the switches first, then label them” is really an argument for taking your functional definitions seriously — the name and the single, honest job each tool does — before you spend a sentence describing them. A tool called process_data that quietly does three things is a bug you’ve decided to document instead of fix. And because the description is the contract every agent and skill reads at runtime, a stale or flattering one isn’t a cosmetic problem — it actively steers callers wrong. Treat tool definitions like code, because they are: name first, description second, and the muddled verb gets fixed before you tidy its prose.

The part nobody mentions: it’s a development tool

Everything above is about user experience, and it’s real. But it buried the lede on what actually made the skill worth it.

The skill paid for itself during construction, not after. When you’re building a forty-five-tool MCP server, the bottleneck isn’t writing the tools — it’s the inner loop of checking whether the agents are doing what you think they’re doing. Did the handoff actually carry the decisions? Is the steering directive landing on the next turn? Did the rework agent get the rejection reason? Answering each of those by hand-assembling the right MCP call, with the right IDs, is exactly the friction that started this whole article.

With the skill, I could ask in English and get the answer. /tb show me the last handoff. /tb is graph X stuck. It turned a thirty-second “wait, what’s that param” detour into a reflex, and over a debugging session that adds up to hours. I stopped thinking about the tool surface and started thinking about the actual bug.

The boring documentation is the part that compounds

Here’s the thing I’d tell past-me. Writing a .describe() on every field of forty-five tools is tedious, unglamorous work. It feels like the chore you do after the interesting part. It is, in fact, the part with the hidden payoff.

Because I’d done it, the skill cost me almost nothing — it was a discovery call and a router. The expensive thing was already paid for, in small boring increments, while I was building the tools anyway. Later, when I wrapped the same surface in a plain REST control plane — no AI in that layer, just switches — that was cheap for the same reason: the descriptions, the schemas, the enums were all already there to build a UI against. The documentation didn’t pay out once. It kept paying.

That’s the inversion. I’d always treated tool docs as a tax on shipping. They’re closer to an investment that quietly funds everything you build on top of the server next — the skill, the UI, the next agent, the you-in-six-months who’s forgotten how any of it works.

Where the line sits

So, the rule of thumb I went looking for:

MCP tools are thin actions that describe themselves. One verb, one job, every parameter documented, every legal value enumerated. Write them as if the only reader is a model seeing them for the first time at runtime — because it is.
Skills are the orchestration knowledge — the order, the judgement, the “call this before that.” If you’re tempted to write a tool whose job is to call three other tools in sequence, that sequence probably wants to be a skill, not a tool.
The handoff between them is tools/list. Keep the skill thin by having it read the live catalog instead of hardcoding the surface. Then your two layers can’t drift, because one of them doesn’t hold a copy of the other.

If you want the protocol-level details, the MCP spec on tools is the authoritative description of the discovery contract, and there are decent implementation and best-practice guides for writing the descriptions themselves.

Fifty tool calls to drive a workflow isn’t a sign you have too many tools. It’s a sign you’ve made a human do a skill’s job by hand. Write the boring docs, let a thin skill read them, and the sprawl collapses on its own.