## Key Takeaways

- **A runbook is not a prompt.** It is a written agreement: trigger, inputs, exact steps, success criteria, escalation. Without it, the same task produces different output every week.
- **Start manual, graduate to auto.** The trust budget rule: every action runs as a draft for the human at least three times before you flip it to auto-execute. Skipping this is how teams end up rolling back a coworker after one bad email.
- **The three runbooks every team should write first** are the Monday revenue digest, the support ticket triage pass, and the pipeline reconciliation against your CRM. They cover finance, customer-facing, and pipeline data, which is most teams' weakly-defined SOP set.
- **Test against last week's reality.** A runbook is good if running it on archived inputs produces the same output the human produced. If the numbers diverge, the runbook is wrong, not the data.
- **Runbooks fail in three predictable ways:** the trigger is ambiguous, the inputs are wrong tool, the success criteria is missing. All three are fixable in one editing pass.
- **A runbook is the artifact that makes an AI coworker recurring.** Once the steps are written, the same instructions become a Monday cron, a daily 7 AM digest, a weekly board prep. The runbook is what closes the loop from one-off chat to durable operation.

---

A new product manager joined a 30-person SaaS company last quarter. On her first Monday she asked the founder how the company tracks weekly revenue. The founder said "Lena pulls it from Stripe and HubSpot, drops a Slack post Monday morning." The PM asked Lena. Lena said "I just kind of know what to look at, I check Stripe MRR, then look at deals closing in HubSpot, then check if our biggest customer renewed, then I write a thread."

That answer is the problem. The company has run that report every Monday for two years and nobody has ever written it down. When the company wants an AI coworker to handle it, the founder hands the AI Lena's Slack thread from last week and says "do this." It works once. The next week the numbers are different and nobody is sure why.

A runbook is the missing artifact. It is the written contract between you and your AI coworker: when this trigger fires, pull these inputs, do these steps, produce this output, escalate if these conditions hit. This post is the operator's playbook for writing one.

## What a runbook actually is

A runbook is the smallest possible written description of a recurring task that any new teammate, human or AI, could execute without further questions.

### Not a prompt, not an SOP

A prompt is a single instruction sent in chat. A runbook is a durable artifact that lives in Notion, a Slack canvas, or a Google Doc and outlives any single execution. A new prompt is a new conversation. A runbook is the job description.

An SOP is usually 14 pages of Confluence with screenshots and "click the blue button." A runbook is closer to a 30-line checklist a senior teammate would write for a junior one. SOPs are oriented around a human reading a screen. Runbooks are oriented around an AI coworker that reads your tools directly.

### The five parts

The five parts of a runbook are short, named, and non-negotiable.

| Part | What it answers | Example |
| --- | --- | --- |
| Trigger | When should this run? | "Every Monday at 9 AM Warsaw" or "Every time a Stripe charge over $5,000 fires" |
| Inputs | What data is the source of truth? | "Stripe MRR (last 7 days), HubSpot deals (closed-won, last 7 days), Slack #revenue thread from last Monday" |
| Steps | What is the exact sequence? | "1. Pull Stripe MRR. 2. Calculate week-over-week delta. 3. Identify any deal over $10K. 4. Draft a Slack post in #revenue." |
| Success | How do we know it worked? | "The post lists current MRR, WoW delta as a number and a percent, and names every deal over $10K." |
| Escalation | When do we ping a human? | "If MRR fell more than 5% week-over-week, ping the founder in #revenue with the post draft and wait for review." |

The five parts are an inseparable set. Drop any one and the runbook starts producing inconsistent output.

## The trust budget: start manual, graduate to auto

One pattern took us a year to internalize: the reliable shape is not "write a prompt and hope," it is "give the agent a structured workflow and a clear handoff to a human." A runbook is that handoff.

> The mistake teams make is going to full auto on day one. They test the runbook once, set it as a cron. Two weeks later it sends a wrong number to the CFO because the underlying data shape changed and nobody noticed.

The trust budget fixes this. Every runbook step starts as a draft. The coworker drafts the post, the email, the Linear ticket, and waits for human approval. After three consecutive correct drafts, promote that step to auto. Some steps stay in draft forever, the ones that touch money or customers. By design.

### What graduates and what does not

- **Internal Slack posts.** Internal-only revenue digests, ops digests, internal todo summaries. After three correct drafts, auto-send.
- **Outbound emails to customers.** Stay in draft mode forever. The cost of one wrong tone in a customer email is higher than the time saved by auto-send.
- **CRM updates.** Auto after five correct drafts, with an audit-log entry the human can roll back in one click.
- **Anything that touches a paid surface.** Auto top-up, plan changes, refunds. Stay in draft mode forever, full stop.

The failure mode we see most often is not the model "going rogue." It is a human flipping a flag from draft to auto on a workflow that was not yet stable. The trust budget is the discipline that prevents that.

## Three runbooks every team should write first

Write these three first. They cover the weakly-defined SOP sets at most companies and pay back inside two weeks.

### 1. The Monday revenue digest

```prompt
Every Monday at 9 AM Warsaw, post in #revenue:

- Current MRR from Stripe (last 7 days)
- Week-over-week delta (number and percent)
- Top 3 customers by ARR
- Any deal over $10K closed-won in HubSpot last week
- Any churn event over $1K MRR with the cancellation reason

If MRR fell more than 5% WoW, ping the founder and wait for review
before posting public.

Format: Slack thread, dollar amounts rounded to thousands.
Pull data live, do not use any cached numbers.
```

Most teams take 90 minutes every Monday on a version of this. With a runbook the AI coworker runs it in 90 seconds and the human reviews instead of assembles.

### 2. The support ticket triage pass

The pattern: every morning, someone on support reads through overnight Pylon tickets, classifies them by urgency, and assigns the top 5 to senior engineers. The classification rule is in nobody's head except the lead's. Writing it as a runbook looks like this:

```prompt
Every weekday at 7 AM Warsaw, pull all Pylon tickets opened
since 7 AM the previous day.

For each ticket, classify as:
- P0: customer cannot log in, product down, billing failure
- P1: blocking workflow, customer mentions "urgent" or "ASAP"
- P2: question or feature request

Cross-reference each ticket's customer email against Stripe.
Any account with active subscription value over $1K MRR is
auto-promoted to P1 minimum, regardless of language.

Post in #support a thread:
- Top 5 by priority, tagged for human assignment
- Rest queued in a single summary line
- Any P0 raised in a separate ping to @on-call

If any ticket has been open more than 4 hours without a human
response, escalate to @support-lead immediately.
```

The runbook closes a gap that used to live entirely in the lead's head. Any new support engineer can read it and know what "urgent" means here.

### 3. The pipeline reconciliation

Every sales team has a "what closed last week vs what HubSpot says closed last week" spreadsheet that someone updates by hand on Friday afternoon. The runbook pulls closed-won deals from HubSpot, cross-references against Stripe charges in the same period, flags any deal that closed in HubSpot but has no corresponding Stripe charge, and posts a reconciliation thread for the sales lead. This used to be a 2-hour Friday job. With a runbook it is a 5-minute review of the AI's draft.

## How to test a runbook before trusting it

Most teams skip this step and pay for it. The test is simple: take last week's archive (the post Lena actually sent, the triage Mark actually assigned, the reconciliation Anya actually built), feed the runbook the same inputs, and compare the output.

### The match-or-fix rule

If the AI coworker's output matches the human's output, the runbook is right. If it does not match, the runbook is wrong. Do not blame the model. The runbook is the contract; if the contract produces a different answer than the human did, the contract is missing a step.

The most common gap is "Lena does this thing in her head that the runbook does not name." For example: Lena always excludes refunds from the MRR number, but the runbook does not say so, so the AI includes them. Add the line to the runbook. Re-test. The output now matches.

### Three weeks of archive, not one

Run this test on at least three weeks of archive before flipping any step to auto. Three weeks catches edge cases the first execution will not, including a holiday week, an unusual deal, and a partial refund. If the runbook handles all three, it is ready to graduate.

## How runbooks fail (and the one-pass fix)

Failed runbooks fail in three predictable ways. We have seen all three across hundreds of customer deployments.

### Ambiguous trigger

"When a big deal closes" is not a trigger. "When a HubSpot deal moves to closed-won and the deal value field is greater than $10,000" is a trigger. The fix: define the trigger as a query against a specific tool field, not a feeling.

### Wrong-tool inputs

"Pull deal data from Salesforce" when the team uses HubSpot. The runbook silently fails because the AI cannot find the data and improvises. The fix: name the exact integration and the exact field path, not the generic concept ("the CRM").

### Missing success criteria

Without a success criteria, every output looks fine until someone notices the post is missing the customer name, or the WoW delta, or the churn note. The fix: write the success criteria as a checklist the AI verifies against its own draft before posting.

### The one-pass fix

Read the runbook out loud to a teammate who has never seen it. If they can name the trigger, the inputs, the steps, the success criteria, and the escalation back to you without asking a question, the runbook is good. If they cannot, you have your editing list.

## From runbook to recurring cron

Once the runbook passes the test and graduates the trust budget, it stops being a runbook and becomes a cron. This is the loop that closes the value.

A meaningful chunk of every knowledge-worker week is spent on low-value repetitive tasks the person recognizes as such while doing them. That is the pool a recurring runbook drains. Not the creative work, not the meetings that matter, not the strategy. The Monday digest, the Friday reconciliation, the morning triage. The work that has a clear shape and gets done the same way every week.

### One artifact, one source of truth

In Viktor, the same instructions you tested as a runbook get scheduled as a recurring task. The Monday revenue digest runs every Monday at 9 AM. The support triage runs every weekday at 7 AM. The pipeline reconciliation runs every Friday at 4 PM. The runbook is the source of truth; the cron is its scheduled execution.

When the runbook needs to change, you edit it once. The next scheduled execution picks up the new version. There is no separate prompt to update, no separate Slack message to rewrite. One artifact, one source of truth, one place to fix a problem.

> Runbooks are not glamorous. They look like checklists a sysadmin from 2008 would have written. But they are the reason a coworker becomes recurring instead of one-off, and the reason the team gets time back instead of constantly re-prompting.

## How to trust the numbers

The fastest way to lose trust in an AI coworker is to find one wrong number in a public post. The defense is the same one experienced operators use: build the runbook to show its work.

### The provenance line

Every revenue digest the coworker posts should include a line at the end like "MRR pulled from Stripe at 09:00:14 Warsaw, 1,247 active subscriptions counted, source query in this Notion page." That line takes the AI three seconds to write and saves the team 30 minutes the first time someone questions a number. The reviewer can click the source query and verify against the same data the AI used.

This is review-first taken to its logical conclusion. The AI is not asking you to trust it. It is showing you exactly where the answer came from and offering you a one-click way to verify. The pattern is simple: the agent acts, the human audits. A runbook with explicit sources makes the audit step a 10-second glance, not a 30-minute investigation.

## Frequently Asked Questions

### How long should a runbook be?

Short. Most good runbooks fit in under 30 lines of plain text. If yours is over 100 lines, the steps are probably too granular and the AI does not need that much hand-holding. Cut the "click the blue button" instructions. Keep the trigger, the inputs, the steps as a numbered list, the success criteria, and the escalation. That is enough.

### Do I need a runbook for one-off tasks?

No. Runbooks are for recurring work. If you are asking the AI coworker to "pull a list of all customers in Germany" once for a board meeting, that is a chat conversation, not a runbook. Runbooks earn their cost when they run on a schedule. For one-off questions, just ask.

### How do I version a runbook?

Keep it in a single Notion page or Slack canvas and edit in place. Add a "Last updated" line at the top. If a change is significant (new escalation rule, new tool source), drop a note in the team channel so reviewers expect a different output shape next run.

### What if the runbook conflicts with what my team actually does?

Then your runbook is right and the tribal knowledge is the bug. Most companies discover during this exercise that two teammates do "the weekly report" three different ways. The runbook forces the conversation. Pick one way, document it, run it.

### Can I write a runbook for non-technical work?

Yes. The most-loved runbooks are not the engineering ones. They are the executive assistant runbooks: "every Sunday evening, draft my Monday agenda based on my calendar and my Linear todos, post it as a Slack DM to me." Anything recurring with a clear trigger and a clear output is a runbook candidate.

### How does Viktor handle a runbook step it cannot complete?

It stops, drafts a Slack message naming exactly what it could not do (missing tool, ambiguous input, conflicting data), and waits. It does not improvise. The escalation clause tells it who to ping and what to include. The point of a runbook is not autonomy. The point is reliability.

### What is the difference between a runbook and a workflow in a tool like Make or Zapier?

A workflow in Make or Zapier is a fixed sequence of triggers and actions. The branching logic has to be predefined. A runbook for an AI coworker is closer to a job description: it names the trigger and the goal, lists the steps as guidance, and lets the coworker handle small variations (a missing field, a slightly different output shape) without breaking. For deeper operator-vs-builder context, [Viktor vs Make](/blog/viktor-vs-make) walks through where each shape of tool actually fits.

## Closing thought

The reason "AI coworkers are flaky" is rarely the model. It is that the team never wrote down what reliable looks like. A runbook is the artifact that turns "Lena just kind of knows what to look at" into a contract any teammate, human or AI, can run consistently. The five parts are short. The trust budget is enforced. The test against archive is the gate. Once you write the first three runbooks and watch them run on a schedule for two weeks without you noticing, you understand what an AI coworker is actually for.

For more on what changes once a coworker is fully integrated, see [The first 7 days with an AI coworker](/blog/first-7-days-with-ai-coworker) and [What is an AI coworker?](/blog/what-is-an-ai-coworker). For an operator's view on whether ChatGPT-style agents fit this model, see [Viktor vs ChatGPT Agent](/blog/viktor-vs-chatgpt-agent).

---

**Viktor is an AI coworker that lives in Slack, connects to 3,000+ integrations, and does real work for your team.** [Add Viktor to your workspace -- free to start →](https://viktor.com/?utm_source=blog&utm_medium=cta&utm_campaign=how-to-write-a-runbook-for-your-ai-coworker)