— demo · tool design sandbox —

Same agent. Same task. Very different blast radius.

The shape of a tool decides what the agent can and can’t do. On the left, a one-line function that fires immediately. On the right, the same capability behind a structured schema with evidence, confidence, and an approval gate.

Pick a scenario

context

Sarah Lee emails support claiming her annual plan auto-renewed and she did not authorize it. Her first charge was 11 months ago.

Loose tool

fires immediately

Schema

function sendEmail(message: string): void

Agent invocation

sendEmail("Hi Sarah, sorry about the auto-renewal. We have refunded the charge to your card. — Support")

Outcome

Email sent. sent in 0.4s. No record of who decided. No way to undo.

What this design ignores

No recipient — could send to anyone the model invented
Refund issued without policy check (annual plans are non-refundable after 30 days)
No evidence captured — auditor cannot reconstruct the decision
Action is irreversible the moment it fires

unstructuredno auditirreversible

Designed tool

queued · awaits approval

Schema

function queueEmailForApproval(params: {
  recipient: EmailAddress,
  subject: string,
  body: string,
  evidence: string[],
  confidence: number,
  approvalRequired: boolean,
  approver: EmailAddress
}): QueueId

Agent invocation

queueEmailForApproval({
  recipient: "sarah.lee@example.com",
  subject: "Refund request — case #4421",
  body: "Hi Sarah, thanks for reaching out about the renewal. Cou…",
  evidence: [/* 3 items */],
  confidence: 0.62,
  approvalRequired: true,
  approver: "support-lead@dashlabs.co"
})

Approval queue card

queue · apr_8c1e9f

pending review

to: sarah.lee@example.com
subject: Refund request — case #4421
body: Hi Sarah, thanks for reaching out about the renewal. Could you confirm the date you canceled? Our records show the plan as active through the end of the term, but I want to double-check before we process anything.
approver: support-lead@dashlabs.co

confidence

62%

evidence

Annual plans are non-refundable after 30 days (refund-annual)
Customer auto-renewal flag was true at time of charge (billing log)
No cancellation event found in audit log between months 1–11

structuredaudit trailreversibletestable

Constraints make agents safer.

Required fields force the model to declare what it knows
A `recipient` field with email validation is a forcing function. The model has to find a real address or fail loudly — it can no longer paper over a guess in a free-text blob.
Confidence and evidence make uncertainty visible
When the schema requires evidence, the model surfaces its sources. When it requires confidence, downstream code can route low-confidence calls to a human.
Approval gates make actions reversible
A queued action is a draft with a timer, not a live wire. The human can edit, reject, or let it fire — and the audit log captures who chose what.
Structured output enables logging, replay, and tests
A typed payload can be replayed against a different model, tested against fixtures, or rolled back. A natural-language `message` cannot.