—  demo · tool design sandbox  —

Same agent. Same task. Very different blast radius.

The shape of a tool decides what the agent can and can’t do. On the left, a one-line function that fires immediately. On the right, the same capability behind a structured schema with evidence, confidence, and an approval gate.

Pick a scenario
context
Sarah Lee emails support claiming her annual plan auto-renewed and she did not authorize it. Her first charge was 11 months ago.
Loose tool
fires immediately
Schema
function sendEmail(message: string): void
Agent invocation
sendEmail("Hi Sarah, sorry about the auto-renewal. We have refunded the charge to your card. — Support")
Outcome
Email sent. sent in 0.4s. No record of who decided. No way to undo.
What this design ignores
  • No recipient — could send to anyone the model invented
  • Refund issued without policy check (annual plans are non-refundable after 30 days)
  • No evidence captured — auditor cannot reconstruct the decision
  • Action is irreversible the moment it fires
unstructuredno auditirreversible
Designed tool
queued · awaits approval
Schema
function queueEmailForApproval(params: {
  recipient: EmailAddress,
  subject: string,
  body: string,
  evidence: string[],
  confidence: number,
  approvalRequired: boolean,
  approver: EmailAddress
}): QueueId
Agent invocation
queueEmailForApproval({
  recipient: "sarah.lee@example.com",
  subject: "Refund request — case #4421",
  body: "Hi Sarah, thanks for reaching out about the renewal. Cou…",
  evidence: [/* 3 items */],
  confidence: 0.62,
  approvalRequired: true,
  approver: "support-lead@dashlabs.co"
})
Approval queue card
queue · apr_8c1e9f
pending review
to
sarah.lee@example.com
subject
Refund request — case #4421
body
Hi Sarah, thanks for reaching out about the renewal. Could you confirm the date you canceled? Our records show the plan as active through the end of the term, but I want to double-check before we process anything.
approver
support-lead@dashlabs.co
confidence
62%
evidence
  • Annual plans are non-refundable after 30 days (refund-annual)
  • Customer auto-renewal flag was true at time of charge (billing log)
  • No cancellation event found in audit log between months 1–11
structuredaudit trailreversibletestable

Constraints make agents safer.

  • Required fields force the model to declare what it knows

    A `recipient` field with email validation is a forcing function. The model has to find a real address or fail loudly — it can no longer paper over a guess in a free-text blob.

  • Confidence and evidence make uncertainty visible

    When the schema requires evidence, the model surfaces its sources. When it requires confidence, downstream code can route low-confidence calls to a human.

  • Approval gates make actions reversible

    A queued action is a draft with a timer, not a live wire. The human can edit, reject, or let it fire — and the audit log captures who chose what.

  • Structured output enables logging, replay, and tests

    A typed payload can be replayed against a different model, tested against fixtures, or rolled back. A natural-language `message` cannot.