This is a helpful distinction. A lot of people still treat prompts as the control plane, but you’re right that capability and guardrails are the real boundary.
In live setups we’re seeing the biggest failures when teams don’t define what an agent is allowed to touch vs what it’s only allowed to suggest.
Really strong framing here — “session key isolates context, not power” is the line most teams miss.
The capability-tier view also explains a lot of real incidents I’ve seen: people think they enabled one tool, but they actually exposed a different authority path (attached browser vs managed browser, sandbox exec vs host exec, etc.).
One thing I’d love to see in Part 6 is a practical “proof bundle” for each run: intent summary, capability surface snapshot, approval events, and post-run diff of side effects. That would make audits much less narrative and much more verifiable.
Really appreciate that. I will try to address it in my next Part 6 as the idea that a production run should leave behind a small “proof bundle,” not just a transcript, intent, capability surface, approval events, and durable side effects.
This is a helpful distinction. A lot of people still treat prompts as the control plane, but you’re right that capability and guardrails are the real boundary.
In live setups we’re seeing the biggest failures when teams don’t define what an agent is allowed to touch vs what it’s only allowed to suggest.
This piece made that line click for me.
Really strong framing here — “session key isolates context, not power” is the line most teams miss.
The capability-tier view also explains a lot of real incidents I’ve seen: people think they enabled one tool, but they actually exposed a different authority path (attached browser vs managed browser, sandbox exec vs host exec, etc.).
One thing I’d love to see in Part 6 is a practical “proof bundle” for each run: intent summary, capability surface snapshot, approval events, and post-run diff of side effects. That would make audits much less narrative and much more verifiable.
Really appreciate that. I will try to address it in my next Part 6 as the idea that a production run should leave behind a small “proof bundle,” not just a transcript, intent, capability surface, approval events, and durable side effects.