Discussion about this post

User's avatar
Pawel Jozefiak's avatar

The point about the stack starting lower than most people think is accurate. I spent months focused on prompts and instruction files before realizing the serving layer was the source of inconsistency. Same model, same prompt, different caching behavior = different results. For anyone starting out: the interaction contract section here is worth reading slowly.

The long PDF with tool calls example you used is exactly the kind of edge case that breaks beginners. Simple chat demos hide the brittleness completely. It only shows up under real load with complex tool use - by which point you've already built half your system.

1 more comment...

No posts

Ready for more?