A new study on AI agents and enterprise automation reveals that how you manage an AI's memory isn't just a technical detail — it's the difference between an agent that completes 8% of your tasks and one that nails 92% of them.
The AI Agent Productivity Problem You Didn't Know You Had
Researchers at Microsoft tested GPT-5 as an autonomous agent handling real enterprise workflows — specifically, hotel expense itemization inside Microsoft Dynamics 365. The results without any context engineering? A dismal 8% completion rate.
The culprit is something called context overflow: enterprise tools are chatty. They return walls of data with every call, and after a few dozen back-and-forth exchanges, the agent drowns in its own conversation history, loses track of where it is, and fails.
The Context Engineering Fix for AI Agent Automation
The fix turned out to be elegant. Instead of feeding the agent its full conversation history, the researchers pruned it to just the last 5 tool interactions, then added a compact running summary of everything before that. Completion rates jumped to 91.6% — and token usage dropped by 63% compared to full-history retention.
Think of it like a skilled contractor who doesn't re-read every email ever sent on a project — they keep a one-page brief and the last few messages. That's context engineering in a nutshell, and you can apply this logic today when building or prompting any long-running AI agent.
If you want to go deeper on building agents that actually finish what they start, our Hermes Agent Essentials course covers exactly this kind of architecture, and Build Your First RAG Pipeline tackles the retrieval and memory patterns that underpin it.
What This Means for Learners
This research hands you a concrete, actionable principle: less context, smarter agents. Whether you're prompting ChatGPT for a multi-step project, building an automation in n8n, or deploying a customer-service bot, bloated memory is silently killing your results.
The practical takeaway? When designing any agentic workflow, build in a summarisation step after every 5–10 tool calls. Discard raw history. Keep the essence. Your agent will be faster, cheaper, and dramatically more reliable — no PhD required.