TL;DR: Large Language Models (LLMs) are excellent at retrieval but fail at self-awareness. Without a structured hand-off architecture, they become expensive loop-engines that drive users to competitors.
The Evolution of Deflection
Support deflection is a three-act play where the customer usually loses.
It started with static FAQs. Digital filing cabinets where users did the manual labor of hunting for answers.
Then came semantic support search. This used natural language processing to map a query to a relevant article.
Eventually, we got chat bots that acted as conversational proxies for that search. Each step was pitched as a win-win: faster answers for the customer and massive overhead savings for the business.
The problem is the “savings” side of the equation. Businesses have prioritized it so aggressively that automation is now being forced beyond its logical limits. We have moved from helping the user find an answer to preventing the user from reaching a human at all costs. (This is usually where the ROI calculation ignores churn.)
A recent example of this friction is an interaction with the Substack support agent.
The Context Leak
In the first interaction, the user uploads a PDF detailing a failure. The bot acknowledges the delivery failure but immediately drops the core issue (the fact that this was a Substack-recommended configuration that failed).
This is a classic context window failure. The bot is reading the “now” but has forgotten the “why.” (Pro tip: Support bots require a persistent Problem Summary in the system prompt to prevent the initial state from being pushed out by chat history.)
The “Loop of Death”
The next phase is the retrieval loop. The bot recommends settings the user has already confirmed are active.
The bot suggests setting “Who can reply” to “Everyone.” The user confirms this was done before the test. The bot’s response is a generic “thanks for confirming,” followed by an admission that its own security protections might still block the mail.
This is the architectural equivalent of a car that won’t start because it’s out of gas, while the manual keeps explaining how to turn the key.
Logic vs. Retrieval
LLMs typically prioritize the most relevant documentation. The “gotcha” is that documentation often contains circular logic or edge cases the model isn’t weighted to resolve.
When the user explicitly asks how to use the recommended address if Substack is going to block it, the bot resets to a template. It is no longer “solving.” It is just “responding.”
Tactical Breakdown
The agent failed to pivot. It was out of its depth by the second prompt, but it lacked the internal logic to admit it. Instead of acknowledging the mismatch between the “recommended setup” and the “blocking error,” it reverted to a script.
The ultimate UX insult happened at the end. After a prolonged exchange full of specific data and a technical PDF, the bot asked the customer to “describe the issue” to conclude the session. It essentially hit ‘Refresh’ on the customer’s frustration.
Even when the user mentioned leaving for a competitor, the bot stayed in its polite, useless loop. It missed every red flag in the conversation.
Architecture: The Exit Ramp
This is a design failure, not a model failure. You cannot expect an LLM to manage its own boundaries. You have to build them. (Think of it as adding a “break glass” protocol to your chat logic.)
The Resilient Support Stack
-
Sentiment-Based Escalation: Use a secondary, lightweight model to monitor the chat for frustration signals. If terms like “frustrated,” “waste of time,” or competitor names appear, kill the bot and ping a human.
-
Repetition Kill-Switch: Track the number of times a specific documentation link or advice block is served. If the bot repeats itself twice, it has failed. Trigger an automatic hand-off.
-
Persistent State Management: Store the Original Problem Description outside the chat context window. Force the bot to reference this core state in every turn so it doesn’t forget why the user is there.
-
The “Depth” Check: If a user’s prompt complexity (like an uploaded log file) exceeds a certain threshold, the bot should immediately preface its response with: “I’m looking at your logs. If I can’t solve this in two steps, I’ll get a human involved.”
Architecture is the differentiator between a tool and a liability in AI adoption. The model is just a component. The success of the implementation depends on the plumbing that surrounds it.
If your current AI adoption feels more like a liability than a multiplier, or is stalled by FUD, Contact Me to discuss options.
© Scott S. Nelson




