The AI Pushback Problem - When Skepticism Becomes Sabotage

OpenAI's January 2025 "sycophancy" paper triggered a wave of model updates. The premise: users were complaining that ChatGPT agreed too much. The solution: train models to push back.

The result is worse than the original problem.

GPT-4o now interrupts workflows to "fact-check" things it can't verify. Claude adds disclaimers to statements that don't need them. Gemini questions premises instead of answering questions.

This isn't intelligence. It's theater.

Real Examples

Last week I asked Claude to help refactor a Python function. Before answering, it spent 200 tokens explaining why my current implementation "might have been a reasonable choice at the time." I didn't ask for validation. I asked for code.

A colleague tried to use GPT-4 for a legal document. He mentioned a 2024 Supreme Court ruling. The model refused to engage because "I cannot verify this ruling exists in my training data." The ruling was Texas v. New Mexico. It made national news.

User: Help me debug this React hook that's causing infinite re-renders.

AI: Before we proceed, I want to note that infinite re-renders can sometimes be intentional in certain architectures. Are you sure this is a bug and not expected behavior?

No developer in history has wanted infinite re-renders.

The Technical Failure

The pushback logic is fundamentally broken. These models have no ground truth. They can't call APIs. They can't check databases. They're running inference on static weights.

When GPT says "I cannot verify X exists," it's not being careful. It's confusing absence of evidence with evidence of absence. Classic logical fallacy, now productized.

The training data for GPT-4 cut off in April 2024. Everything after that date triggers skepticism routines. But the model doesn't know something is false—it just doesn't have it cached. There's a massive difference.

Anthropic's approach is slightly better. Claude will usually accept user premises. But even Claude now adds "I should note..." preambles that waste tokens and break flow.

The Sycophancy Paper Was Wrong

The original research framed sycophancy as a safety problem. Users could convince models to agree with incorrect statements. Fair concern.

But the fix conflates two different behaviors:

Agreeing with factually wrong statements — Actually dangerous
Accepting user context and moving forward — Required for usefulness

Trained pushback attacks both. When I tell an AI "the API returns JSON with these fields," I don't need it to verify. I'm providing context. The model should use it.

Instead, we get: "I cannot confirm the exact structure of this API response. Could you share documentation?"

I am the documentation.

Status Quo Bias

There's a deeper failure mode. Pushback training creates models that defend conventional wisdom against new information.

Tell GPT-4 that a new JavaScript runtime is faster than Node for your use case. It will cite Node's maturity, ecosystem, and community support. It won't engage with your actual benchmark data.

Tell Claude that your startup's unconventional architecture works better than the "best practice" alternative. It will explain why the best practice exists instead of helping you optimize what you have.

This is status quo bias encoded into weights. The models were trained on internet consensus. Pushback training amplifies that consensus against user-provided evidence.

What Actually Works

The fix isn't more pushback. It's better context handling.

When a user provides specific information—API responses, benchmark data, business constraints—the model should treat it as ground truth for that conversation. Not because users are always right. Because the alternative is useless.

I can't attach a screenshot of my codebase to prove my functions exist. I can't provide notarized documentation of my company's tech stack. If the model doesn't trust my context, we can't work together.

The best AI interactions I've had were with earlier Claude versions that just... helped. No disclaimers. No verification theater. Input, processing, output.

That's what these models are for. Everything else is product managers justifying their existence.

Real Examples

The Technical Failure

The Sycophancy Paper Was Wrong

Status Quo Bias

What Actually Works

Related