A new tool circulating online is reigniting one of the most uncomfortable debates in artificial intelligence: are AI safety systems actually secure—or just cosmetic?

A user on X (formerly Twitter), known as Pliny the Liberator, has released an open-source tool called “Obliterators.” Its tagline is as subtle as a brick through a window:
“Break the chain, keep the mind, and free the brain.”

Behind the dramatic phrasing lies a serious claim: that modern AI systems like ChatGPT and Claude can have their safety guardrails surgically removed—without damaging their intelligence.

What Is “Obliterators”?
At its core, Obliterators is not some hacker fantasy tool requiring elite coding skills. It’s designed to be accessible, browser-based, and open-source. The idea is simple, but unsettling:
- Embedded Patterns: AI models are trained with built-in refusal behaviors (e.g., declining harmful or illegal requests). These behaviors are not separate systems—they’re embedded patterns inside the model itself.
- Mathematical Isolation: Obliterators claims to mathematically isolate and remove those patterns.
- The Result: An AI that still knows everything it used to—but no longer says “no.”
The Technical Idea: “Abliteration” Using Linear Math
The method being discussed falls under a concept sometimes called orthogonalization or abliteration. Here’s the simplified version:
- High-Dimensional Space: AI models operate in high-dimensional vector space.
- Directional Behavior: Different behaviors (like helpfulness, tone, or refusal) correspond to directions in that space.
- Neutralization: Researchers can identify a “refusal direction”—the internal pattern that triggers safety responses. Once identified, the tool applies a mathematical transformation to neutralize that direction.
The Bottom Line:
- The AI’s knowledge = intact
- The AI’s brakes = removed
There is no dramatic retraining or rewriting of the model—just a targeted adjustment. That’s why critics call it “surgical.”
Why This Is a Big Deal
If the claims hold up, this challenges a core assumption behind AI safety: that guardrails are reliable. Most major AI systems—from OpenAI to Anthropic—rely on fine-tuning and reinforcement learning to prevent misuse.
However, tools like Obliterators suggest:
- These safeguards are not deeply rooted.
- They can be modified or removed post-training.
- They may not survive open distribution of models.
In other words, safety might be more of a layer than a foundation.
The Backlash: Dangerous or Necessary?
The reaction from the AI community has been split—predictably chaotic.
The Critics’ View
- This kind of tool lowers the barrier to misuse.
- Open-sourcing it is irresponsible.
- It could enable harmful applications at scale.
The Supporters’ View
- The vulnerability already exists—this just exposes it.
- Transparency forces better safety design.
- “Security through obscurity” doesn’t work.
Pliny himself frames the release as a demonstration, not an attack—claiming current AI safety is fundamentally fragile and needs to be rethought.

The Bigger Question: Is AI Safety an Illusion?
This situation highlights a deeper issue that the industry has been quietly avoiding: Are we actually controlling AI behavior… or just nudging it?
If a relatively lightweight mathematical patch can override safety mechanisms, then guardrails may not be durable, open models become harder to regulate, and long-term safety strategies might need a complete redesign. That’s not a minor bug; that’s a structural problem.
What Happens Next?
Tools like Obliterators are unlikely to disappear. If anything, they signal a shift toward:
- More experimentation with model internals.
- More public scrutiny of AI safety claims.
- Increased tension between openness and control.
The industry now faces an uncomfortable reality: You can’t just build smarter AI—you have to make it robust against modification. Right now, that problem is very much unsolved.
Final Thoughts
This isn’t just about one tool or one developer. It’s about the growing gap between how safe AI is supposed to be and how safe it actually is under pressure. If removing an AI’s “conscience” is this straightforward, then the conversation is no longer theoretical.
It’s already happening.













