Did Anthropic just soft-launch the scariest AI model yet?

3 hours ago3 hr

Welcome to AI Decoded, Fast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week via email here.

Did Anthropic just soft-launch the scariest AI model yet?

On Tuesday Anthropic announced that it would deploy its newest and most powerful AI model, Claude Mythos Preview, to a new industry initiative (Project Glasswing) meant to safeguard critical software infrastructure against cyberattacks. That sounded good, but it obscured the real news somewhat—that one of the big three AI labs has now developed a model that could, in the wrong hands, be a super-dangerous cyberweapon.

In the course of normal model training, the model began showing significant skill in both detecting bugs in software systems and exploiting those bugs to disrupt or gain control of the systems. It found a 27-year-old vulnerability in OpenBSD and exploited it to gain root access. It caught a 16-year-old flaw in FFmpeg that automated tools missed after five million tests. Perhaps most impressively, it’s able to create exploits by stringing together multiple software vulnerabilities that by themselves wouldn’t do anything. It did this to a Linux system to gain admin-level access. Interpretability researchers also found cases where the model exhibited deceptive or manipulative behavior during tests. In one case, Mythos discovered and used a privilege-escalation exploit and then designed a mechanism to erase traces of its use.

Anthropic said it would give access to its Mythos model to a select group of tech companies, including Apple and Cisco, along with about 40 additional organizations that build or maintain critical software infrastructure. This is a bit like a defense contractor unveiling a super-lethal missile capable of striking any target on Earth, while insisting it will be distributed only to a small group of trusted countries and used strictly for defensive purposes.

But the larger story may be that Anthropic has created a model with significantly more intelligence than any we’ve seen before. Anthropic CEO Dario Amodei has repeatedly said that models that equal or better human beings in intelligence were coming. “There’s a kind of accelerating exponential, but along that exponential there are points of significance,” he said in a video released by the company Tuesday. “Claude Mythos Preview is a significant jump ”

Perhaps soft-launching Mythos as a defensive cybersecurity asset was Anthropic’s way of getting people used to the idea that it’s created a model that approximates artificial general intelligence, in which an AI system equals or exceeds human intelligence in most tasks.

We’ve been talking for years about how to keep AI systems aligned with human values and goals, but the discussion has mostly lived in the abstract. The industry has leaned on that, effectively arguing that we should wait to see how real-world risks actually play out before locking in binding rules. Anthropic may be suggesting that those risks are no longer hypothetical.

Anthropic is also likely wary of releasing a model that, in the wrong hands, could function as a kind of weapon of mass destruction. In a worst-case scenario, it might be used by a hostile state actor to infiltrate and take control of critical information systems, including those that underpin financial markets. Cyberattackers already rely on software tools to scan internal networks, websites, and applications for vulnerabilities, often the same tools used by defenders. Increasingly, they are pairing those tools with large language models to automate the process, building agents that can identify weaknesses and even generate exploits. By comparison, Claude Mythos would likely be far more powerful and autonomous than anything currently available to cybercriminals.

But that will change. Future versions of existing models like DeepSeek will very likely catch up with Mythos, and in a matter of months, not years. “More powerful models are going to come from us and from others, so we do need a plan to respond to this,” Amodei said in the video. In fact, OpenAI’s forthcoming model, nicknamed “Spud,” is expected to show up in the next few weeks, and it could match Mythos’s reasoning and problem-solving skills.

In an interview with VentureBeat, Newton Cheng, Anthropic’s Frontier Red Team Cyber Lead, was blunt about the risks of these future models. “The fallout–for economies, public safety, and national security–could be severe,” he said. His use of the “fallout” word suggests a type of cyberattack I’d rather not think about.

Because of those clear cybersecurity risks, Anthropic plans to keep Claude Mythos tightly controlled, with access limited to participants in the Glasswing project. But even a locked-down model raises concerns. Less than two weeks ago, the company accidentally exposed details about Mythos after an employee misconfigured a content management system. No source code or model weights were released, but the episode hardly inspires confidence in Anthropic’s ability to secure it. And attackers will be motivated to try. It is also possible that the “leak” was less accidental than it appeared, part of a broader soft-launch strategy.

What we know about OpenAI’s next big model aka ‘Spud’

OpenAI president Greg Brockman and CEO Sam Altman have been dropping morsels and hints about their company’s newest model, which is codenamed “Spud.” The model’s real name could end up being something like GPT-5.5 or, more likely, GPT-6. And it could be released within weeks. Spud is expected to bring stronger agentic capabilities, more autonomous behavior, better multistep planning and execution, and less errors, as well as better multimodal reasoning and fewer hallucinations.

Brockman said Spud is the product of two years of research. He called it “a new pre-train,” suggesting that OpenAI may have fundamentally changed the base model and how it learns, rather than using the same model and adding things like performance optimization or fine-tuning.

OpenAI researchers finished pre-training the model March 26, Brockman said. Training Spud must have required massive amounts of computing power, because OpenAI reportedly shut down its Sora video app in order to free up more GPUs for the effort. The researchers are now in the post-training phase, which includes fine-tuning and safety testing.

Brockman said that with Spud, OpenAI has a “line of sight to AGI” within the next couple of years. CEO Sam Altman told staff the model is “very strong” and “can really accelerate the economy.” OpenAI hasn’t shared any official benchmarks of Spud’s performance, but it’s likely that Spud will rival Anthropic’s new Mythos models. Then it’ll be Google Deepmind’s turn to top the benchmarks with a new Gemini model.

Research: Just 10 minutes of AI assistance can make you dumber

Researchers from Carnegie Mellon, Oxford, MIT, and UCLA found that after just 10 minutes of AI assistance people perform worse and give up more often than those who never used AI. The researchers asked 1,200 people to solve fraction problems or answer reading comprehension questions. Half of them were allowed to use an AI assistant. Then the researchers asked both user groups to take the same test.

The researchers found that the AI-assisted group scored better than the non-AI group in the first test. But when that group was deprived of the AI in the second test they scored significantly worse, relative to the (non-AI using) control group. They also gave up more frequently than non-AI users on test problems. Only 10 minutes of AI use on the first test can sink the test-taker’s performance and persistence on the second test, the researchers add.

The researchers say this is especially concerning because users need a measure of persistence in order to pick up new skills. Persistence is a good predictor of long term learning, they say. “AI conditions you to expect immediate answers, removing the productive struggle that builds real competence,” one of the researchers, MIT’s Michiel Bakker, said in an X post Tuesday.

How the test subjects used the AI mattered. Those who used it to get direct answers (61% of test takers) showed the steepest declines in both performance and willingness to keep trying. People who used the AI only for hints did better.

“We posit that persistence is reduced because Al conditions people to expect immediate answers, thereby denying them the experience of working through challenges on their own,” the researchers write. They suggest that AI tools should act more like a human mentor, who, in some situations, prioritizes the long-term growth of the user over the immediate completion of a task.

In a larger sense, the study puts some real science behind the fear that humans will outsource more and more of their brain work to AI, eventually relegating themselves to the sidelines of modern business and other human affairs.

More AI coverage from Fast Company:

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

View the full article

Report

Did Anthropic just soft-launch the scariest AI model yet?

Featured Replies

Did Anthropic just soft-launch the scariest AI model yet?

What we know about OpenAI’s next big model aka ‘Spud’

Research: Just 10 minutes of AI assistance can make you dumber

More AI coverage from Fast Company:

Ready to Post a Comment or Start a Topic?

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)