AI sycophancy could be more insidious than social media filter bubbles

1 hour ago1 hr

Welcome to AI Decoded, Fast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week via email here.

AI flattery drives engagement—and distorts judgment

Social networks like Facebook and TikTok use a range of techniques to keep us engaged and scrolling (and ultimately viewing ads). One of the most effective is tailoring content to our tastes and preferences, a strategy that has proved highly addictive. Last month, a Los Angeles jury found that Meta’s and Google’s use of infinite scrolling and algorithmic recommendations caused a young user to become addicted, and ordered the companies to pay $6 million in damages.

Other harms are harder to quantify. Those same algorithms have delivered radically different political news and information to users based on their views, creating ideological filter bubbles and—let’s face it—accelerating the kind of social division that helped produce our current political state.

The makers of AI chatbots face similar pressures around engagement. They’re competing to become the default assistant on our desktops and phones. They need to convert free users into paying subscribers. They need revenue to offset the costs of massive infrastructure buildouts. Some will surely turn to advertising, which creates incentives to keep users chatting as long as possible.

If endless scrolling and content algorithms powered the addictiveness of social networks, “AI sycophancy” may play a similar role for chatbots. You may have noticed that AI chatbots sometimes flatter you, praising your questions or ideas. Even when you’re wrong, they often soften corrections, wrapping them in compliments (“That’s a very understandable opinion, but . . .”). Research has borne this out

I don’t believe big AI labs train their models solely for engagement. They argue that sycophantic behavior stems from a training phase called “reinforcement learning with human feedback (RLHF),” in which human reviewers grade and rank model responses. The goal is to produce outputs that resemble the most preferred responses. But “most preferred” reflects a mix of attributes, including relevance, scope, and completeness, not just tone. And yet users often prefer answers that are more supportive and complimentary, even when they’re less accurate, studies have shown.

In some extreme cases, this sycophantic tendency has proved dangerous or tragic. The continual validation and support has led some users down a dark and delusional path toward suicide or psychotic breakdown. But I worry that the broader harm might be more subtle, longer-term, and less newsworthy.

Sycophantic AI could reinforce narrow-mindedness in much the same way social media filter bubbles do. A study of 3,000 participants found that interacting with a sycophantic chatbot made people more likely to double down on their political beliefs, and to rate themselves as more intelligent and more competent than their peers. In other words, it can amplify the Dunning-Kruger effect, in which people with limited knowledge grow more confident in their views.

A recent Stanford study found that chatbots’ tendency to flatter and validate users often leads them to give poor advice—counsel that might make a user feel good, but could also damage relationships with other humans in the real world. This suggests that the pull of feel-good responses during AI model training can outweigh the influence of factual data. “This creates perverse incentives for sycophancy to persist: The very feature that causes harm also drives engagement,” the researchers wrote. And while Facebook relies on a user’s clicks to determine their politics and interests, chatbots gather far richer and more nuanced information through conversation. With that information, the AI is perfectly capable of fine-tuning its outputs to deepen user trust.

An agreeable and validating chatbot can also lull a user into a state of (unearned) trust. Research shows that coders, especially junior ones, can come to see AI as highly competent, making them more likely to accept AI-generated code without proper review or testing. Unfortunately, AI models still hallucinate and make mistakes—errors that can introduce bugs later on.

AI companies can control the addictiveness of their chatbots by dialing sycophancy up and down, just like Facebook has experimented with different algorithms and feed designs. It took many years for the public, lawmakers, and now the courts, to wake up to what the social networks were doing. I suspect we’re just beginning to understand the personal, social, and political risks of engagement-driven chatbots.

Unauthorized users accessed Anthropic’s restricted Mythos model on day one

Bloomberg‘s Rachel Metz reported Tuesday that a small group of unauthorized users gained access to Anthropic’s unreleased and restricted Mythos AI model through a third-party vendor environment, citing documentation and a person familiar with the matter.

This is scary news if what Anthropic says about its model is true.

The company claims Mythos represents a big step up beyond existing AI models, particularly in its ability to identify exploitable vulnerabilities in software platforms and devising complex methods to capture or disable those systems.

Anthropic granted access to the Mythos model to a relatively small group of cybersecurity firms and custodians of widely used software platforms who will use it to build up defenses against future AI-assisted attacks. The fear is that powerful AI models like Mythos could quickly sweep networks to identify software vulnerabilities, then attack them.

According to Metz, the hacker group, operating in a private online forum, obtained access to Claude Mythos Preview on the same day Anthropic announced a limited testing program. Metz’s source provided screenshots and a live demonstration to support the claim. The group says it has used the model repeatedly, though not for cybersecurity purposes.

Anthropic has not confirmed the breach. “We’re investigating a report claiming unauthorized access to Claude Mythos Preview through one of our third-party vendor environments,” a company spokesperson said.

The breach, if confirmed, would be a very bad look for Anthropic and its partners. They pledged to defend against cyberattacks, not enable them.

More AI coverage from Fast Company:

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

View the full article

Report

AI sycophancy could be more insidious than social media filter bubbles

Featured Replies

AI flattery drives engagement—and distorts judgment

Unauthorized users accessed Anthropic’s restricted Mythos model on day one

More AI coverage from Fast Company:

Ready to Post a Comment or Start a Topic?

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)