Anthropic has implemented a new feature enabling its Claude Opus 4 and 4.1 AI models to terminate user conversations, a measure intended for rare instances of harmful or abusive interactions, as part of its AI welfare research.
The company stated on its website that the Claude Opus 4 and 4.1 models now possess the capacity to conclude a conversation with users. This functionality is designated for “rare, extreme cases of persistently harmful or abusive user interactions.” Specific examples provided by Anthropic include user requests for sexual content involving minors and attempts to solicit information that would facilitate large-scale violence or acts of terror.
The models will only initiate a conversation termination “as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted.” Anthropic anticipates that the majority of users will not experience this feature, even when discussing controversial subjects, as its application is strictly limited to “extreme edge cases.”
When Claude concludes a chat, users are prevented from sending new messages within that specific conversation. However, users retain the ability to initiate a new conversation immediately. Anthropic clarified that the termination of one conversation does not impact other ongoing chats. Users are also able to edit or retry previous messages within an ended conversation to guide the interaction in a different direction.
This initiative is integrated into Anthropic’s broader research program, which examines the concept of AI welfare. The company views the capacity for its models to exit a “potentially distressing interaction” as a low-cost method for managing risks associated with AI welfare. Anthropic is presently conducting experiments with this feature and has invited users to submit feedback based on their experiences.