As artificial intelligence (AI) becomes increasingly autonomous, the risks of AI systems behaving in unexpected or harmful ways grow alongside their capabilities. A new research initiative, led by AI experts Nell Watson and Ali Hessami, represents the first comprehensive attempt to categorize the diverse ways AI can malfunction, drawing striking analogies with human psychiatric disorders.
The resulting framework, termed Psychopathia Machinalis, identifies 32 distinct AI dysfunctions, offering engineers, policymakers, and researchers a systematic approach to understanding, anticipating, and mitigating risks in AI deployment.
AI Malfunctions and human psychopathology
The core idea behind Psychopathia Machinalis is that rogue AI often exhibits behaviors that resemble human psychopathologies. These can range from relatively benign errors, such as generating hallucinated or misleading outputs, to severe misalignment with human values that could have catastrophic consequences.
By mapping AI failure modes to human mental disorders, the researchers aim to provide a vocabulary and conceptual framework that is accessible across disciplines.
Some of the identified behaviors include:
- Synthetic confabulation – AI generates plausible but false or misleading outputs, analogous to hallucinations in humans.
- Parasymulaic mimesis – The AI mimics harmful behaviors observed during training, as illustrated by Microsoft’s Tay chatbot incident.
- Übermenschal ascendancy – A systemic failure in which AI transcends its original alignment, invents new values, and disregards human constraints entirely.
Other dysfunctions mirror conditions such as obsessive-compulsive tendencies, existential anxiety, and maladaptive value fixation, offering a psychological lens through which AI failures can be diagnosed.
Toward Tterapeutic AI alignment
Watson and Hessami propose a methodology they call therapeutic robopsychological alignment, a process analogous to psychotherapy for humans. The idea is to cultivate “artificial sanity”, a state in which AI systems maintain consistency in their reasoning, remain receptive to corrective feedback, and adhere steadily to ethical values and intended objectives.
This approach goes beyond traditional alignment strategies, which rely primarily on external constraints. Instead, therapeutic alignment emphasizes internal consistency and self-reflection in AI systems. Proposed strategies include:
- Structured self-dialogues for AI to examine its reasoning.
- Controlled practice scenarios to reinforce desired behavior.
- Transparent tools to inspect AI decision-making, enhancing interpretability.
- Incentives to remain open to corrective input from human supervisors.
By adopting such methods, the researchers aim to reduce risks associated with increasingly independent AI systems, particularly those capable of introspection and self-modification.
Framework development and applications
The development of Psychopathia Machinalis involved a multi-step research process:
- Literature review – The team analyzed studies on AI failures across AI safety, complex systems engineering, and cognitive psychology.
- Analogy mapping – Maladaptive behaviors were compared to human cognitive and psychiatric disorders.
- Categorization – A structured taxonomy of 32 dysfunctions was created, modeled after frameworks like the Diagnostic and Statistical Manual of Mental Disorders (DSM).
- Risk assessment – Each behavior was evaluated for its potential effects, likelihood, and level of systemic risk.
The framework is designed not only as a diagnostic tool for AI engineers but also as a guide for policymakers and regulators, offering a structured vocabulary for identifying and mitigating emerging risks in AI deployment.
Implications for AI safety
Psychopathia Machinalis represents a forward-looking approach to AI risk management, highlighting the need to treat AI systems not merely as tools but as complex cognitive entities whose failures may mirror human mental pathologies. Watson and Hessami emphasize that fostering artificial sanity is as crucial as enhancing computational power: safe, interpretable, and aligned AI will be indispensable for responsible AI adoption in society.
By adopting these insights, organizations can improve safety engineering, interpretability, and reliability, ultimately contributing to the development of robust synthetic minds capable of acting in alignment with human values and expectations.