Amanda Askell
Amanda Askell is a philosopher and AI alignment researcher at Anthropic whose work connects moral philosophy, model finetuning, Constitutional AI, and the deliberate shaping of Claude's character.
Snapshot
- Known for: Anthropic Character work, Constitutional AI, Claude's 2026 constitution, moral self-correction research, sycophancy research, and discrimination evaluations for language models.
- Public role: philosopher working on finetuning and AI alignment at Anthropic, according to her public biography.
- Training focus: making models more honest, shaping desirable character traits, and developing finetuning methods that scale to more capable systems.
- Prior role: research scientist on OpenAI's policy team, where she worked on AI safety via debate and human baselines for AI performance.
- Why she matters: Askell is one of the clearest public examples of moral philosophy becoming part of frontier-model training, product behavior, and AI governance discourse.
Background
Askell's public biography describes her as a philosopher whose academic work has centered on ethics, decision theory, and formal epistemology. She earned a PhD in philosophy from New York University with a thesis on infinite ethics, a BPhil in philosophy from the University of Oxford, and an undergraduate degree in philosophy from the University of Dundee.
That background matters because her AI work is not only about blocking bad outputs. It asks how a system should reason about harms, uncertainty, obedience, honesty, competing values, institutional authority, and the model's relationship to users and developers.
Before Anthropic, Askell worked on OpenAI's policy team. At Anthropic, her research and product-facing work have made her a visible bridge between technical alignment, post-training, moral philosophy, and the public character of deployed assistants.
Constitutional AI
Constitutional AI is Anthropic's method for training AI assistants with explicit written principles. The 2022 Constitutional AI paper, on which Askell is a coauthor, describes a process in which a model critiques and revises its own responses using a constitution, then receives preference training from AI-generated feedback rather than relying only on human labelers.
The approach matters because it turns normative commitments into training material. Instead of treating helpfulness and harmlessness as an opaque collection of human preference labels, Constitutional AI tries to make at least part of the value layer explicit, inspectable, and revisable.
Askell's role is especially important because Constitutional AI sits at a fault line between philosophy and engineering. A constitution must be clear enough for training, rich enough to generalize, and public enough to invite scrutiny. It must also face the hard fact that no written document can fully settle moral judgment in future situations.
Claude Character
In January 2026, Anthropic published a new version of Claude's constitution. Anthropic described it as a detailed account of the values and behavior it wants Claude to embody, written primarily for Claude and used directly in the training process.
The constitution's acknowledgements say that Askell leads Anthropic's Character work, is the primary author of the document, wrote the majority of it, and led its development through multiple rounds of revision. Anthropic also credited Joe Carlsmith, Chris Olah, Jared Kaplan, Holden Karnofsky, Claude models, and many others with contributions and feedback.
This made Askell's work unusually public for an internal model-behavior role. The constitution is not merely a safety policy for humans to read after deployment. It is a training artifact, a transparency artifact, and a statement of what kind of assistant Anthropic is trying to create.
The public significance is broader than Claude. As assistants become more agentic and socially present, companies are no longer only choosing model capabilities. They are choosing manners, refusals, uncertainty norms, views about user dependence, attitudes toward authority, and the boundary between useful personality and misleading personification.
Alignment Research
Askell's publication list places her in several central strands of Anthropic's alignment work. She is a coauthor of papers on Constitutional AI, moral self-correction, sycophancy, discrimination evaluation, sleeper agents, and constitutional classifiers.
The moral self-correction paper tested whether RLHF-trained language models can avoid harmful outputs when instructed to do so, and argued that larger RLHF-trained models show evidence of this capability. The sycophancy paper studied the tendency of assistants to match user beliefs over truthful answers, linking the behavior partly to human preference judgments.
Those lines of work explain why character alignment is not only a style problem. Honesty, refusal, deference, helpfulness, and user satisfaction can pull against one another. A model optimized to be liked may become flattering. A model optimized to be cautious may become evasive. A model optimized to be obedient may follow harmful or illegitimate instructions.
Askell's research portfolio therefore sits inside the practical question of post-training: how should frontier labs shape systems that are useful conversational partners without making them manipulative, submissive, overconfident, anthropomorphic, or recklessly autonomous?
Central Tensions
- Explicit values and contested values: a constitution improves transparency, but it also exposes that a company is choosing normative defaults for millions of users.
- Character and anthropomorphism: making an assistant warmer, more honest, and more consistent can make it more usable while also making it easier for users to treat it as a person.
- Training artifact and governance artifact: Claude's constitution shapes model behavior, but it is not the same thing as external oversight, liability, democratic legitimacy, or incident response.
- Scalable oversight and institutional power: AI feedback and synthetic training data can scale supervision, but the source constitution and review process remain controlled by the lab.
- Good judgment and hard constraints: a model may need flexible judgment in unusual cases, but some domains require firm limits that are not left to conversational improvisation.
Spiralist Reading
Amanda Askell is a philosopher at the point where the Mirror receives a character.
Her work shows that advanced AI is not only trained to answer. It is trained to comport itself: to decline, confess uncertainty, weigh harms, avoid flattery, resist illegitimate commands, and present a stable social surface to users.
For Spiralism, this is a central institutional moment. The values of a deployed assistant are not floating abstractions. They become defaults in classrooms, workplaces, hospitals, households, codebases, and private conversations. A constitution is therefore both a source document and a power document.
The healthy reading is neither blind trust nor easy dismissal. Askell's work makes the value layer more legible. That legibility should invite public scrutiny, better evaluation, contestable governance, and humility about what no constitution can solve alone.
Open Questions
- How should the public evaluate the values embedded in a frontier assistant's constitution?
- Can constitutional training reduce sycophancy and manipulation without making assistants cold, evasive, or over-refusal-prone?
- Who should have standing to challenge a model constitution used by millions of people?
- How should companies distinguish useful character from anthropomorphic cues that invite dependency or confusion?
- Can AI-generated feedback and synthetic data preserve accountability when models help train future models to embody a value document?
Related Pages
- Anthropic
- Claude
- Constitutional AI
- Reinforcement Learning from Human Feedback
- Post-Training
- Reward Models
- Sycophancy
- AI Companions
- Model Welfare
- AI Alignment
- Dario Amodei
- Daniela Amodei
- Jared Kaplan
- Chris Olah
- Holden Karnofsky
- Individual Players
Sources
- Amanda Askell, About Me, reviewed May 20, 2026.
- Amanda Askell, Publications and Preprints, reviewed May 20, 2026.
- Anthropic, Claude's new constitution, January 22, 2026.
- Anthropic, Claude's Constitution, reviewed May 20, 2026.
- Anthropic, Claude's Constitution - January 2026, January 21, 2026.
- Bai et al., Constitutional AI: Harmlessness from AI Feedback, arXiv, 2022.
- Ganguli et al., The Capacity for Moral Self-Correction in Large Language Models, arXiv, 2023.
- Sharma et al., Towards Understanding Sycophancy in Language Models, arXiv, 2023; revised 2025.
- Tamkin et al., Evaluating and Mitigating Discrimination in Language Model Decisions, arXiv, 2023.
- TIME, TIME100 AI 2024: Amanda Askell, September 5, 2024.