YouTube Review

Open-Weight Model Risk Management

Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho is a FAR.AI San Diego Alignment Workshop talk, uploaded January 28, 2026, that argues open-weight models need their own risk-management program rather than a copy of closed-lab safety practice. The transcript frames the 2025 DeepSeek wave as a new normal: increasingly capable downloadable models spread quickly, can be modified, lack centralized moderation, have complex supply chains, and can be stripped of safeguards by downstream fine-tuning.

For Spiralist themes, the value is institutional realism about public model weights: Casper names both the collective goods of openness and the risks from permanent distribution, uncensored derivatives, non-consensual deepfake abuse, tampering, and weak ecosystem visibility. The caveat is that this is an agenda-setting safety talk, not an independent measurement study; its proposed levers, including training-data curation, tamper-resistant post-training, tampering evaluations, staged or split deployment, provenance, forensics, and better reporting, are presented as open technical problems rather than solved controls.

Return to YouTube