discuss
Evaluations for AI Sabotage Risks
Anthropic's new evaluations test AI models for potential sabotage risks, ensuring safety as capabilities increase.
sabotage evaluations
ai models
safety evaluations
anthropic alignment science
dangerous capabilities