discuss
Evaluations for AI Sabotage Risks
Anthropic's new evaluations test AI models for potential sabotage risks, ensuring safety as capabilities increase.
ai models
sabotage evaluations
safety evaluations
anthropic alignment science
dangerous capabilities