All discussions filtered by tag "dangerous capabilities"

Evaluations for AI Sabotage Risks

Anthropic's new evaluations test AI models for potential sabotage risks, ensuring safety as capabilities increase.