A New Challenge for AI: Humanity’s Last Exam
If you’re not familiar with benchmarks, they’re how we measure the capabilities of particular AI models like o1 or Claude Sonnet 3.5. Each one is a standardised test designed to check a specific skill set.