Brief notes on the multi-modal HLE AI benchmark built to test frontier models after traditional AI leaderboards hit saturation.
Humanity's Last Exam Benchmark
Brief notes on the multi-modal HLE AI benchmark built to test frontier models after traditional AI leaderboards hit saturation.