New math benchmark reveals AI models confidently solve problems that have no solution

“`html

A new AI benchmark called SOOHAK has been created by a consortium of mathematicians, featuring 439 handwritten tasks.
Among these are 99 deliberately unsolvable problems. Despite this, no model achieved more than half the correct answers in identifying such problematic tasks.

The findings highlight significant gaps in AI’s ability to verify its own solutions against the reality of problem solvability. This research underscores the need for broader and deeper improvements in AI systems’ robustness and reliability.
“`

Source Read original →