“`html
- A new AI benchmark called SOOHAK has been created by a consortium of mathematicians, featuring 439 handwritten tasks.
- Among these are 99 deliberately unsolvable problems. Despite this, no model achieved more than half the correct answers in identifying such problematic tasks.
The findings highlight significant gaps in AI’s ability to verify its own solutions against the reality of problem solvability. This research underscores the need for broader and deeper improvements in AI systems’ robustness and reliability.
“`
Originally published at the-decoder.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




