Apple Unveils Critique of AI Reasoning with New Benchmark