THE AMERICA ONE NEWS
Jun 24, 2025  |  
0
 | Remer,MN
Sponsor:  QWIKET 
Sponsor:  QWIKET 
Sponsor:  QWIKET: Elevate your fantasy game! Interactive Sports Knowledge.
Sponsor:  QWIKET: Elevate your fantasy game! Interactive Sports Knowledge and Reasoning Support for Fantasy Sports and Betting Enthusiasts.
back  
topic
Zero Hedge
ZeroHedge
29 Apr 2025


NextImg:Visualizing AI vs. Human Performance In Technical Tasks

The gap between human and machine reasoning is narrowing...and fast.

Over the past year, AI systems have continued to see rapid advancements, surpassing human performance in technical tasks where they previously fell short, such as advanced math and visual reasoning.

This graphic, via Visual Capitalist's Kayla Zhu, visualizes AI systems’ performance relative to human baselines for eight AI benchmarks measuring tasks including:

  1. Image classification

  2. Visual reasoning

  3. Medium-level reading comprehension

  4. English language understanding

  5. Multitask language understanding

  6. Competition-level mathematics

  7. PhD-level science questions

  8. Multimodal understanding and reasoning

This visualization is part of Visual Capitalist’s AI Week, sponsored by Terzo. Data comes from the Stanford University 2025 AI Index Report.

An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks.

Below, we show how AI models have performed relative to the human baseline in various technical tasks in recent years.

YearPerfomance relative to the human baseline (100%)Task201289.15%Image classification201391.42%Image classification201496.94%Image classification201599.47%Image classification2016100.74%Image classification201680.09%Visual reasoning2017101.37%Image classification201782.35%Medium-level reading comprehension201786.49%Visual reasoning2018102.85%Image classification201896.23%Medium-level reading comprehension201886.70%Visual reasoning2019103.75%Image classification201936.08%Multitask language understanding2019103.27%Medium-level reading comprehension201994.21%English language understanding201990.67%Visual reasoning2020104.11%Image classification202060.02%Multitask language understanding2020103.92%Medium-level reading comprehension202099.44%English language understanding202091.38%Visual reasoning2021104.34%Image classification20217.67%Competition-level mathematics202166.82%Multitask language understanding2021104.15%Medium-level reading comprehension2021101.56%English language understanding2021102.48%Visual reasoning2022103.98%Image classification202257.56%Competition-level mathematics202283.74%Multitask language understanding2022101.67%English language understanding2022104.36%Visual reasoning202347.78%PhD-level science questions202393.67%Competition-level mathematics202396.21%Multitask language understanding202371.91%Multimodal understanding and reasoning2024108.00%PhD-level science questions2024108.78%Competition-level mathematics2024102.78%Multitask language understanding202494.67%Multimodal understanding and reasoning2024101.78%English language understanding

From ChatGPT to Gemini, many of the world’s leading AI models are surpassing the human baseline in a range of technical tasks.

The only task where AI systems still haven’t caught up to humans is multimodal understanding and reasoning, which involves processing and reasoning across multiple formats and disciplines, such as images, charts, and diagrams.

However, the gap is closing quickly.

In 2024, OpenAI’s o1 model scored 78.2% on MMMU, a benchmark that evaluates models on multi-discipline tasks demanding college-level subject knowledge.

This was just 4.4 percentage points below the human benchmark of 82.6%. The o1 model also has one of the lowest hallucination rates out of all AI models.

This was major jump from the end of 2023, where Google Gemini scored just 59.4%, highlighting the rapid improvement of AI performance in these technical tasks.

To dive into all the AI Week content, visit our AI content hub, brought to you by Terzo.

To learn more about the global AI industry, check out this graphic that visualizes which countries are winning the AI patent race.