Loud

Dark

Oct 11, 2025 |

| Remer,MN

Sign-in

QWIKET.COM

Sponsor: QWIKET

Sponsor: QWIKET: Sports Knowledge

Sponsor: QWIKET: Elevate your fantasy game! Interactive Sports Knowledge.

Sponsor: QWIKET: Elevate your fantasy game! Interactive Sports Knowledge and Reasoning Support for Fantasy Sports and Betting Enthusiasts.

America One News

back

topic

ZeroHedge

29 Apr 2025

Visualizing AI vs. Human Performance In Technical Tasks

/technology/visualizing-ai-vs-human-performance-technical-tasks

The gap between human and machine reasoning is narrowing...and fast.

Over the past year, AI systems have continued to see rapid advancements, surpassing human performance in technical tasks where they previously fell short, such as advanced math and visual reasoning.

This graphic, via Visual Capitalist's Kayla Zhu, visualizes AI systems’ performance relative to human baselines for eight AI benchmarks measuring tasks including:

Image classification
Visual reasoning
Medium-level reading comprehension
English language understanding
Multitask language understanding
Competition-level mathematics
PhD-level science questions
Multimodal understanding and reasoning

This visualization is part of Visual Capitalist’s AI Week, sponsored by Terzo. Data comes from the Stanford University 2025 AI Index Report.

An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks.

Below, we show how AI models have performed relative to the human baseline in various technical tasks in recent years.

YearPerfomance relative to the human baseline (100%)Task201289.15%Image classification201391.42%Image classification201496.94%Image classification201599.47%Image classification2016100.74%Image classification201680.09%Visual reasoning2017101.37%Image classification201782.35%Medium-level reading comprehension201786.49%Visual reasoning2018102.85%Image classification201896.23%Medium-level reading comprehension201886.70%Visual reasoning2019103.75%Image classification201936.08%Multitask language understanding2019103.27%Medium-level reading comprehension201994.21%English language understanding201990.67%Visual reasoning2020104.11%Image classification202060.02%Multitask language understanding2020103.92%Medium-level reading comprehension202099.44%English language understanding202091.38%Visual reasoning2021104.34%Image classification20217.67%Competition-level mathematics202166.82%Multitask language understanding2021104.15%Medium-level reading comprehension2021101.56%English language understanding2021102.48%Visual reasoning2022103.98%Image classification202257.56%Competition-level mathematics202283.74%Multitask language understanding2022101.67%English language understanding2022104.36%Visual reasoning202347.78%PhD-level science questions202393.67%Competition-level mathematics202396.21%Multitask language understanding202371.91%Multimodal understanding and reasoning2024108.00%PhD-level science questions2024108.78%Competition-level mathematics2024102.78%Multitask language understanding202494.67%Multimodal understanding and reasoning2024101.78%English language understanding

From ChatGPT to Gemini, many of the world’s leading AI models are surpassing the human baseline in a range of technical tasks.

The only task where AI systems still haven’t caught up to humans is multimodal understanding and reasoning, which involves processing and reasoning across multiple formats and disciplines, such as images, charts, and diagrams.

However, the gap is closing quickly.

In 2024, OpenAI’s o1 model scored 78.2% on MMMU, a benchmark that evaluates models on multi-discipline tasks demanding college-level subject knowledge.

This was just 4.4 percentage points below the human benchmark of 82.6%. The o1 model also has one of the lowest hallucination rates out of all AI models.

This was major jump from the end of 2023, where Google Gemini scored just 59.4%, highlighting the rapid improvement of AI performance in these technical tasks.

To dive into all the AI Week content, visit our AI content hub, brought to you by Terzo.

To learn more about the global AI industry, check out this graphic that visualizes which countries are winning the AI patent race.

Follow @am1_news

America One News

news&views

am1.news

Loading...

The Internet of Us