


Apple announced Monday that it is making its on-device large language model accessible to developers, following the release of research from Apple’s own team highlighting significant limitations in powerful AI models used across the tech industry.
The move comes as several artificial intelligence companies are heavily investing in large language models as the primary pathway to achieving advanced AI comparable to human capabilities, with applications spanning from healthcare to military uses.
Apple’s researchers published a whitepaper this month examining the considerable limitations of such models, while the company simultaneously announced that it is making Apple Intelligence, the core of its AI system, available to developers.
“The models that power Apple Intelligence are becoming more capable and efficient, and we’re integrating features in even more places across each of our operating systems,” said Apple senior vice president Craig Federighi in a statement. “We’re also taking the huge step of giving developers direct access to the on-device foundation model powering Apple Intelligence, allowing them to tap into intelligence that is powerful, fast, built with privacy, and available even when users are offline.”
Apple’s artificial intelligence system will enable new features including live translation of text messages and visual search capabilities using device cameras. Through Apple’s Shortcuts app, users will be able to access Apple Intelligence directly, while developers will gain access to the “on-device large language model at the core of Apple Intelligence.”
For requests requiring larger models than what’s available on individual devices, users can engage with Apple’s Private Cloud Compute, a computing system designed to securely process user requests without storing or sharing data with Apple.
Federighi noted that the large models powering Apple Intelligence are becoming more capable, and the company believes this will enable new user experiences and developer creations.
However, Apple’s researchers have expressed skepticism about language models’ ability to achieve artificial general intelligence (AGI) on their own. AGI refers to advanced AI that performs at least as well as humans across all cognitive domains.
In a paper titled “The Illusion of Thinking,” Apple researchers argued that large language models (LLMs) and large reasoning models (LRMs) produced by companies like OpenAI, DeepSeek, Google, and Anthropic have major limitations. Rather than testing models solely on output, Apple evaluated their processes using puzzles instead of traditional mathematical and coding benchmarks.
“Our findings reveal fundamental limitations in current models: despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds,” the Apple researchers wrote.
According to the research, large language models perform well on low-complexity tasks, while large reasoning models excel at moderately complex tasks. However, both types of models used by leading AI companies completely fail at highly complex tasks.
The researchers suggested their findings indicate an “inherent compute scaling limit in LRMs,” suggesting there appears to be a ceiling that these powerful models cannot break through to perform the various tasks needed to reach AGI.
“These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning,” Apple’s researchers concluded.
Critics have viewed Apple’s research findings with suspicion, suggesting that because the company has not appeared competitive in certain AI domains, it is choosing to dismiss those domains’ potential.
Andrew White, co-founder of FutureHouse, connected Apple’s AI product performance to the company’s skepticism about large language models’ future. FutureHouse, which is invested in LLM development, unveiled a new large reasoning model focused on chemistry last week.
“Apple’s AI researchers have embraced a kind of anti-LLM cynic ethos, publishing multiple papers trying to argue that reasoning LLMs are somehow limited/cannot generalize,” White wrote Saturday on X. “Apple also has the worst AI products (Siri, Apple [Intelligence]). No idea what their ’strategy’ is here.”
• Ryan Lovelace can be reached at rlovelace@washingtontimes.com.