Affiliate links on Android Authority may earn us a commission. Learn more.

Apple reveals its AI models are nearly as good as ChatGPT, other competitors

Apple's AI models coming to the iPhone and Mac have already been benchmarked.

•

Published onJune 11, 2024

Oliver Cragg / Android Authority

TL;DR

Apple has unveiled the models that will power its upcoming AI features on the iPhone, iPad, and Mac.
The foundation models come in on-device and server variants, depending on their use case.
Apple’s models can match GPT-3.5, but not the industry’s best.

Apple announced a litany of AI features for iPhone, iPad, Mac, and Siri at WWDC 2024 yesterday, but in a surprising twist, didn’t elaborate on the generative AI models it will use to power them. While rumors indicated that the company would rely on OpenAI’s ChatGPT or Google’s Gemini, they turned out to be only half-true. For example, while a ChatGPT integration is indeed coming to iOS, iPadOS, and macOS later this year, it won’t power the revamped Siri or other Apple Intelligence features.

But thanks to a new post on Apple’s machine learning research blog, we now know more about the company’s AI strategy for 2024 and beyond. For starters, the company will rely on its own large language models (LLMs) rather than licensing third-party offerings from the likes of Google and OpenAI.

Apple says its foundation models have been “fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.” The blog post then delves into some of the technical aspects behind its generative AI models, with the main focus being on optimization for low latency and on-device performance.

Apple is still behind in the AI race, but it's gaining significant ground.

More notably, however, this marks our first glimpse at the performance of Apple’s AI models and how they stack up versus the competition.

In one chart, for example, we can see that human evaluators preferred responses from Apple’s cloud model roughly 50% of the time compared to GPT-3.5, which is the base model offered with the free version of ChatGPT. The two models were tied in 25.3% of instances, indicating that GPT-3.5 scored an outright win in only 24.7% of test cases.

However, Apple saw its lead shrink to a mere 28.5% when the cloud model was benchmarked against GPT-4 Turbo. It did deliver a tie in a further 29.8% of cases, though.

Apple’s on-device model performs admirably too, with it either beating or keeping pace with the likes of Mistral-7B and Gemma-2B in the majority of tested responses.

Apple

Apple’s on-device model is approximately three billion parameters in size. Using typical model optimization techniques like quantization, it’s compact enough to run on devices like the iPhone 15 Pro and 15 Pro Max with as little as 8GB of RAM.

The cloud-based model, on the other hand, is larger and more powerful. While Apple didn’t explicitly specify the cloud model’s size, it’s designed to run entirely on Apple Silicon-powered data centers. The latter is an important privacy win for Apple loyalists, as the company can guarantee that their sensitive data isn’t ever handed over to a third-party company like OpenAI.

On the subject of safety, Apple claims that its foundation models are vastly safer than the competition as well. The company’s cloud-based model returned “violating responses for harmful content, sensitive topics, and factuality” in just 6.6% of instances, far lower than GPT-3.5 Turbo’s 15.5% and GPT-4 Turbo’s 20.1%.

This benchmark may indicate why the company has adopted a hybrid approach to Siri, which selectively offloads certain queries to ChatGPT. Instead of responding to factual or potentially inflammatory questions that may tarnish the company’s brand, Apple can simply offer results from third-party sources alongside a disclaimer.

In an interesting twist, Apple claims that both of its foundational models outperform the best AI models available today in summarization. And in composition, GPT-4 Turbo only ekes a minor victory.

While these results sound impressive, it’s worth noting that they’re only claims at this point. Independent testing may arrive at a different conclusion that doesn’t favor the Cupertino giant. It also doesn’t help that the AI industry innovates quickly, and Apple’s AI features won’t be released for a few more months. OpenAI has already moved on to GPT-4o, for instance, and could be on the brink of releasing GPT-5 by the time iOS 18 reaches most iPhone users. Only time will tell if Apple’s lead will hold through the end of this year.

Got a tip? Talk to us! Email our staff at news@androidauthority.com. You can stay anonymous or get credit for the info, it's your choice.