How the AI Race Has Spread From the Cloud to the Palm of Your Hand by Zhai Shaohui and Hao Shuai

15 Dec 2023

“Find a photo of a girl laughing while wearing a garbage bag in the rain.”

When Xie Weiqin, AI solution director at Chinese smartphone-maker Vivo, typed this sentence into a phone at the Vivo Developer Conference on Nov. 1, the photo of the girl appeared almost instantly on the phone’s screen.

Xie was demonstrating the smartphone assistant powered by Vivo’s latest artificial intelligence (AI) model. In addition to searching photos and files, the model can assist users in editing photos, extracting key points from papers, and generating social media threads based on keywords or images. These new smartphone functions are enabled by the so-called edge AI technology.

Edge AI allows computations to run on local devices, rather than at a centralized cloud computing facility. Different from cloud-based AI, which relies on data centers to process resource-intensive tasks, edge AI is adept at making real-time decisions without the internet and can be applied to smart devices in everyday use.

The release of OpenAI’s ChatGPT chatbot in late 2022 kicked off a global AI race around large language models (LLMs) — a type of AI that can mimic human intelligence — and sparked competition among device manufacturers and chipmakers.

Following Vivo, Korean smartphone giant Samsung Electronics Co. Ltd. showcased its Gauss generative AI model on Nov. 8, and Oppo Co. Ltd. released its AndesGPT model later that month. Xiaomi Corp. and Honor Technology Inc. had already announced progress in developing LLMs in late October. Huawei Technologies Co. Ltd. had also released the integration of LLMs for its smartphones in August.

In late October, U.S. chip giant Qualcomm Inc. unveiled its Snapdragon 8 Gen 3 processor, the world’s first to support generative AI models with up to 10 billion parameters. Taiwanese chipmaker MediaTek earlier revealed its collaboration with Oppo and Vivo on LLMs. JC Hsu, senior vice president of MediaTek, said that “fierce competition is unfolding” in the generative AI field.

Companies and investors anticipate that the adoption of new AI technology will revitalize the languishing consumer electronics market. Data from market research firm Counterpoint shows that in the third quarter of 2023, global smartphone shipments declined for the ninth consecutive quarter, shrinking 8% year-on-year and hitting the lowest third-quarter levels in a decade. Global PC shipments also dropped by 9% in the third quarter. Counterpoint said that AI-powered PCs are very likely to drive a rebound in shipments in 2024 and dominate the PC market with a penetration rate of over 50% after 2026.

However, challenges persist for edge AI models due to high processing power requirements and extensive use of memory and storage space, which could significantly increase the cost. For mobile devices, LLMs face challenges in energy consumption since battery energy density is a hardware bottleneck that is difficult to overcome in the short term, analysts said.

An LLM engineer said that it might take six months to a year for consumers to experience the qualitative change brought by edge AI.

Performance metrics

The most common application of LLM in smartphones is intelligent assistants. These assistants evolved from voice interactions to supporting multiple inputs such as voice, text, images and documents. They progressed from passively following instructions to engaging in natural conversations and conducting summarization, information retrieval and multilingual translations.

“In the past, talking to AI required careful consideration, similar to taking care of a child,” said Luan Jian, head of the AI Laboratory Big Model Team of the Xiaomi Technical Committee. But now, LLMs can help users communicate with AI more naturally and casually.

AI can also help users in music and image production on their local devices. During the Snapdragon Summit, Qualcomm demonstrated the “photo augmentation” feature, in which a photo can be augmented with AI-generated scenery. MediaTek showcased the rapid generation of emojis. Pat Gelsinger, CEO of Intel, showed the AI-powered PC could generate a song in the style of Taylor Swift at Intel Innovation 2023 in September.

Cristiano Amon, CEO of Qualcomm, said in a speech that there were only one or two use cases for generative AI one year ago, and now there are hundreds. The number will reach thousands by 2024. “Running AI pervasively and continually on the device will transform our user experience,” Amon said.

Logic behind costs

“We can spend 1 billion dollars to create a trillion-parameter model in the cloud, but how do we get hundreds of millions of people to use it?” Gelsinger asked in a September interview with Caixin.

It shouldn’t be achieved solely by allowing users to access the cloud, but rather by bringing the technology toward the client, said Gelsinger, adding that running cloud AI models on PCs can ensure privacy and data security.

In the last six months, smartphone and chip developers repeatedly mentioned the advantages of edge AI models in protecting privacy and security. “User data including call recordings, photos, fingerprints and faces will all be consumed by large models for analysis and inference. If everything is sent to the cloud, especially to third parties, can you accept that?” said Luo Xuan, co-founder of the RWKV Language Model.

Zhao Ming, CEO of Honor, said cloud AI models address how to integrate human knowledge in a better way, but edge AI models analyze personal data, behavior and habits to provide services. “If cloud AI models know everything about you, such as ID card, phone number, address, and weight, it is terrifying,” said Zhao. Edge AI models may avoid these problems because both data and the model outputs stay in the local device.

Rising costs are also pushing LLMs towards the edge. Cloud AI models often have tens of billions or hundreds of billions of parameters, making inference computation expensive.

During the Vivo Developer Conference, Zhou Wei, the vice president of Vivo, said the minimum cost of using a cloud AI model once is 0.012 yuan, and the current cost is around 0.015 yuan. “With 300 million users using it ten times a day, the bill adds up to about 10 billion yuan a year,” Zhou said.

“Today, no cloud AI models are profitable because of the enormous computational power consumption,” Zhao said. Many computations do not require cloud-side solutions. In the future, the industry plans to use edge and cloud AI models collaboratively.

Seeking balance

Constrained by the processing power, memory, storage capacity and battery life of smart devices, edge AI models are trained with billions of parameters.

In the next six months to a year, smartphones will be able to run LLMs with up to 14 billion parameters, while PCs are expected to accommodate models with 60 billion parameters, Luo said.

“In the future, it is likely that there will be a large model with 14 billion parameters on smartphones, serving as the ‘engine’ for the operating system. On the cloud side, there will be a model even larger than GPT-4, acting as the foundation for the next-generation internet. They will complement each other, much like the current relationship between local software and the internet,” Luo said.

A large model with the ability to understand context requires at least 13 billion parameters, which significantly occupies smartphone memory and affects performance, Luo said. Energy consumption is a more significant bottleneck than memory usage considering the challenge of increasing battery capacity.

Zhao said that edge AI will inevitably demand hardware upgrades.

“The biggest challenge is user privacy, computing power and low power consumption. If any AI application cannot balance these three factors, it cannot give consumers a better experience,” Zhao said.

Several AI industry insiders said that edge AI models are still in the early stages, and the industry is still exploring future applications.

To enhance AI-powered devices’ ability to handle LLMs, the first step is increasing memory capacity and bandwidth, Luan said. The second step is increasing or optimizing computing power to efficiently support the network structure of LLMs. Additionally, continuous exploration of model compression and quantization, as well as improvements in inference algorithms, is required to reduce computing power demands.

“Nevertheless, edge AI models will definitely change how smartphones are used,” Luan said.