On-device GenAI: solving GenAI’s scalability issue

0
6


At the XYZ Paris that took place at Station F on September 27, 2024, George Tsirtsis from Qualcomm stressed the issue with GenAI, i.e. scalability, and sketched a new course for the future of that technology: on-device GenAI. To the chips manufacturer, the future isn’t the cloud but your good old handheld device (preferably equipped with their chips).  

What if on-device GenAI were the future?

On-device GenAI is already here, Tsirtsis said at XYZ Paris on September 27, 2024 and it’s the solution, or at least one of the solutions, for solving the GenAI scalability issue.

GenAI has a scalability problem 

Let me go “straight to the point,” Tsirtsis said, “Generative AI has a scaling problem. The problem stems from the simple fact that computation to serve generative AI queries, and to respond to them, is much higher than a traditional web search, about 10 times as much, and that’s a conservative estimate”.

On-device GenAI
Tsirtsis on stage at XYZ Paris 2024

Towering problems

It’s even worse than that according to the Senior Director of Technolgy of Qualcomm Europe. 

“When you go to ChatGPT or any other AI assistant, the computation and electricity used is much higher than for traditional websites. You have to take that 10x number and multiply it by the deluge of applications and features that are powered by generative AI and multiply that again by the billions of users of that technology.”

GenAI has a scalability issue

Developers can work differently 

The good news is, though, that when you understand that there is a problem, it’s easier to work out a few solutions. Qualcomm thinks the answer to that towering problem is… your device. 

“The solution to this is to distribute computation to the edge. The future is clearly hybrid,” Tsirtsis went on, “we’re going to be dealing with very large models in the cloud and the smaller large models at the edge (SLMs).”

“SLMs aren’t that small,” Tsirtsis told us. “They are anything up to 10 to 15 billion parameters with tens of thousands of token context windows. A couple of days ago, Meta announced the latest generation of LAMA 3.2, a 1-billion parameter model, 3-billion parameters, and also a multimodal 11-billion parameter model.” 

But all of these SLMs can run on device, and “they can run very efficiently”.

The data too, stays on your device

There’s more to it than just efficiency. Data privacy is also part of the equation. “The other big advantage for you as a developer or entrepreneur is that you can make sure that your customer’s sensitive data stays on device” he added. 

A hub to make developers’ lives easier

Qualcomm is adamant that their new architecture for GenAI isn’t a pie in the sky. It’s here and it’s working. “On-device AI is here” the Qualcomm speaker stated. 

“There are more than 100 models that have already been optimised around our different hardware platforms and you can even bring your own model and optimise it there. Developers can test their applications on any one of the many devices we make available in the device farm that you can access online. They don’t have to buy all the different smartphones that are out there to test their applications.”

To this end. Qualcomm launched an online platform dedicated to developers at aihub.qualcomm.com.

The Qualcomm Aihub home page

Time will tell if this solves the GenAI scalability conundrum, but at least there is light at the end of the AI tunnel.

Yann Gourvennec
Latest posts by Yann Gourvennec (see all)



Source link