Google AI compression technology saves data center energy

We’ve seen the future of AI with Large Language Models. And it’s smaller than you think.

That became clear in 2025, when we first saw China’s DeepSeek – a thin, lightweight LLM that doesn’t need the power of a data center to do its job and performs surprisingly well in benchmark tests against the big American AI models. (Ironically, it’s built on top of the US open source model, Meta’s Llama).

DeepSeek may be based on privacy concerns, but the trend toward smaller and smarter AI is endless. The concept of evolution is also reflected in TurboQuant, a compression algorithm that Google quietly revealed this week in a Google Research paper.

The paper itself is pretty impenetrable if you’re not an AI expert on tokens and advanced vectors. We will go into a more detailed explanation below. But here is the TL; DR: The TurboQuant algorithm can make LLM’s memory usage six times smaller.

What does it mean? Low power consumption, perhaps to the extent that running a powerful AI model on your powerful smartphone is possible. Less RAM usage, just in time for the ongoing RAM shortage.

Indeed, algorithms like this can help LLMs to use the data centers that they have found – either by using more space to run complex models, or, hear me, by allowing us not to rush to build ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​at first in the future.

And that, ironically, could be a problem for the AI ​​economy, at least as it’s currently organized.

Why small and smart will destroy NVIDIA

In the last three years, technology products have been rising behind only one company: NVIDIA. And NVIDIA is rising with the idea that we are in the middle of what CEO Jensen Huang called this month “the biggest building in history” – the explosion of data centers, for which NVIDIA will be the main supplier of chips.

But infrastructure, if you look at built data centers versus dedicated data centers, is already stumbling, as a new trend. New York Times the research has just been clarified. What is an arrest? Not just opposition from concerned citizens across the US, now including the NAACP. They are also permits, applications, inspections and other unpleasant but often necessary parts of the local government machinery.

Not the least of the problems: Lack of electricity supply and transmission, which is not stable and the AI ​​industry’s unparalleled ability to generate electricity and absorb water.

What happens when the desire for more AI falls short of the resources? Now, necessity becomes the mother of invention. We learn to do more with less. And that’s exactly what TurboQuant does.

Central pressure

Here’s the explanation – even though TurboQuant is a compression algorithm, you’d be forgiven for thinking that Google is inspired by the NSFW algorithm “in the middle” that ran the HBO comedy project. Silicon Valley.

So there are potential “bottlenecks” when AI models reach something they really want and use frequently. One is called a “key-value cache”, which is like a hot-button library that stores frequently used information. Another is a vector search, which matches similar objects. TurboQuant applies both at the same time, making the concentration faster, easier, and less fragile.

TurboQuant “helps to unclog key-value cache bottlenecks by reducing the size of key pairs,” the Google paper says, partly because of the “intelligence” of “randomly rotating the data vectors.”

Did you get that? No? It’s okay, it doesn’t matter. All you need to know is that there is a promising new field for very complex calculations, and it works in the same way that compression algorithms have been working for a long time – to make the new technology faster, lighter, easier to work with.

First, it was the download of ZIP files, then video compression enabled streaming, and now AI. The result could allow a more powerful LLM to run fully on your phone, or it could destroy the global economy, or both at the same time. Does life not appear in 2026?

Heads
Artificial Intelligence Google

#Google #compression #technology #saves #data #center #energy

Leave a Comment