Home > AI > Body

Google's Latest App Lets Your Phone Run AI in Your Pocket—Entirely Offline

clock
2025-06-03 23:45:41

Google has released a new app that nobody asked for, but everyone wants to try.

The AI Edge Gallery, which launched quietly on May 31, puts artificial intelligence directly on your smartphone—no cloud, no internet, and no sharing your data with Big Tech's servers.

The experimental app—released under the Apache 2.0 license, allowing anyone to use it for almost anything—is available on GitHub, starting with the Android platform. The iOS version is coming soon.

It runs models like Google's Gemma 3n entirely offline, processing everything from image analysis to code writing using nothing but your phone's hardware.

And it’s surprisingly good.

The app, which appears to be aimed at developers for now, includes three main features: AI Chat for conversations, Ask Image for visual analysis, and Prompt Lab for single-turn tasks such as rewriting text.

Users can download models from platforms like Hugging Face, although the selection remains limited to formats such as Gemma-3n-E2B and Qwen2.5-1.5 B.

Reddit users immediately questioned the app's novelty, comparing it to existing solutions like PocketPal.

Some raised security concerns, though the app's hosting on Google's official GitHub counters impersonation claims. No evidence of malware has surfaced yet.

We tested the app on a Samsung Galaxy S24 Ultra, downloading both the largest and smallest Gemma 3 models available.

Each AI model is a self-contained file that holds all its “knowledge”—think of it as downloading a compressed snapshot of everything the model learned during training, rather than a giant database of facts like a local Wikipedia app. The largest Gemma 3 model available in-app is approximately 4.4 GB, while the smallest is around 554 MB.

Once downloaded, no further data is required—the model runs entirely on your device, answering questions and performing tasks using only what it learned before release.

Even on low-speed CPU inference, the experience matched what GPT-3.5 delivered at launch: not blazing fast with the bigger models, but definitely usable.

The smaller Gemma 3 1B model achieved speeds exceeding 20 tokens per second, providing a smooth experience with reliable accuracy under supervision.

This matters when you're offline or handling sensitive data you'd rather not share with Google or OpenAI's training algorithms, which use your data by default unless you opt out.

GPU inference on the smallest Gemma model delivered impressive prefill speeds over 105 tokens per second, while CPU inference managed 39 tokens per second. Token output—how fast the model generates responses after thinking—reached around 10 tokens per second on GPU on average and seven on CPU.

The multimodal capabilities worked well in testing.

Additionally, it appears that CPU inference on smaller models yields better results than GPU inference, although this may be anecdotal; however, this has been observed in various tests.

For example, during a vision task, the model on CPU inference accurately guessed my age and my wife’s in a test photo: late 30s for me, late 20s for her.

The supposedly better GPU inference got my age wrong, guessing I was in my 20s (I’ll take this “information” over the truth any day, though.)

Google's models come with heavy censorship, but basic jailbreaks can be achieved with minimal effort.

Unlike centralized services that ban users for circumvention attempts, local models don't report back about your prompts, so it can be a good practice to use jailbreak techniques without risking your subscription or asking the models for information that censored versions will not provide.

Third-party model support is available, but it is somewhat limited.

The app only accepts .task files, not the widely adopted .safetensor format that competitors like Ollama support.

This significantly limits the available models, and although there are methods to convert .safetensor files into .task, it’s not for everybody.

Code handling works adequately, although specialized models like Codestral would handle programming tasks more effectively than Gemma 3. Again, there must be a .task version for it, but it can be a very effective alternative.

For basic tasks, such as rephrasing, summarizing, and explaining concepts, the models excel without sending data to Samsung or Google's servers.

So, there is no need for users to grant big tech access to their input, keyboard, or clipboard, as their own hardware is handling all the necessary work.

The context window of 4096 tokens feels limited by 2025 standards, but matches what was the norm just two years ago.

Conversations flow naturally within those constraints. And this may probably be the best way to define the experience.

Considering you are running an AI model on a smartphone, this app will provide you a similar experience to what the early ChatGPT provided in terms of speed and text accuracy—with some advantages like multimodality and code handling.

But why would you want to run a slower, inferior version of your favorite AI on your phone, taking up a lot of storage and making things more complicated than simply typing ChatGPT.com?

Privacy remains the killer feature. For example, healthcare workers handling patient data, journalists in the field, or anyone dealing with confidential information can now access AI capabilities without data leaving their device.

“No internet required” means the technology works in remote areas or while traveling, with all responses generated solely from the model’s existing knowledge at the time it was trained..

Cost savings add up quickly. Cloud AI services charge per use, while local models only require your phone's processing power. Small businesses and hobbyists can experiment without ongoing expenses. If you run a model locally, you can interact with it as much as you want without consuming quotas, credits, or subscriptions, and without incurring any payment.

Latency improvements feel noticeable. No server round-trip means faster responses for real-time applications, such as chatbots or image analysis. It also means your chatbot won’t ever go down.

Overall, for basic tasks, this could be more than enough for any user, with the free versions of ChatGPT, Claude, Gemini, Meta, Reka, and Mistral providing a good backup when heavier computation is needed.

Of course, this won’t be a substitute for your favorite internet-connected chatbot anytime soon. There are some early adoption challenges.

Battery drain concerns persist, especially with larger models; setup complexity might deter non-technical users; the model variety pales in comparison to cloud offerings, and Google’s decision not to support .safetensor models (which account for almost 100% of all the LLMs found on the internet) is disappointing.

However, Google's experimental release signals a shift in the philosophy of AI deployment. Instead of forcing users to choose between powerful AI and privacy, the company's offering both, even if the experience isn't quite there yet.

The AI Edge Gallery delivers a surprisingly polished experience for an alpha release. Google's optimization demonstrates the creation of probably the best UI available for running AI models locally.

Adding .safetensor support would unlock the vast ecosystem of existing models, transforming a good app into an essential tool for privacy-conscious AI users.

Edited by Josh Quittner and Sebastian Sinclair

Web3 Desktop Trading Tool
Stay ahead of the game in the cryptocurrency space.

7x24 Newsflash

06:19 2025-07-05
某开设40倍BTC空单的鲸鱼地址加仓做空BTC,仓位价值超7000万美元
据 HyperInsight 监测,某 0x5D2 开头鲸鱼地址在 HyperLiquid 上为其 40 倍 BTC 空单加仓 1645 万美元(138.9 枚 BTC),当前浮亏 44.7 万美元,具体仓位信息如下: 开仓价格: 107,464.40美元 杠杆倍数: 40x 持仓金额: 70,437,384.89美元 持仓数量: 651.28 盈亏: -447,793.90美元 清算价格: 116,670.030美元
06:19 2025-07-05
​​币安将首发上线Infinity Ground(AIN)代币交易​
据官方公告,币安将于2025年7月5日09:00(UTC)上线Infinity Ground(AIN)交易。持有140个以上币安Alpha积分的用户可领取600枚AIN空投,领取需消耗15个积分且需在24小时内完成确认。
05:58 2025-07-05
蚂蚁国际与国际掉期与衍生工具协会联合发布代币化存款在交易银行业务中的应用白皮书
蚂蚁国际与国际掉期与衍生工具协会(ISDA)在新加坡金融管理局Project Guardian支持下,联合发布《代币化存款在交易银行业务中的应用》白皮书,推动代币化存款与共享分类帐在跨境支付及外汇结算领域的发展,并提出代币化存款技术的基本原则和标准。报告指出,实际用例证明代币化存款与共享分类帐技术能提升跨境支付的速度、安全和效率。
05:46 2025-07-05
挪威上市数字资产交易平台NBX完成540万挪威克朗融资拟专用于比特币购买
挪威上市数字资产交易平台Norwegian Block Exchange (NBX)宣布完成540万挪威克朗融资,据悉这笔资金将专用于比特币购买,该公司此前已购入6枚比特币(约合63.37万美元)并计划探索将比特币用作抵押品。
05:24 2025-07-05
数据:以太坊上借贷协议的活跃贷款额达 226 亿美元,创历史新高
据 Token Terminal 数据显示,以太坊上借贷协议的活跃贷款额达 226 亿美元,创历史新高。其中,该市场重要玩家包括:Aave、Spark、Morpho、Maple、Fluid、Compound、Euler 等。
05:03 2025-07-05
BONK旗下Meme启动平台Letsbonk.Fun昨日发射7714种Meme币,仅次于Pump.fun
据Dune数据显示,BONK旗下Meme启动平台Letsbonk.Fun昨日发射7714种Meme币,仅次于Pump.fun的19,367次,相较第三名JupStudio的2372次拉开较大差距。 据官方数据显示,过去24小时Letsbonk.Fun平台交易额达9536万美元,产生32.7万美元费用,预计(过去30日数据x12个月)年收入可达3825万美元。
04:48 2025-07-05
数据:USELESS创新高,最大持有者浮盈超750万美元
据链上分析师Onchain Lens(@OnchainLens)监测,USELESS代币价格创下历史新高,市值接近2.9亿美元,该代币最大持有者theunipcs(@theunipcs)目前浮盈超过750万美元。
04:36 2025-07-05
Bitdeer上周比特币持仓新增41.4枚,总持仓量达1,527.5枚
据比特币矿企Bitdeer官方披露,截至2025年7月4日,Bitdeer比特币持仓量已达1,527.5枚(不包括客户存款)。 上周Bitdeer共产出57.0枚比特币,出售15.6枚,净增持41.4枚比特币。
04:15 2025-07-05
金色午报 | 7月5日午间重要动态一览
7:00-12:00关键词:特朗普、CZ、加密货币周 1.特朗普:贸易信函已签署并将于周一发出 2.CZ:看到2011年巨鲸转移加密货币,意识到自己进入加密领域太晚了 3.Coinbase主管:“休眠14年巨鲸80亿美元 TC转移”或涉及私钥泄露 4.分析师:马来西亚、泰国和菲律宾可能会被征收15%至20%的更高关税 5.特朗普:伊朗没有同意对其核项目进行检查,也没有同意放弃浓缩铀活动 6.美国加密货币周...
03:50 2025-07-05
特朗普:伊朗没有同意对其核项目进行检查,也没有同意放弃浓缩铀活动
据金十数据报道,特朗普于周五说,伊朗没有同意对其核项目进行检查,也没有同意放弃浓缩铀活动。他说,他相信德黑兰的核项目已经被永久搁置,尽管伊朗可以在另一个地方重启核项目。他星期一将与以色列总理内塔尼亚胡讨论伊朗问题。
03:26 2025-07-05
Project Hunt:AI 代理协作平台 AWE Network 为过去 7 天新增 Top 人物关注者最多的项目
根据 Web3 资产数据平台 RootData X 追踪数据显示,过去 7 天,AI 代理协作平台 AWE Network 为新增 X(推特) Top 人物关注者最多的项目,新关注该项目的 X 影响力人物包括小捕手 CHAOS(@iamyourchaos)、华语加密 KOL 马蹄橘子 (@bitcoinzhang1)。
03:17 2025-07-05
特朗普:贸易信函已签署并将于周一发出
美国总统特朗普就致贸易伙伴的信件发表声明称,自己确实签署了一些信函,贸易信函已签署并将于周一发出。有12个国家已就此事作出回应。(金十)