Nexa AI runs GPT-OSS on Android smartphones

Nexa AI running GPT-OSS model on an Android smartphone with futuristic UI and data stream

Nexa AI Runs GPT-OSS on Android Smartphones​


The Nexa AI team has launched an optimized Android build of the open-source GPT-OSS model, demonstrating that full-scale large-language inference is no longer confined to cloud servers.

10 GB model running natively on mobile hardware​


According to Nexa AI’s announcement, the adapted model occupies roughly 10 GB and requires a smartphone with at least 16 GB of RAM and a high-end processor. Tests were conducted on the ASUS ROG 9 equipped with the Snapdragon Elite Gen 5 chip — currently one of the fastest Android SoCs available.

During benchmarks, the device achieved an impressive time-to-first-token of just 3 seconds and an average generation speed of 17 tokens per second. For an on-device large-model setup, these results represent a major leap in mobile AI performance.


Expanding device support​


The same chipset will soon appear in the upcoming Xiaomi 17 / 17 Pro / 17 Pro Max line and is expected across other flagship series such as realme GT 8 Pro, Honor Magic 8 / 8 Pro, iQOO 15, OnePlus 15, and Samsung Galaxy S26. This hardware expansion could make mobile-based LLMs practical for thousands of developers and enthusiasts without relying on external servers.

Based on GPT-OSS 20B architecture​


The Android release is derived from the GPT-OSS 20B model — an open-source variant of OpenAI’s architecture, adjusted for smaller compute budgets. Nexa AI has not disclosed the specific compression or quantization techniques used, but the community suspects a mix of 4-bit weight sharing, attention-window pruning, and optimized inference kernels tailored to the Snapdragon Elite Gen 5 NPU.

The developers noted that the Android build still maintains full context reasoning and multi-turn memory, making it suitable for chat, summarization, and lightweight code generation tasks offline.


Why it matters​


Running an open-weight LLM directly on a smartphone is a milestone for AI accessibility. It suggests that “edge inference” — local, private computation without cloud dependence — may soon reach consumer scale. This shift not only reduces latency and cost but also improves privacy, as data never leaves the device.

“Our goal is to prove that a flagship phone can serve as a portable AI workstation,” the Nexa AI team wrote in its announcement. “GPT-OSS 20B for Android is a proof-of-concept that scales down without dumbing down.”

Next steps and outlook​


Nexa AI plans to open-source its optimization toolkit, enabling community ports to other chipsets such as MediaTek Dimensity 9400 and Tensor G4. The company is also testing mixed-precision inference and memory offloading for devices with 12 GB RAM or less.

If successful, this could bring full GPT-OSS chat capabilities to mid-range smartphones within a year — turning the mobile ecosystem itself into a distributed compute network.


Conclusion​


The Android launch of GPT-OSS marks a symbolic end to the era when powerful language models required cloud supercomputers. For the first time, flagship smartphones can run a 20-billion-parameter model locally, opening a path toward personal, secure, and fully offline AI experiences.


Editorial Team — CoinBotLab

Source: Nexa AI / X (Twitter)

Comments

There are no comments to display

Information

Author
Coinbotlab
Published
Views
124

More by Coinbotlab

Top