Nexa AI Runs GPT-OSS on Android Smartphones
The Nexa AI team has launched an optimized Android build of the open-source GPT-OSS model, demonstrating that full-scale large-language inference is no longer confined to cloud servers.
10 GB model running natively on mobile hardware
According to Nexa AI’s announcement, the adapted model occupies roughly 10 GB and requires a smartphone with at least 16 GB of RAM and a high-end processor. Tests were conducted on the ASUS ROG 9 equipped with the Snapdragon Elite Gen 5 chip — currently one of the fastest Android SoCs available.
During benchmarks, the device achieved an impressive time-to-first-token of just 3 seconds and an average generation speed of 17 tokens per second. For an on-device large-model setup, these results represent a major leap in mobile AI performance.
Expanding device support
The same chipset will soon appear in the upcoming Xiaomi 17 / 17 Pro / 17 Pro Max line and is expected across other flagship series such as realme GT 8 Pro, Honor Magic 8 / 8 Pro, iQOO 15, OnePlus 15, and Samsung Galaxy S26. This hardware expansion could make mobile-based LLMs practical for thousands of developers and enthusiasts without relying on external servers.
Based on GPT-OSS 20B architecture
The Android release is derived from the GPT-OSS 20B model — an open-source variant of OpenAI’s architecture, adjusted for smaller compute budgets. Nexa AI has not disclosed the specific compression or quantization techniques used, but the community suspects a mix of 4-bit weight sharing, attention-window pruning, and optimized inference kernels tailored to the Snapdragon Elite Gen 5 NPU.
The developers noted that the Android build still maintains full context reasoning and multi-turn memory, making it suitable for chat, summarization, and lightweight code generation tasks offline.
Why it matters
Running an open-weight LLM directly on a smartphone is a milestone for AI accessibility. It suggests that “edge inference” — local, private computation without cloud dependence — may soon reach consumer scale. This shift not only reduces latency and cost but also improves privacy, as data never leaves the device.
Next steps and outlook
Nexa AI plans to open-source its optimization toolkit, enabling community ports to other chipsets such as MediaTek Dimensity 9400 and Tensor G4. The company is also testing mixed-precision inference and memory offloading for devices with 12 GB RAM or less.
If successful, this could bring full GPT-OSS chat capabilities to mid-range smartphones within a year — turning the mobile ecosystem itself into a distributed compute network.
Conclusion
The Android launch of GPT-OSS marks a symbolic end to the era when powerful language models required cloud supercomputers. For the first time, flagship smartphones can run a 20-billion-parameter model locally, opening a path toward personal, secure, and fully offline AI experiences.
Editorial Team — CoinBotLab