- Updated: April 5, 2026
- 4 min read
Google Gemma 4 Runs Locally with LM Studio’s New Headless CLI and Claude Code Integration – A Comprehensive Guide
Google Gemma 4 Runs Locally with LM Studio’s New Headless CLI and Claude Code Integration
Meta description: Discover how to run the powerful Google Gemma 4 model locally using LM Studio’s headless CLI, explore performance on Apple Silicon, and learn how Claude Code integration boosts offline coding assistance.
Google’s latest open‑source AI model, Gemma 4, has captured the attention of developers looking for high‑quality, locally‑deployable large language models. In a detailed walkthrough published on Georgeliu AI, the author demonstrates how to run the 4‑parameter (26B‑A4B) variant on a MacBook Pro M4 Pro using LM Studio’s newly released headless command‑line interface (CLI). The guide also shows how to pair the model with Claude Code for an offline coding assistant.
Why Run Gemma 4 Locally?
Running large language models on‑premises offers several advantages: data privacy, reduced latency, and the ability to fine‑tune models without sending proprietary code to the cloud. Gemma 4, built on a Mixture‑of‑Experts (MoE) architecture, delivers strong performance while keeping inference costs manageable. The article highlights that the 26‑billion‑parameter version can comfortably fit into 48 GB of VRAM, making it feasible for high‑end laptops and workstations.
Setting Up LM Studio’s Headless CLI
The headless CLI removes the graphical user interface, allowing developers to script model downloads, installations, and inference directly from the terminal. The steps outlined include:
- Installing LM Studio via
brew install lm-studio(or using the provided.debfor Linux). - Downloading the Gemma 4‑A4B model with the command
lmstudio download gemma-4-a4b. - Launching the model in headless mode:
lmstudio serve --model gemma-4-a4b --port 8080.
These commands enable a lightweight HTTP API that can be queried by any client, including custom scripts, VS Code extensions, or the upcoming Claude Code integration.
Performance Benchmarks on Apple Silicon
The author reports that on a MacBook Pro M4 Pro (32 GB RAM, 64 GB unified memory), the Gemma 4 model runs at approximately 2.5 tokens per second in full‑precision mode and up to 5 tokens per second when using 8‑bit quantization. Memory consumption stays under 45 GB, leaving room for other applications. These numbers compare favorably with other open‑source models of similar size, positioning Gemma 4 as a practical choice for developers who need strong reasoning capabilities without a cloud subscription.
Claude Code Integration for Offline Coding Assistance
One of the most compelling parts of the guide is the seamless integration of Gemma 4 with Claude Code. By pointing Claude Code’s backend URL to the LM Studio server (e.g., http://localhost:8080/v1/chat/completions), developers can enjoy AI‑driven code suggestions, debugging help, and documentation generation entirely offline. This setup is especially valuable for teams working in restricted environments or with sensitive codebases.
Fine‑Tuning and Customization Options
LM Studio also supports LoRA‑based fine‑tuning, allowing users to adapt Gemma 4 to domain‑specific vocabularies. The article walks through creating a LoRA adapter, training it on a small dataset of internal documentation, and then loading the adapter at runtime with the --lora flag. This capability opens the door for highly specialized assistants, such as legal‑tech bots or scientific research helpers.
Embedding the Generated Image
The illustration created with the /generateImage tool visually captures the concept of Gemma 4 running locally on a laptop, surrounded by code snippets and the Claude Code logo. It is featured at the top of the article to enhance visual appeal and SEO performance.
Internal Links to Related UBOS Content
For readers interested in exploring further, the following UBOS resources provide deeper insights:
- LM Studio Overview
- Choosing the Right Open‑Source AI Model
- Deploying AI at the Edge
- Claude Code – Offline Coding Assistant
Conclusion
Running Google Gemma 4 locally with LM Studio’s headless CLI and pairing it with Claude Code offers a powerful, privacy‑first AI stack for developers. The setup is relatively straightforward, delivers respectable performance on modern hardware, and can be fine‑tuned for niche applications. As the open‑source AI ecosystem continues to mature, solutions like this will enable more organizations to harness cutting‑edge language models without relying on external APIs.
Read the original step‑by‑step guide here for additional technical details.