The Best Way of Running GPT-OSS Locally

Image by Author

Have you ever wondered if there’s a better way to install and run llama.cpp locally? Almost every local large language model (LLM) application today relies on llama.cpp as the backend for running models. But here’s the catch: most setups are either too complex, require multiple tools, or don’t give you a powerful user interface (UI) out of the box.

Wouldn’t it be great if you could:

Run a powerful model like GPT-OSS 20B with just a few commands
Get a modern Web UI instantly, without extra hassle
Have the fastest and most optimized setup for local inference

That’s exactly what this tutorial is about.

In this guide, we will walk through the best, most optimized, and fastest way to run the GPT-OSS 20B model locally using the llama-cpp-python package together with Open WebUI. By the end, you will have a fully working local LLM environment that’s easy to use, efficient, and production-ready.

# 1. Setting Up Your Environment

If you already have the uv command installed, your life just got easier.

If not, don’t worry. You can install it quickly by following the official uv installation guide.

Once uv is installed, open your terminal and install Python 3.12 with:

Next, let’s set up a project directory, create a virtual environment, and activate it:

mkdir -p ~/gpt-oss && cd ~/gpt-oss
uv venv .venv --python 3.12
source .venv/bin/activate

# 2. Installing Python Packages

Now that your environment is ready, let’s install the required Python packages.

First, update pip to the latest version. Next, install the llama-cpp-python server package. This version is built with CUDA support (for NVIDIA GPUs), so you will get maximum performance if you have a compatible GPU:

uv pip install --upgrade pip
uv pip install "llama-cpp-python[server]" --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124

Finally, install Open WebUI and Hugging Face Hub:

uv pip install open-webui huggingface_hub

Open WebUI: Provides a ChatGPT-style web interface for your local LLM server
Hugging Face Hub: Makes it easy to download and manage models directly from Hugging Face

# 3. Downloading the GPT-OSS 20B Model

Next, let’s download the GPT-OSS 20B model in a quantized format (MXFP4) from Hugging Face. Quantized models are optimized to use less memory while still maintaining strong performance, which is perfect for running locally.

Run the following command in your terminal:

huggingface-cli download bartowski/openai_gpt-oss-20b-GGUF openai_gpt-oss-20b-MXFP4.gguf --local-dir models

# 4. Serving GPT-OSS 20B Locally Using llama.cpp

Now that the model is downloaded, let’s serve it using the llama.cpp Python server.

Run the following command in your terminal:

python -m llama_cpp.server 
  --model models/openai_gpt-oss-20b-MXFP4.gguf 
  --host 127.0.0.1 --port 10000 
  --n_ctx 16384

Here’s what each flag does:

--model: Path to your quantized model file
--host: Local host address (127.0.0.1)
--port: Port number (10000 in this case)
--n_ctx: Context length (16,384 tokens for longer conversations)

If everything is working, you will see logs like this:

INFO:     Started server process [16470]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:10000 (Press CTRL+C to quit)

To confirm the server is running and the model is available, run:

curl http://127.0.0.1:10000/v1/models

Expected output:

{"object":"list","data":[{"id":"models/openai_gpt-oss-20b-MXFP4.gguf","object":"model","owned_by":"me","permissions":[]}]}

Next, we will integrate it with Open WebUI to get a ChatGPT-style interface.

# 5. Launching Open WebUI

We have already installed the open-webui Python package. Now, let’s launch it.

Open a new terminal window (keep your llama.cpp server running in the first one) and run:

open-webui serve --host 127.0.0.1 --port 9000

Open WebUI sign up page

This will start the WebUI server at: http://127.0.0.1:9000

When you open the link in your browser for the first time, you will be prompted to:

Create an admin account (using your email and a password)
Log in to access the dashboard

This admin account ensures your settings, connections, and model configurations are saved for future sessions.

# 6. Setting Up Open WebUI

By default, Open WebUI is configured to work with Ollama. Since we are running our model with llama.cpp, we need to adjust the settings.

Follow these steps inside the WebUI:

// Add llama.cpp as an OpenAI Connection

Open the WebUI: http://127.0.0.1:9000 (or your forwarded URL).
Click on your avatar (top-right corner) → Admin Settings.
Go to: Connections → OpenAI Connections.
Edit the existing connection:
1. Base URL: http://127.0.0.1:10000/v1
2. API Key: (leave blank)
Save the connection.
(Optional) Disable Ollama API and Direct Connections to avoid errors.

Open WebUI OpenAI connection settings

// Map a Friendly Model Alias

Go to: Admin Settings → Models (or under the connection you just created)
Edit the model name to gpt-oss-20b
Save the model

Open WebUI model alias settings

// Start Chatting

Open a new chat
In the model dropdown, select: gpt-oss-20b (the alias you created)
Send a test message

Chatting with GPT-OSS 20B in Open WebUI

# Final Thoughts

I honestly didn’t expect it to be this easy to get everything running with just Python. In the past, setting up llama.cpp meant cloning repositories, running CMake builds, and debugging endless errors — a painful process many of us are familiar with.

But with this approach, using the llama.cpp Python server together with Open WebUI, the setup worked right out of the box. No messy builds, no complicated configs, just a few simple commands.

In this tutorial, we:

Set up a clean Python environment with uv
Installed the llama.cpp Python server and Open WebUI
Downloaded the GPT-OSS 20B quantized model
Served it locally and connected it to a ChatGPT-style interface

The result? A fully local, private, and optimized LLM setup that you can run on your own machine with minimal effort.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.