Local LLM on Apple w/ MLX

Running a large language model (LLM) on your own machine used to be a distant dream — now it’s possible and surprisingly simple thanks to Apple’s MLX framework.
MLX is Apple’s machine learning library optimized for Apple Silicon, allowing you to run and fine-tune powerful models locally — without needing a GPU cluster or internet connection.

This post will walk you through setting up MLX and running your first model (like Mistral 7B) locally on macOS.


What You’ll Need

  • A Mac with Apple Silicon (M1, M2, or M3)

  • Python 3.10 or later

  • Basic Terminal familiarity


Step 1: Set Up a Project Folder

Start by opening your Terminal and creating a folder for your MLX project.

cd ~/dev/mlx-ui 

If the folder doesn’t exist yet:

mkdir -p ~/dev/mlx-ui
cd ~/dev/mlx-ui

This will be your working directory for everything MLX-related.


Step 2: Create a Python Virtual Environment

A virtual environment keeps your dependencies clean and separate from your system Python.

python3 -m venv .venv

Activate it:

source .venv/bin/activate

Once activated, your Terminal prompt should look something like this:

(.venv) aman@Mac mlx-ui % 

This means you’re now working inside your isolated Python environment.


Step 3: Install MLX and Dependencies

Install the MLX library for running and managing local language models:

pip install mlx-lm 

That’s it — you now have MLX installed on your system.


Step 4: Download a Model from the MLX Community

MLX supports a variety of models hosted on Hugging Face under the mlx-community organization.
For example, to try Mistral 7B Instruct (4-bit) — a strong open-weight model — run:

mlx_lm.generate --model mlx-community/Mistral-7B-Instruct-v0.3-4bit --prompt "hello" 

MLX will automatically:

  • Download the model files to your local Hugging Face cache

  • Run an inference on your machine

  • Return a response to your prompt

If you see a response like “Hi there! How can I help you today?” — your model is live and local 🎉


Step 5: Chat Directly in the Terminal

You can launch a conversational session directly from your terminal:

mlx_lm.chat

This opens an interactive shell where you can type back and forth with your model.
Try a few questions like:

> What’s the capital of India? 
> Write a short poem about the ocean. 

💡 Tip: The terminal interface is great for quick tests, but not ideal for longer conversations or file-based Q&A — that’s where Streamlit comes in (we’ll cover that in the next post).


Step 6: Deactivate When You’re Done

When finished, simply deactivate the virtual environment:

deactivate 

You can always reactivate it later with:

source .venv/bin/activate

Summary

Step Task Command
1 Create project folder mkdir -p ~/dev/mlx-ui
2 Create virtual environment python3 -m venv .venv
3 Activate environment source .venv/bin/activate
4 Install MLX pip install mlx-lm
5 Download and test model mlx_lm.generate --model mlx-community/Mistral-7B-Instruct-v0.3-4bit --prompt "hello"
6 Chat in terminal mlx_lm.chat
7 Deactivate deactivate

Coming Up Next

In the next post, we’ll go beyond the terminal and build a Streamlit WebUI — a sleek, ChatGPT-style interface that lets you chat with your local MLX LLM right from your browser.

Stay tuned for Part 2: Building a Streamlit Web Interface for Your Local MLX Model.

Comments are closed