Glass Slipper

Meet Glass Slipper, the LLM that fits your Mac. Glass Slipper bundles defaults that just work, and prompting libraries to make local LLM-based development easy and fun.

A key feature of Glass Slipper is runtime adaptation: depending on memory pressure, Glass Slipper will use a larger or smaller model, and let you know why it's doing this.

Contrast Glass Slipper with Tinker-Bell-style development. Reddit and Hacker News are rife with Tinker-Bell posts: "I patched llama-server and got 200 tokens/second on Qwen3.5 with zero quality loss. All it took was using upstream, not-yet-merged code from llama.cpp, a Q5_K_M quantized model, a 3090 Ti with 24GB RAM, and imatrix calibration."

The intent for Glass Slipper is to ship a single application that works with a single model family (Qwen3.5) to perform a curated set of development tasks. It should work on basically any Apple Silicon, with better and faster performance with more unified memory.

The version of Glass Slipper linked below doesn't require the user to patch llama. It doesn't require you to choose a quantization level. It uses sensible defaults that fit your Mac. And it monitors the resources on your Mac (namely: unified memory and token throughput) to make sure it keeps chugging happily.

v0.1.5 · 5.8 MB · macOS (Apple Silicon)

Download Glass Slipper.dmg

Mailing list

Approximately one email per month. Release notes, what I'm working on next.

Writing