From model discovery to production deployment — all in one place
OpenAI-compatible API. Serve any model via /v1/chat/completions. Drop-in replacement for
OpenAI, works with LangChain, Open WebUI, and more.
Test models interactively — image classification, object detection, text generation, and image synthesis all in browser.
Search, browse, one-click download models from HuggingFace directly into your model registry.
Convert PyTorch → ONNX → OpenVINO IR. Apply INT8/INT4 quantization with NNCF for hardware deployment.
Full training with WebSocket metrics, plus LoRA/QLoRA fine-tuning. Custom datasets, hyperparameters, real-time loss tracking.
Run latency and throughput benchmarks across CPU, GPU, and NPU. Compare quantization levels side-by-side.
Upload, organize, and auto-detect dataset types. Supports images, CSVs, JSON, Parquet, and zip archives.
Web app, Electron desktop (coming soon), Android & iOS (planned). One backend, every device.
Use NPU-STACK as a drop-in replacement for OpenAI. Works with any SDK, framework, or tool.
List all available & loaded models
Chat completion with streaming SSE
Legacy text completion endpoint
Generate text embeddings
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="any" # Not required for local
)
response = client.chat.completions.create(
model="my-model",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
Deploy on any accelerator — auto-detected and ready to go
Full CUDA GPU support with multi-GPU enumeration.
✅ GPURDNA & CDNA architectures via ROCm/HIP.
✅ GPUIntel Core Ultra AI accelerators via OpenVINO.
✅ NPUEdge TPU support via TFLite delegates.
✅ TPUWindows GPU fallback via ONNX Runtime.
✅ DMLOptimized CPU inference. Always available.
✅ CPUClone, setup, run. It's that easy.
git clone https://github.com/chainchopper/NPU-STACK.git && cd NPU-STACK
setup.bat
Downloads portable Python, creates venv, installs all dependencies, generates .env
run-all.bat
Backend (FastAPI :8000) + Frontend (Vite :5173) + OpenAI API (/v1)
NPU-STACK is free and open source. Help us build the future of edge AI.
Fork → checkout dev → make changes → submit
PR