A CLI proxy that intercepts Warp terminal AI requests and routes them to a local LLM without cloud exposure

Warp terminal routes all AI inference through its own cloud servers. Even after adding a custom inference endpoint, Warp rejects localhost URLs and requires users to expose their local Ollama or LMStudio instance to the public internet via ngrok or Cloudflare Tunnel. This means every AI-assisted terminal session leaks shell context, command history, and file paths to Warp's servers and a public tunnel endpoint, with no opt-out. A local socket-level proxy intercepts Warp's AI requests before they leave the machine, rewrites the destination to a local model endpoint, and returns the response in the same format Warp expects. Zero cloud exposure, zero tunneling, full Warp UX intact. The proxy runs as a background daemon, requires no Warp modification, and works with any OpenAI-compatible local backend (Ollama, LMStudio, llama.cpp, vLLM). Targets privacy-conscious developers, enterprise teams with data-handling constraints, and air-gapped environments where routing terminal context to a third-party cloud is not acceptable.

Demand Breakdown

GitHub

1,517

110

Social Proof 2 sources

Make Warp work with Local Language Models (like Ollama models)

@hmdz105 · 2024-02-26

1,517 HN

Warp sends a terminal session to LLM without user consent

@ykurtov · 2025-08-19

110

Gap Assessment

CompetitiveMultiple tools exist but differentiation opportunities remain

4 tools exist (Warp custom inference endpoint, ngrok / Cloudflare Tunnel, LLMStudio core, blue-context/warp Ollama provider (Go package)) but gaps remain: Rejects localhost/127.0.0.1 — still requires a public tunnel (ngrok, Cloudflare) so requests still traverse the internet and expose terminal context publicly; does not eliminate cloud exposure; Worsens the privacy problem: terminal context now routes through Warp cloud AND a public tunnel endpoint; not a privacy solution.

Features8 agent-ready prompts

Transparent socket-level intercept daemon

▶

Multi-backend model router with live switching

▶

Terminal context privacy filter

▶

Request and response audit log with local TUI dashboard

▶

Zero-config install with auto-detection and clean uninstall

▶

Air-gapped and VPN-enforced mode with kill switch

▶

Warp AI credits bypass tracker with cost comparison dashboard

▶

Model benchmark runner for terminal AI workloads

▶

Competitive LandscapeFREE

Product	Does	Missing
Warp custom inference endpoint	Lets users point Warp at a custom OpenAI-compatible URL for billing purposes	Rejects localhost/127.0.0.1 — still requires a public tunnel (ngrok, Cloudflare) so requests still traverse the internet and expose terminal context publicly; does not eliminate cloud exposure
ngrok / Cloudflare Tunnel	Expose local services to public internet so Warp can reach them as a custom endpoint	Worsens the privacy problem: terminal context now routes through Warp cloud AND a public tunnel endpoint; not a privacy solution
LLMStudio core	General-purpose LLM routing library supporting Ollama and custom backends	Generic router not integrated with Warp's local socket protocol; requires manual wiring and does not intercept Warp traffic transparently
blue-context/warp Ollama provider (Go package)	Experimental Go package with an Ollama provider shim for a warp-named project	1-3 star orphan repo, no packaging, no install path, no active maintenance, not a real product