DEV Community: CodeKing

"I Stopped Building a Coding Agent and Built a Supervisor for Codex and Claude Code Instead"

CodeKing — Thu, 23 Apr 2026 07:14:00 +0000

A couple of weeks ago I was about to do what everyone on my timeline was doing: build another coding agent. Read files, run commands, plan steps, loop until done.

Then I asked myself the uncomfortable question.

Why am I building a worse version of Claude Code and Codex, when both of them are already installed on my machine and work better than anything I can ship this month?

So I stopped. And I built the opposite of a coding agent instead.

The part I was getting wrong

I kept describing the problem as "I want an agent." But when I wrote down what I actually needed it to do, almost none of it was coding:

pick whether this request should go to Codex or Claude Code
decide whether it belongs in the current runtime session or a new one
remember what task the user was iterating on
surface approval prompts that are hiding in logs
summarize when a run finishes
handle "retry that last one" without a human translating

None of those are coding tasks. They are dispatch, supervision, and memory.

The executors (Codex, Claude Code) are the muscle. What I was missing wasn't more muscle. It was a nervous system.

Control plane vs execution plane

Once I framed it that way, the architecture fell out naturally. I now split the system into two planes:

Execution plane — Codex, Claude Code, and any future runtime that can actually write files and run commands. These are providers. They are not the agent.
Control plane — the supervisor agent. It reasons about what to do, chooses an executor, dispatches, observes, and reports back.

The rule I gave myself: the control plane never writes code. If it ever finds itself wanting to, that's a signal that I'm collapsing the two planes and I need to stop and route the work to an executor instead.

This is the opposite of the current trend, where everyone is trying to pack more executor capability into a single agent loop. I went the other way on purpose.

What the supervisor actually does

The supervisor runs its own ReAct loop — but the tools aren't read_file and run_command. They're dispatch and observation tools:

start_runtime_task(provider, prompt, working_dir)
continue_runtime_task(session_id, message)
get_runtime_status(session_id)
list_active_sessions(conversation_id)
approve_pending_question(session_id, answer)
recall_memory(scope, key)
write_memory(scope, key, value)
summarize_task(session_id)

That's it. That's the tool catalog for the agent itself. The coding tools live inside Codex and Claude Code, where they already work.

Observation First — the rule that saved me

The biggest failure mode I expected was the supervisor getting poisoned by the raw text streams from the executors. Dozens of megabytes of stdout, tool output, and chain-of-thought per session. If I pump that into the supervisor's context, it becomes a bloated, expensive, unreliable mess in about fifteen minutes.

So I adopted one principle and protected it fiercely:

The supervisor consumes structured observations, not raw logs.

When Codex emits an event — a turn starts, a tool is invoked, a question is asked, a task completes, a failure occurs — that event gets normalized into a small structured observation. The supervisor sees things like:

{
  "kind": "awaiting_approval",
  "session_id": "sess_83",
  "tool": "shell",
  "summary": "Wants to run: npm install",
  "risk": "medium"
}

Not:

[2026-04-22T14:03:18Z][codex][turn=4][tool_call] shell {...2300 more chars...}

The full log is still archived for audit. The supervisor just doesn't read it by default. This is the single architectural decision with the biggest impact on latency, cost, and correctness.

Memory needs scope, not just storage

The other thing I got wrong in my first draft was memory. I had two levels — "session" and "global" — and within a week they were both the wrong size for every real use case.

What I have now is four scopes:

global user — preferences that cross every project ("I prefer TypeScript over JavaScript")
workspace / project — conventions for this codebase ("tests live under tests/unit/")
conversation — the current chat thread ("we're iterating on the auth middleware")
runtime session — the specific Codex or Claude Code run ("already approved npm install in this session")

Each memory write has to declare its scope. Each read filters by scope. A preference written at conversation scope in a Telegram chat doesn't leak into a totally unrelated Feishu conversation, even though they share the same user.

This sounds obvious written down. It was not obvious when I started.

Direct runtime vs assistant — don't hijack the default

The other thing I was careful about: not making every message go through the supervisor.

If the user is mid-flow with Codex, they don't want a chatty middleman interrupting every turn with observations and summaries. So the default behavior for plain messages is still direct runtime path — the message goes straight to the current session, the supervisor does not intervene.

The supervisor only takes over when the user explicitly invokes it. /cligate do X or a dedicated assistant chat tab. Low-latency, low-noise, predictable.

The result is that you get two modes in one product:

Direct Runtime — fast, predictable, feels like talking to Codex or Claude Code
Assistant Collaboration — explicit, structured, feels like talking to a supervisor who then delegates

Users can tell the difference instantly, because one is immediate and the other shows a planning step.

What this freed me from

The moment I committed to this split, a long list of problems disappeared:

I no longer needed to reinvent tool-use primitives for file editing and shell commands
I no longer had to ship security sandboxing for the agent itself — the executors already have it
I no longer had to match Claude Code or Codex on coding quality
I could ship a useful supervisor in a week, not a quarter

The supervisor's job is narrow enough to be finishable. The coding agent's job is not.

The local-first part matters here

All of this runs on localhost. The supervisor, the executors, the memory store, the channel providers — none of it phones home. That's important to me because a supervisor that manages my credentials, remembers my preferences, and dispatches to my coding tools is exactly the kind of component I do not want living on someone else's server.

Local-first also means the supervisor can observe the executors directly, without routing through anyone's cloud. No round trips, no rate limits on the control plane itself.

Quick start

npx cligate@latest start

Then http://localhost:8081. Normal messages still go to Codex / Claude Code directly. Invoke the supervisor explicitly when you want dispatch and memory behavior.

Repo: https://github.com/codeking-ai/cligate

The question I keep asking myself

Everyone is building agents that can do more. I spent the last two weeks building one that does less — on purpose — because the thing it does less of is already done better by two other tools I have open in the next terminal tab.

Is "supervisor over existing executors" a more honest shape for an agent than "re-implement everything inside a single loop"?

I genuinely don't know the answer across the industry. But for my setup, it's already a clear yes. I'd like to hear how you draw the line — are you putting everything inside one agent, or are you also splitting control plane from execution plane? And if you're splitting, where does your line fall?

"I Only Trusted My Channel Abstraction After Plugging In the Third Provider"

CodeKing — Wed, 22 Apr 2026 01:56:10 +0000

There is a quiet rule a lot of us follow: don't abstract until the third use case.

One integration is a script. Two integrations is copy-paste with a shared helper. By the third, you find out whether you actually built an abstraction — or whether your first two just agreed on the same shape by accident.

I hit that moment last weekend.

The problem

My open-source project runs as a local gateway for AI coding tools — Claude Code, Codex CLI, Gemini CLI — and it also accepts mobile input from messaging channels. Telegram was the first channel. Feishu followed a few weeks later. Both went fine.

Then someone asked for DingTalk.

That is the specific moment that tests you. I had two options:

Copy the Feishu provider, rename everything, and hope
Look at what the first two shared, decide whether it was actually a pattern, and either harden it or tear it out

Option 1 always looks cheaper on a Saturday morning. It almost always isn't.

The part I was worried about

When I looked closely at the existing code, I found two issues that a third provider would inherit by copy-paste — and I did not want to spread them further:

1. A safety flag that looked enforced, but wasn't.

The channel settings already had a requirePairing toggle. The dashboard showed it. The API stored it. But the inbound router was reading a static constructor flag, not the active per-channel setting.

So it looked like a security boundary. In practice, if you flipped the setting after start, nothing happened. Adding DingTalk as-is would have shipped this same gap into a new surface.

2. Runtime sessions dying without a memory.

Each inbound channel message starts or continues a runtime session — basically a live bridge to a Codex or Claude Code run. These sessions expire. Messages don't.

If the user had a conversation going ("now add rate limiting", "no, wrap it in try/except instead"), and the runtime session timed out in between, the next message on the same thread would silently fall back to the channel default provider. No memory of which task they had been iterating on. From the user's perspective, the bot just got dumber for no reason.

Two channels could mask this. Three would turn it into a pattern users would start noticing across the product.

Fixing the abstraction before adding the third integration

I ended up splitting the work in three phases, and doing them in order:

Phase 1 — safety and registry groundwork. Move requirePairing out of the provider constructor and into the active-settings path on every inbound request. Each provider passes its own live settings into routeInboundMessage(message, options). This is boring plumbing, but it is the kind of boring that prevents a future incident.

Phase 2 — DingTalk provider. Text-in, text-out. No interactive cards. No button callbacks. Just enough to validate that the router, orchestrator, and outbound dispatcher pipelines are really channel-agnostic.

Phase 3 — dashboard evolution. The current dashboard has hard-coded cards for Telegram and Feishu. Rather than add a third hard-coded card, expose provider metadata (id, label, capabilities, configFields) from the backend and plan to render the cards from that. This is the part I did not finish in one sitting — it's the kind of change that's easier to do once you already have three providers pulling on the abstraction from different angles.

The rule I gave myself: no new provider may duplicate a shape the first two had already imperfectly shared. If I caught myself writing the same code a third time, that was the signal to extract.

The detail I'm most proud of: the supervisor brief

This is the part I care about more than the channel count.

I didn't want channel conversations to act like stateless webhook bots. So the orchestrator keeps a small structured record per channel conversation — I call it the supervisor brief. It holds:

the last task the user started
whether it's waiting for approval or user input
the runtime provider that owned it (Codex or Claude Code)
remembered permissions at session or conversation scope
the origin relationship when a task was spun off from a previous one

Then, when a message comes in, I don't immediately forward it as a new runtime prompt. I match it against intent patterns first:

进展如何 / status / done? → answer from the brief, don't forward
总结一下 / summarize / recap → wrap-up from the brief
再加一个 / 把…改成… → keep the same session, treat as an update
基于刚才那个再做一个 → sibling task, keep the provider
开始新任务：… / start a new task → fresh task, new runtime session
重试刚才那个 / retry that → recover the failed task if the brief makes the target explicit

The important piece is what happens when the runtime session is already gone but the brief is still there. High-confidence follow-up phrases can revive the remembered provider, so the user keeps talking to the same tool instead of silently falling through to the channel default. When that happens, CliGate also writes the origin relationship back into the current task memory, so later status queries and wrap-ups can explain which earlier task this run came from.

Once that existed, wrap-up replies, next-step suggestions, and busy-state explanations all pulled from the same structured brief instead of ad-hoc string logic. One place to reason about. One place to fix bugs.

What I learned from the third provider

A few things crystallized that I'd been half-believing for months:

Thin provider metadata beats thick provider classes. { id, label, capabilities, configFields } is a surprisingly useful contract. Anything richer tends to calcify.
Security flags that live in the wrong layer are worse than missing flags. A flag the user trusts but the code ignores is a deception, not a feature.
A runtime session and a conversation are not the same lifetime. Treating them as the same was the single biggest source of "the bot got dumb" bug reports.
The third integration is where your abstraction either holds or falls apart. If the third one hurts more than the second one, your first two were just twins, not a pattern.

The DingTalk provider itself ended up being one of the smaller PRs in the project. The work that made it small happened before the file was created.

Quick start

npx cligate@latest start

Then open http://localhost:8081, go to the Channels tab, and plug in Telegram, Feishu, or DingTalk. The same runtime session behavior applies across all three.

Repo: https://github.com/codeking-ai/cligate

Over to you

I'm curious how other people decide when to abstract. Do you wait for the third use case like me? Do you go earlier and accept the rework risk? Or do you just never abstract until someone files a bug that forces your hand?

I'd genuinely like to hear how your team handles this — especially for features that look similar but have quietly different lifetimes, like runtime sessions versus channel conversations.

I Wanted One Local Gateway for Claude Code, Codex, Gemini, Telegram, Feishu, and DingTalk. So I Built CliGate

CodeKing — Tue, 21 Apr 2026 02:42:27 +0000

Most AI dev setups break down in exactly the same place: the layer between your tools and your providers.

You may have:

Claude Code on one account
Codex using a different auth path
Gemini CLI speaking another protocol
a few API keys across multiple vendors
mobile messages coming from Telegram, Feishu, or DingTalk

At that point, the problem is no longer "which model should I use?"

The problem is that your workflow has no control plane.

So I built CliGate: a local multi-protocol AI gateway that runs on localhost and gives all of those clients one entry point.

The idea

I did not want:

separate configs for every CLI
separate auth handling for every provider
separate debugging surfaces for web chat and mobile channels
separate session logic for "real work" versus "messages from my phone"

I wanted one local layer that could do all of this:

accept requests from different AI coding tools
route them to different upstream providers or account pools
keep visibility into usage, logs, pricing, and failures
let mobile channels continue the same runtime flow instead of becoming a dead-end notification pipe

That is what CliGate does.

What CliGate supports

On the client side, CliGate already exposes compatible paths for:

Claude Code through Anthropic Messages API
Codex CLI through OpenAI Responses API, Chat Completions, and Codex internal endpoints
Gemini CLI through Gemini-compatible routes
OpenClaw

On the channel side, it now supports:

Telegram
Feishu
DingTalk

On the upstream side, it can route through combinations of:

ChatGPT account pools
Claude account pools
Antigravity accounts
provider API keys
free-model routes
local runtimes

That means the same local service can sit between your tools, your chat channels, and multiple upstream model providers.

The part I care about most: channels are not bolted on

This is the distinction that made the project worth building.

I did not want Telegram, Feishu, or DingTalk to behave like dumb message forwarders.

In CliGate, channel conversations plug into the same runtime orchestration layer used by the dashboard. That gives you:

sticky runtime sessions
conversation records
pairing and approval flows
provider-specific follow-up handling
one place to inspect what happened

So when a conversation starts from a mobile channel, it can stay attached to the same runtime session until you explicitly reset it.

That is a very different model from the usual "webhook in, text out" bot architecture.

Why the local-first approach matters

CliGate runs locally.

That means:

no hosted relay layer
no forced external control plane
direct connections to official upstream APIs
your routing, credentials, sessions, and logs stay under your control

For developer tooling, this matters a lot more than people admit.

If the gateway layer itself becomes another cloud dependency, you have just moved the fragility somewhere else.

Routing is where the mess gets cleaned up

CliGate separates the client protocol from the upstream provider.

Your tool sends the shape it already expects. CliGate decides where it should actually go.

That includes:

routing priority between account pools and API keys
per-app assignments
model mapping
free-model fallback
local model routing

So Claude Code, Codex CLI, Gemini CLI, and OpenClaw do not need to share the same credentials, and they do not need to know anything about each other's protocol requirements.

You can also bind apps to specific targets instead of manually swapping environment variables every time your usage pattern changes.

The dashboard is part of the product, not an afterthought

Most proxy tools feel fine until something breaks.

Then you realize there is no real visibility into:

which credential was selected
why routing chose that path
whether a token expired
which conversation owns a runtime session
where a mobile follow-up got attached

CliGate ships with a web dashboard to manage:

accounts
API keys
app routing
channel settings
runtime providers
conversation records
request logs
usage and cost stats
pricing overrides

That matters because a gateway without observability eventually becomes guesswork.

A concrete example

This is the workflow I wanted to make normal:

Run CliGate once on my machine.
Point Claude Code, Codex CLI, and Gemini CLI at the same local gateway.
Configure Telegram, Feishu, or DingTalk as channel entry points.
Start a task from the dashboard or from a mobile message.
Keep that conversation attached to the same runtime context while I continue from another surface.

In other words: not just "multiple clients can call one proxy", but "multiple surfaces can participate in the same local orchestration model."

That is the real product.

Quick start

npx cligate@latest start

Or:

npm install -g cligate
cligate start

Then open:

http://localhost:8081

From there you can:

add accounts or API keys
configure app routing
enable Telegram / Feishu / DingTalk channels
inspect runtime sessions and conversation records

Who this is for

CliGate is useful if you are already feeling pain from any of these:

you use more than one AI coding CLI
you switch across OpenAI, Anthropic, Gemini, and other providers
you want one local place to manage auth and routing
you want mobile channel access without giving up runtime continuity
you want debugging and observability instead of shell-script chaos

Repo

GitHub: https://github.com/codeking-ai/cligate

If your current AI setup looks like a pile of disconnected clients, credentials, and chat surfaces, CliGate is meant to turn that into one local piece of infrastructure.

"How I Control Codex and Claude Code From Telegram — a 5-Minute Setup"

CodeKing — Mon, 20 Apr 2026 05:45:15 +0000

I was at dinner when a colleague pinged me: "the staging deploy is failing, can you check the test suite?"

I didn't have my laptop. I had my phone and a Telegram bot connected to my dev machine.

I typed: /cx fix the failing test in tests/auth.test.js

Codex started running on my desktop. Two minutes later, my phone buzzed: "Task completed. Fixed assertion in auth.test.js line 42 — expected token format was outdated."

I went back to dinner.

Here's exactly how to set this up in 5 minutes.

What You Need

CliGate running on your machine (npx cligate@latest start)
Codex CLI or Claude Code installed (CliGate's Tool Installer tab can do this for you)
A Telegram account

That's it. No cloud server. No public IP. No ngrok.

Step 1: Create a Telegram Bot (1 minute)

Open Telegram, search for @botfather, and send:

/newbot

Give it a name and username. BotFather gives you a token like:

7123456789:AAH1234abcdefghijklmnopqrstuvwxyz

Copy that token.

Step 2: Configure CliGate Channels (1 minute)

Open http://localhost:8081 and go to the Channels tab.

Under Telegram:

Paste your bot token
Set Default Runtime Provider to codex (or claude-code — your preference)
Set Working Directory to your project path, e.g. /home/you/projects/my-app
Toggle Enabled on
Click Save

CliGate starts polling Telegram immediately. No webhook URL needed — it uses long-polling mode.

Step 3: Pair Your Phone (30 seconds)

Open your Telegram bot and send any message, like "hello".

The bot responds with a pairing code:

Pairing required. Code: 847291

Go back to the CliGate dashboard. Enter the pairing code in the Channels tab. Done — your Telegram account is now authorized.

Step 4: Send Your First Task (30 seconds)

Now the fun part. Send a message to your bot:

/cx analyze the error handling in src/server.js and suggest improvements

That's it. Here's what happens:

CliGate receives the message
/cx tells the supervisor to use Codex as the runtime
Codex spawns on your desktop in headless mode
Events stream back to Telegram: progress, commands, file changes
When Codex finishes, you get a summary in Telegram

Want Claude Code instead? Use /cc:

/cc refactor the database connection pool in src/db.js

The Commands You Actually Need

Command	What it does
`/cx <task>`	Start a Codex session
`/cc <task>`	Start a Claude Code session
`/new`	Detach current session, next message starts fresh
`/new cx <task>`	Start a new Codex session immediately
`/new cc <task>`	Start a new Claude Code session immediately
`/cancel`	Stop the running task
(plain text)	Continue the current session — follow-up messages stay attached

You don't need to prefix every message with /cx. After starting a session, plain follow-up messages go to the same agent:

You:  /cx fix the failing tests
Bot:  Task accepted. Session abc123 started with Codex.
Bot:  [... progress events ...]
Bot:  Task completed. Fixed 3 assertions.

You:  also update the test snapshots
Bot:  Sent follow-up to session abc123.
Bot:  [... continues in the same session ...]

The Part I Didn't Expect: It Understands What You Mean

This isn't a dumb message relay. CliGate has a supervisor layer that understands intent.

Status queries:

You:  status?
Bot:  Current task "fix auth tests" running with Codex. 
      2 files changed, 1 command executed.
      You can wait for completion or send /cancel.

It recognizes "status", "progress", "done?", "进展如何", "做到哪了" — and answers from the remembered task state instead of forwarding your question to Codex as a new prompt.

Wrap-up summaries:

You:  summarize
Bot:  Task "fix auth tests" completed with Codex.
      Changed: tests/auth.test.js (updated 3 assertions)
      You can ask for a revision or start a related task.

Task switching:

You:  start a new task: add input validation to the signup endpoint
Bot:  Task accepted. Session def456 started with Codex.

It knows "start a new task" means detach the current session and begin fresh — without you needing /new.

Approval Bridging — the Killer Feature

When Claude Code or Codex needs permission to edit a file, the request shows up in Telegram:

Bot:  🔒 Codex wants to run: npm test
      [Approve]  [Deny]

Tap Approve. The agent continues.

But here's the clever part: CliGate remembers your approval. If you approve editing files in /src/, future requests for files in that same directory get auto-approved within the same session. No more tapping "Approve" twenty times for twenty files in the same folder.

Also Works With Feishu (飞书)

If your team uses Feishu instead of Telegram, CliGate supports it too.

The difference: Feishu can run in WebSocket mode — meaning it works on your local machine without a public URL. No ngrok, no cloud, no firewall config. Set Feishu Open Platform event subscription to persistent connection mode, and CliGate connects directly.

Same commands, same supervisor intelligence, same approval bridging.

What the Architecture Looks Like

Your Phone (Telegram / Feishu)
         │
         ▼
  Channel Gateway (long-polling / WebSocket)
         │
         ▼
  Supervisor Agent Layer
    ├── Intent detection (new task / follow-up / status / wrap-up)
    ├── Approval policy engine (remembers scoped permissions)
    └── Task memory (structured brief per conversation)
         │
         ▼
  Agent Runtime (session manager)
    ├── Codex  (headless JSONL events)
    └── Claude Code  (stream-json protocol)
         │
         ▼
  CliGate Proxy Core → Upstream AI APIs

Your phone sends text. The supervisor figures out what to do. The runtime executes. Results come back to your phone. The proxy handles all the API routing underneath.

Honest Caveats

Your desktop machine needs to be running for this to work (it's localhost, not cloud)
Long-running tasks can time out if your machine sleeps
Feishu WebSocket mode requires a Feishu developer app (free to create, but takes 5 more minutes)
Multi-step tasks with lots of approval requests work better with the web dashboard than Telegram

Try It

npx cligate@latest start

Open localhost:8081 → Channels tab → add your Telegram bot token → pair your phone → send /cx hello world.

That's the whole setup.

What's your remote development workflow? Do you SSH from your phone, use VS Code remote, or just wait until you're back at your desk? I'm curious how others handle the "not at my computer but need to fix something" problem.

GitHub: github.com/codeking-ai/cligate

CliGate is open-source under AGPL-3.0. Not affiliated with Anthropic, OpenAI, Google, or Telegram.

"I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home"

CodeKing — Fri, 17 Apr 2026 01:58:00 +0000

Last Tuesday I was on the train home when a Slack message came in: "prod build is broken, can you look?"

I didn't have my laptop open. I didn't want to SSH from my phone. But I had something else — a Telegram bot connected to my localhost machine at home.

I typed: "launch claude code in ~/projects/api-server, fix the failing build"

By the time I walked through my front door, the fix was committed.

That's not how localhost is supposed to work. But here we are.

The Idea That Sounded Crazy

For months, CliGate was "just" a proxy — it sat between your AI coding tools and their APIs, handling routing, account pooling, and key management.

But every time I used the built-in chat to test credentials, the same thought kept nagging me:

Why am I testing models in this chat window, then switching to a terminal to actually use Claude Code or Codex?

What if the chat window could just... launch them?

And then the scarier thought:

What if I didn't even need to be at my computer?

What Changed: Two New Layers

Layer 1: Agent Runtime — Your Chat Window Becomes a Control Room

CliGate's chat can now spawn Claude Code or Codex as real background processes.

Not simulated. Not a wrapper around an API call. The actual CLI tools, running headless, streaming structured events back into your browser.

Here's how it works under the hood:

For Codex:

codex exec --experimental-json --model gpt-5 "fix the failing test"

CliGate spawns this as a child process, reads the JSONL event stream, and maps every event — agent_message, command_execution, file_change, todo_list, reasoning — into the chat UI in real time.

For Claude Code:

claude --print --output-format stream-json --input-format stream-json

Same idea. Claude Code's headless mode exposes a structured stdin/stdout protocol. CliGate reads it, bridges it, and surfaces everything in the chat.

What You Actually See

When you tell CliGate's chat "use codex to refactor the auth module":

A session starts — you see session abc123 started with codex
Codex thinks — reasoning events stream in
Codex runs commands — you see the actual shell commands and their output
Codex changes files — you see diffs
Codex finishes — you get a summary

The killer feature: permission bridging.

When Claude Code asks "Can I edit server.js?" — that question doesn't disappear into a terminal you're not watching. It pops up in the chat. You click Approve or Deny. Claude Code continues.

Session status flow:

starting → running → waiting_approval → running → completed
                          ↑
                    You approve here

This means you don't need a terminal window open at all. The chat window IS your terminal now — but one that actually understands what the agent is doing.

Layer 2: Channel Gateway — Your Phone Becomes the Remote Control

This is where it gets wild.

CliGate now has a Channel Gateway that connects external messaging platforms to the Agent Runtime. Currently supported:

Telegram (polling mode)
Feishu / Lark (webhook mode)

The architecture:

Your Phone (Telegram / Feishu)
        ↓
  Channel Gateway
        ↓
  Agent Runtime (Orchestrator)
        ↓
  Codex / Claude Code (child process)
        ↓
  CliGate Proxy Core
        ↓
  Upstream AI Models

You text your Telegram bot. The Channel Gateway receives the message, routes it to the orchestrator, which decides whether to start a new Codex/Claude Code session or continue an existing one. Results stream back to your phone.

Pairing for security:

You don't want random people controlling your localhost. So there's a pairing flow — the first time you message the bot, it gives you a code. Enter that code in the CliGate dashboard. Now your Telegram account is paired and authorized.

Approval buttons:

When Claude Code needs permission, you get an inline button in Telegram:

🔒 Claude Code wants to edit server.js

[Approve]  [Deny]

Tap Approve. Done. Claude Code continues — on your desktop machine — while you're standing in line at a coffee shop.

Real Talk: What This Actually Solves

Problem 1: Long-running tasks

You tell Claude Code to analyze a large codebase. It takes 20 minutes. Without this feature, you're staring at a terminal for 20 minutes. With it, you get a notification on your phone when it's done.

Problem 2: Permission fatigue

Claude Code asks for permission constantly. If you're not watching the terminal, it just... sits there. Now permission requests reach you wherever you are — browser, Telegram, Feishu.

Problem 3: Context switching

You're in a meeting. A build breaks. You text your bot: "launch codex in ~/projects/backend, fix the test in auth.test.js". You go back to your meeting. Codex handles it.

What I Didn't Build (On Purpose)

This is NOT a full web clone of Claude Code's TUI. It's NOT a complete Codex terminal emulator.

CliGate doesn't try to replicate every feature of these tools. It does exactly four things:

Start a session
Monitor progress in real time
Bridge permission requests and questions
Resume or continue a conversation

The actual coding work is still done by Codex and Claude Code. CliGate is the orchestration layer — the thing that lets you interact with them without sitting in front of a terminal.

The Setup

If you already have CliGate running:

Agent Runtime works out of the box — just use the chat window and mention codex or claude code in your message.

For Telegram:

Create a bot via @botfather
Add your bot token in CliGate's Channel settings
Message your bot — it'll ask you to pair
Enter the pairing code in the dashboard
Start sending tasks

For Feishu:

Create a custom app in Feishu's developer console
Add App ID, App Secret, and Verification Token in Channel settings
Set the webhook URL to your CliGate instance
Same pairing flow

The Honest "Is This Production Ready?" Answer

No. It's early.

The Agent Runtime is solid for single-session workflows. The Channel Gateway handles Telegram well. Feishu needs more testing.

What's missing:

Multi-turn conversations across long time windows need more state management
File attachments from channels aren't supported yet
Error recovery from crashed sessions could be more graceful

But for the "text your computer to fix a bug" workflow? It works. I use it daily.

What's Your Remote Development Setup?

I'm curious about how others handle this problem:

Do you SSH from your phone?
Do you use VS Code's remote features?
Have you tried controlling AI coding agents remotely?

The idea of "your desktop is a server, your phone is the client" feels like it's going to be a bigger pattern. I'd love to hear how others approach it.

GitHub: github.com/codeking-ai/cligate

CliGate is open-source under AGPL-3.0. Not affiliated with Anthropic, OpenAI, or Google.

"How Do You Manage 4 AI Coding Tools at Once? Here's My Setup"

CodeKing — Thu, 16 Apr 2026 02:08:52 +0000

I didn't plan to use four AI coding tools.

It started with Claude Code. Then Codex CLI dropped, and it was good enough that I had to try it. Then Gemini CLI became free. Then a friend told me about OpenClaw and its custom provider injection.

Before I realized it, I had:

4 different CLIs
3 different API key formats
2 ChatGPT accounts
1 Claude account
An Azure OpenAI endpoint from work
A Gemini API key from a free tier
And a growing dread every time I opened a new terminal tab

Does anyone else live like this?

The Config File Graveyard

Here's what my config situation looked like before I snapped:

Claude Code wanted ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY in my environment.

Codex CLI wanted a ~/.codex/config.toml with chatgpt_base_url and openai_base_url.

Gemini CLI wanted... something patched into its internals.

OpenClaw wanted a ~/.openclaw/openclaw.json with its own provider format.

Four tools. Four config formats. Four places to update when a key expires. And if I wanted to switch which account goes where? Manual surgery.

I tried maintaining this by hand for about two weeks before I lost it.

What I Did Instead

I pointed all four tools at localhost:8081.

That's it. That's the setup.

CliGate is an open-source local gateway that sits between your AI tools and their APIs. Every tool talks to the same address. The gateway figures out who sent the request, what model they need, and which credential to use.

npx cligate@latest start

One command. Dashboard opens. All my accounts and keys live in one place.

The Part That Actually Matters: Routing

Here's where it gets interesting.

I don't want Codex using the same account as Claude Code. Codex hammers the API with rapid-fire completions. Claude Code takes longer, deeper passes. Mixing them on the same account burns through rate limits fast.

So I set up App Routing:

Claude Code → My Claude account (PKCE OAuth, auto-refreshing tokens)
Codex CLI → Azure OpenAI endpoint (fastest, corporate budget)
Gemini CLI → Google Gemini API key (free tier — why pay?)
OpenClaw → Pool fallback (whatever's available)

Each binding has a fallback chain. If Claude's rate-limited, it drops to the API key pool. If Azure is down, Codex falls back to ChatGPT accounts.

Zero manual switching. Zero config file editing.

The Free Model Trick

Not every request needs GPT-5 or Claude Opus.

Quick lookups, small code questions, "what does this error mean" — those can go to free models. CliGate has a toggle that routes fast-tier requests (anything that maps to haiku/mini/lite) to free providers like DeepSeek, Qwen, or MiniMax.

Flip it on. Watch your API costs drop.

Flip it off when you need the heavy models for complex reasoning.

What My Setup Actually Looks Like

┌─────────────┐  ┌───────────┐  ┌────────────┐  ┌──────────┐
│ Claude Code │  │ Codex CLI │  │ Gemini CLI │  │ OpenClaw │
└──────┬──────┘  └─────┬─────┘  └──────┬─────┘  └────┬─────┘
       │               │               │              │
       └───────────────┼───────────────┼──────────────┘
                       ▼
              CliGate (localhost:8081)
                       │
       ┌───────┬───────┼───────┬───────┐
       ▼       ▼       ▼       ▼       ▼
   Anthropic  OpenAI  Azure   Google  Free
     API       API   OpenAI  Gemini  Models

Everything goes through one gateway. The gateway handles:

Protocol translation — Anthropic format, OpenAI format, Gemini format — doesn't matter
Account rotation — Multiple ChatGPT/Claude accounts, round-robin or sticky
Key load balancing — Spreads requests across API keys, routes to least-used first
Token refresh — OAuth tokens auto-refresh and sync back to source tools
Usage tracking — Per-account, per-model, per-day cost breakdown

The One-Click Part

Each CLI tool has a "Configure" button in the dashboard. Click it. Done.

No editing .toml files. No setting environment variables. No patching Gemini's internals manually.

The dashboard also installs tools you don't have yet. Don't have Codex CLI? Click "Install." It detects your OS and handles the rest.

Honest Downsides

It's another process running on your machine (Node.js on port 8081)
Initial setup takes ~5 minutes to add accounts and configure routing
If you only use one AI tool with one API key, this is overkill

But if you're juggling 2+ tools or managing multiple accounts? The time savings compound fast.

So... What's Your Setup?

I genuinely want to know:

How many AI coding tools are you running right now?
Are you managing configs manually or have you built some system?
Has anyone else hit the "too many API keys" wall?

Drop your setup in the comments. I'm curious if I'm the only one who went down this rabbit hole — or if there's a whole community of us doing the same thing.

GitHub: github.com/codeking-ai/cligate

CliGate is open-source under AGPL-3.0. Not affiliated with Anthropic, OpenAI, or Google.

"My Company Has Azure OpenAI. My AI Coding Tools Had No Idea What to Do With It."

CodeKing — Wed, 15 Apr 2026 03:03:11 +0000

My company's Azure OpenAI deployment has been running for eight months. Enterprise-grade security controls, compliance logging, the whole setup. Every team that needs AI API access routes through it.

Every team except the ones using AI coding tools.

Claude Code talks Anthropic protocol. Codex CLI talks OpenAI protocol, but to the public endpoint. Azure OpenAI is a different enough target that just pointing the tools at it doesn't work — and the error messages are not helpful when it silently fails.

What Makes Azure OpenAI Different

If you've only used the direct OpenAI or Anthropic APIs, Azure OpenAI looks similar at first glance. It's still a REST API, still returns completions. But the differences compound quickly when you're trying to make a proxy work:

Endpoint format is different. Instead of api.openai.com, you have a resource-specific URL:

https://your-resource-name.openai.azure.com

Models are replaced by deployments. You don't call gpt-4o. You call a deployment — an instance you created in the Azure portal that points to a model. The deployment name is arbitrary (my-gpt4-deployment, prod-coding-model). Your code has to know it.

API version is required. Every request needs a ?api-version=2024-10-21 query parameter (or similar). Miss it and the request fails with a cryptic error.

JSON Schema rules are stricter. Azure OpenAI's tool definition validation rejects things the direct OpenAI API accepts — $schema, $id, definitions fields, const values. If your tool definitions contain any of these (and Claude Code's do), requests fail silently.

That last one took me an embarrassingly long time to figure out.

The Translation Problem

Claude Code sends requests in Anthropic's Messages API format. Azure OpenAI accepts OpenAI's Responses API format. Between those two surfaces there's:

A message format translation (Anthropic content blocks → OpenAI messages)
Tool definition translation (Anthropic tool schema → Azure-safe OpenAI tool schema)
Response translation back (OpenAI completion → Anthropic-format streaming response)
Schema sanitization that strips the fields Azure rejects and converts const to enum

The sanitization step is the one that actually makes things work. Claude Code includes hosted tool definitions with JSON Schema features that Azure's stricter validator rejects. The proxy strips $schema, $id, $defs, $comment, definitions, and examples fields, and converts const: value to enum: [value] before forwarding. Azure accepts the result.

Setting It Up in CliGate

CliGate now supports Azure OpenAI as a native key type. In the API Keys tab, add a new key and select Azure OpenAI as the provider. You'll fill in four fields:

API Key — your Azure OpenAI resource key from the Azure portal
Base URL — https://your-resource-name.openai.azure.com
Deployment Name — the name you gave your deployment in Azure (e.g. gpt4o-prod)
API Version — e.g. 2024-10-21

Once saved, that key appears in your routing options. You can assign it as the backend for Claude Code, Codex CLI, or the chat UI — or let the router pick it based on priority settings.

From Claude Code's perspective, nothing changes. You're still hitting localhost:8081 with Anthropic credentials. The proxy handles the translation, the schema cleaning, the deployment name injection, and the API version parameter. The response comes back in valid Anthropic streaming format.

Why This Matters for Enterprise Teams

The practical upshot: your AI coding tools now route through your company's Azure deployment.

That means:

Requests flow through your company's network controls and compliance logging
You're not using personal API keys or personal accounts for work
Usage appears in your Azure portal dashboards alongside other company AI usage
The content controls and safety policies your company configured in Azure apply

For teams where "just use the public API with your personal key" isn't an acceptable answer — because it usually isn't on enterprise projects — this closes a gap that's been annoying for a while.

One Thing to Watch

Azure OpenAI deployments have their own rate limits, set at the deployment level in the Azure portal. If you're routing multiple AI coding tools through a single deployment, you can hit those limits quickly during intensive sessions. The proxy handles failover to other keys if you've configured them, but it's worth sizing your deployment quota for the team's expected usage before you roll this out.

The Azure OpenAI provider in CliGate is part of the open-source release: github.com/codeking-ai/cligate

If you're in an enterprise setup and have gotten AI coding tools working through your company's infrastructure — curious how you handled it. Azure, on-prem, something else?

"How I Route claude-sonnet-4-6 to GPT-5 Codex — Without Claude Code Knowing the Difference"

CodeKing — Tue, 14 Apr 2026 02:37:28 +0000

Claude Code always sends claude-sonnet-4-6 in the request body. That string goes to whatever base URL you've configured.

Here's what most people don't realize: that string doesn't have to end up at Anthropic.

It doesn't even have to end up at a Claude model.

The Model Name Is a Routing Hint, Not a Destination

When Claude Code makes a request, it sends something like this:

{
  "model": "claude-sonnet-4-6",
  "messages": [...],
  "stream": true
}

If your ANTHROPIC_BASE_URL points to a local proxy instead of api.anthropic.com, that proxy receives the request first. It can read the model field, and decide what to do with it.

That decision is entirely up to you.

What CliGate Does With It

CliGate is a local proxy that sits at localhost:8081. Every AI coding tool I use — Claude Code, Codex CLI, Gemini CLI — routes through it.

When a request for claude-sonnet-4-6 arrives, CliGate checks its routing table:

claude-sonnet-4-6  →  ChatGPT account pool  →  GPT-5.2 Codex
claude-opus-4-6    →  ChatGPT account pool  →  GPT-5.3 Codex
claude-haiku-4-5   →  Kilo AI (free)        →  DeepSeek R1 / Qwen3

Claude Code asked for claude-sonnet-4-6. What actually handles the request is GPT-5.2 Codex, via a rotating pool of ChatGPT accounts. The response comes back in Anthropic's response format. Claude Code never knows the difference.

Why This Works

The magic is in protocol translation. CliGate translates between Anthropic's Messages API format and OpenAI's Chat Completions format at the proxy layer. Claude Code speaks Anthropic protocol. GPT-5.2 Codex speaks OpenAI protocol. The proxy bridges them invisibly.

From Claude Code's perspective, it sent a request and got back a valid streaming Anthropic response. The model name in the response is echoed back correctly. Everything behaves as expected.

The same logic applies to the haiku model. When Claude Code sends a quick completion request using claude-haiku-4-5, that gets routed to DeepSeek R1 or Qwen3 through Kilo AI — completely free, no API key required. Claude Code sees a streaming Anthropic response and moves on.

Setting This Up

The routing table lives in CliGate's Settings tab. Each model can be mapped to:

A specific ChatGPT account (or the account pool, for automatic rotation)
A Claude account (direct Anthropic protocol, no translation needed)
An API key (OpenAI, Anthropic, Azure, Vertex AI, Gemini, etc.)
The free routing path via Kilo AI

You can also set a Priority Mode for each model: account pool first (free tier), or API key first (more reliable). If the first option fails or is exhausted, the proxy falls back to the next one automatically.

One practical configuration I've settled on:

claude-sonnet-4-6:  ChatGPT account pool  (4 accounts, round-robin)
claude-opus-4-6:    Anthropic API key     (reserved for long context work)
claude-haiku-4-5:   Free routing          (DeepSeek R1 via Kilo AI)

This means the vast majority of my coding requests go through the ChatGPT account pool at no API cost. The Anthropic key only gets touched for heavy reasoning tasks. Haiku requests are free.

The Part That Surprised Me

I expected some quality degradation when routing sonnet requests to GPT-5.2 Codex. For most coding tasks, I didn't notice any.

Code generation, test writing, refactoring, explaining stack traces — these all behaved identically from Claude Code's interface. The model was different. The output quality was comparable. The cost was zero (account pool, no API billing).

The cases where I do notice a difference are long multi-file reasoning tasks, where I've configured the fallback to use the Anthropic API key directly. But those are a small fraction of the total request volume, as the usage stats from yesterday confirmed.

Why This Matters Beyond Cost

The cost savings are real, but that's not the most interesting part.

The more interesting implication is that your AI coding tool no longer locks you into a single provider's ecosystem. You chose Claude Code for its UX and agent loop — not necessarily because Anthropic's API is the only place you want your requests going.

With a proxy routing layer, those are two separate decisions. You can use the tool you like with the backend that makes sense for each request type.

The model name in your config is just a string. Where it goes is up to the routing layer.

CliGate is open source: github.com/codeking-ai/cligate

Curious what routing setups others have tried — are you using a single provider for everything, or have you experimented with mixing backends?

"My AI Coding Tools Were Running Up a Tab I Couldn't See — So I Fixed That"

CodeKing — Mon, 13 Apr 2026 03:16:31 +0000

Three months ago I had four AI coding tools set up: Claude Code, Codex CLI, Gemini CLI, and a chat UI for quick questions. Every month I'd get a bill from Anthropic and a bill from OpenAI and vaguely wonder what I'd actually spent them on.

I had no idea which model was being called when. I didn't know if Claude Code was routing to Sonnet or Opus. I didn't know how many tokens Gemini was burning in the background. I just paid the bill and moved on.

Then I looked at one month's invoice line by line.

The answer was uncomfortable.

The Problem With Opaque AI Billing

When you use AI coding tools directly, the billing is aggregated. You see "claude-sonnet-4-6: 2.4M tokens" but you don't know:

Which tasks generated those tokens (code review? refactors? quick completions?)
Which tool was responsible (Claude Code? your chat UI?)
Whether any of it could have been handled by a cheaper — or free — model

You're essentially flying blind. You optimize what you can measure, and the billing dashboards the providers give you aren't built for developers trying to understand usage at the tool level.

What I Did About It

CliGate is a local proxy I built that sits between your AI coding tools and the upstream APIs. All four tools route through it — one localhost:8081, one place to manage credentials and routing.

That position in the stack turned out to be the perfect place to add cost tracking.

Every request passes through the proxy. The proxy knows: which tool sent it, which model was requested, how many tokens were used (from the response stream), and what each model costs per token. The math is simple. The data is suddenly very visible.

Here's what the usage dashboard looks like after a week of normal coding work:

Provider breakdown (this week)
──────────────────────────────────────────
Anthropic API          $4.82   68%
ChatGPT Account         $0.00    0%   ← account pool, no API cost
Free (Kilo AI)          $0.00    0%   ← routed to DeepSeek/Qwen
OpenAI API              $2.27   32%
──────────────────────────────────────────
Total                   $7.09

Model breakdown told an even more interesting story:

claude-sonnet-4-6       $4.21   59%
claude-haiku-4-5        $0.00    0%   ← free routing active
gpt-4o                  $1.89   27%
codex-mini              $0.38    5%

The haiku line at zero was the thing that made me stop and think.

The Bit I Didn't Expect: Some Models Are Just Free

CliGate has a feature called free model routing. When a request comes in for claude-haiku-4-5, instead of forwarding it to Anthropic, the proxy routes it to a free model — DeepSeek R1, Qwen3, MiniMax, whatever you've configured — via Kilo AI. No API key needed.

I turned this on almost as an experiment. But looking at the usage stats a week later: every quick question, every short completion, every "what does this function do" — all of that had been handled for free. The expensive Sonnet calls were left for the work that actually needed it.

That split happened automatically. I didn't have to think about it.

You can change which free model handles haiku requests from the Settings tab. I've been rotating between DeepSeek R1 and Qwen3 depending on the task type — DeepSeek for reasoning-heavy work, Qwen3 for code generation.

The Details That Actually Changed My Behavior

Per-account tracking. I have multiple Claude accounts in the pool. The usage stats break down by account, so I can see if one account is hitting its quota faster than others and rebalance.

Daily and monthly views. You can toggle between a daily sparkline and a monthly total. The daily view is where you catch the outliers — that one afternoon you had three long Claude Code sessions refactoring a module shows up as a spike and explains why a particular week cost more.

Pricing registry. Every model's per-token price is configurable. When OpenAI changes pricing (which happens), you can update it in the dashboard without touching any config files. You can also add manual overrides for models that aren't in the default list.

Cost per request in the logs. The request log view shows cost alongside each request. If something seems expensive, you can pull up the exact prompt, response, token count, and cost in one place.

What This Changed Practically

I now route claude-haiku tasks through free models by default, and I've set up app-level routing so my quick chat window (the thing I use for "hey what's this error") hits the free path while Claude Code gets the full Sonnet model.

My monthly AI tool spend dropped roughly 40% without changing how I actually work.

The bigger change is more subtle: I stopped treating AI API costs as a fixed overhead I couldn't influence. Once you can see the breakdown, you start making different decisions about which model to reach for.

If you're running multiple AI coding tools and paying per-token for all of them, it's worth spending 10 minutes to actually look at where the spend goes. The answer might be more improvable than you'd expect.

CliGate is free and open source: github.com/codeking-ai/cligate

What does your current AI tool spend look like? Are you tracking it at all, or just paying the bill?

"I Pointed Claude Code at My Local Ollama Models — Here's the 3-Minute Setup"

CodeKing — Fri, 10 Apr 2026 07:35:13 +0000

My API bill last month had a line I couldn't ignore.

Not the expensive reasoning tasks — those I expected. It was the small stuff. The "what does this error mean" questions. The quick refactors. The five-line test I asked Claude Code to write at 11pm. A thousand tiny requests, all billed like they mattered.

Meanwhile, I had Ollama running on my machine with qwen2.5-coder loaded. Fast. Free. Already sitting there.

The problem was that my CLI tools had no idea it existed.

The Wiring Problem

Claude Code speaks Anthropic's protocol. Codex CLI speaks OpenAI's. Gemini CLI speaks Google's. And Ollama? It speaks its own thing — but it also exposes an OpenAI-compatible endpoint at http://localhost:11434.

So the question isn't "can Ollama do this" — it clearly can. The question is: how do you get your tools to talk to it without rewriting your entire config every time you switch between local and cloud?

That's what I spent the last week solving, and I've now shipped it as part of CliGate.

How It Works

CliGate is a local proxy that already handles routing Claude Code, Codex CLI, and Gemini CLI to cloud providers. The new local model support adds Ollama as a first-class routing target alongside OpenAI, Anthropic, and Google.

When local model routing is enabled, CliGate intercepts requests from your CLI tools and — depending on your config — sends them to Ollama instead of the cloud. Protocol translation happens in the proxy layer: Claude Code's Anthropic-formatted request gets adapted to whatever Ollama expects, the response gets adapted back.

Your tool never knows the difference.

The 3-Minute Setup

Step 1 — Make sure Ollama is running with a model

ollama run qwen2.5-coder:7b

Or any model you prefer. CliGate auto-discovers whatever's loaded.

# Verify Ollama is accessible
curl http://localhost:11434/api/version
# {"version":"0.6.x"}

Step 2 — Start CliGate

npx cligate@latest start

Dashboard opens at http://localhost:8081.

Step 3 — Add your Ollama instance

Go to Settings → Local Models. Add your Ollama URL:

http://localhost:11434

CliGate runs a health check and then fetches your model list via /v1/models. You'll see your loaded models appear automatically — no manual entry.

Step 4 — Enable local routing

Toggle on "Local Model Routing". At this point, any request that would normally go to a cloud provider will check local models first.

You can also configure this per-app. For example:

Claude Code → qwen2.5-coder:7b (your local coding model)
Codex CLI → cloud (when you need the full thing)
Gemini CLI → cloud

That's it. No ANTHROPIC_BASE_URL juggling. No re-exporting env vars. One dashboard toggle.

Step 5 — Test it

Go to the Chat tab, pick "Local Model" as the source, and send a message. If it comes back, the routing is working. Then go to your terminal and use Claude Code normally — the proxy handles the rest.

# Claude Code is already pointed at CliGate from the one-click setup
claude "explain what this function does"
# → routes to your local Ollama model

The Part That Surprised Me

I expected the basic routing to be the hard part. It wasn't.

The interesting problem was streaming. Claude Code expects streaming responses in Anthropic's SSE format. Ollama streams in its own format. Getting those two to handshake correctly without garbling the output took longer than everything else combined.

The solution is a dedicated SSE bridge in the proxy layer that reads Ollama's stream chunk-by-chunk and re-emits it in the format the requesting tool expects. Claude Code sees a normal Anthropic streaming response. It never touches Ollama directly.

Claude Code
  └─→ POST /v1/messages (Anthropic format, streaming)
        └─→ CliGate proxy
              └─→ detects: local routing enabled
              └─→ sends to Ollama /v1/chat/completions
              └─→ re-streams response as Anthropic SSE
        ←─ Claude Code receives: normal streaming response

Same pattern for Codex CLI (OpenAI Responses format) and any other tool you route through the proxy.

What This Is Actually Good For

I'm not suggesting you replace GPT-4 or Claude Sonnet with a local 7B model. There's a real capability difference.

But a lot of what I actually use Claude Code for in a normal day doesn't need the best model:

"What does this stacktrace mean?"
"Generate a unit test for this function"
"Rename these variables to be more descriptive"
"Does this SQL query look right?"

For tasks like these, qwen2.5-coder:7b is fast, accurate enough, and free. Saving the cloud calls for the harder problems — complex refactors, architecture questions, multi-file changes — drops my monthly API bill significantly without changing my workflow.

The toggle in CliGate makes it easy to switch back when you need to.

What's Your Local Model Setup?

Are you running Ollama (or LM Studio, or anything else) for coding tasks? I'm curious what models people are finding useful for day-to-day dev work — especially anything that runs well on a laptop.

GitHub: github.com/codeking-ai/cligate

npx cligate@latest start

"CliGate Now Has a Built-in AI Assistant That Can Configure Your Proxy For You"

CodeKing — Fri, 10 Apr 2026 07:08:13 +0000

Most local dev tools give you a config file and a README. If something breaks, you're on your own.

CliGate just shipped something different: a built-in AI assistant that lives inside the dashboard, understands the product, and can actually do things for you.

What Is CliGate Again?

CliGate is an open-source local proxy that sits between your AI coding tools and their APIs. You point Claude Code, Codex CLI, Gemini CLI, and OpenClaw at localhost:8081 — and CliGate handles routing, account pooling, protocol translation, and failover.

npx cligate@latest start

The dashboard opens at http://localhost:8081.

The New Chat Page

There's now a Chat tab in the dashboard.

On the surface it looks like a chat interface — and it is. You pick a credential source (a ChatGPT account, Claude account, or any API key you've added), choose a model, optionally set a system prompt, and start chatting. It's a useful testing surface for verifying that your credentials actually work before routing real CLI traffic through them.

But that's the boring part.

The Product Assistant Mode

Here's where it gets interesting.

Toggle on Product Assistant, and the chat behavior changes.

The assistant now has the full CliGate product manual loaded into its context. Ask it things like:

"How do I configure Codex CLI to use my Azure key?"
"What's the difference between Account Pool First and API Key First routing?"
"How do I enable free model routing?"

And it answers with actual, accurate information about this specific product — not generic AI hand-waving.

This is useful. CliGate has a lot of moving parts: multiple protocols, multiple account types, routing modes, model mapping, Gemini patching, free model fallback. Having an assistant that knows the system well enough to answer specific setup questions in plain language removes a lot of friction for new users.

The Action Mode: Chat That Does Things

This is the part that surprised me.

Type something like:

"Set up Claude Code to use the proxy"

And the assistant doesn't just tell you how. It shows you a confirmation card:

Enable Claude Code Proxy
Configure Claude Code to use the local proxy at http://localhost:8081.
[ Confirm ]

Click Confirm, and it actually writes the configuration to your Claude Code credentials — switching it to proxy mode, pointing it at localhost:8081, and mapping the model aliases.

Same in reverse: ask it to disable the proxy, and it confirms before restoring direct mode.

The token-based confirm step isn't just UX polish. It's a deliberate safety gate. The action token expires in 10 minutes. Nothing changes without your explicit confirmation. The assistant proposes, you approve, the action executes.

Why This Matters More Than It Looks

Most AI tools have chat interfaces.

Very few of them are product-aware assistants that can actually modify their own configuration on your behalf.

The gap between "I know how to fix this" and "I have just fixed this" is where most tool friction lives. CliGate's assistant collapses that gap for the most common setup operations — at least for Claude Code proxy toggle right now, with more actions likely coming.

The language support is also worth noting: the assistant detects whether you're asking in English or Chinese and responds accordingly. The intent detection and tool pattern matching work across both languages.

The Practical Loop

Here's what the workflow looks like now for a new user:

npx cligate@latest start
Open http://localhost:8081
Add an account or API key in the Accounts / API Keys tab
Go to Chat → enable Product Assistant
Ask: "How do I set up Claude Code to use this proxy?"
The assistant explains it, then offers to do it for you
Click Confirm
Done

That's a pretty clean onboarding path for what used to require navigating Settings, reading docs, and manually editing config files.

I Stopped Paying for AI CLI Chaos: This Local Gateway Makes Claude Code, Codex, and Gemini Work as One

CodeKing — Thu, 09 Apr 2026 07:08:09 +0000

If you are juggling Claude Code, Codex CLI, Gemini CLI, and random API keys across different providers, the setup gets ugly fast.

Different protocols. Different auth flows. Different config files. Different model names. Different rate limits.

So I built CliGate: a local multi-protocol AI gateway that sits on localhost and turns that mess into one controllable entry point.

Instead of wiring every tool separately, you point them at CliGate once and get:

multi-account pooling
API key failover
protocol translation
app-level routing
free-model fallback
a visual dashboard for everything

What CliGate actually does

CliGate is an open-source local proxy for AI coding tools and model APIs.

It currently supports:

Claude Code via Anthropic Messages API
Codex CLI via OpenAI Responses API, Chat Completions, and the Codex internal endpoint
Gemini CLI via Gemini API compatibility
OpenClaw via provider injection

That means one local service can sit between your tools and multiple upstream providers like OpenAI, Anthropic, Google Gemini, Vertex AI, and even free-model routes.

The real problem it solves

Most people do not have one clean AI stack.

They have:

a few accounts with different limits
some paid API keys
a CLI tool that only speaks one protocol
another tool that expects a completely different endpoint
no decent visibility into cost, usage, or failures

CliGate fixes that by separating the client protocol from the upstream provider.

Your tool can keep speaking the protocol it expects, while CliGate decides where the request should actually go.

The killer features

1. One gateway for multiple AI coding tools

You can run Claude Code, Codex CLI, Gemini CLI, and OpenClaw through the same local server.

No more maintaining a fragile pile of per-tool environment variables and scattered config files.

2. Account pools, not just API keys

CliGate is not just another API proxy.

It can manage:

ChatGPT account pools
Claude account pools
Antigravity accounts
provider API key pools

It supports OAuth login, token refresh, rotation strategies, quota tracking, and per-account management from the dashboard.

3. Smart routing instead of manual switching

You can choose:

account-pool-first
API-key-first
automatic routing
manual app assignment

So Claude Code can use one credential source, Codex can use another, and fallback behavior stays under your control.

4. Free-model routing for cheap or zero-cost workflows

One of my favorite parts is the ability to route lightweight requests such as claude-haiku to free models through Kilo AI.

That gives you a practical low-cost path for lightweight coding, testing, and background tasks without burning premium quota for everything.

5. A real dashboard instead of blind debugging

CliGate ships with a web UI where you can manage:

accounts
API keys
model mapping
per-app routing
request logs
usage and cost stats
pricing overrides
local tool installation and one-click configuration

This matters because most proxy tools become painful the moment you need to debug token expiry, failed routing, or mismatched models.

Why I think the protocol translation matters

This is the part that makes CliGate more than a credential switcher.

It exposes compatible endpoints for:

POST /v1/messages
POST /v1/chat/completions
POST /v1/responses
POST /backend-api/codex/responses
POST /v1beta/models/*

So tools that were never designed to share the same backend can still be managed through one local layer.

That unlocks a cleaner workflow:

Keep your preferred client.
Route it however you want.
Change upstream providers without rebuilding your whole local setup.

Local-first is the point

CliGate runs on localhost.

That means:

no third-party relay server
no hosted control plane
no forced telemetry layer
direct connections to official upstream APIs

For people who care about privacy, local control, or just not introducing another external dependency into their dev workflow, this is the right architecture.

Quick start

You can start it with:

npx cligate@latest start

Or install globally:

npm install -g cligate
cligate start

Then open:

http://localhost:8081

From there you can add accounts or API keys, map models, and configure your CLI tools to hit the local gateway.

Who this is for

CliGate is especially useful if you:

use more than one AI coding CLI
switch between Claude, OpenAI, Gemini, and other providers
want fallback behavior when limits or keys fail
want usage visibility across accounts and models
want a local control plane instead of ad hoc shell config

Repo

GitHub: https://github.com/codeking-ai/cligate

If you are building serious local AI coding workflows, this project is designed to remove a surprising amount of friction.

It is the difference between “a pile of disconnected AI tools” and “one local gateway that actually behaves like infrastructure.”