I Built “llmglot” an LLM API Translation Proxy That Connects Claude Code, Codex, and Gemini
In AI development, people are increasingly struggling more with API specification differences than with the models themselves.
Claude Code uses the Anthropic Messages API. OpenAI-based tools use the Chat Completions API or the Responses API. Gemini has its own proprietary API, and local LLM environments such as Ollama and LM Studio provide OpenAI-compatible APIs.
Even when the model you want exists, you may not be able to connect to it because the API specifications differ. You need to rewrite SDKs, implement translation layers, and separate configurations for each environment.
llmglot is the tool that solves these problems.
GitHub: https://github.com/Himeyama/llmglot
What is llmglot?
llmglot is a proxy server that converts multiple LLM APIs back and forth.
It accepts Anthropic Messages API, OpenAI Responses API, OpenAI Chat Completions API, Gemini API, and more, then appropriately converts and forwards them to the upstream API.
In other words, even if the API specification required by the client does not match the API specification of the model you actually want to use, you can still use it.
Supported Scope
On the client side, it supports:
- Claude Code
- Anthropic SDK
- OpenAI SDK
- Responses API clients
- Gemini SDK
On the upstream side, it can connect to:
- Ollama
- LM Studio
- OpenAI
- Gemini
- OpenRouter
- Azure OpenAI
- vLLM
For example, it enables connections such as:
- Claude Code → Ollama
- OpenAI SDK → Gemini
- Responses API → Chat Completions API
Connecting Claude Code to a Local LLM
One use case for llmglot is connecting Claude Code to a local LLM.
For example, if you are using Ollama, start it with:
CHAT_BASE_URL=http://localhost:11434/v1 llmglot
Then simply change Claude Code’s connection target to llmglot, and you can use a local LLM from Claude Code.
Being able to try a code agent without using an expensive API is extremely appealing.
(Note: Ollama actually supports the Messages API, so a proxy is not really necessary.)
Supports the Responses API Too
Recently, more and more tools have been using the Responses API.
llmglot implements the /v1/responses endpoint and supports not only HTTP but also WebSocket.
That makes it easy to connect with Responses API-based clients such as the Codex CLI.
Useful Logging Features
llmglot includes a log display feature.
The information you can check includes:
- Model name
- Provider
- Input token count
- Output token count
- Cache usage
- Estimated cost
- Generation speed
In environments where multiple LLMs are used, the benefit of centrally managing usage information is significant.
Differences from LiteLLM
At this point, the question is: how is this different from LiteLLM?
Both are software that sits between LLMs, but the direction they aim for is very different.
LiteLLM’s Approach
LiteLLM unifies many LLM providers into OpenAI format.
In other words:
App
↓
LiteLLM
↓
Each LLM provider
You could say it is a mechanism for application developers to use a unified API.
llmglot’s Approach
llmglot, on the other hand, is structured like this:
Claude Code
Codex CLI
Gemini SDK
↓
llmglot
↓
Each LLM provider
Its goal is to translate the client-side API specification.
Comparing with Claude Code
Claude Code uses the Anthropic Messages API.
LiteLLM is basically centered on OpenAI-compatible APIs, so connecting Claude Code directly is difficult.
llmglot, on the other hand, accepts the Anthropic Messages API directly.
That means the following configuration is possible:
Claude Code
↓
llmglot
↓
Ollama
This is a very major feature of llmglot.
Which Should You Choose?
LiteLLM is a good fit for:
- People developing AI applications
- People who want to use the OpenAI SDK as a unified interface
- People who want load balancing or fallback
llmglot is a good fit for:
- People who want to run Claude Code with a different model
- People who want to use local LLMs from existing clients
- People who want to convert the Responses API to another API
- People who do not want to rewrite SDKs
Conclusion
In the LLM world, API differences are becoming a bigger problem than the models themselves.
You want to use Claude Code, but with Ollama. You want to use the OpenAI SDK, but with Gemini. Demands like these will likely continue to grow.
llmglot sits in the middle and absorbs the differences in API specifications.
- Claude Code → Ollama
- OpenAI SDK → Gemini
- Responses API → Chat Completions API
- Gemini API → OpenAI-compatible API
If LiteLLM is a “unified API for developers,” llmglot could be described as a “client-compatible proxy.”