Skip to main content

One post tagged with "LLM"

View all tags

I Built “llmglot” an LLM API Translation Proxy That Connects Claude Code, Codex, and Gemini

In AI development, people are increasingly struggling more with API specification differences than with the models themselves.

Claude Code uses the Anthropic Messages API. OpenAI-based tools use the Chat Completions API or the Responses API. Gemini has its own proprietary API, and local LLM environments such as Ollama and LM Studio provide OpenAI-compatible APIs.

Even when the model you want exists, you may not be able to connect to it because the API specifications differ. You need to rewrite SDKs, implement translation layers, and separate configurations for each environment.

llmglot is the tool that solves these problems.

GitHub: https://github.com/Himeyama/llmglot

What is llmglot?

llmglot is a proxy server that converts multiple LLM APIs back and forth.

It accepts Anthropic Messages API, OpenAI Responses API, OpenAI Chat Completions API, Gemini API, and more, then appropriately converts and forwards them to the upstream API.

In other words, even if the API specification required by the client does not match the API specification of the model you actually want to use, you can still use it.

Supported Scope

On the client side, it supports:

  • Claude Code
  • Anthropic SDK
  • OpenAI SDK
  • Responses API clients
  • Gemini SDK

On the upstream side, it can connect to:

  • Ollama
  • LM Studio
  • OpenAI
  • Gemini
  • OpenRouter
  • Azure OpenAI
  • vLLM

For example, it enables connections such as:

  • Claude Code → Ollama
  • OpenAI SDK → Gemini
  • Responses API → Chat Completions API

Connecting Claude Code to a Local LLM

One use case for llmglot is connecting Claude Code to a local LLM.

For example, if you are using Ollama, start it with:

CHAT_BASE_URL=http://localhost:11434/v1 llmglot

Then simply change Claude Code’s connection target to llmglot, and you can use a local LLM from Claude Code.

Being able to try a code agent without using an expensive API is extremely appealing.

(Note: Ollama actually supports the Messages API, so a proxy is not really necessary.)

Supports the Responses API Too

Recently, more and more tools have been using the Responses API.

llmglot implements the /v1/responses endpoint and supports not only HTTP but also WebSocket.

That makes it easy to connect with Responses API-based clients such as the Codex CLI.

Useful Logging Features

llmglot includes a log display feature.

The information you can check includes:

  • Model name
  • Provider
  • Input token count
  • Output token count
  • Cache usage
  • Estimated cost
  • Generation speed

In environments where multiple LLMs are used, the benefit of centrally managing usage information is significant.

Differences from LiteLLM

At this point, the question is: how is this different from LiteLLM?

Both are software that sits between LLMs, but the direction they aim for is very different.

LiteLLM’s Approach

LiteLLM unifies many LLM providers into OpenAI format.

In other words:

App

LiteLLM

Each LLM provider

You could say it is a mechanism for application developers to use a unified API.

llmglot’s Approach

llmglot, on the other hand, is structured like this:

Claude Code
Codex CLI
Gemini SDK

llmglot

Each LLM provider

Its goal is to translate the client-side API specification.

Comparing with Claude Code

Claude Code uses the Anthropic Messages API.

LiteLLM is basically centered on OpenAI-compatible APIs, so connecting Claude Code directly is difficult.

llmglot, on the other hand, accepts the Anthropic Messages API directly.

That means the following configuration is possible:

Claude Code

llmglot

Ollama

This is a very major feature of llmglot.

Which Should You Choose?

LiteLLM is a good fit for:

  • People developing AI applications
  • People who want to use the OpenAI SDK as a unified interface
  • People who want load balancing or fallback

llmglot is a good fit for:

  • People who want to run Claude Code with a different model
  • People who want to use local LLMs from existing clients
  • People who want to convert the Responses API to another API
  • People who do not want to rewrite SDKs

Conclusion

In the LLM world, API differences are becoming a bigger problem than the models themselves.

You want to use Claude Code, but with Ollama. You want to use the OpenAI SDK, but with Gemini. Demands like these will likely continue to grow.

llmglot sits in the middle and absorbs the differences in API specifications.

  • Claude Code → Ollama
  • OpenAI SDK → Gemini
  • Responses API → Chat Completions API
  • Gemini API → OpenAI-compatible API

If LiteLLM is a “unified API for developers,” llmglot could be described as a “client-compatible proxy.”