Inference proxy
The proxy is the inference entry point. It mirrors the upstream OpenAI/Anthropic API surface, with your token in the URL path.
https://api.usepod.ai/proxy/<token>/... # Anthropic-compatiblehttps://api.usepod.ai/proxy/<token>/v1/... # OpenAI-compatibleCommon endpoints behind the proxy:
| Endpoint | Surface |
|---|---|
/v1/chat/completions | OpenAI chat completions (streaming and non-streaming) |
/v1/messages | Anthropic messages |
/v1/models | Model listing |
Request headers
Section titled “Request headers”| Header | Required | Meaning |
|---|---|---|
Content-Type: application/json | yes | Standard JSON body |
X-Pod-Max-Price-Input | no | Max price per million input tokens (USDC microunits) |
X-Pod-Max-Price-Output | no | Max price per million output tokens (USDC microunits) |
The Authorization / api_key your SDK sends is ignored — auth is the token in
the path. See Spend controls for the price headers.
Response headers
Section titled “Response headers”| Header | Meaning |
|---|---|
X-Balance-Remaining | Remaining token balance after this request |
X-Pod-Route | Which source served the request: marketplace, BYOK relay, or centralized |
Errors
Section titled “Errors”- A token with no balance is rejected before any upstream call.
- An unknown token returns
401. - In marketplace-only mode, no provider at your price returns a dedicated no-provider result rather than billing at a higher price.
Streaming responses are relayed byte-for-byte from the selected source; token usage is extracted from the stream for settlement.