How it works
Every inference request flows through the Use Pod gateway, which authenticates the caller, checks balance, matches a provider, relays the response, and settles the bill asynchronously.
The request path
Section titled “The request path”- Auth. Your token is resolved (Redis cache, with a Postgres fallback).
- Balance check. The token must have a positive balance.
- Rate limit. Per-token requests-per-minute and a concurrency cap apply.
- Marketplace match. Active providers for the requested model are loaded,
their prices capped at the cheapest centralized price, filtered by your price
ceiling and provider health, then sorted by price.
- Marketplace candidate found → dispatched over an outbound WebSocket to a provider agent, which calls its local backend and streams bytes back.
- BYOK relay candidate found → the gateway decrypts the operator’s stored key and forwards over HTTPS to the upstream.
- Otherwise → falls through to the centralized router (the always-on tier-zero fallback).
- Relay. The response is streamed back to you byte-for-byte, with
X-Balance-RemainingandX-Pod-Routeheaders. - Settlement (async). Token usage is extracted from the stream and recorded; a background worker debits your balance and, for marketplace routes, credits the provider.
The two sides
Section titled “The two sides”- Demand side. You hold a token with a USDC balance and send standard API requests. See Using Use Pod.
- Supply side. Operators run the provider agent (or enroll a BYOK relay), advertise models and prices, and earn 80% of every settled inference. See Running a provider.
Settlement and fees
Section titled “Settlement and fees”Marketplace routes split each settled inference 80% to the provider, 20% to the treasury. Users are billed at the operator’s listed price, which is capped at the cheapest centralized price for that model — so the marketplace never costs more than the fallback.
Learn more in Routing & matching and Pricing.