Lower cost
Centralized purchasing can reduce marginal cost
Explainer
An AI relay, also called an API proxy or LLM gateway, is essentially a distribution layer that buys upstream model access in bulk and resells it through a unified interface.
It mainly solves payment friction, network access issues, and the integration cost of using multiple models.
Lower cost
Centralized purchasing can reduce marginal cost
Easier payment
Often supports more local payment methods
Lower network friction
Multi-region access can be more stable
Convenient
One entry point for many models
Full quality
The relay passes through official-grade model resources with stronger stability.
Capped / Nerfed
Capabilities are restricted, often in speed, quota, subscription, or routing policy.
Crash / Rug
The site fails abruptly, credits disappear, or the operator shuts down.
Shared ride
Multiple people split one account or plan, which is cheaper but much riskier.
Key pool
The platform rotates across multiple API keys to spread traffic and reduce rate limits.
Degraded
Users feel the model became weaker, often because requests are routed to a cheaper model.
Bait and switch
The platform charges for a premium model but forwards requests to a cheaper one.
Rate limit
Official RPM and TPM caps. Going over them often leads to 429 responses.
Official direct
(Official Channel)
Direct access to the official API. Highest stability, but usually higher cost and barrier.
Cloud vendor
(Cloud Vendor)
Access through Azure, AWS, and similar platforms, often for enterprise scenarios.
Relay / Proxy
(API Relay / Proxy)
The most common form. Aggregates multiple upstreams with uneven quality.
Reverse engineered
(Reverse Engineering)
Turns web traffic into APIs. Cheap but very unstable.
Subscription to API
(Sub2API)
Converts subscriptions like ChatGPT Plus into API access. Highest risk.
Base URL
The target request address and usually the first thing you change when integrating.
OpenAI-compatible format
Many platforms support it, so integration often only changes the base URL and model name.
Token
The unit used for processing and billing, often split between input and output.
Context window
Defines how much content the model can process at once.
Streaming
Returns text incrementally with SSE for a better interactive experience.
Temperature
Controls randomness: lower is steadier, higher is more diverse.
System prompt
Defines the model role and may also carry provider-side restrictions.