--rate-limit, --wait, and --manual — to slow down or gate requests before they reach Copilot’s servers. You can combine these flags with smallModel routing in config.json to further reduce how often premium quota is consumed.
Rate limiting flags
--rate-limit <seconds>
The --rate-limit flag sets a minimum number of seconds that must pass between requests. If a new request arrives before the cooldown expires, the proxy rejects it with an error.
--wait
Add --wait alongside --rate-limit to make the proxy hold the request in a queue until the cooldown expires instead of returning an error. This is useful when your client does not retry automatically on rate limit errors.
--manual
The --manual flag pauses every request and prompts you in the terminal to approve or deny it before it is forwarded to Copilot. This gives you complete, per-request control over what gets sent.
--manual is most useful during debugging or when you want to inspect every request interactively. It is not practical for long-running sessions where you step away from the terminal.Example commands
Reducing premium request consumption
Beyond pacing requests, you can configure Copilot API to route certain low-value requests to a cheaper model so they do not spend your premium quota.smallModel — Set this in config.json to a free or cheaper model (default: gpt-5-mini). The proxy uses smallModel for tool-less warmup probes, compact/background requests, and other housekeeping turns that do not need a premium model.
compactUseSmallModel — When true (the default), detected compact requests from Claude Code or OpenCode are automatically routed to smallModel. Set it to false only if you want compact requests to use the full model.
config.json
Premium request optimizations — including
smallModel routing and compact detection — are covered in more detail in the configuration reference.