Control request rate limits in Copilot API

GitHub Copilot monitors usage patterns and can flag accounts that send rapid or bulk automated requests. Copilot API gives you three flags — --rate-limit, --wait, and --manual — to slow down or gate requests before they reach Copilot’s servers. You can combine these flags with smallModel routing in config.json to further reduce how often premium quota is consumed.

GitHub’s Acceptable Use Policies prohibit excessive automated or bulk activity. Rapid automated use of Copilot may trigger abuse-detection systems and result in a temporary suspension of your Copilot access. Review GitHub Acceptable Use Policies and GitHub Copilot Terms before running automated workloads.

Rate limiting flags

`--rate-limit <seconds>`

The --rate-limit flag sets a minimum number of seconds that must pass between requests. If a new request arrives before the cooldown expires, the proxy rejects it with an error.

npx @nick3/copilot-api@latest start --rate-limit 30

This enforces at least 30 seconds between each request sent upstream to Copilot.

`--wait`

Add --wait alongside --rate-limit to make the proxy hold the request in a queue until the cooldown expires instead of returning an error. This is useful when your client does not retry automatically on rate limit errors.

npx @nick3/copilot-api@latest start --rate-limit 30 --wait

Use --wait when you want smooth operation without error-handling logic in your client. Use --rate-limit alone (without --wait) when you want the client to receive an error and decide whether to retry.

`--manual`

The --manual flag pauses every request and prompts you in the terminal to approve or deny it before it is forwarded to Copilot. This gives you complete, per-request control over what gets sent.

npx @nick3/copilot-api@latest start --manual

--manual is most useful during debugging or when you want to inspect every request interactively. It is not practical for long-running sessions where you step away from the terminal.

Example commands

npx @nick3/copilot-api@latest start --rate-limit 30

Reducing premium request consumption

Beyond pacing requests, you can configure Copilot API to route certain low-value requests to a cheaper model so they do not spend your premium quota. smallModel — Set this in config.json to a free or cheaper model (default: gpt-5-mini). The proxy uses smallModel for tool-less warmup probes, compact/background requests, and other housekeeping turns that do not need a premium model. compactUseSmallModel — When true (the default), detected compact requests from Claude Code or OpenCode are automatically routed to smallModel. Set it to false only if you want compact requests to use the full model.

config.json

{
  "smallModel": "gpt-5-mini",
  "compactUseSmallModel": true
}

Premium request optimizations — including smallModel routing and compact detection — are covered in more detail in the configuration reference.

​Rate limiting flags

​--rate-limit <seconds>

​--wait

​--manual

​Example commands

​Reducing premium request consumption

Rate limiting flags

`--rate-limit <seconds>`

`--wait`

`--manual`

Example commands

Reducing premium request consumption