Rate Limiting - Gumnut AI

REST API requests are rate limited using a weighted token bucket to ensure fair usage and platform stability.

How It Works

Each user has a token bucket with a fixed capacity that refills at a steady rate. Every API request consumes tokens based on its cost — heavier operations like uploads consume more tokens than simple reads.

Parameter	Value
Bucket capacity	400 tokens
Refill rate	100 tokens per second

Request Costs

Different operations consume different amounts of tokens:

Operation	Cost	Examples
Metadata	1	Get asset details, create album, update person
List / Search	5	List assets, search albums
Thumbnail	10	Download a thumbnail
Upload / Download	20	Upload an asset, download original file

MCP Tool Calls

If you use the MCP server, tool calls follow the same weighted cost model as the REST API. Each tool call is charged once using the cost of the closest REST operation. For example:

Search and list tools use the list/search cost
Write tools use the corresponding mutation cost
view_asset counts like a metadata read rather than a full file download

Rate Limit Headers

Rate limit information is included in every response:

X-RateLimit-Limit: 400
X-RateLimit-Remaining: 375
X-RateLimit-Cost: 5

Handling Rate Limits

When rate limited, you’ll receive a 429 response with a Retry-After header indicating how many seconds to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 3
X-RateLimit-Limit: 400
X-RateLimit-Remaining: 0
X-RateLimit-Cost: 20

Respect the Retry-After value before retrying:

async function requestWithBackoff(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status !== 429) return response;

    const retryAfterSec = parseInt(response.headers.get('Retry-After'), 10) || 0;
    const backoffSec = Math.pow(2, attempt);
    const waitSec = Math.max(retryAfterSec, backoffSec);
    await new Promise(resolve => setTimeout(resolve, waitSec * 1000));
  }
  throw new Error('Max retries exceeded');
}

If you’re using one of our SDKs, automatic retries with exponential backoff are built in.

Best Practices

Monitor X-RateLimit-Remaining to proactively throttle requests before hitting limits
Batch operations when possible to reduce total token consumption
Cache responses to avoid redundant requests

​How It Works

​Request Costs

​MCP Tool Calls

​Rate Limit Headers

​Handling Rate Limits

​Best Practices

How It Works

Request Costs

MCP Tool Calls

Rate Limit Headers

Handling Rate Limits

Best Practices