Configuring Quotas
Learn how to configure and manage quotas to control resource usage in FoundationaLLM.
Overview
Quotas define limits on resource usage to:
- Prevent abuse: Stop runaway usage or malicious activity
- Manage costs: Control AI model token consumption
- Ensure fairness: Distribute resources among users
- Maintain performance: Prevent system overload
Quota Types
API Raw Request Rate
Limits the number of raw API calls made to FoundationaLLM services.
| Metric | Description |
|---|---|
api_requests |
Total API calls |
| Period | Minute, Hour, Day |
Agent Request Rate
Limits the number of agent completion requests (conversations with AI).
| Metric | Description |
|---|---|
agent_requests |
Agent completion calls |
| Period | Minute, Hour, Day |
Quota Configuration
TODO: Document the UI for quota configuration in the Management Portal if available. Currently, quotas are configured via App Configuration.
Via Azure App Configuration
Quota definitions are stored in Azure App Configuration and the storage account.
Quota Definition Structure
{
"name": "api-rate-limit",
"metric": "api_requests",
"limit": 100,
"period": "minute",
"partitioning": "user_principal_name"
}
| Field | Description |
|---|---|
name |
Unique quota identifier |
metric |
What to measure (api_requests, agent_requests) |
limit |
Maximum allowed within period |
period |
Time window (minute, hour, day) |
partitioning |
How to segment limits (optional) |
Partitioning Options
Partitioning determines how quotas are applied:
| Partitioning | Description |
|---|---|
| None | Global limit shared by all users |
user_identifier |
Per-user by internal user ID |
user_principal_name |
Per-user by Azure AD UPN/email |
Configuration Examples
Global Rate Limit
100 API requests per minute for all users combined:
{
"name": "global-api-limit",
"metric": "api_requests",
"limit": 100,
"period": "minute"
}
Per-User Rate Limit
50 requests per user per minute:
{
"name": "user-api-limit",
"metric": "api_requests",
"limit": 50,
"period": "minute",
"partitioning": "user_principal_name"
}
Daily Agent Request Limit
1000 agent requests per user per day:
{
"name": "daily-agent-limit",
"metric": "agent_requests",
"limit": 1000,
"period": "day",
"partitioning": "user_principal_name"
}
Applying Quotas
Instance-Level Quotas
Apply to the entire FoundationaLLM instance.
Agent-Level Quotas
TODO: Document agent-level quota configuration if available.
Monitoring Quota Usage
In the Chat User Portal
- Token consumption is displayed per message (if enabled)
- Users can see their current usage
In Azure Monitor
- Review API metrics and logs
- Set up alerts for quota threshold warnings
In the Management Portal
TODO: Document quota monitoring dashboards in the Management Portal if available.
Quota Exceeded Behavior
When a quota is exceeded:
- API Response: Returns HTTP 429 (Too Many Requests)
- Error Message: Indicates quota type and reset time
- User Experience: Chat User Portal shows an appropriate error
Best Practices
Setting Appropriate Limits
| Use Case | Recommendation |
|---|---|
| Development | Higher limits for testing |
| Production | Balanced limits for normal usage |
| Public apps | Stricter limits to prevent abuse |
Monitoring and Adjustment
- Start with conservative limits
- Monitor actual usage patterns
- Adjust limits based on real needs
- Set up alerts before limits are hit
Communication
- Document quota limits for users
- Provide clear error messages
- Offer escalation paths for legitimate high-usage scenarios
Troubleshooting
Users Hitting Limits Unexpectedly
- Review current quota configuration
- Check for automated tools consuming quota
- Consider if limits are appropriately set
Quotas Not Enforcing
- Verify quota configuration is properly deployed
- Check the quota applies to the correct scope
- Review service logs for quota processing
Performance Impact
- Quota checking adds minimal latency
- Use appropriate partitioning to distribute checks
- Consider caching for high-volume scenarios