Performance Guide#

This guide focuses on performance tuning that matches the current Cloud Native MCP Server implementation.

What You Can Tune Today#

1. Server Timeouts#

1
2
3
4
server:
  readTimeoutSec: 30
  writeTimeoutSec: 0
  idleTimeoutSec: 60

Recommendations:

Keep writeTimeoutSec: 0 for SSE-heavy workloads.
Increase readTimeoutSec for slow clients or large request bodies.
Increase idleTimeoutSec for long-lived clients behind proxies.

2. Kubernetes Client Throughput#

1
2
3
4
kubernetes:
  timeoutSec: 30
  qps: 100.0
  burst: 200

Recommendations:

Increase qps/burst for larger clusters.
Increase timeoutSec for heavy list/watch or expensive queries.

3. Request Rate Limiting#

1
2
3
4
ratelimit:
  enabled: true
  requests_per_second: 100
  burst: 200

Recommendations:

Enable in multi-tenant or internet-facing deployments.
Start with conservative values and raise gradually from metrics.

4. Reduce Unnecessary Service Overhead#

1
2
enableDisable:
  enabledServices: ["kubernetes", "prometheus", "grafana", "aggregate"]

Recommendations:

Only enable services you actually use.
Disable unused tools for lower memory and startup overhead.

5. Audit Cost Control (When Audit Is Enabled)#

1
2
3
4
5
6
audit:
  enabled: true
  storage: "database"
  sampling:
    enabled: true
    rate: 0.3

Recommendations:

Use sampling on high-QPS workloads.
Use storage: database for large, query-heavy audit datasets.

Built-In Optimizations (No Public YAML Key)#

The server already includes internal optimizations such as:

response truncation safeguards
efficient JSON processing paths
internal caching and pooling in service/tool layers

These are implementation details, not stable public config keys. Use observable knobs above (server, kubernetes, ratelimit, enableDisable, audit) first.

Performance Metrics to Track#

Key metrics to watch:

request rate
p95/p99 latency
error rate
active connections
memory and CPU usage

Example queries:

1
rate(mcp_requests_total[5m])

1
histogram_quantile(0.99, rate(mcp_request_duration_seconds_bucket[5m]))

1
rate(mcp_errors_total[5m])

Benchmarking#

Health Endpoint#

1
ab -n 10000 -c 100 http://127.0.0.1:8080/health

Tool Call Endpoint (Legacy Message Endpoint Compatibility)#

Create payload:

1
{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}

Run benchmark:

1
2
3
4
5
ab -n 1000 -c 10 \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key" \
  -p payload.json \
  http://127.0.0.1:8080/api/kubernetes/sse/message

Production Baseline Example#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
server:
  mode: "streamable-http"
  addr: "0.0.0.0:8080"
  readTimeoutSec: 30
  writeTimeoutSec: 0
  idleTimeoutSec: 60

logging:
  level: "info"
  json: true

kubernetes:
  kubeconfig: ""
  timeoutSec: 30
  qps: 100.0
  burst: 200

ratelimit:
  enabled: true
  requests_per_second: 100
  burst: 200

auth:
  enabled: true
  mode: "apikey"
  apiKey: "${MCP_AUTH_API_KEY}"

audit:
  enabled: true
  storage: "database"
  sampling:
    enabled: true
    rate: 0.3

enableDisable:
  enabledServices: ["kubernetes", "prometheus", "grafana", "aggregate"]

Troubleshooting#

High Latency#

Actions:

Increase kubernetes.timeoutSec if backend queries are slow.
Increase kubernetes.qps/kubernetes.burst for API-client bottlenecks.
Check p99 latency and correlate with backend service saturation.

High Error Rate#

Actions:

Verify auth mode and credentials.
Check backend service availability (Prometheus/Grafana/etc.).
Inspect audit logs via /api/audit/logs if audit is enabled.

High Memory Usage#

Actions:

Reduce enabled services/tools.
Enable audit sampling.
Reduce burst limits if traffic spikes cause memory pressure.

Best Practices#

Prefer streamable-http in production unless a client requires SSE.
Tune one dimension at a time (timeout, then qps/burst, then ratelimit).
Keep load tests representative of real tool usage.
Track p95/p99 latency, not only average latency.

Performance Guide#

What You Can Tune Today#

1. Server Timeouts#

2. Kubernetes Client Throughput#

3. Request Rate Limiting#

4. Reduce Unnecessary Service Overhead#

5. Audit Cost Control (When Audit Is Enabled)#

Built-In Optimizations (No Public YAML Key)#

Performance Metrics to Track#

Benchmarking#

Health Endpoint#

Tool Call Endpoint (Legacy Message Endpoint Compatibility)#

Production Baseline Example#

Troubleshooting#

High Latency#

High Error Rate#

High Memory Usage#

Best Practices#

Related Documentation#