部署指南#
本指南涵盖 Cloud Native MCP Server 的各种部署策略和最佳实践。
前提条件#
系统要求#
- 操作系统: Linux, macOS, 或 Windows
- CPU: 最低 1 核,推荐 2+ 核
- 内存: 最低 512MB,推荐 1GB+
- 磁盘: 最低 100MB
- 网络: 可访问 Kubernetes 集群和配置的服务
软件要求#
- Go: 1.25+ (从源码构建)
- Docker: 20.10+ (容器化部署)
- kubectl: 已配置集群访问
- Helm: 3.0+ (Helm 部署)
服务依赖#
可选连接的服务:
- Grafana (可选)
- Prometheus (可选)
- Kibana (可选)
- Elasticsearch (可选)
- Alertmanager (可选)
- Jaeger (可选)
- OpenTelemetry (可选)
快速部署#
二进制部署#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| # 下载最新版本
wget https://github.com/mahmut-Abi/cloud-native-mcp-server/releases/latest/download/cloud-native-mcp-server-linux-amd64
chmod +x cloud-native-mcp-server-linux-amd64
# 创建配置
cat > config.yaml << EOF
server:
mode: "sse"
addr: "0.0.0.0:8080"
logging:
level: "info"
kubernetes:
kubeconfig: ""
EOF
# 运行
./cloud-native-mcp-server-linux-amd64
|
Docker 快速启动#
1
2
3
4
5
| docker run -d \
--name cloud-native-mcp-server \
-p 8080:8080 \
-v ~/.kube:/root/.kube:ro \
mahmutabi/cloud-native-mcp-server:latest
|
Kubernetes 部署#
基本部署#
创建部署清单:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| apiVersion: apps/v1
kind: Deployment
metadata:
name: cloud-native-mcp-server
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: cloud-native-mcp-server
template:
metadata:
labels:
app: cloud-native-mcp-server
spec:
serviceAccountName: cloud-native-mcp-server
containers:
- name: cloud-native-mcp-server
image: mahmutabi/cloud-native-mcp-server:latest
ports:
- containerPort: 8080
env:
- name: MCP_MODE
value: "sse"
- name: MCP_ADDR
value: "0.0.0.0:8080"
- name: MCP_LOG_LEVEL
value: "info"
volumeMounts:
- name: kubeconfig
mountPath: /root/.kube
readOnly: true
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: kubeconfig
configMap:
name: kubeconfig
|
Service Account 和 RBAC#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| apiVersion: v1
kind: ServiceAccount
metadata:
name: cloud-native-mcp-server
namespace: default
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cloud-native-mcp-server
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get", "list", "watch", "describe"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cloud-native-mcp-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cloud-native-mcp-server
subjects:
- kind: ServiceAccount
name: cloud-native-mcp-server
namespace: default
|
Service#
1
2
3
4
5
6
7
8
9
10
11
12
13
| apiVersion: v1
kind: Service
metadata:
name: cloud-native-mcp-server
namespace: default
spec:
type: ClusterIP
ports:
- port: 8080
targetPort: 8080
protocol: TCP
selector:
app: cloud-native-mcp-server
|
Ingress#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: cloud-native-mcp-server
namespace: default
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: k8s-mcp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: cloud-native-mcp-server
port:
number: 8080
|
1
2
3
4
5
6
7
8
9
10
| # 应用所有清单
kubectl apply -f deploy/kubernetes/
# 验证部署
kubectl get pods -l app=cloud-native-mcp-server
kubectl logs -l app=cloud-native-mcp-server
# 测试连接
kubectl port-forward svc/cloud-native-mcp-server 8080:8080
curl http://localhost:8080/health
|
Docker 部署#
Docker Compose#
创建 docker-compose.yml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| version: '3.8'
services:
cloud-native-mcp-server:
image: mahmutabi/cloud-native-mcp-server:latest
container_name: cloud-native-mcp-server
ports:
- "8080:8080"
volumes:
- ~/.kube:/root/.kube:ro
- ./config.yaml:/app/config.yaml:ro
environment:
- MCP_MODE=sse
- MCP_ADDR=0.0.0.0:8080
- MCP_LOG_LEVEL=info
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- monitoring
networks:
monitoring:
external: true
|
使用 Docker Compose 运行#
1
2
3
4
5
6
7
8
9
10
11
| # 启动
docker-compose up -d
# 查看日志
docker-compose logs -f
# 停止
docker-compose down
# 重启
docker-compose restart
|
自定义 Docker 镜像#
构建自己的镜像:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| FROM golang:1.25-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o cloud-native-mcp-server ./cmd/server
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/cloud-native-mcp-server .
EXPOSE 8080
CMD ["./cloud-native-mcp-server"]
|
构建和推送:
1
2
3
4
5
| # 构建
docker build -t your-registry/cloud-native-mcp-server:latest .
# 推送
docker push your-registry/cloud-native-mcp-server:latest
|
Helm 部署#
从 Chart 仓库安装#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # 添加仓库
helm repo add k8s-mcp https://mahmut-Abi.github.io/cloud-native-mcp-server
# 更新仓库
helm repo update
# 安装
helm install cloud-native-mcp-server k8s-mcp/cloud-native-mcp-server
# 升级
helm upgrade cloud-native-mcp-server k8s-mcp/cloud-native-mcp-server
# 卸载
helm uninstall cloud-native-mcp-server
|
自定义 Values#
创建 values.yaml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
| replicaCount: 2
image:
repository: mahmutabi/cloud-native-mcp-server
tag: latest
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 8080
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: k8s-mcp.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: k8s-mcp-tls
hosts:
- k8s-mcp.example.com
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
config:
server:
mode: "sse"
addr: "0.0.0.0:8080"
logging:
level: "info"
format: "json"
kubernetes:
kubeconfig: ""
grafana:
enabled: true
url: "http://grafana:3000"
apiKey: "${GRAFANA_API_KEY}"
prometheus:
enabled: true
address: "http://prometheus:9090"
rbac:
create: true
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get", "list", "watch", "describe"]
serviceAccount:
create: true
name: ""
nodeSelector: {}
tolerations: []
affinity: {}
|
使用自定义 Values 安装#
1
| helm install cloud-native-mcp-server ./deploy/helm/cloud-native-mcp-server -f values.yaml
|
生产环境考虑#
高可用性#
部署多个副本并设置适当的资源限制:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| replicaCount: 3
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
|
资源优化#
调优当前版本已支持的服务参数:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| config:
server:
readTimeoutSec: 30
writeTimeoutSec: 0
idleTimeoutSec: 60
kubernetes:
timeoutSec: 30
qps: 100.0
burst: 200
ratelimit:
enabled: true
requests_per_second: 100
burst: 200
|
1. 启用认证#
1
2
3
4
5
| config:
auth:
enabled: true
mode: "apikey"
apiKey: "${MCP_AUTH_API_KEY}"
|
2. 使用 Secrets#
1
2
3
4
5
6
7
8
| apiVersion: v1
kind: Secret
metadata:
name: k8s-mcp-secrets
type: Opaque
stringData:
mcp-api-key: "your-secret-key"
grafana-api-key: "your-grafana-key"
|
3. 网络策略#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: cloud-native-mcp-server
spec:
podSelector:
matchLabels:
app: cloud-native-mcp-server
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
|
日志和监控#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| config:
logging:
level: "info"
json: true
audit:
enabled: true
storage: "file"
format: "json"
file:
path: "/var/log/cloud-native-mcp-server/audit.log"
maxSizeMB: 100
maxBackups: 10
maxAgeDays: 30
compress: true
|
添加 Prometheus 监控:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| apiVersion: v1
kind: Service
metadata:
name: cloud-native-mcp-server
namespace: default
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
type: ClusterIP
ports:
- port: 8080
targetPort: 8080
selector:
app: cloud-native-mcp-server
|
监控和可观测性#
健康检查#
服务器提供健康检查端点:
1
2
3
4
5
6
7
8
| # 基本健康检查
curl http://localhost:8080/health
# 详细健康信息
curl http://localhost:8080/health/detailed
# 就绪检查
curl http://localhost:8080/ready
|
Prometheus 指标在 /metrics 端点可用:
1
| curl http://localhost:8080/metrics
|
关键指标:
mcp_requests_total - 总请求数mcp_request_duration_seconds - 请求持续时间mcp_cache_hits_total - 缓存命中数mcp_cache_misses_total - 缓存未命中数mcp_active_connections - 活动连接数
结构化 JSON 日志:
1
2
3
4
5
6
7
| {
"level": "info",
"timestamp": "2024-01-01T00:00:00Z",
"message": "Starting Cloud Native MCP Server",
"version": "1.0.0",
"mode": "sse"
}
|
审计日志#
审计日志跟踪所有操作:
1
2
3
4
5
6
7
8
| {
"timestamp": "2024-01-01T00:00:00Z",
"request_id": "abc123",
"tool": "kubernetes_list_resources_summary",
"params": {"kind": "Pod"},
"duration_ms": 123,
"status": "success"
}
|
安全最佳实践#
1. 最小权限 RBAC#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cloud-native-mcp-server
rules:
# 允许对大多数资源的只读访问
- apiGroups: [""]
resources: ["pods", "services", "configmaps", "secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch"]
# 允许 describe 用于故障排查
- apiGroups: ["*"]
resources: ["*"]
verbs: ["describe"]
|
2. 密钥管理#
使用 Kubernetes secrets 存储敏感数据:
1
2
3
| kubectl create secret generic k8s-mcp-secrets \
--from-literal=mcp-api-key='your-key' \
--from-literal=grafana-api-key='your-grafana-key'
|
将 secrets 挂载为环境变量:
1
2
3
4
5
6
| env:
- name: MCP_AUTH_API_KEY
valueFrom:
secretKeyRef:
name: k8s-mcp-secrets
key: mcp-api-key
|
3. 网络安全#
- 对外部访问使用 TLS
- 实施网络策略
- 限制入口/出口流量
- 使用服务网格进行 mTLS
4. Pod 安全#
1
2
3
4
5
6
7
8
| securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
|
5. 镜像安全#
- 使用签名镜像
- 扫描镜像漏洞
- 保持镜像更新
- 使用特定版本标签
故障排查#
常见问题#
1. 连接被拒绝#
问题: 无法连接到服务器
解决方案:
1
2
3
4
5
6
7
8
9
10
11
12
| # 检查 pod 状态
kubectl get pods -l app=cloud-native-mcp-server
# 检查日志
kubectl logs -l app=cloud-native-mcp-server
# 检查 service
kubectl get svc cloud-native-mcp-server
# 端口转发测试
kubectl port-forward svc/cloud-native-mcp-server 8080:8080
curl http://localhost:8080/health
|
2. 认证失败#
问题: 401 Unauthorized
解决方案:
1
2
3
4
5
6
7
8
| # 检查认证配置
kubectl get configmap k8s-mcp-config -o yaml
# 验证 secrets
kubectl get secret k8s-mcp-secrets -o yaml
# 使用正确的头部测试
curl -H "X-API-Key: your-key" http://localhost:8080/health
|
3. Kubernetes API 访问被拒绝#
问题: 无法访问 Kubernetes API
解决方案:
1
2
3
4
5
6
7
8
9
10
11
| # 检查 RBAC
kubectl get clusterrole cloud-native-mcp-server -o yaml
# 检查 service account
kubectl get sa cloud-native-mcp-server
# 验证 cluster role binding
kubectl get clusterrolebinding cloud-native-mcp-server
# 测试权限
kubectl auth can-i list pods --as=system:serviceaccount:default:cloud-native-mcp-server
|
4. 高内存使用#
问题: Pod OOMKilled
解决方案:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| # 增加内存限制
resources:
limits:
memory: "1Gi"
# 平滑突发流量
config:
ratelimit:
enabled: true
requests_per_second: 80
burst: 120
# 降低审计开销
audit:
sampling:
enabled: true
rate: 0.3
|
5. 响应慢#
问题: 请求超时
解决方案:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # 增加服务超时并提高吞吐
kubernetes:
timeoutSec: 60
qps: 120
burst: 240
# 调整 HTTP 超时
server:
readTimeoutSec: 60
writeTimeoutSec: 0
idleTimeoutSec: 90
# 使用摘要工具
# 用 kubernetes_list_resources_summary 替换 kubernetes_list_resources
|
调试模式#
启用调试日志:
1
2
| logging:
level: "debug"
|
或通过环境变量:
1
| export MCP_LOG_LEVEL=debug
|
健康检查脚本#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| #!/bin/bash
echo "检查 Cloud Native MCP Server 健康状况..."
# 检查端点
curl -f http://localhost:8080/health || exit 1
# 检查指标
curl -f http://localhost:8080/metrics > /dev/null || exit 1
# 检查就绪状态
curl -f http://localhost:8080/ready || exit 1
echo "所有检查通过!"
|
相关文档#