Creata CP
Platform Hub · 전역
Platform Admin
KR
김운영
admin@creata.kr
SLO · Service Level Objectives
All within budget
Availability
99.97%
Target ≥ 99.95% · 달성
Error Budget 30d72%
남은 downtime 예산: 7m 24s
P95 Latency
148ms
Target < 200 ms · 달성
Burn rate0.23x
초과 요청 비율: 0.03%
RPO (Data Loss)
18s
Target < 1 min · 달성
최근 측정12:05:42
Kafka lag + replication
RTO (Recovery)
8min
Target < 15 min · 달성
마지막 훈련2026-03-28
월간 DR drill · 다음 04-28
Metrics Dashboard
Prometheus scrape 15s Live
Request Rate
rate[5m] · 1h
P95 Latency / 서브시스템
ms · 15m
Error Rate
% · 1h
Kafka Lag
messages · 1h
Zenith Anchor Queue
depth · 1h
CPU / Memory across pods
% · 30m
Active Alerts
2 critical 3 warning 1 info
Severity Service Metric Threshold 현재값 Fired At
Prometheus Alert Rule 샘플
/etc/prometheus/rules/approval_latency.yml
groups: - name: creata.approval.slo interval: 30s rules: - alert: ApprovalL1P95TooHigh expr: histogram_quantile(0.95, sum(rate(approval_duration_seconds_bucket[5m])) by (le, tenant)) > 0.2 for: 5m labels: severity: warning track: {{ $labels.tenant }} annotations: summary: "Approval L1 P95 latency > 200ms on {{ $labels.tenant }}" runbook: "https://runbook.creata/cp/approval-latency.md" # 이중 임계: 5x burn rate 는 critical 로 승격 - alert: ApprovalBudgetBurn5x expr: error_budget_burn_rate{slo="approval_latency"} > 5 for: 2m labels: severity: critical annotations: summary: "Approval SLO burn rate 5x — budget exhausted in 2h"
Log Volume 24h
Loki
184.2 GB · 초당 평균 2.1 MB
Warn+Error 3.8% · ERROR만 0.12%
SRE 도구 바로가기