Skip to main content

Monitoring

Key Metrics to Watch

MetricHealthy thresholdAlert threshold
Assessment job completion rate> 98%< 95%
Average collection duration< 60s> 90s
Average quality score> 80%< 60% (per job)
Phase 1 failure rate0%Any failure
API error rate (403)< 5% of calls> 15%
API throttling rate (429)< 2% of calls> 10%
EDS write success rate100%Any failure
Admin notification delivery< 2 min> 5 min

Dashboard

The CYC operations dashboard shows:

  • Jobs completed in last 24 hours with quality score distribution
  • Error rate by domain (Defender, Cost, Sentinel, etc.)
  • Collection duration percentiles (p50, p90, p99)
  • Degraded quality job list with recommended actions

Audit Log Review

Review the EDS audit log weekly for:

  • Jobs in collecting state for more than 10 minutes (stuck jobs)
  • Jobs where deleted_at is null and TTL should have expired (deletion failures)
  • Unusual access patterns in the access log