Skip to main content

Monitoring

Prometheus metrics for performance and availability tracking.


📊 Metrics Endpoint

curl http://localhost:5000/metrics

📈 All Metrics

MetricTypeDescriptionLabelsUse Case
dns_lookup_totalCounterTotal DNS lookupsserver, query_type, resultTrack query volume + success rate
dns_lookup_duration_secondsHistogramLookup duration (all servers)server, query_typeMeasure latency, calculate P95/P99
dns_lookup_errors_totalCounterTotal lookup errorsserver, error_typeIdentify problematic servers
dns_tasks_totalCounterTotal DNS tasksstatusMonitor async task processing
dns_api_requests_totalCounterTotal API requestsendpointTrack API usage patterns
dns_api_result_polls_totalCounterResult poll requests-Monitor polling frequency
dns_response_time_secondsHistogramDNS response timeserverDetailed server latency
dns_total_queriesCounterTotal queries per serverserverQuery distribution
dns_noerror_countCounterSuccessful resolutionsserverCount successful queries
dns_failure_countCounterFailed queriesserver, rcodeTrack error types (NXDOMAIN, SERVFAIL)
dns_avg_response_time_secondsGaugeAverage response timeserverQuick latency overview
dns_query_types_countCounterQueries by typeqtypeDistribution (A, AAAA, PTR, etc.)

🔍 Common PromQL Queries

Success Rate

# Per-server success rate (%)
sum by (server) (dns_lookup_total{result="success"}) /
sum by (server) (dns_lookup_total) * 100

Latency Analysis

# P95 latency per server
histogram_quantile(0.95,
sum by (server, le) (rate(dns_lookup_duration_seconds_bucket[5m]))
)

# Average latency
rate(dns_lookup_duration_seconds_sum[5m]) /
rate(dns_lookup_duration_seconds_count[5m])

Error Tracking

# Error rate per server
sum by (server) (rate(dns_lookup_errors_total[5m]))

# Top error types
topk(5, sum by (error_type) (rate(dns_lookup_errors_total[5m])))

Query Distribution

# Queries per server
sum by (server) (rate(dns_total_queries[5m]))

# Query types distribution
sum by (qtype) (dns_query_types_count)

🎯 Quick Setup

1. Enable Metrics (API)

# conf/config.yaml
api:
enable_metrics: true
metrics_port: 9090

2. Enable Metrics (Worker)

dnstestergo worker --enable-metrics --metrics-port 9091

3. Configure Prometheus

# prometheus.yml
scrape_configs:
- job_name: 'dnstester-api'
static_configs:
- targets: ['localhost:9090']

- job_name: 'dnstester-worker'
static_configs:
- targets: ['localhost:9091']

📊 Grafana Dashboard

Key Panels:

PanelQueryVisualization
Total Queriessum(dns_lookup_total)Single Stat
Success Ratesum(dns_lookup_total{result="success"})/sum(dns_lookup_total)*100Gauge
P95 Latencyhistogram_quantile(0.95, sum by (le) (rate(dns_lookup_duration_seconds_bucket[5m])))Graph
Errors/secsum(rate(dns_lookup_errors_total[5m]))Graph
Query Typessum by (qtype) (dns_query_types_count)Pie Chart
Server Comparisonsum by (server) (rate(dns_total_queries[5m]))Bar Chart

🐛 Troubleshooting

ProblemCheckSolution
No metrics datacurl localhost:9090/metricsVerify enable_metrics: true in config
Stale metricsPerform test queryMetrics update on API requests
Missing worker metricsWorker started with --enable-metrics?Add flag and restart
Prometheus "down"Check http://localhost:9090/targetsVerify ports and firewall

📚 See Also