Monitoring Quiz API Performance with Prometheus and Grafana
Instrument your quiz API with Prometheus metrics, build Grafana dashboards, and set up alerts that catch problems before users notice.
You Cannot Fix What You Cannot See
Your quiz API might be running fine right now, but when 500 students hit it during an exam, will you know about the latency spike before they start complaining? Monitoring with Prometheus and Grafana gives you visibility into request latency, error rates, database performance, and quiz-specific metrics like completion rates and scoring distributions.
This guide walks you through instrumenting a Node.js quiz API, defining custom metrics, building dashboards, and creating alerts.
Prerequisites
- Node.js quiz API (Express or Fastify)
- Docker for running Prometheus and Grafana locally
- Basic understanding of HTTP metrics
Setting Up prom-client
Install the Prometheus client library:
npm install prom-client
Create a metrics module at src/metrics.ts:
1import { 2 Registry, 3 Counter, 4 Histogram, 5 Gauge, 6 collectDefaultMetrics, 7} from "prom-client"; 8 9export const registry = new Registry(); 10 11// Collect Node.js runtime metrics (memory, CPU, event loop) 12collectDefaultMetrics({ register: registry }); 13 14// HTTP request metrics 15export const httpRequestDuration = new Histogram({ 16 name: "http_request_duration_seconds", 17 help: "Duration of HTTP requests in seconds", 18 labelNames: ["method", "route", "status_code"], 19 buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5], 20 registers: [registry], 21}); 22 23export const httpRequestTotal = new Counter({ 24 name: "http_requests_total", 25 help: "Total number of HTTP requests", 26 labelNames: ["method", "route", "status_code"], 27 registers: [registry], 28}); 29 30// Quiz-specific metrics 31export const quizCompletionDuration = new Histogram({ 32 name: "quiz_completion_duration_seconds", 33 help: "Time taken to complete a quiz", 34 labelNames: ["quiz_id", "difficulty"], 35 buckets: [30, 60, 120, 300, 600, 900, 1800], 36 registers: [registry], 37}); 38 39export const quizScore = new Histogram({ 40 name: "quiz_score_percentage", 41 help: "Distribution of quiz scores as percentages", 42 labelNames: ["quiz_id", "difficulty"], 43 buckets: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100], 44 registers: [registry], 45}); 46 47export const quizSubmissions = new Counter({ 48 name: "quiz_submissions_total", 49 help: "Total quiz submissions", 50 labelNames: ["quiz_id", "difficulty", "passed"], 51 registers: [registry], 52}); 53 54export const activeQuizSessions = new Gauge({ 55 name: "active_quiz_sessions", 56 help: "Number of currently active quiz sessions", 57 registers: [registry], 58}); 59 60// Database metrics 61export const dbQueryDuration = new Histogram({ 62 name: "db_query_duration_seconds", 63 help: "Duration of database queries", 64 labelNames: ["operation", "table"], 65 buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1], 66 registers: [registry], 67}); 68 69export const dbConnectionPool = new Gauge({ 70 name: "db_connection_pool_size", 71 help: "Current database connection pool size", 72 labelNames: ["state"], 73 registers: [registry], 74});
Instrumenting Express
Add middleware to capture HTTP metrics:
1import express from "express"; 2import { registry, httpRequestDuration, httpRequestTotal } from "./metrics"; 3 4const app = express(); 5 6// Metrics endpoint for Prometheus to scrape 7app.get("/metrics", async (req, res) => { 8 res.setHeader("Content-Type", registry.contentType); 9 res.send(await registry.metrics()); 10}); 11 12// Request duration middleware 13app.use((req, res, next) => { 14 const end = httpRequestDuration.startTimer(); 15 16 res.on("finish", () => { 17 const route = req.route?.path || req.path; 18 const labels = { 19 method: req.method, 20 route: normalizeRoute(route), 21 status_code: res.statusCode.toString(), 22 }; 23 24 end(labels); 25 httpRequestTotal.inc(labels); 26 }); 27 28 next(); 29}); 30 31// Normalize route paths to avoid high cardinality 32function normalizeRoute(path: string): string { 33 return path 34 .replace(/\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/g, "/:id") 35 .replace(/\/\d+/g, "/:id") 36 .replace(/\/cuid_[a-z0-9]+/g, "/:id"); 37}
High cardinality is the most common Prometheus mistake. If you use raw paths with IDs as label values, you create a new time series for every unique quiz ID. The normalizeRoute function collapses these into generic patterns.
Instrumenting Quiz Logic
Add metrics to your quiz submission handler:
1import { 2 quizCompletionDuration, 3 quizScore, 4 quizSubmissions, 5 activeQuizSessions, 6} from "./metrics"; 7 8app.post("/api/v1/quizzes/:id/submit", async (req, res) => { 9 const { id: quizId } = req.params; 10 const { answers, startedAt } = req.body; 11 12 try { 13 const quiz = await getQuiz(quizId); 14 const result = calculateScore(quiz, answers); 15 16 // Record completion time 17 if (startedAt) { 18 const durationSeconds = (Date.now() - new Date(startedAt).getTime()) / 1000; 19 quizCompletionDuration.observe( 20 { quiz_id: quizId, difficulty: quiz.difficulty }, 21 durationSeconds 22 ); 23 } 24 25 // Record score distribution 26 const percentage = (result.score / result.total) * 100; 27 quizScore.observe( 28 { quiz_id: quizId, difficulty: quiz.difficulty }, 29 percentage 30 ); 31 32 // Count submissions 33 const passed = percentage >= 70; 34 quizSubmissions.inc({ 35 quiz_id: quizId, 36 difficulty: quiz.difficulty, 37 passed: passed.toString(), 38 }); 39 40 // Decrement active sessions 41 activeQuizSessions.dec(); 42 43 res.json(result); 44 } catch (err) { 45 res.status(500).json({ error: "Submission failed" }); 46 } 47}); 48 49// Track when quizzes start 50app.post("/api/v1/quizzes/:id/start", async (req, res) => { 51 activeQuizSessions.inc(); 52 // ... start logic 53});
Database Query Instrumentation
Wrap your database client to capture query metrics:
1import { Pool } from "pg"; 2import { dbQueryDuration, dbConnectionPool } from "./metrics"; 3 4const pool = new Pool({ connectionString: process.env.DATABASE_URL }); 5 6// Monitor connection pool 7setInterval(() => { 8 dbConnectionPool.set({ state: "total" }, pool.totalCount); 9 dbConnectionPool.set({ state: "idle" }, pool.idleCount); 10 dbConnectionPool.set({ state: "waiting" }, pool.waitingCount); 11}, 5000); 12 13// Instrumented query function 14export async function query( 15 text: string, 16 params?: unknown[] 17): Promise<any> { 18 const operation = text.trim().split(" ")[0].toUpperCase(); 19 const table = extractTableName(text); 20 21 const end = dbQueryDuration.startTimer({ operation, table }); 22 23 try { 24 const result = await pool.query(text, params); 25 end(); 26 return result; 27 } catch (err) { 28 end(); 29 throw err; 30 } 31} 32 33function extractTableName(sql: string): string { 34 const match = sql.match(/(?:FROM|INTO|UPDATE|JOIN)\s+(\w+)/i); 35 return match?.[1] ?? "unknown"; 36}
Prometheus Configuration
Create prometheus.yml:
1global: 2 scrape_interval: 15s 3 evaluation_interval: 15s 4 5rule_files: 6 - "alert_rules.yml" 7 8scrape_configs: 9 - job_name: "quiz-api" 10 metrics_path: "/metrics" 11 static_configs: 12 - targets: ["host.docker.internal:3000"] 13 labels: 14 environment: "production"
Alerting Rules
Create alert_rules.yml:
1groups: 2 - name: quiz-api-alerts 3 rules: 4 - alert: HighErrorRate 5 expr: | 6 sum(rate(http_requests_total{status_code=~"5.."}[5m])) 7 / 8 sum(rate(http_requests_total[5m])) 9 > 0.05 10 for: 2m 11 labels: 12 severity: critical 13 annotations: 14 summary: "High error rate detected" 15 description: "More than 5% of requests are returning 5xx errors" 16 17 - alert: HighLatency 18 expr: | 19 histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) 20 > 2 21 for: 5m 22 labels: 23 severity: warning 24 annotations: 25 summary: "High API latency" 26 description: "95th percentile latency is above 2 seconds" 27 28 - alert: DatabaseSlowQueries 29 expr: | 30 histogram_quantile(0.99, rate(db_query_duration_seconds_bucket[5m])) 31 > 1 32 for: 3m 33 labels: 34 severity: warning 35 annotations: 36 summary: "Slow database queries" 37 description: "99th percentile query duration is above 1 second" 38 39 - alert: LowQuizPassRate 40 expr: | 41 sum(rate(quiz_submissions_total{passed="true"}[1h])) 42 / 43 sum(rate(quiz_submissions_total[1h])) 44 < 0.2 45 for: 30m 46 labels: 47 severity: info 48 annotations: 49 summary: "Low quiz pass rate" 50 description: "Less than 20% of submissions are passing - questions may be too difficult"
Docker Compose Setup
Run Prometheus and Grafana locally:
1# docker-compose.monitoring.yml 2services: 3 prometheus: 4 image: prom/prometheus:v2.53.0 5 ports: 6 - "9090:9090" 7 volumes: 8 - ./prometheus.yml:/etc/prometheus/prometheus.yml 9 - ./alert_rules.yml:/etc/prometheus/alert_rules.yml 10 extra_hosts: 11 - "host.docker.internal:host-gateway" 12 13 grafana: 14 image: grafana/grafana:11.1.0 15 ports: 16 - "3001:3000" 17 environment: 18 GF_SECURITY_ADMIN_PASSWORD: admin 19 volumes: 20 - grafana-data:/var/lib/grafana 21 22volumes: 23 grafana-data:
Start the stack:
docker compose -f docker-compose.monitoring.yml up -d
Grafana Dashboard
After connecting Prometheus as a data source in Grafana, create panels with these PromQL queries:
Request rate:
sum(rate(http_requests_total[5m])) by (route)
95th percentile latency:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))
Error rate:
sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
Quiz score distribution:
histogram_quantile(0.5, sum(rate(quiz_score_percentage_bucket[1h])) by (le, quiz_id))
Active sessions:
active_quiz_sessions
Summary
Prometheus and Grafana give you complete visibility into your quiz API. The combination of standard HTTP metrics, database performance tracking, and quiz-specific metrics like score distributions and pass rates lets you understand both technical and product health.
Key points:
- Normalize route paths to avoid high-cardinality label problems
- Instrument both the HTTP layer and the business logic layer
- Set alerts on error rates, latency, and slow database queries
- Track quiz-specific metrics like pass rates to catch content problems
- Use histograms with meaningful bucket boundaries for latency and scores
Think you understand Engineering? Put your skills to the test with hands-on quiz questions.
Enjoyed this article?
Share it with your team or try our quiz platform.
Stay Updated
Get the latest tutorials and API tips delivered to your inbox.
No spam, unsubscribe anytime.
Related Articles
Building a Quiz Import/Export System
Design a robust import/export system for quizzes with JSON and CSV support, validation schemas, bulk operations, and clear error reporting.
Rate Limiting Your Quiz API: A Practical Guide
Protect your quiz API from abuse with token bucket and sliding window rate limiters. Includes Redis-based implementation and graceful 429 handling.
Scaling Quiz Delivery: From 100 to 100,000 Concurrent Players
Scale your quiz platform to handle massive concurrent load with database optimization, caching, connection pooling, and read replicas.