Monitoring & Observability
tollbooth emits structured JSON logs for every request. This guide covers the log fields you should index, the metrics you should track, and how to diagnose common production issues.
Structured log fields
Section titled “Structured log fields”Every log line tollbooth produces is a JSON object. The fields below are the most useful for filtering, alerting, and dashboarding.
| Field | Type | Description |
|---|---|---|
msg | string | Event type — "request", "payment_settled", "route_not_found", "settlement_failed", etc. |
timestamp | string | ISO 8601 timestamp. |
level | string | Log level: "debug", "info", "warn", "error". |
method | string | HTTP method, e.g. "POST". |
path | string | Request path, e.g. "/v1/messages". |
route | string | Matched route pattern, e.g. "POST /v1/messages". |
status | number | HTTP status code returned to the client. |
duration_ms | number | Total request duration in milliseconds. |
price | string | Price charged for the route, e.g. "$0.075". |
payer | string | Payer wallet address extracted from the payment header. |
tx_hash | string | On-chain transaction hash from settlement. |
amount | string | Settlement amount. |
upstream_status | number | HTTP status code from the upstream response. |
reason | string | Reason for rejection or failure. |
error | string | Error message, if any. |
Log levels
Section titled “Log levels”| Level | When |
|---|---|
info | Successful paid requests, settlements completed. |
warn | 402 responses, 429 rate-limit blocks, recoverable failures. |
error | Settlement failures, upstream crashes, hook errors. |
debug | Verbose tracing for development. Disable in production. |
Privacy note
Section titled “Privacy note”Example log lines
Section titled “Example log lines”Successful paid request:
{ "msg": "request", "level": "info", "timestamp": "2025-12-01T14:30:00.000Z", "method": "POST", "path": "/v1/messages", "route": "POST /v1/messages", "status": 200, "duration_ms": 342, "price": "$0.075", "payer": "0x1234...abcd", "tx_hash": "0xdeadbeef..."}Payment settled:
{ "msg": "payment_settled", "level": "info", "timestamp": "2025-12-01T14:30:00.187Z", "route": "POST /v1/messages", "payer": "0x1234...abcd", "amount": "$0.075", "tx_hash": "0xdeadbeef..."}402 — missing payment:
{ "msg": "request", "level": "warn", "timestamp": "2025-12-01T14:30:01.000Z", "method": "GET", "path": "/data", "route": "GET /data", "status": 402, "duration_ms": 3, "price": "$0.01", "error": "missing payment-signature header"}Settlement failed:
{ "msg": "settlement_failed", "level": "error", "timestamp": "2025-12-01T14:30:02.000Z", "route": "POST /v1/messages", "payer": "0x1234...abcd", "reason": "facilitator timeout"}Request ID correlation
Section titled “Request ID correlation”tollbooth propagates request IDs for end-to-end tracing:
- If the incoming request has an
X-Request-Idheader, tollbooth uses it. - Otherwise, tollbooth generates a
req_prefixed UUID. - The ID is forwarded to the upstream in
X-Request-Id. - The same ID appears in the response header and in every log line for that request.
This lets you correlate a single request across tollbooth logs, upstream API logs, and client-side traces.
Prometheus-style metrics
Section titled “Prometheus-style metrics”When gateway.metrics is enabled, tollbooth exposes a Prometheus-compatible /metrics endpoint. Names follow the tollbooth_ prefix convention.
gateway: metrics: trueCounters
Section titled “Counters”| Metric | Labels | Description |
|---|---|---|
tollbooth_requests_total | route, method, status | Total requests by route, method, and HTTP status. |
tollbooth_payments_total | route, outcome | Payment attempts. outcome = success, rejected, missing. |
tollbooth_settlements_total | strategy, outcome | Settlement attempts. outcome = success, failure. |
tollbooth_cache_hits_total | route | Verification cache hits. |
tollbooth_cache_misses_total | route | Verification cache misses. |
tollbooth_rate_limit_blocks_total | route | Requests blocked by rate limiting. Use logs for per-client breakdown — putting client_id in a Prometheus label causes cardinality explosion with unbounded wallet addresses. |
tollbooth_upstream_errors_total | upstream, status | Non-2xx responses from upstreams. |
tollbooth_revenue_usd_total | route | Cumulative revenue in USD from successful settlements. Incremented by the route price on each settled request. |
Histograms
Section titled “Histograms”| Metric | Labels | Description |
|---|---|---|
tollbooth_request_duration_seconds | route, method | End-to-end request latency. |
tollbooth_settlement_duration_seconds | strategy | Settlement latency. |
tollbooth_upstream_duration_seconds | upstream | Upstream response latency. |
Gauges
Section titled “Gauges”| Metric | Labels | Description |
|---|---|---|
tollbooth_active_requests | — | Currently in-flight requests. |
Payment funnel
Section titled “Payment funnel”tollbooth’s request lifecycle forms a conversion funnel. Tracking drop-off at each stage tells you exactly where revenue is lost.
Request outcomes (from tollbooth_requests_total):
All inbound requests ├─ 402 — payment required (client didn't pay) ├─ 200 — success (free route, cached session, or paid + settled + upstream OK) └─ 5xx — internal / upstream errorPaid-request pipeline (separate counters):
402 issued → tollbooth_payments_total{outcome="success"} ← payment verified → tollbooth_settlements_total{outcome="success"} ← settled on-chain → tollbooth_requests_total{status="200"} ← upstream responded OK → tollbooth_revenue_usd_total ← revenue collectedPayment verification and settlement are separate phases — a payment can be verified successfully (payments_total{outcome="success"}) while the subsequent settlement still fails (settlements_total{outcome="failure"}). A healthy funnel has minimal drop-off between verification and settlement, and between settlement and successful upstream responses. If you see a gap between those first two counters, check facilitator health. A gap between settlements and 200s means upstreams are erroring after payment (see Refund Protection).
Dashboards you want
Section titled “Dashboards you want”If you’re using Grafana (or any Prometheus-compatible dashboarding tool), these are the panels worth setting up:
Traffic overview
Section titled “Traffic overview”- Request rate —
rate(tollbooth_requests_total[5m])broken down byroute. - 402 rate —
rate(tollbooth_requests_total{status="402"}[5m]). A spike means clients aren’t sending payment headers. Check if a client library updated or a route price changed. - Error rate —
rate(tollbooth_requests_total{status=~"5.."}[5m])byroute. Upstream failures vs. tollbooth errors.
Revenue & payments
Section titled “Revenue & payments”- Revenue rate —
rate(tollbooth_revenue_usd_total[5m])byroute. Shows real-time earning velocity. - Cumulative revenue —
tollbooth_revenue_usd_totalbyroute. Total revenue per route since last restart. If you need persistent revenue accounting across restarts, export metrics to a remote store (e.g. Prometheus with long-term storage) or emit settlement events to an external analytics system — tollbooth itself is not an accounting system. - Settlement success rate —
rate(tollbooth_settlements_total{outcome="success"}[5m]) / rate(tollbooth_settlements_total[5m]). Alert if this drops below 99%. - Settlement latency p95 —
histogram_quantile(0.95, rate(tollbooth_settlement_duration_seconds_bucket[5m])). Facilitator latency over 500 ms warrants investigation. - Payment rejection rate —
rate(tollbooth_payments_total{outcome="rejected"}[5m]). Rejections mean invalid signatures or insufficient funds.
Cache & rate limiting
Section titled “Cache & rate limiting”- Cache hit ratio —
rate(tollbooth_cache_hits_total[5m]) / (rate(tollbooth_cache_hits_total[5m]) + rate(tollbooth_cache_misses_total[5m])). A high miss rate increases settlement load. - Rate-limit blocks —
rate(tollbooth_rate_limit_blocks_total[5m])byroute. Use logs to identify specific abusive clients.
Upstream health
Section titled “Upstream health”- Upstream latency p95 —
histogram_quantile(0.95, rate(tollbooth_upstream_duration_seconds_bucket[5m]))byupstream. - Upstream error rate —
rate(tollbooth_upstream_errors_total[5m])byupstreamandstatus.
Suggested SLOs
Section titled “Suggested SLOs”Use these as starting points and tune based on your traffic patterns:
| SLO | Target | Metric |
|---|---|---|
| Settlement success rate | >= 99.5% | rate(tollbooth_settlements_total{outcome="success"}[5m]) / rate(tollbooth_settlements_total[5m]) |
| Upstream p95 latency | < 800 ms | histogram_quantile(0.95, rate(tollbooth_upstream_duration_seconds_bucket[5m])) |
| 5xx error rate | < 0.5% | tollbooth_requests_total{status=~"5.."} / tollbooth_requests_total |
| Cache hit ratio | > 80% | tollbooth_cache_hits_total / (tollbooth_cache_hits_total + tollbooth_cache_misses_total) |
Troubleshooting checklist
Section titled “Troubleshooting checklist”High 402 rate
Section titled “High 402 rate”- Check if clients are sending the
payment-signatureheader. - Verify the route price hasn’t changed unexpectedly —
grep price tollbooth.config.yaml. - Check if the x402 discovery endpoint is reachable:
curl https://your-tollbooth/.well-known/x402. - Look at log lines with
status: 402— theerrorfield tells you exactly why.
Settlement failures
Section titled “Settlement failures”- Check
tollbooth_settlements_total{outcome="failure"}for the affected strategy. - For
facilitatorstrategy: is the facilitator endpoint reachable?curl https://x402.org/facilitator/health. - Look at
duration_msonsettlement_failedlog lines — timeouts may indicate network issues. - Check for
msg: "settlement_failed"entries in recent logs — thereasonfield tells you exactly why.
High upstream latency
Section titled “High upstream latency”- Compare
upstream_statustiming withduration_ms— if they’re close, tollbooth isn’t the bottleneck. - Check the upstream’s own status page or health endpoint.
- Look for
upstream_errors_totalspikes that coincide with latency increases.
Low cache hit rate
Section titled “Low cache hit rate”- Verify that verification caching is enabled in your config.
- Check if clients are sending unique payment tokens per request (expected for fresh payments, but repeated verifications should hit cache).
- A restart clears the in-memory cache — frequent restarts will reduce hit rate.
Rate-limit blocks affecting legitimate traffic
Section titled “Rate-limit blocks affecting legitimate traffic”- Review the rate-limit config for the affected route.
- Check
payerin the blocked requests — is it a single heavy client or many? - Consider per-client rate limits instead of global limits if traffic patterns are uneven.
Next: Refund Protection →