Skip to content

Monitoring & Observability

tollbooth emits structured JSON logs for every request. This guide covers the log fields you should index, the metrics you should track, and how to diagnose common production issues.

Every log line tollbooth produces is a JSON object. The fields below are the most useful for filtering, alerting, and dashboarding.

FieldTypeDescription
msgstringEvent type — "request", "payment_settled", "route_not_found", "settlement_failed", etc.
timestampstringISO 8601 timestamp.
levelstringLog level: "debug", "info", "warn", "error".
methodstringHTTP method, e.g. "POST".
pathstringRequest path, e.g. "/v1/messages".
routestringMatched route pattern, e.g. "POST /v1/messages".
statusnumberHTTP status code returned to the client.
duration_msnumberTotal request duration in milliseconds.
pricestringPrice charged for the route, e.g. "$0.075".
payerstringPayer wallet address extracted from the payment header.
tx_hashstringOn-chain transaction hash from settlement.
amountstringSettlement amount.
upstream_statusnumberHTTP status code from the upstream response.
reasonstringReason for rejection or failure.
errorstringError message, if any.
LevelWhen
infoSuccessful paid requests, settlements completed.
warn402 responses, 429 rate-limit blocks, recoverable failures.
errorSettlement failures, upstream crashes, hook errors.
debugVerbose tracing for development. Disable in production.

Successful paid request:

{
"msg": "request",
"level": "info",
"timestamp": "2025-12-01T14:30:00.000Z",
"method": "POST",
"path": "/v1/messages",
"route": "POST /v1/messages",
"status": 200,
"duration_ms": 342,
"price": "$0.075",
"payer": "0x1234...abcd",
"tx_hash": "0xdeadbeef..."
}

Payment settled:

{
"msg": "payment_settled",
"level": "info",
"timestamp": "2025-12-01T14:30:00.187Z",
"route": "POST /v1/messages",
"payer": "0x1234...abcd",
"amount": "$0.075",
"tx_hash": "0xdeadbeef..."
}

402 — missing payment:

{
"msg": "request",
"level": "warn",
"timestamp": "2025-12-01T14:30:01.000Z",
"method": "GET",
"path": "/data",
"route": "GET /data",
"status": 402,
"duration_ms": 3,
"price": "$0.01",
"error": "missing payment-signature header"
}

Settlement failed:

{
"msg": "settlement_failed",
"level": "error",
"timestamp": "2025-12-01T14:30:02.000Z",
"route": "POST /v1/messages",
"payer": "0x1234...abcd",
"reason": "facilitator timeout"
}

tollbooth propagates request IDs for end-to-end tracing:

  1. If the incoming request has an X-Request-Id header, tollbooth uses it.
  2. Otherwise, tollbooth generates a req_ prefixed UUID.
  3. The ID is forwarded to the upstream in X-Request-Id.
  4. The same ID appears in the response header and in every log line for that request.

This lets you correlate a single request across tollbooth logs, upstream API logs, and client-side traces.

When gateway.metrics is enabled, tollbooth exposes a Prometheus-compatible /metrics endpoint. Names follow the tollbooth_ prefix convention.

gateway:
metrics: true
MetricLabelsDescription
tollbooth_requests_totalroute, method, statusTotal requests by route, method, and HTTP status.
tollbooth_payments_totalroute, outcomePayment attempts. outcome = success, rejected, missing.
tollbooth_settlements_totalstrategy, outcomeSettlement attempts. outcome = success, failure.
tollbooth_cache_hits_totalrouteVerification cache hits.
tollbooth_cache_misses_totalrouteVerification cache misses.
tollbooth_rate_limit_blocks_totalrouteRequests blocked by rate limiting. Use logs for per-client breakdown — putting client_id in a Prometheus label causes cardinality explosion with unbounded wallet addresses.
tollbooth_upstream_errors_totalupstream, statusNon-2xx responses from upstreams.
tollbooth_revenue_usd_totalrouteCumulative revenue in USD from successful settlements. Incremented by the route price on each settled request.
MetricLabelsDescription
tollbooth_request_duration_secondsroute, methodEnd-to-end request latency.
tollbooth_settlement_duration_secondsstrategySettlement latency.
tollbooth_upstream_duration_secondsupstreamUpstream response latency.
MetricLabelsDescription
tollbooth_active_requestsCurrently in-flight requests.

tollbooth’s request lifecycle forms a conversion funnel. Tracking drop-off at each stage tells you exactly where revenue is lost.

Request outcomes (from tollbooth_requests_total):

All inbound requests
├─ 402 — payment required (client didn't pay)
├─ 200 — success (free route, cached session, or paid + settled + upstream OK)
└─ 5xx — internal / upstream error

Paid-request pipeline (separate counters):

402 issued
→ tollbooth_payments_total{outcome="success"} ← payment verified
→ tollbooth_settlements_total{outcome="success"} ← settled on-chain
→ tollbooth_requests_total{status="200"} ← upstream responded OK
→ tollbooth_revenue_usd_total ← revenue collected

Payment verification and settlement are separate phases — a payment can be verified successfully (payments_total{outcome="success"}) while the subsequent settlement still fails (settlements_total{outcome="failure"}). A healthy funnel has minimal drop-off between verification and settlement, and between settlement and successful upstream responses. If you see a gap between those first two counters, check facilitator health. A gap between settlements and 200s means upstreams are erroring after payment (see Refund Protection).

If you’re using Grafana (or any Prometheus-compatible dashboarding tool), these are the panels worth setting up:

  • Request raterate(tollbooth_requests_total[5m]) broken down by route.
  • 402 raterate(tollbooth_requests_total{status="402"}[5m]). A spike means clients aren’t sending payment headers. Check if a client library updated or a route price changed.
  • Error raterate(tollbooth_requests_total{status=~"5.."}[5m]) by route. Upstream failures vs. tollbooth errors.
  • Revenue raterate(tollbooth_revenue_usd_total[5m]) by route. Shows real-time earning velocity.
  • Cumulative revenuetollbooth_revenue_usd_total by route. Total revenue per route since last restart. If you need persistent revenue accounting across restarts, export metrics to a remote store (e.g. Prometheus with long-term storage) or emit settlement events to an external analytics system — tollbooth itself is not an accounting system.
  • Settlement success raterate(tollbooth_settlements_total{outcome="success"}[5m]) / rate(tollbooth_settlements_total[5m]). Alert if this drops below 99%.
  • Settlement latency p95histogram_quantile(0.95, rate(tollbooth_settlement_duration_seconds_bucket[5m])). Facilitator latency over 500 ms warrants investigation.
  • Payment rejection raterate(tollbooth_payments_total{outcome="rejected"}[5m]). Rejections mean invalid signatures or insufficient funds.
  • Cache hit ratiorate(tollbooth_cache_hits_total[5m]) / (rate(tollbooth_cache_hits_total[5m]) + rate(tollbooth_cache_misses_total[5m])). A high miss rate increases settlement load.
  • Rate-limit blocksrate(tollbooth_rate_limit_blocks_total[5m]) by route. Use logs to identify specific abusive clients.
  • Upstream latency p95histogram_quantile(0.95, rate(tollbooth_upstream_duration_seconds_bucket[5m])) by upstream.
  • Upstream error raterate(tollbooth_upstream_errors_total[5m]) by upstream and status.

Use these as starting points and tune based on your traffic patterns:

SLOTargetMetric
Settlement success rate>= 99.5%rate(tollbooth_settlements_total{outcome="success"}[5m]) / rate(tollbooth_settlements_total[5m])
Upstream p95 latency< 800 mshistogram_quantile(0.95, rate(tollbooth_upstream_duration_seconds_bucket[5m]))
5xx error rate< 0.5%tollbooth_requests_total{status=~"5.."} / tollbooth_requests_total
Cache hit ratio> 80%tollbooth_cache_hits_total / (tollbooth_cache_hits_total + tollbooth_cache_misses_total)
  1. Check if clients are sending the payment-signature header.
  2. Verify the route price hasn’t changed unexpectedly — grep price tollbooth.config.yaml.
  3. Check if the x402 discovery endpoint is reachable: curl https://your-tollbooth/.well-known/x402.
  4. Look at log lines with status: 402 — the error field tells you exactly why.
  1. Check tollbooth_settlements_total{outcome="failure"} for the affected strategy.
  2. For facilitator strategy: is the facilitator endpoint reachable? curl https://x402.org/facilitator/health.
  3. Look at duration_ms on settlement_failed log lines — timeouts may indicate network issues.
  4. Check for msg: "settlement_failed" entries in recent logs — the reason field tells you exactly why.
  1. Compare upstream_status timing with duration_ms — if they’re close, tollbooth isn’t the bottleneck.
  2. Check the upstream’s own status page or health endpoint.
  3. Look for upstream_errors_total spikes that coincide with latency increases.
  1. Verify that verification caching is enabled in your config.
  2. Check if clients are sending unique payment tokens per request (expected for fresh payments, but repeated verifications should hit cache).
  3. A restart clears the in-memory cache — frequent restarts will reduce hit rate.

Rate-limit blocks affecting legitimate traffic

Section titled “Rate-limit blocks affecting legitimate traffic”
  1. Review the rate-limit config for the affected route.
  2. Check payer in the blocked requests — is it a single heavy client or many?
  3. Consider per-client rate limits instead of global limits if traffic patterns are uneven.

Next: Refund Protection →