Skip to content

Streaming & SSE

Streaming works in tollbooth without response buffering, including Server-Sent Events (SSE). The important decision is settlement timing: when payment is finalized relative to when stream bytes start flowing.

For streaming routes, before-response and after-response behave differently at the first-byte boundary:

ModeWhat happens before first byte to clientGood fitMain risk
before-responsePayment is settled first, then upstream stream is openedPremium streams where you must guarantee payment before contentClient is charged even if upstream fails later
after-responsePayment is verified, upstream is called, then settlement is attempted once upstream response is knownUnreliable/expensive upstreams where charge protection mattersExtra latency before stream starts; settlement can fail after upstream already did work

In practice, first-byte latency for streaming is:

  1. x402 verification/settlement round-trips
  2. upstream time-to-first-byte
  3. any reverse-proxy buffering or timeout behavior

1) after-response does not protect against mid-stream failure

Section titled “1) after-response does not protect against mid-stream failure”

With SSE, a 200 can be returned and then the stream can still terminate early. If settlement already happened, the request is still charged.

Use after-response to protect against obvious upstream failures (5xx, timeouts, no response), not token-perfect delivery guarantees.

2) Settlement failure path is different by mode

Section titled “2) Settlement failure path is different by mode”
  • before-response: settlement failure stops the request before upstream is called.
  • after-response: upstream may already be called; if settlement fails, the stream cannot be delivered as a paid success response.

If your upstream bills you per request/token, this can create provider-cost exposure even when the client was not charged.

3) Proxy defaults can break SSE even when tollbooth is correct

Section titled “3) Proxy defaults can break SSE even when tollbooth is correct”

SSE often fails because of proxy buffering or short read timeouts, not tollbooth routing logic. See troubleshooting below.

Use caseSettlement modeWhy
Paid LLM/chat stream where payment certainty matters mostbefore-responseNo stream starts until payment is finalized
Upstream is flaky and refund protection matters mostafter-responseAvoid charging on upstream 5xx or timeout
Long-lived sessions (10-60 min stream windows)Session purchase route + free stream routeAvoid per-message payment friction

Example 1: Pay-per-request LLM streaming proxy

Section titled “Example 1: Pay-per-request LLM streaming proxy”

Charge each streaming completion request. This is the simplest model for AI APIs.

tollbooth.config.yaml
gateway:
port: 3000
discovery: true
wallets:
base: "0xYourWalletAddress"
accepts:
- asset: USDC
network: base
upstreams:
openai:
url: "https://api.openai.com"
headers:
authorization: "Bearer ${OPENAI_API_KEY}"
routes:
"POST /v1/chat/completions":
upstream: openai
type: token-based
settlement: before-response

Why this works well:

  • Works for both non-stream and stream: true requests.
  • Payment is guaranteed before bytes start.
  • No custom hooks required.

Client UX pattern:

  1. Call stream endpoint.
  2. If 402, show pay prompt and sign.
  3. Retry same request with payment headers.
  4. Render stream chunks as they arrive.

Example 2: Time-window session streaming (pay once, stream N minutes)

Section titled “Example 2: Time-window session streaming (pay once, stream N minutes)”

Use one paid route to mint a short-lived session, then allow free SSE calls while session is valid.

tollbooth.config.yaml
gateway:
port: 3000
upstreams:
stream-api:
url: "https://stream.example.com"
timeout: 900
routes:
# Step 1: paid session purchase
"POST /session/start":
upstream: stream-api
path: "/session/start"
price: "$0.25"
settlement: before-response
# Step 2: stream within the active session
"GET /session/:id/events":
upstream: stream-api
path: "/session/${params.id}/events"
price: "$0.00"
hooks:
onRequest: "hooks/require-valid-session.ts"

hooks/require-valid-session.ts validates a signed session token (for example, 15-minute TTL). Reject with 401 when expired. This keeps pricing predictable for long streams and avoids re-paying every reconnect.

ModelBest forNotes
Fixed upfront per requestMost LLM streaming APIsEasiest to reason about and document
Post-hoc exact usageAdvanced billing systemsRequires custom metering + settlement logic outside simple route pricing
Time-window sessionsLive feeds, dashboards, room streamsBetter reconnect UX and fewer payment prompts

For most teams, start with fixed upfront pricing, then move to session windows only when reconnect behavior becomes a UX problem.

  • Disable proxy buffering (proxy_buffering off;) in Nginx.
  • Ensure Cache-Control: no-cache and Connection: keep-alive headers are preserved.
  • Increase upstream/proxy read timeouts for long-lived responses.
  • Check upstream logs first; many providers close idle streams.
  • Increase route/upstream timeout.
  • Add client auto-reconnect with idempotency keys to avoid duplicate work.
  • Reuse signed payment headers only within their validity window.
  • Ensure client retries the same request payload after a 402.
  • If reconnect frequency is high, move to a time-window session model.
  • Verify CDN/WAF is not buffering or transforming SSE.
  • Confirm HTTP/1.1 keep-alive behavior across load balancers.
  • Compare with the production proxy guidance in Production (VPS).

Next: Refund Protection →