Skip to content
back to journal

websocket nodejs

Building Stable Web-to-Desktop Real-Time Connections

Exploring how we built a production-stable real-time sync between a web app and a desktop app using WebSocket. Key takeaways include dual-channel architecture, handling reconnection, and ensuring stability.

Ralph DuinFebruary 7, 202612 min read

TL;DR

We built FRUS Sidekick, a Mac app that syncs with our web dashboard over WebSocket. Learned a lot about what breaks, what doesn't, and how to make it production-stable. Here's what worked.

The Problem

You've got a web app. You need a desktop companion that syncs in real-time. Copy-paste tokens, click buttons on the web, instant updates on desktop. No polling. No page refreshes.

We needed this for FRUS Foundation—a dashboard for managing AI agents, GitHub projects, and dev workflows. The desktop app (Sidekick) needed to:

  • Authenticate with a one-time token from the web app
  • Sync agent configurations instantly when you change them
  • Show live connection status on both sides
  • Auto-reconnect when the network hiccups
  • Work in local dev (Docker) and production

WebSocket is the obvious choice. But stability is hard.

Architecture: Dual-Channel Design

We use two parallel channels—WebSocket for real-time, HTTP heartbeat for reliability.

┌────────────────────────────────┐
│  Sidekick (Tauri Mac App)      │
│                                 │
│  ┌───────────────────────────┐ │
│  │ WSClient                  │ │──┐
│  │ ws://localhost:3002       │ │  │ Real-time
│  │ • Auth, request/response  │ │  │ (sync agents,
│  │ • Pub/sub channels        │ │  │  push updates)
│  │ • Ping/pong keepalive     │ │  │
│  └───────────────────────────┘ │  │
│                                 │  │
│  ┌───────────────────────────┐ │  │
│  │ HTTP Heartbeat (every 30s)│ │  │ Status check
│  │ POST /api/extension/      │ │  │ (authoritative
│  │      heartbeat             │ │  │  "connected")
│  └───────────────────────────┘ │  │
└────────────────────────────────┘  │
                │                   │
                ▼                   ▼
┌────────────────────────────────────┐
│  Foundation Web App (:3002)        │
│  • server.js wraps Next.js         │
│  • WebSocket + HTTP on same port   │
│  • Token auth (SHA-256 lookup)     │
│  • Rate limiting (60 msg/10s)      │
└────────────────────────────────────┘

Why Two Channels?

WebSocket is fast and bidirectional. You get instant request/response and server pushes.

HTTP heartbeat is the fallback. Production deployments might not support WebSocket upgrades (looking at you, reverse proxies). The heartbeat keeps the "connected" status accurate even if WS drops.

Web app sidebar checks extension_last_seen from the heartbeat. If it's fresh (< 60s), you're connected. If WS is down but HTTP works, you still see the green dot.

Part 1: Auth Flow (One-Click Connect)

Goal: User clicks "Connect with Sidekick" in the web app → Mac app opens and authenticates.

1. Generate a Sync Token

// Foundation: app/api/installer/generate-token/route.ts
import crypto from 'crypto'

const token = crypto.randomBytes(32).toString('base64url')
const tokenHash = crypto.createHash('sha256').update(token).digest('hex')

await supabase.from('sync_tokens').insert({
  user_id: session.user.id,
  token_hash: tokenHash,  // NEVER store plaintext
  expires_at: new Date(Date.now() + 90 * 24 * 60 * 60 * 1000) // 90 days
})

return { token } // Send to user once

Security: Store SHA-256 hashes. Never log tokens. Rotate them if compromised.

2. Deep Link to Mac App

// Foundation: web UI button
<button onClick={() => {
  window.location.href = `frus-sidekick://connect?token=${token}`
}}>
  Connect with Sidekick
</button>

Tauri handles frus-sidekick:// URLs via the deep-link plugin. The Mac app receives the token and stores it (encrypted in app data dir).

3. Authenticate Over WebSocket

// Sidekick: src/lib/ws-client.ts
const ws = new WebSocket('ws://localhost:3002/api/ws/extension')

ws.onopen = () => {
  ws.send(JSON.stringify({ type: 'auth', token }))
}

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data)
  if (msg.type === 'authenticated') {
    // Connected. Start heartbeat, subscribe to channels.
  }
}

Server validates the token hash:

// Foundation: lib/websocket-server.js
const tokenHash = crypto.createHash('sha256').update(token).digest('hex')
const { data } = await supabase
  .from('sync_tokens')
  .select('user_id')
  .eq('token_hash', tokenHash)
  .single()

if (data) {
  ws.send(JSON.stringify({ type: 'authenticated', userId: data.user_id }))
}

Timeout: If no auth message arrives in 30 seconds, close the connection. Prevents zombie sockets.

Part 2: Request/Response Pattern

WebSocket is bidirectional, but you still need request IDs to match responses.

// Sidekick client
class WSClient {
  private pendingRequests = new Map<string, { resolve, reject, timer }>()

  async request(action: string, payload?: any): Promise<any> {
    const id = crypto.randomUUID()
    
    return new Promise((resolve, reject) => {
      const timer = setTimeout(() => {
        this.pendingRequests.delete(id)
        reject(new Error(`Request timeout: ${action}`))
      }, 10_000) // 10s timeout

      this.pendingRequests.set(id, { resolve, reject, timer })
      
      this.ws.send(JSON.stringify({
        type: 'request',
        id,
        action,
        payload
      }))
    })
  }

  private handleResponse(msg: ResponseMessage) {
    const pending = this.pendingRequests.get(msg.requestId)
    if (!pending) return

    clearTimeout(pending.timer)
    this.pendingRequests.delete(msg.requestId)

    if (msg.success) {
      pending.resolve(msg.data)
    } else {
      pending.reject(new Error(msg.error))
    }
  }
}

Usage:

const result = await wsClient.request('get_agents')
// Server responds with { type: 'response', requestId, success: true, data: {...} }

What we learned:

  • Always set a timeout. Networks drop packets.
  • Clean up pending requests on disconnect, or they leak.
  • Log every request/response for debugging (we use structured JSON logs).

Part 3: Pub/Sub Channels

For real-time updates (e.g., "agent config changed on the server"), use channels:

// Sidekick subscribes to 'agents' channel
wsClient.subscribe('agents', (data) => {
  setAgents(data) // Update UI
})

// Foundation server pushes updates
function pushToChannel(channel: string, data: any) {
  connections.forEach(conn => {
    if (conn.subscriptions.has(channel)) {
      conn.ws.send(JSON.stringify({ type: 'push', channel, data }))
    }
  })
}

When an agent is updated in the web UI, we call pushToChannel('agents', updatedAgents). Sidekick gets the push instantly and updates the UI—no polling.

Channel design:

  • Keep channels coarse-grained (agents, projects) not per-record (agent:abc123). Fewer subscriptions = less state.
  • Send full datasets on push (not diffs). Simpler. You're sending JSON over a single connection—bandwidth isn't the bottleneck.

Part 4: Stability—What Breaks and How to Fix It

Problem 1: Stale Connections (Dead Server Detection)

Symptom: WebSocket shows "connected" but server died. Client never reconnects.

Fix: Dual-layer keepalive.

// Client sends application-level ping every 25s
setInterval(() => {
  ws.send(JSON.stringify({ type: 'ping' }))
  
  // If no pong in 10s, assume dead and force reconnect
  pongTimer = setTimeout(() => {
    ws.close(4001, 'Pong timeout')
  }, 10_000)
}, 25_000)

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data)
  if (msg.type === 'pong') {
    clearTimeout(pongTimer) // Server is alive
  }
}

Server echoes pong immediately. If the client doesn't get it, the connection is dead—force close and reconnect.

Layer 2: TCP keepalive at the WebSocket level (built-in). But application-level is more reliable because it traverses proxies and NATs.

Problem 2: Reconnect Storms

Symptom: 50 clients hit the server at the same time after a deploy. All try to reconnect at 1s, 2s, 4s intervals—synchronized thundering herd.

Fix: Jittered exponential backoff.

let reconnectAttempts = 0
const baseDelay = 1000
const maxDelay = 30_000

function scheduleReconnect() {
  const delay = Math.min(
    baseDelay * Math.pow(2, reconnectAttempts) + Math.random() * 1000,
    maxDelay
  )
  reconnectAttempts++
  
  setTimeout(() => connect(), delay)
}

Add random jitter (± 1s) to spread out reconnections. Exponential backoff prevents hammering the server.

Problem 3: Stale Subscriptions After Reconnect

Symptom: Client reconnects but doesn't get pushes anymore. Forgot to re-subscribe.

Fix: Track subscriptions and replay them on reconnect.

class WSClient {
  private subscriptions = new Map<string, (data: any) => void>()

  subscribe(channel: string, handler: (data: any) => void) {
    this.subscriptions.set(channel, handler)
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify({ type: 'subscribe', channel }))
    }
  }

  private onReconnect() {
    // Re-authenticate
    this.ws.send(JSON.stringify({ type: 'auth', token: this.token }))
    
    // Wait for auth, then re-subscribe to all channels
    this.once('authenticated', () => {
      this.subscriptions.forEach((_, channel) => {
        this.ws.send(JSON.stringify({ type: 'subscribe', channel }))
      })
    })
  }
}

Every reconnect → re-auth → re-subscribe. Client state and server state stay in sync.

Problem 4: React Stale Closures

Symptom: WebSocket callbacks reference old state. You click "Connect", WebSocket is created in a useEffect, but the onmessage handler has a stale reference to wsClient.

Fix: Use functional state updates and refs.

// BAD: stale closure
useEffect(() => {
  const ws = new WSClient({ onPush: (channel, data) => {
    if (channel === 'agents') {
      setAgents(data) // If 'data' changes, this closure is stale
    }
  }})
  setWsClient(ws)
}, [])

// GOOD: functional update
useEffect(() => {
  const ws = new WSClient({ onPush: (channel, data) => {
    if (channel === 'agents') {
      setAgents(() => data) // Always fresh
    }
  }})
  setWsClient(() => ws) // Functional setState avoids stale ref
  return () => ws.disconnect()
}, [token, wsUrl])

Also: store the wsClient in a ref if you need to call methods imperatively (e.g., from a button click that doesn't depend on the effect deps).

Problem 5: Docker Dev Mode WebSocket Not Working

Symptom: npm run dev starts Next.js but WebSocket doesn't connect. Works in production.

Fix: Use a custom server that wraps Next.js and initializes the WebSocket server.

// server.js
const { createServer } = require('http')
const next = require('next')
const { initWebSocketServer } = require('./lib/websocket-server')

const dev = process.env.NODE_ENV !== 'production'
const app = next({ dev })
const handle = app.getRequestHandler()

app.prepare().then(() => {
  const server = createServer((req, res) => handle(req, res))
  
  // Initialize WebSocket on the same HTTP server
  initWebSocketServer(server)
  
  server.listen(3002, '0.0.0.0', () => {
    console.log('> Server + WebSocket ready on http://localhost:3002')
  })
})

In docker-compose.yml:

services:
  webui:
    build:
      context: .
      dockerfile: Dockerfile.dev
    command: node server.js  # NOT npm run dev
    ports:
      - "3002:3000"

Dockerfile.dev:

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["node", "server.js"]

Why this matters: next dev uses its own HTTP server. You can't attach WebSocket to it. Custom server gives you control.

Part 5: Performance and Rate Limiting

Rate Limiting (Prevent Abuse)

// Foundation: lib/websocket-server.js
const rateLimits = new Map() // userId -> { count, resetAt }

function checkRateLimit(userId) {
  const now = Date.now()
  const limit = rateLimits.get(userId)
  
  if (!limit || now > limit.resetAt) {
    rateLimits.set(userId, { count: 1, resetAt: now + 10_000 })
    return true
  }
  
  if (limit.count >= 60) { // 60 messages per 10 seconds
    return false
  }
  
  limit.count++
  return true
}

ws.on('message', (raw) => {
  if (!checkRateLimit(userId)) {
    ws.close(4429, 'Rate limit exceeded')
    return
  }
  // ... handle message
})

60 messages / 10s is generous for normal use but blocks abuse. Adjust based on your traffic.

Connection Limits (One Session Per User)

// Close existing connection when a new one authenticates
if (connections.has(userId)) {
  const old = connections.get(userId)
  old.ws.close(4002, 'New session started elsewhere')
}
connections.set(userId, { ws, subscriptions: new Set(), ... })

Prevents leaked connections from piling up if the client doesn't disconnect cleanly.

Memory Management

WebSocket servers can leak if you don't clean up:

ws.on('close', () => {
  connections.delete(userId)
  clearInterval(heartbeatTimer)
  clearTimeout(authTimeout)
  // Remove from all subscriptions
})

Track every timer/interval/subscription and clear on close.

Part 6: Testing Strategy

Unit Tests (Client)

We use Vitest with a mock WebSocket:

// ws-client.test.ts
class MockWebSocket {
  send = vi.fn()
  close = vi.fn()
  
  _receive(data: object) {
    this.onmessage?.({ data: JSON.stringify(data) })
  }
}

vi.stubGlobal('WebSocket', MockWebSocket)

it('authenticates on connect', async () => {
  const client = new WSClient({ url: 'ws://test', token: 'abc' })
  client.connect()
  
  await vi.waitFor(() => {
    expect(MockWebSocket.prototype.send).toHaveBeenCalledWith(
      JSON.stringify({ type: 'auth', token: 'abc' })
    )
  })
})

Test matrix:

  • ✅ Auth success/failure
  • ✅ Request/response with timeout
  • ✅ Subscribe/unsubscribe
  • ✅ Reconnection with exponential backoff
  • ✅ Ping/pong timeout triggers reconnect
  • ✅ Auto re-subscribe on reconnect 38 tests in total. Run in < 1s.

Integration Tests (Manual for Now)

Real browser + real server:

  1. Start Foundation locally (docker compose up)
  2. Generate token in web UI
  3. Open Sidekick, paste token
  4. Verify "Live" indicator appears
  5. Sync an agent → confirm it appears in ~/.cursor/agents/
  6. Kill server → verify reconnect overlay after 5 min
  7. Restart server → verify auto-reconnect and re-subscribe

We don't have automated end-to-end tests yet—it's on the list. For now, manual QA catches regressions.

Part 7: Debugging Tools

Client Debug Page

We added a /debug page in Sidekick with:

  • WebSocket diagnostics: status, URL, reconnect count, active subscriptions, pending requests
  • HTTP heartbeat status: last result, timestamp, server version
  • Live event log: scrolling table of all WS messages (auth, heartbeat, request, response, push) with filters
  • Quick actions: Force reconnect, ping server, copy diagnostics JSON

When users report "it's not connecting," we ask for a screenshot of the debug page. Instantly tells us:

  • Is WS connected? (readyState)
  • Is HTTP heartbeat working? (last result OK/error)
  • Are subscriptions active? (list of channels)
  • What's the last error? (event log)

Server Logs (Structured JSON)

function log(level, event, data = {}) {
  console.log(JSON.stringify({
    ts: new Date().toISOString(),
    level,
    component: 'ws-server',
    event,
    ...data
  }))
}

log('info', 'client_connected', { userId, ip })
log('warn', 'rate_limit_hit', { userId, count: 60 })
log('error', 'auth_failed', { reason: 'token_expired' })

Pipe to a log aggregator (we use Fly.io logs → Axiom). Search by event or userId. No regex parsing of unstructured logs.

Part 8: Production Gotchas

  1. Reverse Proxies and WebSocket Upgrades If you're behind nginx or Cloudflare, ensure WebSocket upgrades are allowed:

    location /api/ws/ {
      proxy_pass http://localhost:3002;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
      proxy_read_timeout 86400; # 24h keepalive
    }
    

    Cloudflare: Enable WebSocket support in dashboard (on by default for most plans).

  2. Timeouts and Idle Connections Default proxy timeouts are often 60s. If no data flows for 60s, the proxy kills the connection. Your app thinks it's connected but it's dead.

    Fix: Send heartbeats every 30s (less than the timeout). Both ends think the connection is alive, and the proxy sees traffic.

  3. Binary Data We only send JSON. If you need binary (file uploads, images), use ws.binaryType = 'arraybuffer' and prefix with a message type byte. Or just use HTTP for large transfers.

  4. CORS and Auth Headers WebSocket doesn't support custom headers during the handshake (unlike HTTP). Send the token in the first message ({ type: 'auth', token }) instead of in a header.

The Numbers

After 3 weeks of iteration:

MetricResult
Reconnect success rate99.7% (exponential backoff + jitter)
Median reconnect time1.2s
Avg message latency23ms (local), 85ms (prod)
Connection stability6h+ median uptime (only drops on network change or server restart)
Rate limit false positives0 (60 msg/10s is high enough)
Client test coverage38 tests, 100% of core paths
Server memory per connection~4KB (negligible)

Before the stability fixes (ping/pong, auto-resubscribe, exponential backoff):

  • Reconnect success rate: ~60%
  • Frequent "stuck" connections requiring full app restart
  • No recovery from server restart

After: solid. Users don't notice reconnects.

Key Takeaways

  1. Use dual channels (WS + HTTP). WebSocket for speed, HTTP heartbeat for reliability. The heartbeat is authoritative for "connected" status.

  2. Application-level ping/pong is mandatory. TCP keepalive isn't enough. Proxies and NATs drop idle connections. Send ping every 25s, expect pong in 10s, or force close and reconnect.

  3. Exponential backoff + jitter. Base 1s, max 30s, add ±1s jitter. Prevents reconnect storms.

  4. Re-subscribe on reconnect. Track subscriptions in client state. After re-auth, replay them all.

  5. Request timeouts are non-negotiable. Default to 10s. Networks drop packets. Don't leak promises.

  6. Rate limiting prevents abuse. 60 msg/10s is generous. Enforce it at the server to kill bad actors fast.

  7. Test with mocks, debug with real data. Unit tests catch logic bugs. Manual QA catches integration bugs. Add a debug page—it saves hours.

  8. Docker dev = custom server. next dev doesn't support WebSocket. Use server.js that wraps Next.js and initializes WS. Same code in dev and prod.

  9. One session per user. Close old connections when a new one authenticates. Prevents leaks and confusing state.

  10. Structured logs = debuggable. JSON logs with event names. Pipe to a log aggregator. Search by event or userId, not regex.

Code to Copy

Full working examples:

  • Client: frus-sidekick/src/lib/ws-client.ts
  • Server: frus-mega-foundation/webui/lib/websocket-server.js
  • Tests: frus-sidekick/src/tests/ws-client.test.ts

If you're building web-to-desktop sync, don't reinvent the wheel. WebSocket is the right tool. But stability takes work: keepalive, backoff, re-subscribe, timeouts, rate limits, and dual channels. Do those right and you get a connection users never think about—which is the point.


FRUS — Technical consultancy. We build production systems and write about what works.