Chapter 68: Real-Time, Collaboration, and Multiplayer State

Real-time and collaborative applications are one of the categories that genuinely benefit from richer client-side architecture than the platform-first defaults provide.

Figma. Linear. Notion. Google Docs. Slack. Microsoft Teams. Discord. Replit. Each of these has multiple users interacting with shared state, with updates flowing between them in something close to real time, with conflict resolution when simultaneous edits collide. The architectural problems are non-trivial — the application has to maintain consistent state across many clients, handle network failures and reconnections, resolve conflicts predictably, and present a coherent UI to each user.

This chapter walks through how real-time and collaborative architectures fit into the Kit model. The platform provides the network primitives (WebSockets, WebRTC, BroadcastChannel). The conflict-resolution algorithms (Operational Transforms and CRDTs) live in libraries that integrate as modules. The integration with the rest of the architecture follows the same patterns as the rest of Part VII — the Kit runtime coordinates; the real-time module handles its specific responsibilities.

The chapter is part inventory of the technology and part architectural guidance for how to integrate.

The Network Primitives

The platform provides three primitives for real-time work, each with a specific shape.

WebSockets (new WebSocket(url)) give the application a full-duplex bidirectional connection to a server. Messages flow in both directions. The connection is long-lived (the server pushes updates as they happen, rather than the client polling). The protocol is binary or text, with no message format imposed; applications layer their own protocol on top.

WebSockets have been the standard real-time primitive since they shipped widely around 2012. The protocol is well-supported, well-understood, and integrates with most server-side runtimes. The trade-offs are familiar — connections can drop and have to be reconnected, scaling requires server-side infrastructure (sticky sessions, message brokers, fan-out architecture), and the protocol doesn’t include built-in primitives for the kinds of messages real-time applications send (state diffs, presence updates, message ordering).

Server-Sent Events (new EventSource(url)) are a one-way HTTP-based alternative. The server pushes; the client receives. The protocol is simpler than WebSockets (it’s just an HTTP response that doesn’t end), works over standard HTTP infrastructure, and is automatically reconnecting. For applications where only the server-to-client direction matters (live activity feeds, server-sent notifications), SSE is often simpler than WebSockets.

WebRTC (Chapter 32) provides peer-to-peer connections. The browser talks directly to other browsers, with no intermediate server (after the initial signaling handshake). The data channel can carry arbitrary application data with low latency. WebRTC is useful when server bandwidth costs matter (the messages don’t pass through a server) or when latency-sensitive coordination between peers is required (multiplayer games, collaborative editing where every keystroke matters).

BroadcastChannel (Chapter 25) is the in-browser version of the same pattern. Tabs of the same origin can communicate without going through the network. For cross-tab synchronization of state, BroadcastChannel is simpler and faster than WebSockets.

Most real-time applications use some combination — WebSockets for client-server, BroadcastChannel for cross-tab, occasionally WebRTC for direct peer communication. The Kit architecture treats each as a module’s implementation detail; the application’s components don’t care which network primitive carries which message.

Operational Transforms

Google Docs popularized one approach to real-time collaborative editing: Operational Transforms (OT).

The pattern, simplified: every user’s edit is represented as an operation (an insertion, a deletion, a formatting change). Operations are sent to a central server. The server orders the operations, transforms concurrent operations against each other to resolve conflicts, and broadcasts the resolved operations to all clients. Each client applies the transformed operations to its local state.

OT has been used in production for over a decade by Google Docs, Microsoft Office Online, Quill, and many other collaborative editors. The pattern works, but the math is subtle. Designing a correct transform function for a non-trivial document model is hard. Implementations have routinely had bugs — operations that, when transformed in certain orders, produce inconsistent state across clients. Google’s internal OT implementation (Wave, then Docs) is one of the longest-running engineering investments in this kind of code.

For applications building on OT, the main libraries are ShareJS (open source, used by ShareDB), ot.js (deprecated, simpler implementations), and the proprietary engines inside Google Docs and Office. Adopting OT means committing to one of these libraries and the operational model it imposes.

CRDTs: The Modern Alternative

Conflict-free Replicated Data Types (CRDTs) are the more recent alternative.

The pattern: every piece of state has a data structure whose merges are commutative, associative, and idempotent. Two clients can make changes independently; merging the changes produces the same result regardless of order. No central authority is required to resolve conflicts.

The CRDT approach has theoretical advantages over OT — no central server is structurally necessary, peer-to-peer architectures are straightforward, the merge function is mechanical rather than requiring careful per-operation transforms. The trade-off is that CRDT data structures are more complex than the underlying state they represent (each value carries metadata about its causality), and the memory overhead is non-trivial for large documents.

The most-used CRDT library in the JavaScript ecosystem is Yjs, written by Kevin Jahns starting in 2014. Yjs provides CRDT-backed data types — Y.Map, Y.Array, Y.Text, Y.XmlFragment — that the application uses as its state. Edits to a Yjs document produce update messages that can be sent over any transport (WebSockets, WebRTC, BroadcastChannel, file sync, anything). The receiving end applies the update; the local state converges to the same value as the sender’s.

Yjs has been adopted by Notion (replacing Notion’s earlier OT implementation), Atlassian (Confluence’s collaborative editor), JupyterLab, BlockSuite (used by AFFiNE), and many smaller real-time applications. The library is efficient (the implementation has been optimized over years), small (under 80 KB minified + gzipped for the core), and reliable.

A second, newer CRDT library worth knowing about is Automerge, originally by Martin Kleppmann and his collaborators at Cambridge. Automerge’s design emphasizes local-first applications — applications where the user’s data lives on their own device and syncs to other devices peer-to-peer, with no central server. Local-first is a smaller market than collaborative-with-server applications, but the approach is principled and the library is mature.

Jahns and Kleppmann are both worth knowing about as researchers and engineers. Their work has made real-time collaboration substantially more tractable for application teams than it was a decade ago.

Liveblocks and Hosted Real-Time

For teams that don’t want to operate WebSocket infrastructure, hosted services provide the server-side piece.

Liveblocks is the most-prominent of these. The service provides a hosted WebSocket layer with built-in primitives for presence (showing which users are in a document), storage (shared state with CRDT-backed conflict resolution), and yjs hosting (running Yjs documents on Liveblocks’ infrastructure). The client SDK integrates with React, Vue, Svelte, and (with effort) Lit.

Other hosted real-time services include Ably, Pusher, PubNub, Cloudflare Durable Objects (more infrastructure than a finished real-time SDK, but commonly used for real-time work), and the various low-code platforms that include real-time features.

The trade-off with hosted services is the supply-chain consideration (Chapter 17) — the service is now a dependency, with its own pricing model, its own outage profile, its own deprecation risk. For applications where real-time is core, hosting it yourself (WebSocket server + message broker) may be the right answer. For applications where real-time is one feature among many, the hosted service often makes sense — the team doesn’t have to operate WebSocket infrastructure, scale message brokers, or build the presence and conflict-resolution machinery.

Figma’s Multiplayer Architecture

A worth-naming case study because it’s been written about publicly.

Figma’s real-time editor was one of the most ambitious real-time applications of the past decade. The team published How Figma’s multiplayer technology works (Evan Wallace, 2019) — a detailed account of the architecture and the engineering trade-offs.

The summary: Figma uses a CRDT-like approach with server-mediated message ordering. Every edit produces an operation. Operations are sent to the server. The server assigns each operation a monotonically increasing sequence number and broadcasts to all connected clients. Clients apply operations in sequence order, with local optimistic updates and reconciliation when the server’s ordering differs.

The architecture is somewhere between pure CRDTs and OT — the server provides ordering (which pure CRDTs don’t require), but the operations themselves are CRDT-like (mergeable in the right ways). Figma’s writeup is one of the clearest production accounts of these architectural decisions and is worth reading for anyone designing a real-time application.

Figma is also an Electron application (Chapter 18) — the desktop version is a wrapped version of the web application. The architecture composes cleanly: the same WebSocket protocol, the same CRDT-like operations, the same client code. The web and desktop versions share the implementation.

How Real-Time Fits Kit

The Kit architecture accommodates real-time work through modules that own the real-time concerns.

A typical real-time module:

import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'

const realtimeDocModule = defineKitModule({
  name: 'realtime-doc',
  providers: [
    { token: REALTIME_DOC, value: null as Y.Doc | null }
  ],
  onInstall: async ({ runtime }) => {
    const doc = new Y.Doc()
    const provider = new WebsocketProvider(
      'wss://realtime.example.com',
      `doc-${currentDocId}`,
      doc
    )

    // Replace the provider value with the live doc
    runtime.provide(REALTIME_DOC, doc)

    // Wire local changes through the runtime's event bus
    doc.on('update', (update, origin) => {
      if (origin === provider) {
        runtime.emit({
          type: 'realtime.update_received',
          payload: { from: 'remote', size: update.byteLength }
        })
      } else {
        runtime.emit({
          type: 'realtime.update_sent',
          payload: { size: update.byteLength }
        })
      }
    })

    provider.on('status', (event) => {
      runtime.emit({
        type: 'realtime.connection_changed',
        payload: { status: event.status }
      })
    })
  }
})

The module owns the Yjs document, the WebSocket connection, the bridge to the runtime’s event bus. Components that need to read or write the collaborative state inject the REALTIME_DOC provider:

class CollaborativeTextEditor extends LitElement {
  private get doc() {
    return runtime.inject(REALTIME_DOC)
  }

  connectedCallback() {
    super.connectedCallback()
    const text = this.doc?.getText('content')
    text?.observe(() => this.requestUpdate())
  }

  render() {
    const text = this.doc?.getText('content').toString() ?? ''
    return html`<textarea .value=${text} @input=${this.handleInput}></textarea>`
  }

  private handleInput(event: Event) {
    const value = (event.target as HTMLTextAreaElement).value
    const text = this.doc?.getText('content')
    if (text) {
      text.delete(0, text.length)
      text.insert(0, value)
    }
  }
}

The component reads from the Yjs document. When the document changes (either from local edits or from remote updates), the component re-renders. When the user types, the component writes to the document. Yjs handles the synchronization, conflict resolution, and propagation across clients.

The architecture’s separation of concerns helps. The component knows about the document but not about the WebSocket. The module knows about the WebSocket but not about the component. The runtime routes events but doesn’t care about either. The capability layer can observe realtime.update_received or realtime.connection_changed without depending on the specific real-time technology.

Presence: A Common Pattern

A specific real-time feature most collaborative applications need is presence — showing which users are currently viewing a document, what they’re doing, where their cursor is.

The pattern doesn’t require CRDTs. It’s simpler: a transient state shared across clients, with each client publishing its own state and receiving updates when others change theirs. Yjs has awareness support that handles this with a simple API. Liveblocks has built-in presence. Custom WebSocket-based implementations are straightforward.

// Yjs awareness
const awareness = provider.awareness

awareness.setLocalStateField('user', {
  id: currentUser.id,
  name: currentUser.displayName,
  cursor: { x: 100, y: 200 }
})

awareness.on('change', () => {
  const states = Array.from(awareness.getStates().values())
  runtime.emit({
    type: 'realtime.presence_updated',
    payload: { users: states.map(s => s.user) }
  })
})

Components subscribed to the presence event can render the other users’ cursors, avatars, and indicators. The pattern is composable; multiple presence-aware features can coexist on the same WebSocket connection.

Offline-First and Local-First

A related architectural pattern is offline-first — the application works without a network connection, with sync happening when connectivity is restored.

CRDTs are particularly well-suited to offline-first because the merge semantics handle the case where two users make changes while disconnected. When connectivity returns, the changes merge correctly without manual conflict resolution.

For Kit applications that want offline-first, the storage layer (Chapter 25) and the real-time module compose naturally. Local edits go to IndexedDB. The real-time module syncs when connectivity is available. Cross-tab consistency comes free (BroadcastChannel). The user never sees a connection lost error that blocks them from working; they just see updates from other users delayed until the network returns.

The local-first movement (the Ink & Switch lab’s writing, particularly Geoffrey Litt’s and Martin Kleppmann’s work) has been articulating these architectural patterns for several years. The patterns are still ahead of mainstream adoption, but they’re applicable today for teams that want to build them.

Trade-Offs

Real-time work is a real complexity addition. The architecture should be honest about it.

Real-time adds infrastructure. WebSocket servers, message brokers, sticky sessions, scaling considerations. The hosted services (Liveblocks, Ably) remove some of this; self-hosted real-time is a substantial engineering investment.

CRDTs have overhead. Memory and CPU costs compared to plain data structures. For documents that need to scale to thousands of operations, the CRDT implementation has to be efficient (Yjs is; some others aren’t).

Real-time UI is harder. Optimistic updates, conflict indicators, presence cursors, undo across users — each of these is a UI design challenge that doesn’t exist in single-user applications.

Testing is harder. Multi-user scenarios are harder to test than single-user ones. Network conditions, timing, and ordering all matter.

For applications that genuinely need real-time, the complexity is worth it. For applications that don’t, adding real-time without a clear product reason is over-engineering. The architectural lift is in deciding when real-time is the right answer — and applying it only there.

Bridge to Privacy and Security

The next chapter (Chapter 69) takes on production privacy, security, and data-capture concerns. Many of the same considerations apply to real-time work — the WebSocket carries user data, the CRDT updates contain content that may be sensitive, the presence information reveals user activity. The privacy and security architecture composes with the real-time architecture.

Exercise: Build a Collaborative Counter

Build the smallest possible real-time application. A shared counter visible to multiple connected users, with each user able to increment it.

Choose a real-time substrate:

Yjs with y-websocket and a local Yjs server (the package includes a minimal one).
Liveblocks using their free tier.
Custom WebSocket with a small Node server.

Build:

A realtime-counter module that establishes the connection and shares the counter state.
A <kit-counter> custom element that displays the counter and provides an increment button.
The element reads the state from the module’s provider and renders.
The increment button emits a counter.incremented event that the module observes and applies as a CRDT operation.

Open the application in three browser tabs. Verify:

Each tab sees the same counter value.
Incrementing in one tab updates the others within milliseconds.
Disconnecting one tab (offline mode, then close the browser), then reconnecting, restores the state.
Two tabs incrementing simultaneously produces the correct sum (not a lost update).

Then extend:

Add presence — show how many tabs are connected.
Add a who incremented last indicator.
Add reconciliation if the user makes changes while offline.
Measure the network bandwidth — how many bytes per increment?

Reflect on:

How much of the work was the network setup?
How much was the CRDT operation?
How much was the UI?
If the application had a hundred concurrent users, what would scale? What would break?
If the real-time service went down, what would the application’s degraded experience look like?

The exercise is the architecture’s most ambitious composition. The Kit runtime provides the coordination layer. The real-time module owns the WebSocket and CRDT. The component reads from the module’s provider. The capability layer can observe events as needed. The whole system composes from independent pieces; each piece does one thing well; the result is a real real-time application that runs on the platform’s primitives.