Chapter 71: Testing Modern Frontend Architecture

A Kit application is testable at several distinct levels, and the architecture’s separation of concerns shows up directly in how testable each level is.

Components are testable as components — does the markup render correctly, do the events fire with the right data, do the accessibility properties match the expected shape. Modules are testable as modules — given an event, does the analytics module produce the expected analytics call. The full closed loop is testable as integration — does a user action trigger the right modules in the right order with the right context.

This chapter walks through each layer, with the tools the field has settled on for each. The chapter is short on prose and dense on practical guidance — what to test, what to skip, what tools to use, how the test infrastructure fits into the architecture.

The Test Pyramid for the Kit Architecture

A reasonable test mix:

Component tests (many, fast). Each Kit component has tests covering its rendering, its events, its accessibility, its form participation. The tests run in a browser-like environment (Vitest’s browser mode, Web Test Runner, Playwright’s component testing) and exercise the component in isolation. The tests are fast and easy to maintain.

Module tests (many, very fast). Each capability module has tests that verify, for given input events and commands, the module produces the right output. The tests don’t render any DOM; they exercise the module’s logic directly. Often, modules can be tested with plain unit-test frameworks.

Integration tests (some, slower). The closed loop — user clicks a button, metadata boundary observes, runtime routes, modules respond, UI updates — is testable as a single flow. The tests render real DOM, install real modules, and verify the end-to-end behavior.

End-to-end tests (few, slow). Playwright (or Cypress) drives a real browser against the deployed application. These are the most expensive tests but the closest to user reality. Use them sparingly for critical paths.

Accessibility tests (woven through). Every layer should include accessibility assertions. The component tests verify ARIA roles and keyboard behavior. The integration tests verify focus management. The end-to-end tests can run axe-core (Deque’s automated accessibility scanner) against the deployed pages.

The mix isn’t novel — the test pyramid has been a working principle since Mike Cohn’s Succeeding with Agile (2010) — but applying it to the Kit architecture has specific patterns worth surfacing.

Component Tests with Web Test Runner or Vitest Browser

Modern web component testing has converged on two tools.

Web Test Runner (the Lit team’s recommended option for Lit components) runs tests in real browsers via Playwright. The tests use a @open-wc/testing helper for fixture setup; assertions can use any standard library.

Vitest browser mode (a more recent addition) runs Vitest tests in a real browser via Playwright. The setup is similar but integrates with Vitest’s broader test infrastructure.

For Kit components, either tool works. A kit-button test:

import { fixture, html, expect } from '@open-wc/testing'
import './kit-button.js'

describe('kit-button', () => {
  it('renders with default variant', async () => {
    const el = await fixture<KitButton>(html`<kit-button>Save</kit-button>`)
    expect(el.variant).to.equal('secondary')

    const innerButton = el.shadowRoot!.querySelector('button')
    expect(innerButton).to.exist
    expect(innerButton!.textContent?.trim()).to.equal('') // empty (label in slot)
  })

  it('reflects variant to host', async () => {
    const el = await fixture<KitButton>(html`<kit-button variant="primary">Save</kit-button>`)
    expect(el.getAttribute('variant')).to.equal('primary')
  })

  it('fires click events that bubble', async () => {
    const el = await fixture<KitButton>(html`<kit-button>Save</kit-button>`)
    let clicked = false
    el.addEventListener('click', () => clicked = true)
    el.shadowRoot!.querySelector('button')!.click()
    expect(clicked).to.be.true
  })

  it('participates in forms via internals', async () => {
    const form = await fixture<HTMLFormElement>(html`
      <form>
        <kit-button name="action" value="save" type="submit">Save</kit-button>
      </form>
    `)
    let submitted = false
    form.addEventListener('submit', (e) => {
      e.preventDefault()
      submitted = true
    })

    form.querySelector('kit-button')!.shadowRoot!.querySelector('button')!.click()
    expect(submitted).to.be.true
  })

  it('blocks clicks when disabled', async () => {
    const el = await fixture<KitButton>(html`<kit-button disabled>Save</kit-button>`)
    let clicked = false
    el.addEventListener('click', () => clicked = true)
    el.shadowRoot!.querySelector('button')!.click()
    expect(clicked).to.be.false
  })

  it('has correct accessibility role', async () => {
    const el = await fixture<KitButton>(html`<kit-button>Save</kit-button>`)
    const innerButton = el.shadowRoot!.querySelector('button')!
    expect(innerButton.tagName).to.equal('BUTTON') // native button
  })
})

The tests render the component into a real DOM, exercise its behavior, and assert the results. Each test is self-contained — the fixture is fresh for each test, and there’s no global state leaking between them. The tests are fast (a few hundred milliseconds for a typical component’s suite).

The pattern extends to every Kit component. A kit-dialog test verifies open/close behavior. A kit-text-field test verifies form participation and validity. A kit-disclosure test verifies the toggle behavior. The component-level test coverage scales linearly with the component library.

Module Tests

Modules are easier to test than components because they don’t render DOM.

import { createRuntime } from '@kitsune/core'
import { analyticsModule } from './analytics-module.js'

describe('analytics module', () => {
  it('tracks profile.saved events with full context', async () => {
    const sent: any[] = []
    const fakeProvider = {
      track: (name: string, props: any) => sent.push({ name, props })
    }

    const runtime = createRuntime()
    await runtime.install(analyticsModule({ provider: fakeProvider }))

    runtime.emit({
      type: 'profile.saved',
      context: {
        surface: 'profile-editor',
        feature: 'preferences',
        entity: { type: 'profile', id: 'user_123' }
      }
    })

    expect(sent).to.have.length(1)
    expect(sent[0].name).to.equal('Profile Saved')
    expect(sent[0].props.surface).to.equal('profile-editor')
    expect(sent[0].props.feature).to.equal('preferences')
    expect(sent[0].props.entity_id).to.equal('user_123')
  })

  it('respects private boundaries', async () => {
    const sent: any[] = []
    const fakeProvider = { track: (name: string, props: any) => sent.push({ name, props }) }

    const runtime = createRuntime()
    await runtime.install(analyticsModule({ provider: fakeProvider }))

    runtime.emit({
      type: 'payment.attempted',
      context: { private: true, surface: 'payment-form' },
      payload: { amount: 100, cardNumber: '4242424242424242' }
    })

    expect(sent).to.have.length(1)
    expect(sent[0].name).to.equal('Payment Attempted')
    expect(sent[0].props.surface).to.equal('payment-form')
    expect(sent[0].props.amount).to.be.undefined // not sent
    expect(sent[0].props.cardNumber).to.be.undefined // not sent
  })

  it('redacts known-sensitive field names', async () => {
    const sent: any[] = []
    const fakeProvider = { track: (name: string, props: any) => sent.push({ name, props }) }

    const runtime = createRuntime()
    await runtime.install(analyticsModule({ provider: fakeProvider }))

    runtime.emit({
      type: 'signup.attempted',
      payload: { email: 'test@example.com', password: 'secret123' }
    })

    expect(sent[0].props.email).to.equal('test@example.com') // ok
    expect(sent[0].props.password).to.equal('[REDACTED]') // redacted
  })
})

The tests are unit-test-shaped. They install the module into a fresh runtime, fire events, check the output. No DOM. No browser. The tests run in plain Node (or any JavaScript runtime) and are very fast — thousands of these tests can run in seconds.

The pattern is the architectural payoff of capability modularity. Each module is tested in isolation. Each module’s behavior is provable. The application’s analytics, audit, observability, and notification policies are all testable as code, not as visual-regression checks against a deployed application.

Integration Tests: The Closed Loop

The most valuable tests cover the architecture’s closed loop — from user action through metadata observation to module response.

import { setupShell } from './test-helpers.js'

describe('profile save flow', () => {
  it('completes the full flow when the form submits', async () => {
    const { runtime, fixture } = await setupShell({
      modules: [
        profileSaveOrchestratorModule(),
        analyticsModule({ provider: fakeAnalytics }),
        auditModule({ provider: fakeAudit }),
        notificationsModule()
      ],
      template: html`
        <kit-boundary surface="settings-page" feature="preferences">
          <kit-boundary surface="profile-form"
                        entity-type="profile" entity-id="me">
            <form data-meta-event="profile.save_requested"
                  data-meta-prop-prevent-default="true">
              <kit-text-field name="displayName" value="Jeremy"></kit-text-field>
              <kit-button type="submit">Save</kit-button>
            </form>
          </kit-boundary>
        </kit-boundary>
      `
    })

    // Capture diagnostic trace
    const trace: any[] = []
    runtime.onDiagnostic((entry) => trace.push(entry))

    // Trigger the form submit
    const form = fixture.querySelector('form')!
    form.requestSubmit()

    // Wait for async work
    await new Promise((resolve) => setTimeout(resolve, 100))

    // Verify the event flowed through
    expect(trace.some((e) => e.kind === 'event' && e.type === 'profile.save_requested'))
      .to.be.true
    expect(trace.some((e) => e.kind === 'event' && e.type === 'profile.saved'))
      .to.be.true
    expect(fakeAnalytics.tracked).to.have.length(1)
    expect(fakeAudit.recorded).to.have.length(1)
  })
})

The test renders a fixture, installs modules, triggers an interaction, asserts the resulting trace and side effects. The test exercises the full architecture — boundary collection, metadata observation, event emission, module responses — without testing each piece independently.

The pattern produces tests that are stable against refactoring. A test that says the analytics module is called when the profile save flow runs keeps passing through any internal change that doesn’t affect the user-visible behavior. The component implementation can change. The module’s internal structure can change. The runtime can change. As long as the visible behavior — the analytics call happens — is preserved, the test passes.

Playwright for End-to-End

For the highest-fidelity tests, Playwright drives a real browser against the deployed application.

import { test, expect } from '@playwright/test'

test('user can save their profile', async ({ page }) => {
  await page.goto('/settings/profile')

  await page.locator('input[name="displayName"]').fill('Jeremy')
  await page.getByRole('button', { name: 'Save' }).click()

  await expect(page.getByText('Profile saved')).toBeVisible()
})

test('accessibility check on settings page', async ({ page }) => {
  await page.goto('/settings/profile')

  // Run axe-core via @axe-core/playwright
  const violations = await runAxe(page)
  expect(violations).toHaveLength(0)
})

Playwright tests run slowly (each one starts a browser, navigates, interacts), but they catch issues integration tests miss — actual network behavior, real CSS rendering, real interactions with the user agent. For critical user flows (signup, checkout, the application’s main task), the tests are worth having.

The architecture’s role here is incidental. Playwright tests against the deployed application; the architecture is one implementation detail. The test specifies behavior; the architecture happens to be how the behavior gets implemented.

Vitest Browser Mode for Modern Setups

A more recent addition worth knowing about: Vitest’s browser mode (shipped 2024–2025) runs Vitest test files in a real browser via Playwright. The setup combines the speed of Vitest’s test runner with the realism of browser execution.

import { test, expect } from 'vitest'
import { render } from 'vitest-browser-react' // or equivalent for Lit

test('kit-button renders with variant', async () => {
  const screen = await render(<kit-button variant="primary">Save</kit-button>)

  const button = await screen.getByRole('button')
  expect(button).toBeInTheDocument()
  expect(button).toHaveTextContent('Save')
})

For teams already using Vitest for unit tests, the browser mode is the natural extension. The same test file can include unit tests, component tests, and integration tests, all running in the same harness. The cognitive overhead of multiple test tools drops.

Accessibility Testing

Accessibility shows up at every test level.

Component tests verify the component’s accessibility properties — role, accessible name, keyboard interaction, focus management. Tools like @open-wc/testing and Playwright’s accessibility helpers expose these:

it('kit-dialog announces its label to screen readers', async () => {
  const el = await fixture<KitDialog>(html`
    <kit-dialog labelled-by="title">
      <h2 slot="title" id="title">Settings</h2>
    </kit-dialog>
  `)

  const dialog = el.shadowRoot!.querySelector('dialog')!
  expect(dialog.getAttribute('aria-labelledby')).to.equal('title')
})

Integration tests verify the focus flow — focus moves into the dialog when it opens, returns when it closes, doesn’t escape during the dialog’s lifetime.

End-to-end tests run automated accessibility scanners (axe-core, Pa11y) against the deployed application. The scanners catch ~30-50% of WCAG issues automatically (Chapter 29 noted this). The other 50-70% require manual testing with screen readers — which can be incorporated into the team’s test plan but isn’t fully automatable.

The Kit architecture’s accessibility story is one of its measurable advantages. The components use native elements that the platform handles. The tests verify the platform’s contracts haven’t been broken. Regressions show up early.

Visual Regression

For applications where visual consistency matters — design systems, marketing pages, content sites — visual regression testing catches unintended visual changes.

The tools (Chromatic, Percy, Playwright’s expect(...).toHaveScreenshot(), Loki, Storybook’s visual testing) work the same way. The test takes a screenshot of a component or page. The first run establishes a baseline. Subsequent runs compare against the baseline and flag differences.

For Storybook-documented Kit components (Chapter 51), Chromatic or Percy integrates directly. Each story becomes a visual test. A CSS change that affects a component’s appearance is caught before it ships.

The trade-off is real — visual regression tests can produce false positives (anti-aliasing differences across browsers, font rendering variations, animation-frame timing). The team has to triage the failures and decide which are real. For applications where visual consistency is critical, the trade-off is worth it.

What Not to Test

A short list of test patterns that produce churn without value.

Don’t test framework internals. Lit’s reactive properties work; don’t test that they fire. The runtime’s event bus works; don’t test that subscribers receive events from runtime.emit. Test your code, not the framework.

Don’t test implementation details. A component test that asserts which DOM element is third in the rendered output is brittle. Assert behavior (the click handler runs, the value updates) and accessibility (the role is correct, the accessible name matches) rather than structure.

Don’t over-mock. A test that mocks every dependency and asserts the mocks were called isn’t testing anything real. Use real modules where possible; only mock things you can’t run in the test environment (network calls, third-party services, browser APIs that don’t work in the test environment).

Don’t test what static analysis already covers. TypeScript catches type errors; don’t write tests for type-equivalence. ESLint catches naming inconsistencies; don’t write tests for naming.

The rule is test the behavior, not the structure. The architecture’s separation of concerns helps — components have clear behavior, modules have clear behavior, the integration has clear behavior. Tests at each level can focus on the behavior at that level without depending on the levels below.

What Comes Next

Part VII ends here. The architecture has a production story — server rendering, performance, real-time, privacy, migration, testing. Each of those concerns has a workable answer that fits the architecture’s posture.

Part VIII engages with the future. Contextual depth as the new superpower. The shift from chat to generated UIs. The security, reliability, and cross-device delivery story for stochastic UIs. The closing chapter returns to the introduction’s hook — the platform-first argument made for the future the rest of the book has been preparing us for.

Exercise: Test the Profile Editor

Take the profile-editor application you’ve built through Parts IV–VI. Write tests at each level:

Component tests:

kit-text-field renders the right input type.
kit-text-field participates in form data.
kit-button reflects variant to host.
kit-button triggers form submission via internals.

Module tests:

The analytics module sends profile.saved with the right context.
The audit module records the action.
The notification module’s notification.show command produces a visible toast (run in a browser environment).
The privacy redaction works as expected.

Integration tests:

Submitting the form fires profile.save_requested.
The orchestrator module dispatches profile.save (mocked) and emits profile.saved.
All subscribed modules respond.
The diagnostic trace contains the full chain.

End-to-end test:

Navigate to the profile editor.
Update the display name.
Submit.
Verify the success toast appears.
Run axe-core on the page; expect zero violations.

Reflect on:

How fast did each layer’s tests run?
Which layer caught the most bugs during development?
Which layer was the most expensive to maintain?
If a refactor changes the runtime’s internal structure, which tests would break? (Hopefully: very few, because the tests assert behavior, not structure.)

Part VII closes with the testing story. The architecture is testable at every level, with the tools the field has converged on. Part VIII begins by engaging with the future the architecture is structurally aimed at.