Parallel Playwright

E2E test suites slow down fast. Every new page, every new flow, every new edge case adds seconds. At some point the suite takes longer than the feature took to build. Sharding fixes this by splitting the suite across multiple runners that execute in parallel.

We built a proof of concept to nail down the pattern: Playwright sharding across GitHub Actions runners, each with its own PostgreSQL database, blob reports merged into a single HTML artifact at the end.

Why Shard

Default runners cap your parallelism. Sharding removes the cap.

Playwright defaults to using half the available CPU cores as parallel workers. Standard GitHub Actions runners have 2 CPUs (4 on public repos), so you get 1-2 workers. That's it. Adding more tests doesn't make the suite faster. It makes each run longer.

You have two options:

Shard horizontally: split tests across multiple runners. More total compute minutes, but lower wall-clock time. Works on any GitHub plan.
Scale vertically: use larger runners with more CPUs. Less coordination, but higher per-minute cost and requires GitHub Team or Enterprise Cloud.

Sharding requires no special runner configuration and works out of the box. For most teams, it's the faster path to faster tests.

Architecture

4 shards, 4 databases, 1 report.

Each shard gets its own runner with its own PostgreSQL service container. No shared database, no cleanup between tests, no ordering assumptions. Within each shard, workers run tests in parallel against that shard's database.

After all shards finish, a separate job downloads the blob reports and merges them into a single HTML report uploaded as an artifact.

The Workflow

One setup job, N shard jobs, one merge job.

Caching

A dedicated setup job runs first and populates caches for the pnpm store and Playwright browser binaries. Shard jobs restore from cache instead of downloading from scratch. Playwright system dependencies (apt packages) still install per shard since they can't be cached across runners, but browser binary downloads are skipped entirely on cache hits.

Shard Jobs

e2e:
  name: 'Shard ${{ matrix.shardIndex }}/${{ matrix.shardTotal }}'
  needs: [setup]
  runs-on: ubuntu-latest
  strategy:
    fail-fast: false
    matrix:
      shardIndex: [1, 2, 3, 4]
      shardTotal: [4]
 
  services:
    postgres:
      image: postgres:16
      env:
        POSTGRES_USER: app
        POSTGRES_PASSWORD: app
        POSTGRES_DB: app
      ports:
        - 5432:5432
      options: >-
        --health-cmd="pg_isready -U app"
        --health-interval=5s
        --health-timeout=5s
        --health-retries=5
 
  steps:
    # ... restore caches, install deps, run migrations, seed ...
 
    - name: Run Playwright tests
      run: pnpm exec playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
 
    - name: Upload blob report
      if: ${{ !cancelled() }}
      uses: actions/upload-artifact@v7
      with:
        name: blob-report-${{ matrix.shardIndex }}
        path: blob-report/
        retention-days: 1

fail-fast: false is important. If shard 2 fails, you still want shards 1, 3, and 4 to finish so the merged report shows every failure, not just the first one.

Each shard gets its own PostgreSQL service container via services. GitHub Actions starts the container before the job steps run and tears it down after. No Docker Compose, no manual lifecycle management.

Report Merging

merge-reports:
  name: Merge Reports
  if: ${{ !cancelled() }}
  needs: [e2e]
  runs-on: ubuntu-latest
  steps:
    - name: Download blob reports
      uses: actions/download-artifact@v8
      with:
        path: all-blob-reports
        pattern: blob-report-*
        merge-multiple: true
 
    - name: Merge reports
      run: pnpm exec playwright merge-reports --reporter html ./all-blob-reports
 
    - name: Upload merged HTML report
      uses: actions/upload-artifact@v7
      with:
        name: playwright-report
        path: playwright-report/
        retention-days: 14

The if: ${{ !cancelled() }} on the merge job ensures you get a report even when some shards fail. The blob reporter is a Playwright feature designed for this exact pattern: each shard writes a binary blob, the merge step combines them into a standard HTML report.

Playwright Config

Blob reporter in CI, HTML locally.

export default defineConfig({
  testDir: './tests',
  fullyParallel: true,
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 1 : 0,
  workers: process.env.CI ? 2 : undefined,
  reporter: process.env.CI ? 'blob' : [['list'], ['html', { open: 'never' }]],
  use: {
    baseURL: process.env.BASE_URL || 'http://localhost:3000',
    trace: 'on-first-retry',
  },
  webServer: {
    command: 'pnpm dev',
    url: 'http://localhost:3000',
    reuseExistingServer: !process.env.CI,
    timeout: 30_000,
  },
})

fullyParallel: true lets Playwright run individual tests from different files in parallel across workers, rather than running all tests within a file sequentially before moving to the next. workers: 2 in CI overrides the default (which would be 1 on a 2-CPU runner) to maximize the available cores. Locally, undefined lets Playwright auto-detect your machine's cores.

The blob reporter in CI produces the sharded output that the merge job consumes. Locally you get the list reporter in the terminal and an HTML report for debugging.

Debugging Parallel Failures

When a test fails in shard 3 but passes locally, you need visibility.

A custom test fixture logs the worker index, parallel index, and shard identifier for every test:

export const test = base.extend({
  page: async ({ page }, use, testInfo) => {
    const worker = testInfo.workerIndex
    const shard = process.env.SHARD || 'local'
    console.log(
      `[worker=${worker} shard=${shard}] START: ${testInfo.titlePath.join(' > ')}`,
    )
    const start = Date.now()
    await use(page)
    const duration = Date.now() - start
    console.log(
      `[worker=${worker} shard=${shard}] END (${duration}ms): ${testInfo.titlePath.join(' > ')}`,
    )
  },
})

When a test fails, you know exactly which shard and worker ran it, how long it took, and what ran alongside it. Combined with trace: "on-first-retry" in the Playwright config, you get a full trace file for any flaky test on its retry attempt.

When to Shard

Not every project needs sharding. If your E2E suite runs in under 2 minutes on a single runner, the overhead of multiple runners and report merging isn't worth it. But once the suite crosses 5 minutes and is still growing, sharding pays for itself quickly.

The breakeven depends on your suite size and how many tests write to the database. A suite of 30 read-heavy tests across 4 shards will see close to a 4x speedup. A suite where every test writes and needs unique identifiers will still see the speedup, but requires the workerIndex pattern throughout.

Start with the number of shards matching your parallelism needs, not the number of test files. Four shards is a good default for suites in the 20-60 test range. Playwright distributes tests across shards automatically. You don't need to manually assign tests to shards.