Shard your E2E tests across runners, merge the reports
E2E test suites slow down fast. Every new page, every new flow, every new edge case adds seconds. At some point the suite takes longer than the feature took to build. Sharding fixes this by splitting the suite across multiple runners that execute in parallel.
We built a proof of concept to nail down the pattern: Playwright sharding across GitHub Actions runners, each with its own PostgreSQL database, blob reports merged into a single HTML artifact at the end.
Default runners cap your parallelism. Sharding removes the cap.
Playwright defaults to using half the available CPU cores as parallel workers. Standard GitHub Actions runners have 2 CPUs (4 on public repos), so you get 1-2 workers. That's it. Adding more tests doesn't make the suite faster. It makes each run longer.
You have two options:
Sharding requires no special runner configuration and works out of the box. For most teams, it's the faster path to faster tests.
4 shards, 4 databases, 1 report.
Each shard gets its own runner with its own PostgreSQL service container. No shared database, no cleanup between tests, no ordering assumptions. Within each shard, workers run tests in parallel against that shard's database.
After all shards finish, a separate job downloads the blob reports and merges them into a single HTML report uploaded as an artifact.
One setup job, N shard jobs, one merge job.
A dedicated setup job runs first and populates caches for the pnpm store and Playwright browser binaries. Shard jobs restore from cache instead of downloading from scratch. Playwright system dependencies (apt packages) still install per shard since they can't be cached across runners, but browser binary downloads are skipped entirely on cache hits.
e2e:
name: 'Shard ${{ matrix.shardIndex }}/${{ matrix.shardTotal }}'
needs: [setup]
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shardIndex: [1, 2, 3, 4]
shardTotal: [4]
services:
postgres:
image: postgres:16
env:
POSTGRES_USER: app
POSTGRES_PASSWORD: app
POSTGRES_DB: app
ports:
- 5432:5432
options: >-
--health-cmd="pg_isready -U app"
--health-interval=5s
--health-timeout=5s
--health-retries=5
steps:
# ... restore caches, install deps, run migrations, seed ...
- name: Run Playwright tests
run: pnpm exec playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
- name: Upload blob report
if: ${{ !cancelled() }}
uses: actions/upload-artifact@v7
with:
name: blob-report-${{ matrix.shardIndex }}
path: blob-report/
retention-days: 1fail-fast: false is important. If shard 2 fails, you still want shards 1, 3, and 4 to finish so the merged report shows every failure, not just the first one.
Each shard gets its own PostgreSQL service container via services. GitHub Actions starts the container before the job steps run and tears it down after. No Docker Compose, no manual lifecycle management.
merge-reports:
name: Merge Reports
if: ${{ !cancelled() }}
needs: [e2e]
runs-on: ubuntu-latest
steps:
- name: Download blob reports
uses: actions/download-artifact@v8
with:
path: all-blob-reports
pattern: blob-report-*
merge-multiple: true
- name: Merge reports
run: pnpm exec playwright merge-reports --reporter html ./all-blob-reports
- name: Upload merged HTML report
uses: actions/upload-artifact@v7
with:
name: playwright-report
path: playwright-report/
retention-days: 14The if: ${{ !cancelled() }} on the merge job ensures you get a report even when some shards fail. The blob reporter is a Playwright feature designed for this exact pattern: each shard writes a binary blob, the merge step combines them into a standard HTML report.
Blob reporter in CI, HTML locally.
export default defineConfig({
testDir: './tests',
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 1 : 0,
workers: process.env.CI ? 2 : undefined,
reporter: process.env.CI ? 'blob' : [['list'], ['html', { open: 'never' }]],
use: {
baseURL: process.env.BASE_URL || 'http://localhost:3000',
trace: 'on-first-retry',
},
webServer: {
command: 'pnpm dev',
url: 'http://localhost:3000',
reuseExistingServer: !process.env.CI,
timeout: 30_000,
},
})fullyParallel: true lets Playwright run individual tests from different files in parallel across workers, rather than running all tests within a file sequentially before moving to the next. workers: 2 in CI overrides the default (which would be 1 on a 2-CPU runner) to maximize the available cores. Locally, undefined lets Playwright auto-detect your machine's cores.
The blob reporter in CI produces the sharded output that the merge job consumes. Locally you get the list reporter in the terminal and an HTML report for debugging.
When a test fails in shard 3 but passes locally, you need visibility.
A custom test fixture logs the worker index, parallel index, and shard identifier for every test:
export const test = base.extend({
page: async ({ page }, use, testInfo) => {
const worker = testInfo.workerIndex
const shard = process.env.SHARD || 'local'
console.log(
`[worker=${worker} shard=${shard}] START: ${testInfo.titlePath.join(' > ')}`,
)
const start = Date.now()
await use(page)
const duration = Date.now() - start
console.log(
`[worker=${worker} shard=${shard}] END (${duration}ms): ${testInfo.titlePath.join(' > ')}`,
)
},
})When a test fails, you know exactly which shard and worker ran it, how long it took, and what ran alongside it. Combined with trace: "on-first-retry" in the Playwright config, you get a full trace file for any flaky test on its retry attempt.
Not every project needs sharding. If your E2E suite runs in under 2 minutes on a single runner, the overhead of multiple runners and report merging isn't worth it. But once the suite crosses 5 minutes and is still growing, sharding pays for itself quickly.
The breakeven depends on your suite size and how many tests write to the database. A suite of 30 read-heavy tests across 4 shards will see close to a 4x speedup. A suite where every test writes and needs unique identifiers will still see the speedup, but requires the workerIndex pattern throughout.
Start with the number of shards matching your parallelism needs, not the number of test files. Four shards is a good default for suites in the 20-60 test range. Playwright distributes tests across shards automatically. You don't need to manually assign tests to shards.
Tell us what you're building. We'll tell you how we'd approach it, what it takes, and how fast we can move.
We'll tell you honestly if we're the right fit. And if we're not, we'll point you to someone who is.