Every PR, On Video

Agentic speed is only valuable if the output holds. This is one concrete way we're making that real.

Right Idea, Wrong Process

A screen recording in a PR serves a real purpose. It lets reviewers see the change without checking out the branch. It gives clients something tangible to react to. It creates a record of what the feature actually looked like when it shipped.

We'd been doing this manually for a while. Every PR with a front-end change included a screen recording, and reviewers came to rely on it. The practice was good. The problem was everything around it. The engineer had to record, compress, and upload the video. Then do it again after every round of review comments. It was tedious enough to be easy to deprioritize, and coverage was never uniform across PRs.

This became more pressing as we started running more agents in parallel. The surface area of front-end changes grows faster than any reviewer can track. An agent implementing a feature doesn't forget to test it, but it also can't upload a video. The manual step wasn't just friction. It was a gap in the feedback loop.

So we closed it.

Clapperboard

What We Built

If a test is touched, a video exists. No exceptions.

CI runs the full E2E suite on every PR. When test files are touched, those specific tests are re-run with video recording enabled. The recordings are uploaded to S3 and links appear in the PR comment automatically. On re-runs, the comment updates rather than multiplying.

The new expectation is simple: if a PR includes front-facing changes, it must include or update an E2E test. You don't upload a recording. You write the test, and the recording comes for free.

In practice, the pipeline of the pilot project breaks down into four CI jobs. The e2e-video job only runs when test files are touched, keeping cost and runtime minimal for PRs that don't affect the front end.

CI pipeline showing build, e2e-video, e2e, and quality jobs

How It Works

Detection

The first step queries the GitHub API for changed files in the PR and filters for E2E test files:

gh api repos/{owner}/{repo}/pulls/{number}/files \
  --paginate --jq '.[].filename' \
  | grep '^src/tests/e2e/.*\.spec\.ts$'

If no test files changed, the entire job short-circuits. No compute wasted.

Recording

Playwright's video recording is controlled by a single environment variable. When RECORD_VIDEO is set, the config flips video on and adds a slowMo delay so the recordings are watchable rather than a blur of instant page transitions:

const recording = !!process.env.RECORD_VIDEO
 
export default defineConfig({
  use: {
    video: recording ? 'on' : 'off',
  },
  projects: [
    {
      name: 'chromium',
      use: {
        launchOptions: recording ? { slowMo: 1000 } : {},
      },
    },
  ],
})

Each test produces its own .webm file under test-results/.

Upload and Comment

A post-test script walks test-results/, uploads every .webm to S3 keyed by {repo}/{pr}/{run-id}/{test}/video.webm, and posts a comment with one link per recording:

for (const videoPath of videos) {
  const key = `${repo}/${prNumber}/${runId}/${testDir}/${fileName}`
  await s3.send(
    new PutObjectCommand({
      Bucket: bucket,
      Key: key,
      Body: fs.readFileSync(videoPath),
      ContentType: 'video/webm',
    }),
  )
}

The comment uses a hidden HTML marker () so subsequent pushes update the same comment rather than creating a new one. Reviewers see a table, one row per test:

E2E Video Recordings PR comment posted by github-actions bot

S3 objects expire after 30 days. Old recordings clean themselves up.

What didn't work

GitHub strips <video> and <source> tags from markdown comments. Our first version embedded inline players. They rendered as empty space.

Three Audiences, One Recording

The same video serves reviewers, agents, and customers.

Reviewers get visual context without checking out the branch. A link in the PR comment goes straight to the recording.

Agents get a forcing function. Writing an E2E test is now the mechanism by which a front-end change produces its own evidence. An agent that ships UI without a test produces no video, and that absence is visible in the PR. Code review now includes one more question: does the recording match what the PR claims to do?

Customers get something we couldn't offer before. The video that proves the feature works to a reviewer is the same one that surfaces in a Linear ticket or a release changelog. No extra recording, no separate demo session. A customer seeing what changed in a release gets the actual test run, not a polished screen capture made after the fact.

What's Next

The recordings are already being generated and stored. The next step is wiring them into Linear issues and changelog emails automatically. When it lands, every ticket gets a video of its own resolution, and every customer-facing changelog ships with a recording of what changed.

Visual proof that compounds across the entire delivery pipeline, without anyone having to think about it.