Mining your review history to teach the agentic reviewer what your team actually cares about
Every team has a style guide. Some rules remain unwritten and live in the heads of senior reviewers and in the comments they leave on PRs, year after year, repeating themselves. We pulled all of it out of our GitHub, distilled it into rules, and dropped it into the file the agent already reads.
The result is a closed loop. The agent writes code against the same standards a reviewer would enforce. The agentic reviewer flags the same things a senior would flag. Nobody types "use our error helper instead of throwing strings" for the eight hundredth time.
The whole pipeline is packaged as a reusable Claude Code skill. The full SKILL.md is included at the end of this post — drop it into ~/.claude/skills/extract-pr-rules/ and invoke it with /extract-pr-rules from any repo.
The rules that matter most are often not written down.
Every codebase has rules that exist only as PR comments. "We don't import lodash, use the native equivalent." "Server actions belong in actions/, not components." "If you're touching the queue, write a retry test." These are real conventions. They get enforced because the same three people review every PR and they remember.
New developers learn them slowly, by getting the same comment a few times. AI agents never learn them at all, because the rules are not in CONTRIBUTING.md, not in CLAUDE.md, not in AGENTS.md, not anywhere a tool can see. The agent writes code that ignores six conventions, the human reviewer writes the comments again, and the loop continues.
The information exists. It is locked inside thousands of pull_request_review_comments records on GitHub. The fix is to mine it.
The GitHub CLI gives you the entire review history in one command.
gh api paginates through any GitHub REST endpoint. The review comments endpoint returns every inline comment a reviewer has ever left, with the file path, line range, body, author, and commit context. The since parameter scopes the pull to a recent window — six months for the first run, three months for re-runs to match the quarterly cadence.
SINCE=$(date -u -d '6 months ago' +%Y-%m-%dT%H:%M:%SZ)
gh api \
--paginate \
-H "Accept: application/vnd.github+json" \
"/repos/OWNER/REPO/pulls/comments?since=$SINCE" \
| jq -s 'add' > review-comments.jsonThe jq -s 'add' step matters. gh api --paginate concatenates each page's JSON array onto stdout, which produces invalid JSON across page boundaries. Slurping with -s and folding with add reassembles the pages into one valid array.
That is the inline review feedback for the window. Sanity-check the corpus size before going further. Under about fifty PRs the pattern-extraction step degrades into noise — there are not enough repetitions for an LLM to distinguish convention from one-off feedback. Over about five thousand comments, batch the input; a single prompt cannot hold it. Most active repos sit comfortably between.
We strip out approvals, bot comments, and short replies, and keep anything that reads like feedback.
jq '[.[]
| select(.user.login | endswith("[bot]") | not)
| select(.body | length >= 10)
| select(.body | test("^(lgtm|thanks|ack|done|sgtm)"; "i") | not)
| {body, path, author: .user.login, pr: .pull_request_url}
]' review-comments.json > feedback.jsonThere are three categories of noise to drop: bot accounts (anything ending in [bot]), bodies too short to carry a rule (under ten characters catches lgtm, done, single emoji, and the like), and comments opening with an approval keyword. Each category misses different things, so applying all three together strips far more than any single regex.
We deliberately keep nit: comments. The prefix is a politeness marker, not a noise signal — short nits (nit: rename this, nit: extract a constant) carry some of the most concentrated rule content in the entire corpus.
The shape of each entry is enough for an LLM to reason about: the comment, the file it was left on, who wrote it, and which PR it came from. For richer context, the issues comments endpoint (/repos/OWNER/REPO/issues/comments) covers the conversation thread, but inline review comments are where the actionable feedback lives.
An LLM reads the comments, you read the patterns.
We feed feedback.json to Claude in batches with a single prompt:
You are reviewing the comment history of a software team.
Below is the team's existing AGENTS.md (which may be empty
or partial), followed by a JSON array of review comments.
For each recurring pattern, output a single rule.
A pattern is anything that appears more than twice across
different PRs. Ignore one-off feedback.
Return a JSON array. For each rule, include:
- rule: imperative form ("Use X, not Y")
- reasoning: one sentence
- examples: two short quotes (each under 120 characters,
truncate with ellipsis if needed)
- authors: distinct comment authors supporting the rule
- prs: distinct PR URLs supporting the rule
- status: one of NEW (not in AGENTS.md), DUPLICATE (in
AGENTS.md and not flagged much), CONTRADICTS (clashes
with a documented rule), VIOLATED (in AGENTS.md but
reviewers still flag it)
Existing AGENTS.md:
{agents_md}
Comments:
{batch}The status field is what makes this more than a one-time mining run. NEW rules go into AGENTS.md. CONTRADICTS surfaces conflicts to resolve. DUPLICATE confirms documented wording is still landing. VIOLATED — documented rules reviewers keep flagging — is the most actionable output of the entire pipeline. A rule that is written but not followed is a punch list for tooling: it needs stronger enforcement, clearer wording, or both.
The structured output also lets us enforce the diversity rule mechanically. We post-filter so that only patterns appearing across at least two distinct PRs survive — a rule that only appears in one PR is usually a one-off, no matter how confidently the model phrased it.
jq '[.[] | select((.prs | unique | length) >= 2)]' \
rules.json > rules-filtered.jsonOn teams with three or more active reviewers, layer on (.authors | unique | length) >= 2 to filter out single-reviewer taste. On smaller teams, the reviewer is the standard, so author diversity is not a meaningful gate.
The result is a clean list of rules with citations. We review it manually. Some patterns are noise, or a rule that was true two years ago and is no longer enforced. Most are real. The model is good at this because the comments are repetitive by nature. If reviewers said the same thing fifty times, the pattern is unmissable.
A typical run produces forty to sixty rules. About two thirds survive review.
The agent reads AGENTS.md when writing and when reviewing.
AGENTS.md (or CLAUDE.md) sits at the repo root and gets loaded into the agent's context for every session. It is the canonical place for project rules, commands, and conventions. We organized the extracted rules into sections that match how reviewers think.
## Conventions
### Imports
- Prefer native methods over lodash — the team is migrating off it.
- Import from package roots, not internal paths — internal paths
break on package restructures.
### Error Handling
- Throw `AppError` instances, never plain strings — consumers rely
on the class for retry and routing logic.
- Always include a stable error code — the frontend branches on it.
### Database
- Wrap multi-statement writes in `db.transaction(...)` — partial
writes leave the system in an inconsistent state.
- New migrations require a corresponding seed update — fresh
environments break otherwise.
### Testing
- Every new queue handler gets a retry test — silent retry bugs
are the most common production incident in this area.
- E2E tests live next to the page they cover — top-level `tests/`
has lost coverage to refactors.Each rule ships with a one-sentence reason. The reasoning is what makes the rule survive the next refactor or the next contributor — without it, the rule looks arbitrary and gets ignored or revised on the wrong grounds.
The same file drives writing and review.
We run an agentic code review as a GitHub Actions step on every PR. It uses Claude with AGENTS.md loaded into context, plus the diff. The system prompt tells it to act as a reviewer and post inline comments where the diff violates the rules.
Because the rules came from real reviewers, the agent's comments look like real review comments. Because the same AGENTS.md is loaded when the agent writes code, the diff usually does not violate the rules in the first place. The agentic reviewer becomes a backstop, not the primary defense.
The dotted line is the part we like most. The agent's review comments become future training data for the next extraction pass. As the team's standards evolve, the rules evolve with them.
Standards drift. So does the rule list.
We plan to re-run the extraction every quarter. The new comments since the last run go through the same pipeline, and any new pattern that appears more than twice gets reviewed for inclusion. Rules that no longer appear in recent comments, usually because the agent already enforces them and they stopped being needed, get a note in AGENTS.md so they do not get extracted again next time.
The whole pipeline is two short scripts, a prompt, and a one-line post-filter. It runs in about ten minutes. The team had been writing the style guide for years without realizing it.
The full pipeline, packaged for reuse.
Save the block below as ~/.claude/skills/extract-pr-rules/SKILL.md. Invoke with /extract-pr-rules from any repository.
---
name: extract-pr-rules
description: Extract recurring conventions from a repository's historical PR review comments and turn them into a Markdown block of imperative rules ready to paste into AGENTS.md or CLAUDE.md. Pulls inline review comments via the GitHub CLI (windowed by date), filters noise (bots, lgtm/thanks/acks/done/sgtm — but keeps `nit:`), asks the LLM to extract recurring patterns with author/PR diversity counts, post-filters on PR support, classifies each pattern against any existing AGENTS.md as NEW / DUPLICATE / CONTRADICTS / VIOLATED, stratifies by confidence, and emits a validated rule + reasoning list. Use when user says /extract-pr-rules, "extract review rules", "mine PR comments", "find patterns in our code reviews", "what do reviewers keep saying", "turn review comments into rules", "build a style guide from PR history", or wants to capture the unwritten conventions hiding in their team's review history. Always trigger when the user wants to derive rules from past code review activity.
---
# Extract PR Rules
Pull a repository's recent PR review comments, surface the patterns reviewers repeat, classify them against any rules already documented, and emit a Markdown block of imperative rules with reasoning — ready to paste into `AGENTS.md` or `CLAUDE.md`. The premise: reviewers have spent years writing an invisible style guide in PR comments. Make it visible.
## Required inputs
Ask the user for these if not already clear:
- **The repository** — `OWNER/REPO` form (e.g. `acme/api`).
- **GitHub CLI access** — `gh auth status` should pass.
- **Date window** — default `since=` 6 months ago. Also ask: "Any recent architectural shift, migration, or rename you want to exclude?" If yes, use that date as the floor — older comments will reflect a codebase that no longer exists.
- **Existing rules file** — read `AGENTS.md`, `CLAUDE.md`, or whichever the user names. Used in step 4 to classify patterns. If neither exists, skip.
## Process
1. **Pull historical review comments** into a fresh temp directory. `gh api --paginate` concatenates page arrays into invalid JSON across page boundaries — slurp with `jq -s 'add'` to reassemble into one valid array:
```bash
WORK_DIR=$(mktemp -d -t extract-pr-rules.XXXXXX)
SINCE=$(date -u -d '6 months ago' +%Y-%m-%dT%H:%M:%SZ) # or user's cutoff
gh api --paginate \
-H "Accept: application/vnd.github+json" \
"/repos/OWNER/REPO/pulls/comments?since=$SINCE" \
| jq -s 'add' > "$WORK_DIR/review-comments.json"
```
Report the path and raw comment count.
2. **Scope further if needed.** If raw count exceeds ~5000, pause and offer:
- By path (e.g. `src/api/`)
- By tighter date range
- By team or author set
Apply with `jq` filters before step 3.
3. **Filter noise — and treat the regex as a starting point.** Every team has different review banter. Print 10 sample dropped comments and ask the user whether any look like signal worth keeping.
```bash
jq '[.[]
| select(.user.login | endswith("[bot]") | not)
| select(.body | length >= 10)
| select(.body | test("^(lgtm|thanks|ack|done|sgtm)"; "i") | not)
| {body, path, author: .user.login, pr: .pull_request_url}]' \
"$WORK_DIR/review-comments.json" > "$WORK_DIR/filtered.json"
```
Keep `nit:` comments — they carry the most concentrated rule signal. Under ~200 filtered comments, warn the user that signal will be thin and the LLM will start hallucinating patterns.
4. **Extract patterns and classify against existing rules.** Feed `filtered.json` and the existing `AGENTS.md` / `CLAUDE.md` (if any) to the agent in batches of ~200 comments. Use this prompt:
> You are reviewing a team's PR comment history. Surface every pattern that recurs in the input. For each pattern, output a JSON object with:
>
> - **rule** — one imperative sentence (e.g. "Throw `AppError` instances, never plain strings").
> - **reasoning** — one sentence why.
> - **examples** — two example comments, each formatted as `PR #123 (@author): "<quote ≤120 chars>"`. Truncate with ellipsis if needed.
> - **authors** — list of distinct comment authors supporting the pattern.
> - **prs** — list of distinct PR URLs supporting the pattern.
> - **confidence** — `high` (3+ distinct authors), `medium` (2 distinct authors), `low` (1 author across multiple PRs).
> - **category** — prefer `Imports`, `Error Handling`, `Naming`, `Testing`, `Database`, `Logging`, `Types`, `Style`, `Documentation`. Add a new category only if no preferred name fits.
> - **status** — `NEW` (not documented), `DUPLICATE` (already documented and not recurring much → suggest drop), `CONTRADICTS` (clashes with a documented rule → flag), `VIOLATED` (documented but reviewers still flag it → most actionable signal). Compare against the existing rules file provided.
>
> A pattern is anything that recurs across different PRs. Do not invent rules the comments do not support.
Save the model's output as `rules.json`.
5. **Post-filter on PR diversity.** The LLM is enthusiastic about pattern-matching; the gate is what turns enthusiasm into evidence. Keep only rules supported by 2+ distinct PRs:
```bash
jq '[.[] | select((.prs | unique | length) >= 2)]' \
"$WORK_DIR/rules.json" > "$WORK_DIR/rules-filtered.json"
```
On teams with 3+ active reviewers, layer on `(.authors | unique | length) >= 2` as well. On smaller teams the reviewer _is_ the standard, so author diversity is not a meaningful gate.
6. **Validate by status, then confidence.** Present output in four buckets, in this order:
- **VIOLATED** — documented rules reviewers still flag. Highest-value bucket. For each, ask the user "stronger tooling, clearer wording, or both?"
- **CONTRADICTS** — flag for manual resolution.
- **NEW** — stratify by confidence: high auto-accept (with chance to object), medium asks "drop, rewrite, or merge?", low listed briefly for rescue.
- **DUPLICATE** — list briefly so the user can confirm the existing rule still holds.
Wait for the user before producing the final output.
7. **Emit the rule block.** Print Markdown grouped by category, rule + reasoning per entry, citations stripped:
```md
## Conventions
### Imports
- Prefer native methods over lodash equivalents — the team is migrating off lodash.
- Import from package roots, not internal paths — internal paths break on package restructures.
### Error Handling
- Throw `AppError` instances, never plain strings — consumers rely on stable error codes for retry and routing.
```
If any VIOLATED rules surfaced, print a separate `## Enforcement gaps` section listing the rule and its violation citations — that is the user's punch list for tooling work.
8. **Suggest the next step.** Tell the user the block is ready to paste into their `AGENTS.md` or `CLAUDE.md`. If neither file exists, suggest creating `AGENTS.md` at the repo root. Mention the temp directory path. For re-runs, suggest setting `SINCE` to the date of this run — the date filter is the delta workflow.
## Output
Two Markdown blocks printed to chat:
- **`## Conventions`** — rules grouped by category, formatted as `- <rule> — <reasoning>`.
- **`## Enforcement gaps`** — only if VIOLATED rules surfaced; documented rules reviewers are still flagging, with citations.
Plus the temp directory path with `review-comments.json`, `filtered.json`, `rules.json`, and `rules-filtered.json` for re-inspection.
## Anti-patterns
- **Pulling without a date window.** `since=` is mandatory. Five years of comments will surface conventions from architectures the team has abandoned.
- **Treating the noise filter as copy-paste.** Print sample drops and let the user tune — different teams write different banter.
- **Skipping the post-filter on PR diversity.** The LLM emits enthusiastic-looking patterns; the `(.prs | unique | length) >= 2` gate is what filters single-PR flukes from real conventions.
- **Discarding DUPLICATE patterns silently.** Show them — they confirm existing rules are still accurate.
- **Treating VIOLATED as cleanup work.** It is the most actionable output of the whole pipeline. Surface it visibly, not as an aside.
- **Inventing rules the data does not support.** Every rule must be grounded in two or more real comments with cross-author or cross-PR support.
- **Dropping `nit:` comments.** They carry concentrated rule signal — `nit:` is a politeness prefix, not a quality marker.
- **Writing to `AGENTS.md` or `CLAUDE.md` directly.** This skill emits a rule block, not a file edit. The user owns the merge.Tell us what you're building. We'll tell you how we'd approach it, what it takes, and how fast we can move.
We'll tell you honestly if we're the right fit. And if we're not, we'll point you to someone who is.