Build a Code Review MCP Stack That Actually Works
How to build an AI-powered code review stack using MCP servers for GitHub PRs, testing, error tracking, and structured reasoning. Practical configs included.
Most developers treat AI code review the same way they treat AI code generation: paste some code, ask "is this good?", and get a generic response about variable naming and missing error handling. That is not a code review. That is a linter with better grammar.
A real code review checks whether the code does what it claims to do. It runs the tests. It looks at the error logs. It considers the PR in the context of the broader codebase. It reasons through edge cases. MCP servers let you build exactly this kind of review workflow by connecting your AI assistant to the actual tools involved in code quality -- not just the code itself, but the infrastructure around it.
This guide walks through building a code review stack from five MCP servers, how they work together in a review workflow, and the configuration to set it up.
The Problem With AI Code Review Today
When you ask an AI assistant to review code, it works from the code snippet you provide. It does not have access to the full repository, the test suite, the error tracking dashboard, or the CI pipeline. It is reviewing code in a vacuum.
This leads to surface-level feedback: "consider adding error handling here," "this variable name could be more descriptive," "you should add tests for this function." These observations are technically correct but rarely actionable. They do not tell you whether the code will break in production, whether it passes the existing test suite, or whether it introduces a pattern that conflicts with the rest of the codebase.
MCP servers change this by giving the assistant access to real data. Instead of guessing whether tests pass, it can run them. Instead of suggesting you "check for errors," it can look at Sentry and tell you whether similar code is already throwing exceptions in production. Instead of reviewing a code snippet in isolation, it can read the full PR diff with context from the repository.
The Code Review Stack
Five MCP servers, each covering a different aspect of code quality.
1. GitHub MCP -- Read PRs, Not Pastes
Author: GitHub (official) | Tools: 20 | Context cost: ~8,000 tokens
This is the foundation of the review stack. GitHub MCP gives your assistant direct access to the GitHub API, which means it can read full pull request diffs, check PR descriptions, review individual file changes, read comments, and see the commit history.
The difference between reviewing a pasted code snippet and reviewing a full PR diff is enormous. With the full diff, the assistant sees what was added, what was removed, and what was modified. It can identify whether a change to one file breaks an import in another file. It can check whether the PR description accurately reflects the actual changes. It can look at the commit history to understand whether the changes were made incrementally or in one large dump.
For open-source maintainers, this server is indispensable. Your assistant can review incoming PRs, check whether the contributor followed the project's coding conventions, and draft review comments -- all without you opening GitHub in the browser.
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "<your-token>"
}
}
}
}
2. Playwright MCP -- Verify, Do Not Assume
Author: Microsoft | Tools: 20 | Context cost: ~5,000 tokens
Code review should include behavioral verification. A function might look correct syntactically, but does it actually produce the right output when the application runs? Playwright MCP gives your assistant the ability to run the application and verify.
In a code review context, this means your assistant can navigate to the affected pages, interact with the changed UI elements, and confirm that the behavior matches the PR's intent. If a PR claims to "fix the login form validation," Playwright can open the login page, submit invalid data, and verify that the error messages appear correctly.
This is especially powerful for frontend changes. CSS modifications, layout changes, and responsive behavior are nearly impossible to review from a diff alone. With Playwright, the assistant can actually see the result and catch visual regressions that no amount of code reading would reveal.
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
}
}
}
3. Sentry MCP -- Check Production Before Approving
Author: Sentry (official) | Tools: 12 | Context cost: ~5,000 tokens
The most underrated part of code review is asking: "Is the code this PR modifies already causing problems in production?" Sentry MCP connects your assistant to your error tracking system, letting it check whether the files or functions being modified are associated with existing errors, performance issues, or user-facing bugs.
This adds a dimension to code review that even experienced human reviewers often miss. When someone submits a PR that refactors the payment processing module, your assistant can check Sentry to see if that module is currently throwing unhandled exceptions, what the error rates look like, and whether the PR's changes address (or potentially worsen) existing issues.
Sentry MCP can also help evaluate the risk of a change. If a PR modifies a function that currently handles 10,000 requests per hour with zero errors, you know the risk profile is different from modifying a function that is already failing 2% of the time.
{
"mcpServers": {
"sentry": {
"command": "npx",
"args": ["-y", "@sentry/mcp-server-sentry"],
"env": {
"SENTRY_AUTH_TOKEN": "<your-token>"
}
}
}
}
4. Sequential Thinking MCP -- Structured Reasoning for Complex Changes
Author: Anthropic (official) | Tools: 1 | Context cost: ~1,800 tokens
Not every PR is a simple bug fix. Some changes involve complex business logic, architectural decisions, or subtle interactions between multiple systems. For these, you want your assistant to think through the implications methodically rather than giving a quick gut reaction.
Sequential Thinking MCP provides a structured reasoning tool that breaks complex problems into explicit steps. During code review, this means the assistant works through questions systematically: What does this change do? What could break? Are there edge cases the author might have missed? Does this change interact with other parts of the system in unexpected ways?
The value is most visible when reviewing changes that span multiple files or modify shared utilities. Instead of reviewing each file in isolation and potentially missing cross-file implications, sequential thinking pushes the assistant to consider the change holistically.
At only ~1,800 tokens, this server adds negligible overhead while significantly improving the depth of review feedback on complex PRs.
{
"mcpServers": {
"sequential-thinking": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
}
}
}
5. Filesystem MCP -- Read the Full Codebase Context
Author: Anthropic (official) | Tools: 11 | Context cost: ~2,500 tokens
A PR diff shows what changed, but a code review needs to understand the surrounding code that did not change. Filesystem MCP gives your assistant access to the full project directory, so it can read related files, check import paths, verify that type definitions are consistent, and understand the architectural patterns the project follows.
This is critical for catching issues that only appear in context. A new utility function might look fine in isolation, but Filesystem MCP lets the assistant check whether a similar utility already exists elsewhere in the codebase (duplication), whether the naming follows the project's conventions, and whether the file is placed in the correct directory.
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./"]
}
}
}
The Review Workflow
Here is how these five servers work together to produce a meaningful code review.
Step 1: Read the PR
GitHub MCP fetches the full pull request: the diff, the description, the commit messages, and any existing comments. Your assistant now has the same context a human reviewer would start with.
Step 2: Understand the context
Filesystem MCP reads the files surrounding the changes. If the PR modifies src/services/payment.ts, the assistant also reads the related types file, the tests file, and any files that import from the payment service. This builds the context needed to evaluate whether the change is consistent with the rest of the codebase.
Step 3: Check production health
Sentry MCP queries your error tracking for the affected files and functions. The assistant learns whether the modified code is currently healthy or already causing issues. This informs the risk assessment.
Step 4: Verify behavior
If the PR involves UI changes or testable functionality, Playwright MCP launches a browser, navigates to the affected pages, and verifies that the application behaves as the PR claims. This catches behavioral regressions that diff-reading alone cannot detect.
Step 5: Reason through implications
Sequential Thinking MCP structures the final analysis. The assistant works through the change's implications step by step: correctness, edge cases, performance, security, and maintainability. The output is a structured review rather than a collection of disconnected observations.
Ready-to-Copy Configuration
Here is the full code review stack:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "<your-token>" }
},
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
},
"sentry": {
"command": "npx",
"args": ["-y", "@sentry/mcp-server-sentry"],
"env": { "SENTRY_AUTH_TOKEN": "<your-token>" }
},
"sequential-thinking": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./"]
}
}
}
Token Budget
| Configuration | Servers | Estimated Tokens | % of 200K Context |
|---|---|---|---|
| Lean (GitHub + Filesystem + Sequential Thinking) | 3 | ~12,300 | 6.2% |
| Full (all 5) | 5 | ~22,300 | 11.2% |
The lean stack gives you PR access, codebase context, and structured reasoning at just 6.2% of the context window. The full stack adds behavioral testing and production error checking for under 12%. This is well within the budget for a focused review session.
When to Use This Stack
This stack is not meant to replace human code review. It is meant to augment it. Use it when:
- You are an open-source maintainer receiving more PRs than you can review manually. The stack triages PRs by identifying the ones that need your attention versus the ones that are straightforward.
- You are a team lead reviewing PRs across multiple repositories. The stack provides a consistent, thorough first pass that catches the issues you might miss when reviewing quickly.
- You are a solo developer and your code does not get reviewed by anyone. The stack acts as a second pair of eyes that has access to your full codebase, test suite, and error logs.
- You are reviewing a large PR that touches many files. Sequential thinking and filesystem access help the assistant reason about cross-cutting changes that are hard to evaluate file by file.
What This Stack Does Not Do
Be honest about the limitations:
- It does not understand your business requirements. The assistant can verify that code works correctly, but it cannot judge whether the feature itself is the right thing to build.
- It does not replace integration testing or CI pipelines. Playwright MCP can verify individual behaviors, but it is not a substitute for a comprehensive test suite.
- It works best with well-structured PRs. A PR that mixes refactoring, feature work, and bug fixes in a single diff will produce a less useful review than a focused, single-purpose PR.
Getting Started
You do not need all five servers to start getting value. Begin with the combination that addresses your biggest review pain point:
- Reviewing PRs in isolation without codebase context? GitHub MCP plus Filesystem MCP.
- Missing behavioral regressions? Add Playwright MCP.
- Not checking production impact? Sentry MCP closes that gap.
- Getting shallow feedback on complex changes? Sequential Thinking MCP adds depth.
Add servers incrementally. Each one is a few lines of config, and you can always remove any that are not earning their token cost.
For pre-configured stacks tailored to QA, testing, and open-source maintenance workflows, visit stackmcp.dev. Pick your role, choose your editor, and get a config that is ready to paste.