[HN Gopher] Get an AI code review in 10 seconds
___________________________________________________________________
Get an AI code review in 10 seconds
Author : oldmanrahul
Score : 89 points
Date : 2025-12-21 17:21 UTC (7 hours ago)
HTML web link (oldmanrahul.com)
TEXT w3m dump (oldmanrahul.com)
| Smaug123 wrote:
| With not much more effort you can get a _much_ better review by
| additionally concatenating the touched files and sending them as
| context along with the diff. It was the work of about five
| minutes to make the scaffolding of a very basic bot that does
| this, and then somewhat more time iterating on the prompt. By the
| way, I find it 's _seriously_ worth sucking up the extra ~four
| minutes of delay and going up to GPT-5 high rather than using a
| dumber model; I suspect xhigh is worth the ~5x additional bump in
| runtime on top of high, but at that point you have to start
| rearchitecting your workflows around it and I haven 't solved
| that problem yet.
|
| (That's if you don't want to go full Codex and have an agent play
| around with the PR. Personally I find that GPT-5.2 xhigh is
| incredibly good at analysing diffs-plus-context without tools.)
| verdverm wrote:
| I've been using gemini-3-flash the last few days and it is
| quite good, I'm not sure you need the biggest models anymore. I
| have only switched to pro once or twice the last few days
|
| Here are the commits, the tasks were not trivial
|
| https://github.com/hofstadter-io/hof/commits/_next/
|
| Social posts and pretty pictures as I work on my custom copilot
| replacement
|
| https://bsky.app/profile/verdverm.com
| Smaug123 wrote:
| Depends what you mean by "need", of course, but in my
| experience the curves aren't bending yet; better model still
| means better-quality review (although GPT-5.0 high was still
| a reasonably competent reviewer)!
| pawelduda wrote:
| Yes, it's my new daily driver for light coding and the rest.
| Also great at object recognition and image gen
| fweimer wrote:
| Do you do any preprocessing of diffs to replace significant
| whitespace with some token that is easier to spot? In my
| experience, some LLMs cannot tell unchanged context from the
| actual changes. That's especially annoying with -U99999 diffs
| as a shortcut to provide full file context.
| Smaug123 wrote:
| I've only ever had that problem when supplying a formatted
| diff alone. Once I moved to "provide the diff, and then also
| provide the entire contents of the file after the change",
| I've never had the problem. (I've also only seriously used
| GPT-5.0 high or more powerful models for this.)
| ocharles wrote:
| I recently started using LLMs to review my code before asking for
| a more formal review from colleagues. It's actually been
| surprisingly useful - why waste my colleagues time with small
| obvious things? But it's also gone much further than that
| sometimes with deeper reviews points. Even when I don't agree
| with them it's great having that little bit more food for thought
| - if anything it helps seed the review
| danlamanna wrote:
| Are you using a particularly well crafted prompt or just
| something off the cuff?
| sultson wrote:
| This one's served fairly well: "Review this diff - detect top
| 10 problem-causers, highlight 3 worst - I'm talking bugs with
| editing,saving etc. (not type errors or other minor aspects)
| [your diff]". The bit on "editing, saving" would vary based
| on goal of diff.
| morkalork wrote:
| Not who you're replying to but working at a small small
| company, I didn't have anyone to give my code for review to
| so have used AI to fill in that gap. I usually go with a
| specific then general pass, where for example if I'm making
| heavy use of async logic, I'll ask the LLM to pay particular
| attention to pitfalls that can arise with it.
| ocharles wrote:
| We're a Haskell shop, so I usually just say "review the
| current commit. You're an experienced Haskell programmer and
| you value readable and obvious code" (because that it is
| indeed what we value on the team). I'll often ask it to
| explicitly consider testing, too
| eterm wrote:
| Personally, this is what I use in claude code:
|
| "Diff to master and review the changes. Branch designed to
| address <problem statement>. Write output to d:\claudeOut in
| typst (.typ) format."
|
| It'll do the diffs and search both branch and master versions
| of files.
|
| I prefer reading PDFs than markdown, but it'll default to
| markdown unprompted if you prefer.
|
| I have almost all my workspaces configured with /add-dir to
| add d:/claudeOut and d:/claudeIn as general scratch folders
| for temporary in/out file permissions so it can read/write
| outside the context of the workspace for things like this.
|
| You might get better results using a better crafted prompt
| (or code review skill?). In general I find claude code
| reviews are: - Overly fussy about null
| checking everything - Completely miss on whether the PR
| has properly distilled the problem down to its essence
| - Are good at catching spelling mistakes - Like to
| pretend they know if something is well architectured, but
| doesn't
|
| So it's a bit of a mixed bag, I find it focuses on trivia but
| it's still useful as a first pass before letting your
| teammates have to catch that same trivia.
|
| It will absolutely assume too much from naming, so it's kind
| of a good spot if it's making wrong kind of assumptions about
| how parts work, to think how to name things more clearly.
|
| e.g. If you write a class called "AddingFactory", it'll go
| around assuming that's what it does, even if the core of it
| returns (a, b) -> a*b.
|
| You have to then work hard to get it to properly examine the
| file and convince itself that it is actually a multiplier.
|
| Obviously real-world examples are more subtle than that, but
| if you're finding yourself arguing with it, it's worth
| sometimes considering whether you should rename things.
| afro88 wrote:
| This is exactly the right approach IMO. You find the signal
| amongst the slop, and all your colleagues see is a better PR.
| ohans wrote:
| TIL: you could add a ".diff" to a PR URL. Thanks!
|
| As for PR reviews, assuming you've got linting and static
| analysis out the way, you'd need to enter a sufficiently
| reasonable prompt to truly catch problems or surface reviews that
| match your standard and not generic AI comments.
|
| My company uses some automatic AI PR review bots, and they annoy
| me more than they help. Lots of useless comments
| hrpnk wrote:
| `gh pr diff num` is an alternative if you have the repo checked
| out. One can then pipe the output to one's favorite llm CLI and
| create a shell alias with a default review prompt.
|
| > My company uses some automatic AI PR review bots, and they
| annoy me more than they help. Lots of useless comments
|
| One way to make them more useful is to ask to list the topN
| problems found in the change set.
| visarga wrote:
| I would just put a PR_REVIEW.md file in the repo an have a CI
| agent run it on the diff/repo and decide pass or reject. In
| this file there are rules the code must be evaluated against.
| It could be project level policy, you just put your constraints
| you cannot check by code testing. Of course any constraint that
| can be a code test, better be a code test.
|
| My experience is you can trust any code that is well tested,
| human or AI generated. And you cannot trust any code that is
| not well tested (what I call "vibe tested"). But some
| constraints need to be in natural language, and for that you
| need a LLM to review the PRs. This combination of code tests
| and LLM review should be able to ensure reliable AI coding. If
| it does not, iterate on your PR rules and on tests.
| MYEUHD wrote:
| > TIL: you could add a ".diff" to a PR URL. Thanks!
|
| You can also append ".patch" and get a more useful output
| petesergeant wrote:
| I have been using Codex as a code review step and it has been
| magnificent, truly. I don't like how it writes code, but as a
| second line of defence I'm getting better code reviews out of it
| than I've ever had from a human.
| zedascouves wrote:
| Hum? I just tell claude to review pr #123 and it uses 'gh' to do
| everything, including responding to human comments! Feedback from
| coleagues has been awesome.
|
| We are sooo gonna get replaced soon...
| porise wrote:
| Good thing I work on an old C++ code base where it's impossible
| for AI to go through the millions of lines that all interact
| horribly in unpredictable ways.
| devttyeu wrote:
| Funny you mention that, I have very recently just came back
| from a one-shot prompt which fixed a rather complex template
| instantiation issue in a relatively big very convoluted low-
| level codebase (lots of asm, SPDK / userspace nvme, unholy
| shuffling of data between numa domains into shared l3/l2
| caches). That codebase maybe isn't in millions of lines of
| code but definitely is complex enough to need a month of
| onboarding time. Or you know, just give Claude Opus 4.5 a
| lldb backtrace with 70% symbols missing due to unholy linker
| gymnastics and get a working fix in 10 mins.
|
| And those are the worst models we will have used from now on.
| porise wrote:
| Template instantiation is relatively simple and can be
| resolved immediately. Trying to figure out how 4 different
| libraries interact with undefined behavior to boot is not
| going to be easy for AI for a while.
| Yeask wrote:
| Is all marketing, it does not even work with js frontend
| frameworks.
| didibus wrote:
| > Feedback from colleagues has been awesome
|
| Colleague's feedback:
|
| Claude> Address comments on PR #123
| tharkun__ wrote:
| Not my experience. Most Claude reviews are horrible and if I
| catch you replying with Claude (any AI really) under your own
| name you are gonna get two earfulls. Don't get me wrong, if you
| have an AI bot that I can have a convo with on the PR, sure.
| But you passing their stuff off as you: do that twice and
| you're dead to me.
|
| Now, I use it as well to review, just like you mention it pulls
| it via _gh_ , has all the source to reference and then tells me
| what it thinks. But it can't be left alone.
|
| Similarly people have been trying to pass root cause analyses
| off as true and they sound confident but have holes like a good
| Swiss cheese.
| mehdibl wrote:
| How to do agentic workflow like 2 years ago.
| sgt101 wrote:
| What would SOA be?
| elliottkember wrote:
| https://cursor.com/bugbot
|
| I didn't see this mentioned, but we've been running bugbot for a
| while now and it's very good. It catches so many subtle bugs.
| howToTestFE wrote:
| while this approach is useful, i think the diff is too small to
| catch a lot of bugs.
|
| i use https://www.coderabbit.ai/ and it tends to be aware of
| files that aren't in the diff, and definitely can see the rest of
| the file your are editing (not just the lines in the diff)
| syndacks wrote:
| In CC or Codex (or whichever) -- "run git diff and review"
| mvanbaak wrote:
| I still dont get the idea about AI code reviews. A code review
| (at least in my opinion) is for your peers to check if the
| changes will have a positive or negative effect on the overall
| code + architecture. I have yet to see an LLM being good at this.
|
| Sure, they will leave comments about common made errors (your
| editor should already warn about this before you even commit it)
| etc. But to notify about this weird thing that was done to make
| sure something a lot of customers wanted is made reality.
|
| also, PR's are created to share knowledge. Questions and answers
| on them are to spread knowledge in the team. AI does not do that.
|
| [edit] Added the part about knowledge sharing
| simonw wrote:
| Sure, AI code reviews aren't a replacement for an architecture
| review on a larger team project.
|
| But they're fantastic at spotting dumb mistakes or low-hanging
| fruit for improvements!
|
| And having the AI spot those for you first means you don't
| waste your team's valuable reviewing time on the simple stuff
| that you could have caught early.
| mvanbaak wrote:
| those AI checks, if you insist in getting them, should be
| part of your pre-commit, not part of your PR review flow.
| they are at best (if they even reach this level) as good as a
| local run of a linter or static type checker If you are
| running them as a PR check, the PR is out there. So people
| will spend time on that PR. no matter if you are fixing the
| AI comments or not. Best to fix those things BEFORE you
| provide your code to the team.
|
| [edit] Added part about wasting your teams time
| simonw wrote:
| I completely agree.
| tokioyoyo wrote:
| We have AI code reviews enabled for some PR reviews and we
| discuss them from time to time on the PR to see if it's
| worth doing it.
| nnutter wrote:
| My team uses draft PRs and goes through a process,
| including AI review, before removing the draft status
| thereby triggering any remaining human review.
|
| A PR is also a decent UI for getting the feedback but
| especially so for documenting/discussing the AI review
| suggestions with the team, just like human review.
|
| AI review is also not equivalent to linter and static
| checks. It can suggest practices appropriate for the
| language and appropriate for your code base. Like a lot of
| my AI experiences it's pretty hit or miss and it's non-
| deterministic but it doesn't have much cost to disregard
| the misses and I appreciate the hits.
| aidanlister wrote:
| This just sounds like you haven't worked in a team
| environment in the last 12 months.
|
| The ergonomics of doing this in pre-commit make no sense.
|
| Spin up a PR in GitHub and get Cursor and/or Claude to do a
| code review -- it's amazing.
|
| It'll often spot bugs (not only obvious ones), it'll
| utilise your agent.md to spot mismatched coding style,
| missing documentation, it'll check sentry to see if this
| part of the code touches a hotspot or a LOC that's been
| throwing off errors ... it's an amazing first pass.
|
| Once all the issues are resolved you can mark the PR as
| ready for review and get a human to look big picture.
|
| It's unquestionably a huge time saver for reviewers.
|
| And having the AI and human review take place with the same
| UX (comments attached to lines of code, being able to chat
| to the AI to explain decisions, having the AI resolve the
| comment when satisfied) just makes sense and is an obvious
| time saver for the submitter.
| wakawaka28 wrote:
| It makes as much sense to use AI in pre-commit as it does
| to use a linter.
| mvanbaak wrote:
| why not have AI review your code BEFORE you share it with
| the team ? that shows so much more respect to the rest of
| the team then just throwing your code into the wild, only
| to change it because some robot tells you that X could be
| Y
| Sharlin wrote:
| Stuff like coding style and missing documentation is what
| your basic dumb formatter and linter are supposed to do,
| using a LLM for such things is hilarious overkill and
| waste of electricity.
| gerad wrote:
| Your linter can tell if a comment exists. AI can tell if
| it's up to date.
| emeraldd wrote:
| My experience with AI code reviews has been very mixed and
| more on the negative side than the positive one. In
| particular, I've had to disable the AI reviewer on some
| projects my team manages because it was so chatty that it
| caused meaningful notifications from team members to be
| missed.
|
| In most of the repos I work with, it tends to make a large
| number of false positive or inappropriate suggestions that
| are just plain wrong for the code base in question. Sometimes
| these might be ok in some settings, but are generally just
| wrong. About 1 in every 10~20 comments is actually useful or
| something novel that hasn't been caught elsewhere etc. The
| net effect is that the AI reviewer we're effectively forced
| to use is just noise that get's ignored because it's so wrong
| so often.
| syntheticcdo wrote:
| Sometimes the only review a PR needs is "LGTM" - something
| today's LLMs are structurally incapable of.
| fusslo wrote:
| one person proved the uselessness of ai reviews for our
| entire company.
|
| He'd make giant, 100+ file changes, 1000+ worded PRs.
| Impossible to review. eventually he just modified the
| permissions to require a single approval, approves his
| changes and merges. This is still going on, but he's
| isolated to repos he made himself
|
| He'd copy/paste the output from AI on other people's
| reviews. Often they were false positives or open ended
| questions. So he automated his side, but doubled or tripled
| the work of the person requesting the review. not to
| mention the ai's comments were 100-300 words with
| formatting and emojis.
|
| The contractors refused to address any comments made by
| him. Some felt it was massively disrespectful as they put
| tons of time and effort into their changes and he can't
| even bother to read it himself.
|
| It got to the CTO. And AI reviews have been banned.
|
| But it HAS helped the one Jr guy on the team prepare for
| reviews and understand review comments better. It's also
| helped us write better comments, since I and some others
| can be really bad at explaining something
| insin wrote:
| I love having to hit Resolve Conversation umpteen times
| before I can merge because somebody added Copilot and it
| added that many dumb questions/suggestions
| bilalq wrote:
| This question is surprising to me, because I consider AI code
| review the single most valuable aspect of AI-assisted software
| development today. It's ahead of line/next-edit tab completion,
| agentic task completion, etc.
|
| AI code review does not replace human review. But AI reviewers
| will often notice little things that a human may miss.
| Sometimes the things they flag are false positives, but it's
| still worth checking in on them. If even one logical error or
| edge case gets caught by an AI reviewer that would've otherwise
| made it to production with just human review, it's a win.
|
| Some AI reviewers will also factor in context of related files
| not visible in the diff. Humans can do this, but it's time
| consuming, and many don't.
|
| AI reviews are also a great place to put "lint" like rules that
| would be complicated to express in standard linting tools like
| Eslint.
|
| We currently run 3-4 AI reviewers on our PRs. The biggest
| problem I run into is outdated knowledge. We've had AI
| reviewers leave comments based on limitations of DynamoDB or
| whatever that haven't been true for the last year or two. And
| of course it feels tedious when 3 bots all leave similar
| comments on the same line, but even that is useful as
| reinforcement of a signal.
| IshKebab wrote:
| Yeah a terrible review presumably. It has zero context.
| bhl wrote:
| gh pr diff [num]
|
| also works if you have the GitHub cli installed. Would setup an
| AGENTS.md or SKILL.md to instruct an agent on how to use gh too.
___________________________________________________________________
(page generated 2025-12-22 01:00 UTC)