hngopher.com/1/live/items/46346391

  URI:

       [HN Gopher] Get an AI code review in 10 seconds
       ___________________________________________________________________
        
       Get an AI code review in 10 seconds
        
       Author : oldmanrahul
       Score  : 89 points
       Date   : 2025-12-21 17:21 UTC (7 hours ago)
        
  HTML web link (oldmanrahul.com)
  TEXT w3m dump (oldmanrahul.com)
        
       | Smaug123 wrote:
       | With not much more effort you can get a _much_ better review by
       | additionally concatenating the touched files and sending them as
       | context along with the diff. It was the work of about five
       | minutes to make the scaffolding of a very basic bot that does
       | this, and then somewhat more time iterating on the prompt. By the
       | way, I find it 's _seriously_ worth sucking up the extra ~four
       | minutes of delay and going up to GPT-5 high rather than using a
       | dumber model; I suspect xhigh is worth the ~5x additional bump in
       | runtime on top of high, but at that point you have to start
       | rearchitecting your workflows around it and I haven 't solved
       | that problem yet.
       | 
       | (That's if you don't want to go full Codex and have an agent play
       | around with the PR. Personally I find that GPT-5.2 xhigh is
       | incredibly good at analysing diffs-plus-context without tools.)
        
         | verdverm wrote:
         | I've been using gemini-3-flash the last few days and it is
         | quite good, I'm not sure you need the biggest models anymore. I
         | have only switched to pro once or twice the last few days
         | 
         | Here are the commits, the tasks were not trivial
         | 
         | https://github.com/hofstadter-io/hof/commits/_next/
         | 
         | Social posts and pretty pictures as I work on my custom copilot
         | replacement
         | 
         | https://bsky.app/profile/verdverm.com
        
           | Smaug123 wrote:
           | Depends what you mean by "need", of course, but in my
           | experience the curves aren't bending yet; better model still
           | means better-quality review (although GPT-5.0 high was still
           | a reasonably competent reviewer)!
        
           | pawelduda wrote:
           | Yes, it's my new daily driver for light coding and the rest.
           | Also great at object recognition and image gen
        
         | fweimer wrote:
         | Do you do any preprocessing of diffs to replace significant
         | whitespace with some token that is easier to spot? In my
         | experience, some LLMs cannot tell unchanged context from the
         | actual changes. That's especially annoying with -U99999 diffs
         | as a shortcut to provide full file context.
        
           | Smaug123 wrote:
           | I've only ever had that problem when supplying a formatted
           | diff alone. Once I moved to "provide the diff, and then also
           | provide the entire contents of the file after the change",
           | I've never had the problem. (I've also only seriously used
           | GPT-5.0 high or more powerful models for this.)
        
       | ocharles wrote:
       | I recently started using LLMs to review my code before asking for
       | a more formal review from colleagues. It's actually been
       | surprisingly useful - why waste my colleagues time with small
       | obvious things? But it's also gone much further than that
       | sometimes with deeper reviews points. Even when I don't agree
       | with them it's great having that little bit more food for thought
       | - if anything it helps seed the review
        
         | danlamanna wrote:
         | Are you using a particularly well crafted prompt or just
         | something off the cuff?
        
           | sultson wrote:
           | This one's served fairly well: "Review this diff - detect top
           | 10 problem-causers, highlight 3 worst - I'm talking bugs with
           | editing,saving etc. (not type errors or other minor aspects)
           | [your diff]". The bit on "editing, saving" would vary based
           | on goal of diff.
        
           | morkalork wrote:
           | Not who you're replying to but working at a small small
           | company, I didn't have anyone to give my code for review to
           | so have used AI to fill in that gap. I usually go with a
           | specific then general pass, where for example if I'm making
           | heavy use of async logic, I'll ask the LLM to pay particular
           | attention to pitfalls that can arise with it.
        
           | ocharles wrote:
           | We're a Haskell shop, so I usually just say "review the
           | current commit. You're an experienced Haskell programmer and
           | you value readable and obvious code" (because that it is
           | indeed what we value on the team). I'll often ask it to
           | explicitly consider testing, too
        
           | eterm wrote:
           | Personally, this is what I use in claude code:
           | 
           | "Diff to master and review the changes. Branch designed to
           | address <problem statement>. Write output to d:\claudeOut in
           | typst (.typ) format."
           | 
           | It'll do the diffs and search both branch and master versions
           | of files.
           | 
           | I prefer reading PDFs than markdown, but it'll default to
           | markdown unprompted if you prefer.
           | 
           | I have almost all my workspaces configured with /add-dir to
           | add d:/claudeOut and d:/claudeIn as general scratch folders
           | for temporary in/out file permissions so it can read/write
           | outside the context of the workspace for things like this.
           | 
           | You might get better results using a better crafted prompt
           | (or code review skill?). In general I find claude code
           | reviews are:                 - Overly fussy about null
           | checking everything       - Completely miss on whether the PR
           | has properly distilled the problem down to its essence
           | - Are good at catching spelling mistakes       - Like to
           | pretend they know if something is well architectured, but
           | doesn't
           | 
           | So it's a bit of a mixed bag, I find it focuses on trivia but
           | it's still useful as a first pass before letting your
           | teammates have to catch that same trivia.
           | 
           | It will absolutely assume too much from naming, so it's kind
           | of a good spot if it's making wrong kind of assumptions about
           | how parts work, to think how to name things more clearly.
           | 
           | e.g. If you write a class called "AddingFactory", it'll go
           | around assuming that's what it does, even if the core of it
           | returns (a, b) -> a*b.
           | 
           | You have to then work hard to get it to properly examine the
           | file and convince itself that it is actually a multiplier.
           | 
           | Obviously real-world examples are more subtle than that, but
           | if you're finding yourself arguing with it, it's worth
           | sometimes considering whether you should rename things.
        
         | afro88 wrote:
         | This is exactly the right approach IMO. You find the signal
         | amongst the slop, and all your colleagues see is a better PR.
        
       | ohans wrote:
       | TIL: you could add a ".diff" to a PR URL. Thanks!
       | 
       | As for PR reviews, assuming you've got linting and static
       | analysis out the way, you'd need to enter a sufficiently
       | reasonable prompt to truly catch problems or surface reviews that
       | match your standard and not generic AI comments.
       | 
       | My company uses some automatic AI PR review bots, and they annoy
       | me more than they help. Lots of useless comments
        
         | hrpnk wrote:
         | `gh pr diff num` is an alternative if you have the repo checked
         | out. One can then pipe the output to one's favorite llm CLI and
         | create a shell alias with a default review prompt.
         | 
         | > My company uses some automatic AI PR review bots, and they
         | annoy me more than they help. Lots of useless comments
         | 
         | One way to make them more useful is to ask to list the topN
         | problems found in the change set.
        
         | visarga wrote:
         | I would just put a PR_REVIEW.md file in the repo an have a CI
         | agent run it on the diff/repo and decide pass or reject. In
         | this file there are rules the code must be evaluated against.
         | It could be project level policy, you just put your constraints
         | you cannot check by code testing. Of course any constraint that
         | can be a code test, better be a code test.
         | 
         | My experience is you can trust any code that is well tested,
         | human or AI generated. And you cannot trust any code that is
         | not well tested (what I call "vibe tested"). But some
         | constraints need to be in natural language, and for that you
         | need a LLM to review the PRs. This combination of code tests
         | and LLM review should be able to ensure reliable AI coding. If
         | it does not, iterate on your PR rules and on tests.
        
         | MYEUHD wrote:
         | > TIL: you could add a ".diff" to a PR URL. Thanks!
         | 
         | You can also append ".patch" and get a more useful output
        
       | petesergeant wrote:
       | I have been using Codex as a code review step and it has been
       | magnificent, truly. I don't like how it writes code, but as a
       | second line of defence I'm getting better code reviews out of it
       | than I've ever had from a human.
        
       | zedascouves wrote:
       | Hum? I just tell claude to review pr #123 and it uses 'gh' to do
       | everything, including responding to human comments! Feedback from
       | coleagues has been awesome.
       | 
       | We are sooo gonna get replaced soon...
        
         | porise wrote:
         | Good thing I work on an old C++ code base where it's impossible
         | for AI to go through the millions of lines that all interact
         | horribly in unpredictable ways.
        
           | devttyeu wrote:
           | Funny you mention that, I have very recently just came back
           | from a one-shot prompt which fixed a rather complex template
           | instantiation issue in a relatively big very convoluted low-
           | level codebase (lots of asm, SPDK / userspace nvme, unholy
           | shuffling of data between numa domains into shared l3/l2
           | caches). That codebase maybe isn't in millions of lines of
           | code but definitely is complex enough to need a month of
           | onboarding time. Or you know, just give Claude Opus 4.5 a
           | lldb backtrace with 70% symbols missing due to unholy linker
           | gymnastics and get a working fix in 10 mins.
           | 
           | And those are the worst models we will have used from now on.
        
             | porise wrote:
             | Template instantiation is relatively simple and can be
             | resolved immediately. Trying to figure out how 4 different
             | libraries interact with undefined behavior to boot is not
             | going to be easy for AI for a while.
        
           | Yeask wrote:
           | Is all marketing, it does not even work with js frontend
           | frameworks.
        
         | didibus wrote:
         | > Feedback from colleagues has been awesome
         | 
         | Colleague's feedback:
         | 
         | Claude> Address comments on PR #123
        
         | tharkun__ wrote:
         | Not my experience. Most Claude reviews are horrible and if I
         | catch you replying with Claude (any AI really) under your own
         | name you are gonna get two earfulls. Don't get me wrong, if you
         | have an AI bot that I can have a convo with on the PR, sure.
         | But you passing their stuff off as you: do that twice and
         | you're dead to me.
         | 
         | Now, I use it as well to review, just like you mention it pulls
         | it via _gh_ , has all the source to reference and then tells me
         | what it thinks. But it can't be left alone.
         | 
         | Similarly people have been trying to pass root cause analyses
         | off as true and they sound confident but have holes like a good
         | Swiss cheese.
        
       | mehdibl wrote:
       | How to do agentic workflow like 2 years ago.
        
         | sgt101 wrote:
         | What would SOA be?
        
       | elliottkember wrote:
       | https://cursor.com/bugbot
       | 
       | I didn't see this mentioned, but we've been running bugbot for a
       | while now and it's very good. It catches so many subtle bugs.
        
       | howToTestFE wrote:
       | while this approach is useful, i think the diff is too small to
       | catch a lot of bugs.
       | 
       | i use https://www.coderabbit.ai/ and it tends to be aware of
       | files that aren't in the diff, and definitely can see the rest of
       | the file your are editing (not just the lines in the diff)
        
       | syndacks wrote:
       | In CC or Codex (or whichever) -- "run git diff and review"
        
       | mvanbaak wrote:
       | I still dont get the idea about AI code reviews. A code review
       | (at least in my opinion) is for your peers to check if the
       | changes will have a positive or negative effect on the overall
       | code + architecture. I have yet to see an LLM being good at this.
       | 
       | Sure, they will leave comments about common made errors (your
       | editor should already warn about this before you even commit it)
       | etc. But to notify about this weird thing that was done to make
       | sure something a lot of customers wanted is made reality.
       | 
       | also, PR's are created to share knowledge. Questions and answers
       | on them are to spread knowledge in the team. AI does not do that.
       | 
       | [edit] Added the part about knowledge sharing
        
         | simonw wrote:
         | Sure, AI code reviews aren't a replacement for an architecture
         | review on a larger team project.
         | 
         | But they're fantastic at spotting dumb mistakes or low-hanging
         | fruit for improvements!
         | 
         | And having the AI spot those for you first means you don't
         | waste your team's valuable reviewing time on the simple stuff
         | that you could have caught early.
        
           | mvanbaak wrote:
           | those AI checks, if you insist in getting them, should be
           | part of your pre-commit, not part of your PR review flow.
           | they are at best (if they even reach this level) as good as a
           | local run of a linter or static type checker If you are
           | running them as a PR check, the PR is out there. So people
           | will spend time on that PR. no matter if you are fixing the
           | AI comments or not. Best to fix those things BEFORE you
           | provide your code to the team.
           | 
           | [edit] Added part about wasting your teams time
        
             | simonw wrote:
             | I completely agree.
        
             | tokioyoyo wrote:
             | We have AI code reviews enabled for some PR reviews and we
             | discuss them from time to time on the PR to see if it's
             | worth doing it.
        
             | nnutter wrote:
             | My team uses draft PRs and goes through a process,
             | including AI review, before removing the draft status
             | thereby triggering any remaining human review.
             | 
             | A PR is also a decent UI for getting the feedback but
             | especially so for documenting/discussing the AI review
             | suggestions with the team, just like human review.
             | 
             | AI review is also not equivalent to linter and static
             | checks. It can suggest practices appropriate for the
             | language and appropriate for your code base. Like a lot of
             | my AI experiences it's pretty hit or miss and it's non-
             | deterministic but it doesn't have much cost to disregard
             | the misses and I appreciate the hits.
        
             | aidanlister wrote:
             | This just sounds like you haven't worked in a team
             | environment in the last 12 months.
             | 
             | The ergonomics of doing this in pre-commit make no sense.
             | 
             | Spin up a PR in GitHub and get Cursor and/or Claude to do a
             | code review -- it's amazing.
             | 
             | It'll often spot bugs (not only obvious ones), it'll
             | utilise your agent.md to spot mismatched coding style,
             | missing documentation, it'll check sentry to see if this
             | part of the code touches a hotspot or a LOC that's been
             | throwing off errors ... it's an amazing first pass.
             | 
             | Once all the issues are resolved you can mark the PR as
             | ready for review and get a human to look big picture.
             | 
             | It's unquestionably a huge time saver for reviewers.
             | 
             | And having the AI and human review take place with the same
             | UX (comments attached to lines of code, being able to chat
             | to the AI to explain decisions, having the AI resolve the
             | comment when satisfied) just makes sense and is an obvious
             | time saver for the submitter.
        
               | wakawaka28 wrote:
               | It makes as much sense to use AI in pre-commit as it does
               | to use a linter.
        
               | mvanbaak wrote:
               | why not have AI review your code BEFORE you share it with
               | the team ? that shows so much more respect to the rest of
               | the team then just throwing your code into the wild, only
               | to change it because some robot tells you that X could be
               | Y
        
               | Sharlin wrote:
               | Stuff like coding style and missing documentation is what
               | your basic dumb formatter and linter are supposed to do,
               | using a LLM for such things is hilarious overkill and
               | waste of electricity.
        
               | gerad wrote:
               | Your linter can tell if a comment exists. AI can tell if
               | it's up to date.
        
           | emeraldd wrote:
           | My experience with AI code reviews has been very mixed and
           | more on the negative side than the positive one. In
           | particular, I've had to disable the AI reviewer on some
           | projects my team manages because it was so chatty that it
           | caused meaningful notifications from team members to be
           | missed.
           | 
           | In most of the repos I work with, it tends to make a large
           | number of false positive or inappropriate suggestions that
           | are just plain wrong for the code base in question. Sometimes
           | these might be ok in some settings, but are generally just
           | wrong. About 1 in every 10~20 comments is actually useful or
           | something novel that hasn't been caught elsewhere etc. The
           | net effect is that the AI reviewer we're effectively forced
           | to use is just noise that get's ignored because it's so wrong
           | so often.
        
             | syntheticcdo wrote:
             | Sometimes the only review a PR needs is "LGTM" - something
             | today's LLMs are structurally incapable of.
        
             | fusslo wrote:
             | one person proved the uselessness of ai reviews for our
             | entire company.
             | 
             | He'd make giant, 100+ file changes, 1000+ worded PRs.
             | Impossible to review. eventually he just modified the
             | permissions to require a single approval, approves his
             | changes and merges. This is still going on, but he's
             | isolated to repos he made himself
             | 
             | He'd copy/paste the output from AI on other people's
             | reviews. Often they were false positives or open ended
             | questions. So he automated his side, but doubled or tripled
             | the work of the person requesting the review. not to
             | mention the ai's comments were 100-300 words with
             | formatting and emojis.
             | 
             | The contractors refused to address any comments made by
             | him. Some felt it was massively disrespectful as they put
             | tons of time and effort into their changes and he can't
             | even bother to read it himself.
             | 
             | It got to the CTO. And AI reviews have been banned.
             | 
             | But it HAS helped the one Jr guy on the team prepare for
             | reviews and understand review comments better. It's also
             | helped us write better comments, since I and some others
             | can be really bad at explaining something
        
             | insin wrote:
             | I love having to hit Resolve Conversation umpteen times
             | before I can merge because somebody added Copilot and it
             | added that many dumb questions/suggestions
        
         | bilalq wrote:
         | This question is surprising to me, because I consider AI code
         | review the single most valuable aspect of AI-assisted software
         | development today. It's ahead of line/next-edit tab completion,
         | agentic task completion, etc.
         | 
         | AI code review does not replace human review. But AI reviewers
         | will often notice little things that a human may miss.
         | Sometimes the things they flag are false positives, but it's
         | still worth checking in on them. If even one logical error or
         | edge case gets caught by an AI reviewer that would've otherwise
         | made it to production with just human review, it's a win.
         | 
         | Some AI reviewers will also factor in context of related files
         | not visible in the diff. Humans can do this, but it's time
         | consuming, and many don't.
         | 
         | AI reviews are also a great place to put "lint" like rules that
         | would be complicated to express in standard linting tools like
         | Eslint.
         | 
         | We currently run 3-4 AI reviewers on our PRs. The biggest
         | problem I run into is outdated knowledge. We've had AI
         | reviewers leave comments based on limitations of DynamoDB or
         | whatever that haven't been true for the last year or two. And
         | of course it feels tedious when 3 bots all leave similar
         | comments on the same line, but even that is useful as
         | reinforcement of a signal.
        
       | IshKebab wrote:
       | Yeah a terrible review presumably. It has zero context.
        
       | bhl wrote:
       | gh pr diff [num]
       | 
       | also works if you have the GitHub cli installed. Would setup an
       | AGENTS.md or SKILL.md to instruct an agent on how to use gh too.
        
       ___________________________________________________________________
       (page generated 2025-12-22 01:00 UTC)