URI: 
        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Response Healing: Reduce JSON defects by 80%+
       
       
        culi wrote 51 min ago:
        I did some searching for an open-source version of this and found this
        pretty neat library for Elixir called json_remedy
        
  HTML  [1]: https://github.com/nshkrdotcom/json_remedy
       
        oats wrote 57 min ago:
        Is this a joke? Am I going crazy?
        
        I don't like this future we're going towards where we have to trick our
        software (which we can no longer understand the workings of) into doing
        what we tell it to by asking it nicely, or by putting another black box
        on the end to "fix" the output. This is the opposite of engineering.
        This is negotiation with a genie trapped in silicon.
       
          Eisenstein wrote 14 min ago:
          It is easier to realize that software development was never
          engineering. Physical engineering is reliant on physics, while
          software is reliant on other software. Physics are static and as
          regarding practical engineering is known and can be applied
          rigorously and taught in courses. Software is constantly changing,
          contain tons of edge cases, and as we can see by recent developments,
          can change in unpredictable ways and lead to entirely new paradigms.
          
          So, the software that you learned on is changing. You aren't going
          crazy, but the ground is indeed shifting. The problem is that you
          assumed it couldn't shift because you were applying the wrong
          constraints.
       
          blibble wrote 38 min ago:
          it does seem as if the world has gone insane
          
          we have brilliant machines that can more or less work perfectly
          
          then the scam artists have convinced people that spending a trillion
          dollar and terawatts to get essentially a biased random number
          generator to produce unusable garbage is somehow an improvement
       
            Spivak wrote 11 min ago:
            These models have turned a bunch of NLP problems that were
            previously impossible into something trivial. I have personally
            built extremely reliable systems from the biased random number
            generator. Our f-score using "classic" NLP went from 20% to 99%
            using LLMs.
       
        gruez wrote 1 hour 1 min ago:
        >What about XML? The plugin can heal XML output as well - contact us if
        you’d like access.
        
        Isn't this exactly how we got weird html parsing logic in the first
        place, with "autohealing" logic for mismatched closing tags or quotes?
       
        red2awn wrote 1 hour 40 min ago:
        Very confused. When you enable structured output the response should
        adhere to the JSON schema EXACTLY, not best effort, by constraining the
        output via guided decoding. This is even documented in OpenRouter's
        structured output doc
        
        > The model will respond with a JSON object that strictly follows your
        schema
        
        Gemini is listed as a model supporting structured output, and yet its
        fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output
        has a high performance cost but advertising it as supported when in
        reality it's not is a massive red flag.
        
        Worst yet response healing only fixes JSON syntax error, not schema
        adherence. This is only mentioned at the end of the article which
        people are clearly not going to read.
        
        WTF
       
          osaariki wrote 6 min ago:
          You're exactly right. The llguidance library [1,2] seems to have
          emerged as the go-to solution for this by virtue of being >10X faster
          than its competition. It's work from some past colleagues of mine at
          Microsoft Research based on theory of (regex) derivatives, which we
          perviously used to ship a novel kind of regex engine for .NET. It's
          cool work and AFAIK should ensure full adherence to a JSON grammar.
          
          llguidance is used in vLLM, SGLang, internally at OpenAI and
          elsewhere. At the same time, I also see a non-trivial JSON error rate
          from Gemini models in large scale synthetic generations, so perhaps
          Google hasn't seen the "llight" yet and are using something less
          principled.
          
          1: [1] 2:
          
  HTML    [1]: https://guidance-ai.github.io/llguidance/llg-go-brrr
  HTML    [2]: https://github.com/guidance-ai/llguidance
       
        lijok wrote 2 hours 18 min ago:
        One of the best shitposts I have ever seen, by far. Absurdism taken to
        its finest form.
       
        kristianp wrote 2 hours 30 min ago:
        How do they know the output needs to be in json format?
       
        idle_zealot wrote 2 hours 51 min ago:
        This really gets at the heart of my instinctive dislike of how LLMs are
        being deployed. A core feature of computers, and tools in general, is
        reliability. I like software because you can set something up, run it,
        and (ideally) know that it will do the same job the same way each
        subsequent time you run it. I want a button that is clearly labeled,
        and when pressed, does a specific thing, acting like a limb, an
        extension of my will. I do not, in almost all cases, want my computer
        to be another distinct entity that I conduct social interactions with.
        
        Maybe people got used to computers being unreliable and unpredictable
        as the UIs we shipped became more distracting, less learnable, always
        shifting and hiding information, popping up suggestions and displaying
        non-deterministic-seeming behavior. We trained users to treat their
        devices like unruly animals that they can never quite trust. So now the
        idea of a machine that embodies a more clever (but still unreliable)
        animal to wrangle sounds like a clear upgrade.
        
        But as someone who's spent an inordinate amount of time tweaking and
        tuning his computing environment to prune out flakey components and
        fine-tune bindings and navigation, the idea of integrating a tool into
        my workflow that does amazing things but fails utterly even 1% of the
        time sounds like a nightmare, a sort of perpetual torture of low-grade
        anxiety.
       
          ksenzee wrote 2 hours 20 min ago:
          > We trained users to treat their devices like unruly animals that
          they can never quite trust. So now the idea of a machine that
          embodies a more clever (but still unreliable) animal to wrangle
          sounds like a clear upgrade.
          
          I wish I didn't agree with this, but I think you're exactly right.
          Even engineers dealing with systems we know are deterministic will
          joke about making the right sacrifices to the tech gods to get
          such-and-such working. Take that a step further and maybe it doesn't
          feel too bad to some people for the system to actually not be
          deterministic, if you have a way to "convince" it to do what you
          want. How depressing.
       
        wat10000 wrote 3 hours 10 min ago:
        I thought structured output was done by only allowing tokens that would
        produce valid output. For their example of a missing closing bracket,
        the end token wouldn't be allowed, and it would only accept tokens that
        contain a digit, comma, or closing bracket. I guess that must not be
        the case, though. Doing that seems like a better way to address this.
       
          numlocked wrote 3 hours 0 min ago:
          That is a way of doing that, but it's quite expensive
          computationally. There are some companies that can make it feasible
          [0], but it's often not a perfect process and different inference
          providers implement it different ways.
          
          [0]
          
  HTML    [1]: https://dottxt.ai/
       
            xg15 wrote 38 min ago:
            Out of curiosity, why is it so expensive? Shouldn't constraining
            the possible result tokens make the inference less expensive?
            (because you have to calculate less logits and could occasionally
            even skip tokens entirely if there is only one valid option)
       
              wat10000 wrote 31 min ago:
              Is there anything in the JSON grammer that only allows one valid
              option? In any case, I also don't understand why it would be
              costly. The fact that tokens are typically multiple characters
              would complicate things somewhat, but checking that a given token
              results in valid partial JSON doesn't seem too hard.
       
                xg15 wrote 22 min ago:
                Freeform JSON not so much I think, but if you combine it with a
                schema and strict whitespace/formatting rules, you could get
                quite a few.
                
                I think there are lots of boilerplate sequences like '":{' or
                '":[' or '", "', etc - though they might already be compressed
                into a single token if the tokenizer was trained on enough
                JSON.
                
                There are also situations where the schema would only allow a
                specific field name as the next token, e.g. if it was the only
                remaining valid and required field, or if fields have to be
                output in a specific order.
       
            ViewTrick1002 wrote 2 hours 58 min ago:
            I have used structured outputs both with OpenAI and the Gemini
            models. In the beginning they had some rough edges but lately it's
            been smooth sailing.
            
            Seems like Openrouter also supports structured outputs.
            
  HTML      [1]: https://openrouter.ai/docs/guides/features/structured-outp...
       
        stuaxo wrote 3 hours 26 min ago:
        This is good, is there a python library to do this ?
       
        nubg wrote 3 hours 41 min ago:
        Dear Openrouter blog authors, could you please stop writing your
        blogposts with LLMs?
        
        The content of your posts is really insightful and interesting, but
        it's feel like junk quality because of the way LLMs write blogposts.
        
        What was your prompt?
       
          lab wrote 3 hours 0 min ago:
          A lot of it was finger written -- curious which part sounded like LLM
          to you?
       
            CallMeJim wrote 1 hour 41 min ago:
            > > Here's something most developers overlook: if an LLM has a 2%
            JSON defect rate, and Response Healing drops that to 1%, you
            haven't just made a 1% improvement. You've cut your defects, bugs,
            and support tickets in half.
            
            This sounds AI written.
       
            nubg wrote 2 hours 1 min ago:
            Meaning parts were LLM written? With no disclosure?
       
              Sabinus wrote 1 hour 28 min ago:
              "With no disclosure?"
              
              Why do you have an expectation that a company will disclose to
              you when they use AI for their copywriting? Do you want them to
              disclose the software they used to draft and publish? If a
              manager reviewed the blog post before it went live?
       
                curtisf wrote 11 min ago:
                Using words written by other people without disclosure has
                always been frowned upon. It's called plagiarism.
                
                Plagiarism is bad for a lot of reasons, all of which also apply
                to the undisclosed use of generative AI.
       
                nubg wrote 1 hour 5 min ago:
                Why not just publish the prompt? I can then take an LLM of my
                taste to reformat it the way I want.
                
                Basically, I'm asking for open source blogging!
       
          re-thc wrote 3 hours 24 min ago:
          Next up: blog healing
       
        top1aibooster wrote 3 hours 59 min ago:
        > Here's something most developers overlook: if an LLM has a 2% JSON
        defect rate, and Response Healing drops that to 1%, you haven't just
        made a 1% improvement. You've cut your defects, bugs, and support
        tickets in half.
        
        If part of my system can't even manage to output JSON reliably, it
        needs way more "healing" than syntax munging. This comes across as
        naive.
       
          Spivak wrote 17 min ago:
          Model itself can't output JSON reliably. It's on you building a
          system around the model to make sure it either returns correct output
          or errors which is trivial to do.
       
          01HNNWZ0MV43FF wrote 59 min ago:
          "it's not just X, it's Y"
          
          Don't you worry about Planet Express, let me worry about blank.
       
          0cf8612b2e1e wrote 3 hours 13 min ago:
          Sounds like we are twice as close to AGI!
       
          Dylan16807 wrote 3 hours 15 min ago:
          Plus, that claim isn't even true.  A 1% and 2% JSON defect rate are
          going to annoy a similar amount of people into filing bugs and
          tickets.
       
          arm32 wrote 3 hours 33 min ago:
          But, but, you've just cut your defects, bugs, and support tickets in
          half!
       
        seawatts wrote 1 day ago:
        This is incredible!
       
       
   DIR <- back to front page