codevoid.de/1/hn/comments_45702993.gph

  URI:

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   Visual Features Across Modalities: SVG and ASCII Art Cross-Modal Understanding
       
       
        robot-wrangler wrote 10 hours 52 min ago:
        Generating and displaying diagrams in mermaid, svg, or css has become
        one of my go-to tests for reasoning.  This seems fair because while SVG
        is admittedly syntactically difficult and maybe not emphasized in
        training, CSS is certainly a popular output target, and mermaid is very
        simple.  It seems like SOTA should be able to draw and modify things
        that it "understands".
        
        I'm much more interested in stuff like Venn diagrams and bipartite
        graphs than pictures of cats or pelicans riding bikes.    It's similar to
        a code-generation problem in that output is a new artifact that's one
        step away from the problem-presentation, but it has the advantage that
        it's simpler than code, is less likely to have exact-match training
        data, usually has one correct answer, and is easy to check.  Try making
        venn diagrams on a few circles with "exactly and only the following
        intersections" and gradually elaborating the spec.
        
        This is a great way to get a starter diagram boilerplate if that's what
        you're looking for.  One shot prompts for simple things are ok,
        sometimes.  But it always completely falls apart when you try to
        iterate with small modifications, introducing errors in parts that were
        correct previously or ignoring requested changes.  Maybe it's wrong to
        conclude anything from that, but to me this looks bad for the "they can
        reason!" argument and very bad for trusting complicated work in other
        domains that are harder to check.  Haven't read TFA yet, but whether it
        confirms or denies my gut here hopefully it's going to add some
        perspective
       
       
   DIR <- back to front page