hngopher.com/1/live/items/46285424

  URI:

       [HN Gopher] Show HN: Autograd.c - A tiny ML framework built from...
       ___________________________________________________________________
        
       Show HN: Autograd.c - A tiny ML framework built from scratch
        
       built a tiny pytorch clone in c after going through prof. vijay
       janapa reddi's mlsys book: mlsysbook.ai/tinytorch/  perfect for
       learning how ml frameworks work under the hood :)
        
       Author : sueszli
       Score  : 77 points
       Date   : 2025-12-16 06:26 UTC (6 days ago)
        
  HTML web link (github.com)
  TEXT w3m dump (github.com)
        
       | sueszli wrote:
       | woah, this got way more attention than i expected. thanks a lot.
       | 
       | if you are interested in the technical details, the design specs
       | are here:
       | https://github.com/sueszli/autograd.c/blob/main/docs/design....
       | 
       | if you are working on similar mlsys or compiler-style projects
       | and think there could be overlap, please reach out:
       | https://sueszli.github.io/
        
       | spwa4 wrote:
       | Cool. But this makes me wonder. This negates most of the
       | advantages of C. Is there a compiler-autograd "library"?
       | Something that would compile into C specifically to execute as
       | fast as possible on CPUs with no indirection at all.
        
         | thechao wrote:
         | At best you'd be restricted to the forward mode, which would
         | still double stack pressure. If you needed reverse mode you'd
         | need 2x stack, and the back sweep over the stack based tape
         | would have the nearly perfectly unoptimal "grain". If you
         | allows the higher order operators (both push out and pull
         | back), you're going to end up with Jacobians & Hessians over
         | nontrivial blocks. That's going to need the heap. It's still
         | better than an unbounded loop tape, though.
         | 
         | We had all these issues back in 2006 when my group was
         | implementing autograd for C++ and, later, a computer algebra
         | system called Axiom. We knew it'd be ideal for NN; I was trying
         | to build this out for my brother who was porting AI models to
         | GPUs. (This did not work in 2006 for both HW & math reasons.)
        
         | sueszli wrote:
         | a heap-free implementation could be a really cool direction to
         | explore. thanks!
         | 
         | i think you might be interested in MLIR/IREE:
         | https://github.com/openxla/iree
        
         | attractivechaos wrote:
         | > _Is there a compiler-autograd "library"?_
         | 
         | Do you mean the method theano is using? Anyway, the performance
         | bottleneck often lies in matrix multiplication or 2D-CNN (which
         | can be reduced to matmul). Compiler autograd wouldn't save much
         | time.
        
         | marcthe12 wrote:
         | We would need to mirror jax architecture more. Since the jax is
         | sort of jit arch wise. Basically you somehow need a good way to
         | convert computational graph to machine code while at compile
         | time also perform a set of operations on the graph.
        
       | PartiallyTyped wrote:
       | Any reason for creating a new tensor when accumulating grads over
       | updating the existing one?
       | 
       | Edit: I asked this before I read the design decisions. Reasoning
       | is, as far as I understand, that for simplificity no in-place
       | operations hence accumulating it done on a new tensor.
        
         | sueszli wrote:
         | yeah, exactly. it's for explicit ownership transfer. you always
         | own what you receive, sum it, release both inputs, done. no
         | mutation tracking, no aliasing concerns.
         | 
         | https://github.com/sueszli/autograd.c/blob/main/src/autograd...
         | 
         | i wonder whether there is a more clever way to do this without
         | sacrificing simplicity.
        
       ___________________________________________________________________
       (page generated 2025-12-22 07:00 UTC)