[HN Gopher] Show HN: Autograd.c - A tiny ML framework built from...
___________________________________________________________________
Show HN: Autograd.c - A tiny ML framework built from scratch
built a tiny pytorch clone in c after going through prof. vijay
janapa reddi's mlsys book: mlsysbook.ai/tinytorch/ perfect for
learning how ml frameworks work under the hood :)
Author : sueszli
Score : 77 points
Date : 2025-12-16 06:26 UTC (6 days ago)
HTML web link (github.com)
TEXT w3m dump (github.com)
| sueszli wrote:
| woah, this got way more attention than i expected. thanks a lot.
|
| if you are interested in the technical details, the design specs
| are here:
| https://github.com/sueszli/autograd.c/blob/main/docs/design....
|
| if you are working on similar mlsys or compiler-style projects
| and think there could be overlap, please reach out:
| https://sueszli.github.io/
| spwa4 wrote:
| Cool. But this makes me wonder. This negates most of the
| advantages of C. Is there a compiler-autograd "library"?
| Something that would compile into C specifically to execute as
| fast as possible on CPUs with no indirection at all.
| thechao wrote:
| At best you'd be restricted to the forward mode, which would
| still double stack pressure. If you needed reverse mode you'd
| need 2x stack, and the back sweep over the stack based tape
| would have the nearly perfectly unoptimal "grain". If you
| allows the higher order operators (both push out and pull
| back), you're going to end up with Jacobians & Hessians over
| nontrivial blocks. That's going to need the heap. It's still
| better than an unbounded loop tape, though.
|
| We had all these issues back in 2006 when my group was
| implementing autograd for C++ and, later, a computer algebra
| system called Axiom. We knew it'd be ideal for NN; I was trying
| to build this out for my brother who was porting AI models to
| GPUs. (This did not work in 2006 for both HW & math reasons.)
| sueszli wrote:
| a heap-free implementation could be a really cool direction to
| explore. thanks!
|
| i think you might be interested in MLIR/IREE:
| https://github.com/openxla/iree
| attractivechaos wrote:
| > _Is there a compiler-autograd "library"?_
|
| Do you mean the method theano is using? Anyway, the performance
| bottleneck often lies in matrix multiplication or 2D-CNN (which
| can be reduced to matmul). Compiler autograd wouldn't save much
| time.
| marcthe12 wrote:
| We would need to mirror jax architecture more. Since the jax is
| sort of jit arch wise. Basically you somehow need a good way to
| convert computational graph to machine code while at compile
| time also perform a set of operations on the graph.
| PartiallyTyped wrote:
| Any reason for creating a new tensor when accumulating grads over
| updating the existing one?
|
| Edit: I asked this before I read the design decisions. Reasoning
| is, as far as I understand, that for simplificity no in-place
| operations hence accumulating it done on a new tensor.
| sueszli wrote:
| yeah, exactly. it's for explicit ownership transfer. you always
| own what you receive, sum it, release both inputs, done. no
| mutation tracking, no aliasing concerns.
|
| https://github.com/sueszli/autograd.c/blob/main/src/autograd...
|
| i wonder whether there is a more clever way to do this without
| sacrificing simplicity.
___________________________________________________________________
(page generated 2025-12-22 07:00 UTC)