URI: 
        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
  HTML Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
  HTML   How I turned Zig into my favorite language to write network programs in
       
       
        5- wrote 19 hours 47 min ago:
        perhaps it's a trivial observation that people tend to conflate the
        programming language in the strict sense (syntax, semantics, compiler
        implementation etc.) with its standard and/or community libraries and
        tooling.
        
        of course these are very important, but perhaps i'm just a language
        nerd/pedant who gets confused when an article about a programming
        language tends to be about async i/o libraries.
       
        RustSupremacist wrote 21 hours 23 min ago:
        > In the previous C++ version, I used Qt, which might seem very strange
        for a server software, but I wanted a nice way of doing asynchronous
        I/O and Qt allowed me to do that. It was callback-based, but Qt has a
        lot of support for making callbacks usable. In the newer prototypes, I
        used Go, specifically for the ease of networking and concurrency. With
        Zig, I was stuck.
        
        There are new Qt bindings for these. Go has [1] and Zig has [2] . I
        wonder if the author knew about them. I don't know enough about either
        language to speak on the async parts.
        
        For me, I want these for Rust, especially what Zig has because I use
        KDE. I know about [3] and it is the only maintained effort for Rust
        that is left standing after all these years. But I don't want QML. I
        definitely don't want C++ or CMake. I just want Rust and Cargo.
        
  HTML  [1]: https://github.com/mappu/miqt
  HTML  [2]: https://github.com/rcalixte/libqt6zig
  HTML  [3]: https://github.com/KDAB/cxx-qt
       
        d3ckard wrote 1 day ago:
        Honestly, have been excited about Zig for quite a while, dabbled a bit
        a while back and was waiting for it getting closer to 1.0 to actually
        do a deep dive... but that moment doesn't seem to come.
        
        I don't mind, it's up to the maintainers on how they want to proceed.
        However, I would greatly appreciate if Zig news was a bit clearer on
        what's happening, timelines etc.
        
        I think it takes relatively little time to do so, but optics would be
        so much better.
       
        cat-whisperer wrote 1 day ago:
        Stackful coroutines make sense when you have the RAM for it.
        
        I've been using Zig for embedded (ARM Cortex-M4, 256KB RAM) mainly for
        memory safety with C interop. The explicitness around calling
        conventions catches ABI mismatches at compile-time instead of runtime
        crashes.
        
        I actually prefer colored async (like Rust) over this approach. The
        "illusion of synchronous code" feels magical, but magic becomes a
        gotcha in larger codebases when you can't tell what's blocking and what
        isn't.
       
          pron wrote 18 hours 32 min ago:
          All synchronous code is an illusion created in software, as is the
          very notion of "blocking". The CPU doesn't block for IO. An OS thread
          is a (scheduled) "stackful coroutine" implemented in the OS that
          gives the illusion of blocking where there is none.
          
          The only problem is that the OS implements that illusion in a way
          that's rather costly, allowing only a relatively small number of
          threads (typically, you have no more than a few thousand
          frequently-active OS threads), while languages, which know more about
          how they use the stack, can offer the same illusion in a way that
          scales to a higher number of concurrent operations. But there's
          really no more magic in how a language implements this than in how
          the OS implements it, and no more illusion. They are both a mostly
          similar implementation of the same illusion. "Blocking" is always a
          software abstraction over machine operations that don't actually
          block.
          
          The only question is how important is it for software to distinguish
          the use of the same software abstraction between the OS and the
          language's implementation.
       
            zozbot234 wrote 17 hours 55 min ago:
            Unfortunately, the illusion of an OS thread relies on keeping a
            single consistent stack. Stackful coroutines (implemented on top of
            kernel threads) break this model in a way that has many detrimental
            effects; stackless ones do not.
       
              pron wrote 15 hours 34 min ago:
              It is true that in some languages there could be difficulties due
              to the language's idiosyncrasies of implementation, but it's not
              an intrinsic difficulty. We've implemented virtual threads in the
              JVM, and we've used the same Thread API with no issue.
       
                hawk_ wrote 14 hours 21 min ago:
                Yep the JVM structured concurrency implementation is amazing.
                One thing I got wondering especially when reading this post on
                HN though is if stackless coroutines could (have) fit the JVM
                in some way to get even better performance for those who may
                care.
       
                  pron wrote 12 hours 31 min ago:
                  They wouldn't have had better performance, though. There is
                  no significant performance penalty we're paying, although
                  there's a nuance here that may be worth pointing out.
                  
                  There are two different usecases for coroutines that may
                  tempt implementors to address with a single implementation,
                  but the usecases are sufficiently different to separate into
                  two different implementations. One is the generator use case.
                  What makes it special is that there are exactly two
                  communicating parties, and both of their state may fit in the
                  CPU cache. The other use case is general concurrency,
                  primarily for IO. In that situation, a scheduler juggles a
                  large number of user-mode threads, and because of that, there
                  is likely a cache miss on every context switch, no matter how
                  efficient it is. However, in the second case, almost all of
                  the performance is due to Little's law rather than context
                  switch time (see my explanation here: [1] ).
                  
                  That means that a "stackful" implementation of user-mode
                  threads can have no significant performance penalty for the
                  second use case (which, BTW, I think has much more value than
                  the first), even though a more performant implementation is
                  possible for the first use case. In Java we decided to tackle
                  the second use case with virtual threads, and so far we've
                  not offered something for the first (for which the demand is
                  significantly lower).
                  
                  What happens in languages that choose to tackle both use
                  cases with the same construct is that they gain negligible
                  performance in the second use case (at best), but they're
                  paying for that negligible benefit with a substantial
                  degradation in user experience. That's just a bad tradeoff,
                  but some languages (especially low-level ones) may have
                  little choice, because their stackful solution does carry a
                  significant performance cost compared to Java because of
                  Java's very efficient heap memory management.
                  
  HTML            [1]: https://inside.java/2020/08/07/loom-performance/
       
              lukaslalinsky wrote 17 hours 47 min ago:
              The OS allocates your thread stack in a very similar way that a
              coroutine runtime allocates the coroutine stack. The OS will swap
              the stack pointer and a bunch more things in each context switch,
              the coroutine runtime will also swap the stack pointer and some
              other things. It's really the same thing. The only difference is
              that the runtime in a compiled language knows more about your
              code  than the OS does, so it can make assumptions that the OS
              can't and that's what makes user-space coroutines lighter. The
              mechanisms are the same.
       
                zozbot234 wrote 17 hours 42 min ago:
                And the stackless runtime will use some other register than the
                stack pointer to access the coroutine's activation frame,
                leaving the stack pointer register free for OS and library use,
                and avoiding the many drawbacks of fiddling with the system
                stack as stackful coroutines do. It's the same thing.
       
          audunw wrote 19 hours 1 min ago:
          The new Zig IO will essentially be colored, but in a nicer way than
          Rust.
          
          You don't have to color your function based on whether you're
          supposed to use in in an async or sync manner. But it will
          essentially be colored based on whether it does I/O or not (the
          function takes IO interface as argument). Which is actually important
          information to "color" a function with.
          
          Whether you're doing async or sync I/O will be colored at the place
          where you call an IO function. Which IMO is the correct way to do it.
          If you call with "async" it's nonblocking, if you call without it,
          it's blocking. Very explicit, but not in a way that forces you to
          write a blocking and async version of all IO functions.
          
          The Zio readme says it will be an implementation of Zig IO interface
          when it's released.
          
          I guess you can then choose if you want explicit async (use Zig
          stdlib IO functions) or implicit async (Zio), and I suppose you can
          mix them.
          
          > Stackful coroutines make sense when you have the RAM for it.
          
          So I've been thinking a bit about this. Why should stackful
          coroutines require more RAM? Partly because when you set up the
          coroutine you don't know how big the stack needs to be, right? So you
          need to use a safe upper bound. While stackless will only set up the
          memory you need to yield the coroutine. But Zig has a goal of having
          a built-in to calculate the required stack size for calling a
          function. Something it should be able to do (when you don't have
          recursion and don't call external C code), since Zig compiles
          everything in one compilation unit.
          
          Zig devs are working on stackless coroutines as well. But I wonder if
          some of the benefits goes away if you can allocate exactly the amount
          of stack a stackful coroutine needs to run and nothing more.
       
            zozbot234 wrote 18 hours 51 min ago:
            > You don't have to color your function based on whether you're
            supposed to use in in an async or sync manner. But it will
            essentially be colored based on whether it does I/O or not (the
            function takes IO interface as argument). Which is actually
            important information to "color" a function with.'
            
            The Rust folks are working on a general effect system, including
            potentially an 'IO' effect. Being able to abstract out the
            difference between 'sync' and 'async' code is a key motivation of
            this.
       
            lukaslalinsky wrote 18 hours 55 min ago:
            This is not true. Imagine code like this:
            
                const n = try reader.interface.readVec(&data);
            
            Can you guess if it's going to do blocking or non-blocking I/O
            read?
            
            The io parameter is not really "coloring", as defined by the
            async/await debate, because you can have code that is completely
            unaware of any async I/O, pass it std.Io.Reader and it will just
            work, blocking or non-blocking, it makes no difference. Heck, you
            even even wrap this into C callbacks and use something like hiredis
            with async I/O.
            
            Stackful coroutines need more memory, because you need to
            pre-allocate large enough stack for the entire lifetime. With
            stackless coroutines, you only need the current state, but with the
            disadvantage that you need frequent allocations.
       
              NobodyNada wrote 16 hours 32 min ago:
              > Stackful coroutines need more memory, because you need to
              pre-allocate large enough stack for the entire lifetime. With
              stackless coroutines, you only need the current state, but with
              the disadvantage that you need frequent allocations.
              
              This is not quite correct -- a stackful coroutine can start with
              a small stack and grow it dynamically, whereas stackless
              coroutines allocate the entire state machine up front.
              
              The reason why stackful coroutines typically use more memory is
              that the task's stack must be large enough to hold both
              persistent state (like local variables that are needed across
              await points) and ephemeral state (like local variables that
              don't live across await points, and stack frames of leaf
              functions that never suspend). With a stackless implementation,
              the per-task storage only holds persistent state, and the OS
              thread's stack is available as scratch space for the current
              task's ephemeral state.
       
          vrnvu wrote 22 hours 44 min ago:
          > when you can't tell what's blocking and what isn't.
          
          Isn't that exactly why they're making IO explicit in functions? So
          you can trace it up the call chain.
       
        pjmlp wrote 1 day ago:
        Zio already exists,
        
  HTML  [1]: https://zio.dev/
       
        noselasd wrote 1 day ago:
        Mostly out of curiosity, a read on a TCP connection could easily block
        for a month - how does the I/O timeout interface look like ? e.g. if
        you want to send an application level heartbeat when a read has blocked
        for 30 seconds.
       
          dgb23 wrote 18 hours 15 min ago:
          You can set read and write timeouts on TCP sockets: [1] Zig has a
          posix API layer.
          
  HTML    [1]: https://linux.die.net/man/3/setsockopt
       
          secondcoming wrote 1 day ago:
          This is very true. Most examples of async io I've seen - regardless
          of the framework - gloss over timeouts and cancellation. It's really
          the hardest part. Reading and writing asynchronously from a socket,
          or whatever, is the straightforward part.
       
          lukaslalinsky wrote 1 day ago:
          I don't have a good answer for that yet, mostly because TCP reads are
          expected to be done through std.Io.Reader which isn't aware of
          timeouts.
          
          What I envision is something like `asyncio.timeout` in Python, where
          you start a timeout and let the code run as usual. If it's in I/O
          sleep when the timeout fires, it will get woken up and the operation
          gets canceled.
          
          I see something like this:
          
              var timeout: zio.Timeout = .init;
              defer timeout.cancel(rt);
          
              timeout.set(rt, 10);
              const n = try reader.interface.readVec(&data);
       
            sgt wrote 1 day ago:
            Are you working using Zig master with the new Io interface passed
            around, by the way?
       
              lukaslalinsky wrote 23 hours 56 min ago:
              No, I'm targeting Zig 0.15. The new Io interface is not in master
              yet, it's still evolving. When it's merged to master and stable,
              I'll start implementing the vtable. But I'm just passing Runtime
              around, instead of Io. So you can easily migrate code from zio to
              std when it's released.
       
        sriku wrote 1 day ago:
        The article says it was created to write audio software but I'm unable
        to find any first sources for that. Pointers?
       
          lukaslalinsky wrote 1 day ago:
          See the first example in Andrew's introduction:
          
  HTML    [1]: https://andrewkelley.me/post/intro-to-zig.html
       
        aidenn0 wrote 1 day ago:
        I am still mystified as to why callback-based async seems to have
        become the standard.  What this and e.g. libtask[1] do seems so much
        cleaner to me.
        
        The Rust folks adopted async with callbacks, and they were essentially
        starting from scratch so had no need to do it that way, and they are
        smarter than I (both individually and collectively) so I'm sure they
        have a reason; I just don't know what it is.
        
        1:
        
  HTML  [1]: https://swtch.com/libtask/
       
          NobodyNada wrote 16 hours 49 min ago:
          > The Rust folks adopted async with callbacks
          
          Rust's async is not based on callbacks, it's based on polling. So
          really there are three ways to implement async:
          
          - The callback approach used by e.g. Node.js and Swift, where a
          function that may suspend accepts a callback as an argument, and
          invokes the callback once it is ready to make progress. The compiler
          transforms async/await code into continuation-passing style.
          
          - The stackful approach used by e.g. Go, libtask, and this; where a
          runtime switches between green threads when a task is ready to make
          progress. Simple and easy to implement, but introduces complexity
          around stack size.
          
          - Rust's polling approach: an async task is statically transformed
          into a state machine object that is polled by a runtime when it's
          ready to make progress.
          
          Each approach has its advantages and disadvantages.
          Continuation-passing style doesn't require a runtime to manage tasks,
          but each call site must capture local variables into a closure, which
          tends to require a lot of heap allocation and copying (you could also
          use Rust's generic closures, but that would massively bloat code size
          and compile times because every suspending function must be
          specialized for each call site). So it's not really acceptable for
          applications looking for maximum performance and control over
          allocations.
          
          Stackful coroutines require managing stacks. Allocating large stacks
          is very expensive in terms of performance and memory usage; it won't
          scale to thousands or millions of tasks and largely negates the
          benefits of green threading. Allocating small stacks means you need
          the ability to dynamically resize stacks at runtime, which requires
          dynamic allocation and adds significant performance and complexity
          overhead if you want to make an FFI call from an asynchronous task
          (in Go, every function begins with a prologue to check if there is
          enough stack space and allocate more if needed; since foreign
          functions do not have this prologue, an FFI call requires switching
          to a sufficiently large stack). This project uses fixed-sized task
          stacks, customizable per-task but defaulting to 256K [1]. This
          default is several orders of mangitude larger than a typical task
          size in other green-threading runtimes, so to achieve large scale the
          programmer must manually manage the stack size on a per-task basis,
          and face stack overflows if they guess wrong (potentially only in
          rare/edge cases).
          
          Rust's "stackless" polling-based approach means the compiler knows
          statically exactly how much persistent storage a suspended task
          needs, so the application or runtime can allocate this storage
          up-front and never need to resize it; while a running task has a full
          OS thread stack available as scratch space and for FFI. It doesn't
          require dynamic memory allocation, but it imposes limits on things
          like recursion. Rust initially had stackful coroutines, but this was
          dropped in order to not require dynamic allocation and remove the FFI
          overhead.
          
          The async support in Zig's standard library, once it's complete, is
          supposed to let the application developer choose between stackful and
          stackless coroutines depending on the needs of the application.
          
          [1] 
          
  HTML    [1]: https://github.com/lalinsky/zio/blob/9e2153eed99a772225de9b2...
       
          MisterTea wrote 17 hours 17 min ago:
          See also: [1] The history of this concurrency model is here:
          
  HTML    [1]: https://man.9front.org/2/thread
  HTML    [2]: https://seh.dev/go-legacy/
       
          torginus wrote 23 hours 38 min ago:
          Stackless coroutines can be implemented using high level language
          constructs, and entirely in your language. Because of this it
          interacts with legacy code, and existing language features in
          predictable ways. Some security software or code hardening and
          instrumentation libraries will break as well.
          
          Also, async at low level is literally always callbacks (even
          processor interrupts are callbacks)
          
          By mucking about with the stack, you break stuff like stack unwinding
          for exceptions and GC, debuggers, and you probably make a bunch of
          assumptions you shouldn't
          
          If you start using the compiler backend in unexpected ways, you
          either expose bugs or find missing functionality and find that the
          compiler writers made some assumptions about the code (either
          rightfully or not), that break when you start wildly overwriting
          parts of the stack.
          
          Writing a compiler frontend is hard enough as it is, and becoming an
          LLVM expert is generally too much for most people.
          
          But even if you manage to get it working, should you have your code
          break in either the compiler or any number of widely used external
          tooling, you literally can't fast track your fix, and thus you can't
          release your language (since it depends on a broken external
          dependency, fix pending whenever they feel like it).
          
          I guess even if you are some sort of superhero who can do all this
          correclty, the LLVM people won't be happy merging some low level
          codegen change that has the potential to break all compiled software
          of trillion dollar corporations for the benefit of some small
          internet project.
       
          secondcoming wrote 1 day ago:
          > callback-based async seems to have become the standard
          
          At some level it's always callbacks. Then people build frameworks on
          top of these so programmers can pretend they're not dealing with
          callbacks.
       
          boomlinde wrote 1 day ago:
          One thing I would consider "unclean" about the zio approach (and e.g.
          libtask) is that you pass it an arbitrary expected stack size (or, as
          in the example, assume the default) and practically just kind of hope
          it's big enough not to blow up and small enough to be able to spawn
          as many tasks as you need. Meanwhile, how much stack actually ends up
          being needed by the function is a platform specific implementation
          detail and hard to know.
          
          This is a gotcha of using stack allocation in general, but
          exacerbated in this case by the fact that you have an incentive to
          keep the stacks as small as possible when you want many concurrent
          tasks. So you either end up solving the puzzle of how big exactly the
          stack needs to be, you undershoot and overflow with possibly
          disastrous effects (especially if your stack happens to overflow into
          memory that doesn't cause an access violation) or you overshoot and
          waste memory. Better yet, you may have calculated and optimized your
          stack size for your platform and then the code ends up doing UB on a
          different platform with fewer registers, bigger `c_long`s or
          different alignment constraints.
          
          If something like [1] actually gets implemented I will be happier
          about this approach.
          
  HTML    [1]: https://github.com/ziglang/zig/issues/157
       
            aidenn0 wrote 7 hours 53 min ago:
            Maybe I've been on x64 Linux too long, but I would just specify 8MB
            of stack for each fiber and let overcommit handle the rest.  For
            small fibers that would be 4k per fiber of RSS so a million fibers
            is 4GB of RAM which seems fine to me?
       
            Hendrikto wrote 22 hours 27 min ago:
            Couldn’t you use the Go approach of starting with a tiny stack
            that is big enough for 90% of cases, then grow it as needed?
       
              boomlinde wrote 15 hours 31 min ago:
              Consider that resizing the stack may require reallocating it
              elsewhere in memory. This would invalidate any internal pointers
              to the stack.
              
              AFAIK Go solves this by keeping track of these pointer locations
              and adjusting them when reallocating the stack. Aside from the
              run-time cost this incurs, this is unsuitable for Zig because it
              can't stricly know whether values represent pointers.
              
              Go technically also has this problem as well, if you for example
              convert a pointer to a uintptr, but maintains no guarantee that a
              former pointer will still be valid when converted back. Such
              conversions are also rarely warranted and are made explicit using
              the `unsafe` package.
              
              Zig is more like C in that it gives the programmer rather than a
              memory management runtime exclusive control and free rein over
              the memory. If there are some bits in memory that happen to have
              the same size as a pointer, Zig sees no reason to stop you from
              interpreting them as such. This is very powerful, but precludes
              abstractions like Go's run-time stack reallocation.
       
              loeg wrote 16 hours 50 min ago:
              8kB is enough for 90% of use cases.  But then you invoke
              getaddrinfo() once and now your stack is 128kB+.
       
              lukaslalinsky wrote 21 hours 50 min ago:
              Go depends on the fact that it can track all pointers, and when
              it needs to resize stacks, it can update them.
              
              Previous versions of Go used segmented stacks, which are
              theoretically possible, if Zig really wanted (would need compiler
              support), but they have nasty performance side-effects, see
              
  HTML        [1]: https://www.youtube.com/watch?v=-K11rY57K7k
       
                loeg wrote 16 hours 49 min ago:
                Resizing stacks on use does not depend on any of these
                properties of Go.  You can do it like this in C, too.  It does
                not require segmentation.
       
                  boomlinde wrote 15 hours 24 min ago:
                  Resizing stacks insofar that expansion may require moving the
                  stack to some other place in memory that can support the new
                  size depends on these properties. Your initial 4k of
                  coroutine stack may have been allocated some place that wont
                  fit the new 8k of coroutine stack.
                  
                  Or are you making a point about virtual memory? If so, that
                  assumption seems highly platform dependent.
       
                    loeg wrote 14 hours 2 min ago:
                    You would implement this with virtual memory.  Obviously,
                    this is less of a limited resource on 64-bit systems.  And
                    I wouldn't recommend the Go/stack/libtask style model for
                    high concurrency on any platform.
       
                  lukaslalinsky wrote 16 hours 9 min ago:
                  I'm very interested to know how. Do you mean reserving a huge
                  chunk of virtual memory and slowly allocating it? That works
                  to some degree, but limits how many coroutines can you really
                  spawn.
       
                    loeg wrote 14 hours 2 min ago:
                    Yes, exactly.
       
          oaiey wrote 1 day ago:
          I think it started with an interrupt. And less abstraction often
          wins.
       
            dgb23 wrote 18 hours 44 min ago:
            This is the only explanation here I can intuitively understand!
       
          loeg wrote 1 day ago:
          The thread stack for something like libtask is ambiguously sized and
          often really large relative to like, formalized async state.
       
          vlovich123 wrote 1 day ago:
          The research Microsoft engineers did on stackful vs stackless
          coroutines for the c++ standard I think swayed this as “the way”
          to implement it for something targeting a systems level -
          significantly less memory overhead (you only pay for what you use)
          and offload the implementation details of the executor (lots of
          different design choices that can be made).
       
            aidenn0 wrote 7 hours 58 min ago:
            > significantly less memory overhead
            
            On an OS with overcommit, you might also only pay for what you use
            (at a page granularity), but this may be defeated if the stack gets
            cleared (or initialized to a canary value) by the runtime.
       
            zozbot234 wrote 1 day ago:
            Yup, stackful fibers are an anti-pattern. Here's Gor Nishanov's
            review for the C++ ISO committee [1] linked from [2] . Notice how
            it sums things up:
            
            > DO NOT USE FIBERS!
            
  HTML      [1]: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p...
  HTML      [2]: https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=...
       
              aidenn0 wrote 7 hours 39 min ago:
              Many of these issues go away if you control the compiler and
              runtime, which Rust does (and they needed to make changes to
              those to add async, so changes were inevitable).
       
              gpderetta wrote 23 hours 31 min ago:
              And this is the rebuttal: [1] There are downsides to stackful
              coroutines (peak stack usage for example), but    I feed that p1364
              was attacking a strawman: first of all it is comparing a solution
              with builtin compiler support against a pure library
              implementation, second it is not even comparing against the
              reference implementation of the competing proposal.
              
  HTML        [1]: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2019...
       
                aidenn0 wrote 7 hours 50 min ago:
                The TL;DR of that sums up my opinions pretty well.
                
                As an aside, I know Rust would be unlikely to implement
                segmented stacks for fibers, given that they were burned by the
                performance implications thereof previously.
       
              torginus wrote 23 hours 48 min ago:
              > DO NOT USE FIBERS!
              
              For C++.
              
              If your language has RAII or exceptions, it raises crazy
              questions about how if thread A is hosting fiber 1, which throws
              an exception, which propagates outside of the fiber invocation 
              scope, destroys a bunch of objects, then we switch to fiber 2,
              which sees the world in an inconsistent state (outside resources
              have been cleaned up, inside ones still alive).
              
              This was literally impossible in pre-fiber code, so most existing
              code would probably not handle it well.
       
                gpderetta wrote 23 hours 29 min ago:
                That's not different from threads running concurrent exceptions
                (in fact it is simpler in the single threaded example). RAII or
                exceptions are really not an issue for stackful coroutines.
       
              sgt wrote 1 day ago:
              Is stackful fibers the same as stackful coroutines?
       
                gpderetta wrote 23 hours 29 min ago:
                yes same thing, different names.
       
        otobrglez wrote 1 day ago:
        There is an extremely popular library/framework for Scala named ZIO out
        there,… Naming is hard.
       
        tombert wrote 1 day ago:
        I really need to play with Zig.  I got really into Rust a few months
        ago, and I was actually extremely impressed by Tokio, so if this
        library also gives me Go-style concurrency without having to rely on a
        garbage collector, then I am likely to enjoy it.
       
          lukaslalinsky wrote 1 day ago:
          Go has tricks that you can't replicate elsewhere, things like
          infinitely growable stacks, that's only possible thanks to the
          garbage collector. But I did enjoy working on this, I'm continually
          impressed with Zig for how nice high-level looking APIs are possible
          in such a low-level language.
       
            gpderetta wrote 23 hours 18 min ago:
            You mean GO segmented stacks? You can literally them in C and C++
            with GCC and glibc. It was implemented to support gccgo, but it
            works for other languages as well.
            
            It is an ABI change though, so you need to recompile the whole
            stack (there might be the ability for segmented code to call non
            segmented code, but I don't remember the extent of the support) and
            it is probably half deprecated now. But it works and it doesn't
            need GC.
       
              pjmlp wrote 22 hours 54 min ago:
              I think by now we can consider gccgo will enventually join gcj.
              
              The Fortran, Modula-2 and ALGOL 68 frontends are getting much
              more development work than gccgo, stuck in pre-generics Go,
              version 1.18 from 2022 and no one is working on it other than
              minor bug fixes.
       
              lukaslalinsky wrote 23 hours 1 min ago:
              No, Go abandoned segmented stacks a long time ago. It causes
              unpredictable performance, because you can hit alloc/free cycle
              somewhere deep in code. What they do now is that when they hit
              stack guard, they allocate a new stack (2x size), copy the data,
              update pointers. Shrinking happens during GC.
       
            pjmlp wrote 1 day ago:
            Also, it is about time to let go with GC-phobia. [1] [2] Note the
            
            > This video illustrates the use case of Perc within the Aegis
            Combat System, a digital command and control system capable of
            identifying and tracking incoming threats and providing the war
            fighter with a solution to address threats. Aegis, developed by
            Lockheed Martin, is critical to the operation of the DDG-51, and
            Lockheed Martin has selected Perc as the operating platform for
            Aegis to address real-time requirements and response times.
            
            Not all GCs are born alike.
            
  HTML      [1]: https://www.withsecure.com/en/solutions/innovative-securit...
  HTML      [2]: https://www.ptc.com/en/products/developer-tools/perc
       
              jandrewrogers wrote 18 hours 36 min ago:
              That GC introduces latencies of ~1000µs. The article is about
              eliminating ~10µs context switching latencies. Completely
              different performance class. The "GC-phobia" is warranted if you
              care about software performance, throughput, and scalability.
              
              DoD uses languages like Java in applications where raw throughput
              and low-latency is not critical to success. A lot of what AEGIS
              does is not particularly performance sensitive.
       
              bccdee wrote 19 hours 5 min ago:
              Real-time GCs can only guarantee a certain number of
              deallocations per second. Even with a very well-designed GC,
              there's no free lunch. A system which manages its memory
              explicitly will not need to risk overloading its GC.
       
                aidenn0 wrote 7 hours 35 min ago:
                I think you have that backwards; they can only guarantee a
                certain number of allocations per second (once the application
                hits steady-state the two are the same, but there are times
                when it matters)
       
              RossBencina wrote 1 day ago:
              > Not all GCs are born alike.
              
              True. However in the bounded-time GC space few projects share the
              same definitions of low-latency or real-time. So you have to find
              a language that meets all of your other desiderata and provides a
              GC that meets your timing requirements. Perc looks interesting,
              Metronome made similar promises about sub-ms latency. But I'd
              have to get over my JVM runtime phobia.
       
                pjmlp wrote 1 day ago:
                I consider one where human lifes depend on it, for good or
                worse depending on the side, real time enough.
       
                  bccdee wrote 19 hours 11 min ago:
                  Human lives often depend on processes that can afford to be
                  quite slow. You can have a real time system requiring only
                  sub-hour latency; the "realness" of a real-time deadline is
                  quite distinct from the duration of that deadline.
       
              kunley wrote 1 day ago:
              GC is fine, what scaries me is using j*va in Aegis..
       
                Ygg2 wrote 1 day ago:
                The OutOfMemoryError will happen after rocket hits the target.
       
            aidenn0 wrote 1 day ago:
            Pre-1.0 Rust used to have infinitely growing stacks, but they
            abandoned it due to (among other things) performance reasons (IIRC
            the stacks were not collected with Rust's GC[1], but rather on
            return; the deepest function calls may happen in tight loops, and
            if you are allocating and freeing the stack in a tight loop, oops!)
            
            1: Yes, pre-1.0 Rust had a garbage collector.
       
              RustSupremacist wrote 21 hours 19 min ago:
              Rust still has garbage collection if you use Arc and Rc. Not a
              garbage collector but this type of garbage collection.
       
                aidenn0 wrote 7 hours 46 min ago:
                I'm going to veer into no-true-scottsman territory for a bit
                and claim that those don't count since they cannot collect
                cycles (if I'm wrong and they implement e.g. trial-deletion,
                let me know).  This isn't just academic, since cyclic
                data-structures are an important place where the borrow-checker
                can't help you, so a GC would be useful.
       
                echelon wrote 18 hours 45 min ago:
                You mean Drop, which is entirely predictable and controlled by
                the user?
       
        mrasong wrote 1 day ago:
        The first time I heard about Zig was actually on Bun’s website,
        it’s been getting better and better lately.
       
        mananaysiempre wrote 1 day ago:
        > Context switching is virtually free, comparable to a function call.
        
        If you’re counting that low, then you need to count carefully.
        
        A coroutine switch, however well implemented, inevitably breaks the
        branch predictor’s idea of your return stack, but the effect of
        mispredicted returns will be smeared over the target coroutine’s
        execution rather than concentrated at the point of the switch. (Similar
        issues exist with e.g. measuring the effect of blowing the cache on a
        CPU migration.) I’m actually not sure if Zig’s async design even
        uses hardware call/return pairs when a (monomorphized-as-)async
        function calls another one, or if every return just gets translated to
        an indirect jump. (This option affords what I think is a cleaner design
        for coroutines with compact frames, but it is much less friendly to the
        CPU.)
        
        So a foolproof benchmark would require one to compare the total
        execution time of a (compute-bound) program that constantly switches
        between (say) two tasks to that of an equivalent program that not only
        does not switch but (given what little I know about Zig’s
        “colorless” async) does not run under an async executor(?) at all.
        Those tasks would also need to yield on a non-trivial call stack each
        time. Seems quite tricky all in all.
       
          gpderetta wrote 20 hours 50 min ago:
          If you constantly switch between two tasks from the bottom of their
          call stack (as for stackless coroutines) and your stack switching
          code is inlined, then you can mostly avoid the mispaired call/ret
          penalty.
          
          Also, if you control the compiler, an option is to compile all
          call/rets in and out of "io" code in terms of explicit jumps. A ret
          implemented as pop+indirect jump will be less less predictable than a
          paired ret, but has more chances to be predicted than an unpaired
          one.
          
          My hope is that, if stackful coroutines become more mainstreams, CPU
          microarchitectures will start using a meta-predictor to chose between
          the return stack predictor and the indirect predictor.
       
          jadbox wrote 20 hours 54 min ago:
          Semi-unrelated, but async is coming soon to Zig. I'm sorta holding
          off getting deep into Zig until it lands.
          
  HTML    [1]: https://kristoff.it/blog/zig-new-async-io/
       
            throwawaymaths wrote 19 hours 21 min ago:
            the point of all this io stuff is that you'll be able to start
            playing with zig before async comes and when async comes it will be
            either drop in if you choose an async io for main() or it will be a
            line or two of code if you pick an event loop manually.
       
          lukaslalinsky wrote 1 day ago:
          You are right that the statement was overblown, however when I was
          testing with "trivial" load between yields (synchronized ping-pong
          between coroutines), I was getting numbers that I had trouble
          believing, when comparing them to other solutions.
       
            gpderetta wrote 21 hours 22 min ago:
            In my test of a similar setup in C++ (IIRC about 10 years ago!), I
            was able to do a context switch every other cycle. The bottleneck
            was literally the cycles per taken jump of the microarchitecture I
            was testing again. As in your case it was a trivial test with two
            coroutines doing nothing except context switching, so the compiler
            had no need to save any registers at all and I carefully defined
            the ABI to be able to keep stack and instruction pointers in
            registers even across switches.
       
          messe wrote 1 day ago:
          > I’m actually not sure if Zig’s async design even uses hardware
          call/return pairs
          
          Zig no longer has async in the language (and hasn't for quite some
          time). The OP implemented task switching in user-space.
       
            loeg wrote 1 day ago:
            Even so.  You're talking about storing and loading at least ~16
            8-byte registers, including the instruction pointer which is
            essentially a jump.  Even to L1 that takes some time; more than a
            simple function call (jump + pushed return address).
       
              ori_b wrote 16 hours 40 min ago:
              Which, with store forwarding, can be shockingly cheap. You may
              not actually be hitting L1, and if you are, you're probably not
              hitting it synchronously. [1] and, section 15.10 of
              
  HTML        [1]: https://easyperf.net/blog/2018/03/09/Store-forwarding
  HTML        [2]: https://www.agner.org/optimize/microarchitecture.pdf
       
                loeg wrote 14 hours 0 min ago:
                Are you talking about context switching every handful of
                cycles? This is going to be extremely inefficient even with
                store forwarding.
       
                  ori_b wrote 11 min ago:
                  Sure, and so is calling a function every handful of cycles.
                  That's a big part of why compilers inline.
                  
                  Either you're context switching often enough that store
                  forwarding helps, or you're not spending a lot of time
                  context switching. Either way, I would expect that you aren't
                  waiting on L1: you put the write into a queue and move on.
       
              lukaslalinsky wrote 1 day ago:
              Only stack and instruction pointer are explicitly restored. The
              rest is handled by the compiler, instead of depending on the C
              calling convention, it can avoid having things in registers
              during yield.
              
              See this for more details on how stackful coroutines can be made
              much faster:
              
  HTML        [1]: https://photonlibos.github.io/blog/stackful-coroutine-ma...
       
                messe wrote 1 day ago:
                > The rest is handled by the compiler, instead of depending on
                the C calling convention, it can avoid having things in
                registers during yield.
                
                Yep, the frame pointer as well if you're using it. This is
                exactly how its implemented in user-space in Zig's WIP std.Io
                branch green-threading implementation: [1] On ARM64, only fp,
                sp and pc are explicitly restored; and on x86_64 only rbp, rsp,
                and rip. For everything else, the compiler is just informed
                that the registers will be clobbered by the call, so it can
                optimize allocation to avoid having to save/restore them from
                the stack when it can.
                
  HTML          [1]: https://github.com/ziglang/zig/blob/ce704963037fed60a3...
       
                  flimflamm wrote 1 day ago:
                  Is this just buttering the cost of switches by crippling the
                  optimization options compiler have?
       
                    GoblinSlayer wrote 1 day ago:
                    I wonder how you see it. Stackful coroutines switch context
                    on syscall in the top stack frame, the deeper frames are
                    regular optimized code, but syscall/sysret is already big
                    context switch. And read/epoll loop has exactly same
                    structure, the point of async programming isn't
                    optimization of computation, but optimization of memory
                    consumption. Performance is determined by features and
                    design (and Electron).
       
                    hawk_ wrote 1 day ago:
                    What do you mean by "buttering the cost of switches", can
                    you elaborate? (I am trying to learn about this topic)
       
                      masfuerte wrote 23 hours 53 min ago:
                      I think it is
                      
                      > buttering the cost of switches [over the whole
                      execution time]
                      
                      The switches get cheaper but the rest of the code gets
                      slower (because it has less flexibility in register
                      allocation) so the cost of the switches is "buttered"
                      (i.e. smeared) over the rest of the execution time.
                      
                      But I don't think this argument holds water.  The
                      surrounding code can use whatever registers it wants.  In
                      the worst case it saves and restores all of them, which
                      is what a standard context switch does anyway.    In other
                      words, this can be better and is never worse.
       
                    lukaslalinsky wrote 1 day ago:
                    If this was done the classical C way, you would always have
                    to stack-save a number of registers, even if they are not
                    really needed. The only difference here is that the
                    compiler will do the save for you, in whatever way fits the
                    context best. Sometimes it will stack-save, sometimes it
                    will decide to use a different option. It's always strictly
                    better than explicitly saving/restoring N registers unaware
                    of the context. Keep in mind, that in Zig, the compiler
                    always knows the entire code base. It does not work on
                    object/function boundaries. That leads to better
                    optimizations.
       
                      hawk_ wrote 1 day ago:
                      This is amazing to me that you can do this in Zig code
                      directly as opposed to messing with the compiler.
       
                        lukaslalinsky wrote 1 day ago:
                        See [1] for GNU C++ example. It's a tiny bit more
                        limited, because of how the compilation works, but the
                        concept is the same.
                        
  HTML                  [1]: https://github.com/alibaba/PhotonLibOS/blob/2f...
       
                        messe wrote 1 day ago:
                        To be fair, this can be done in GNU C as well. Like the
                        Zig implementation, you'd still have to use inline
                        assembly.
       
                          hawk_ wrote 1 day ago:
                          > If this was done the classical C way, you would
                          always have to stack-save a number of registers
                          
                          I see, so you're saying that GCC can be coaxed into
                          gathering only the relevant registers to stack and
                          unstack not blindly do all of them?
       
                            messe wrote 21 hours 37 min ago:
                            Yes, you write inline assembly that saves the frame
                            pointer, stack pointer, and instruction pointer to
                            the stack, and list every other register as a
                            clobber. GCC will know which ones its using at the
                            call-site (assuming the function gets inlined; this
                            is more likely in Zig due to its single unit of
                            compilation model), and save those to the stack. If
                            it doesn't get inlined, it'll be treated as any
                            other C function and only save the ones needed to
                            be preserved by the target ABI.
       
        quantummagic wrote 1 day ago:
        Isn't this a bad time to be embracing Zig?  It's currently going
        through an intrusive upheaval of its I/O model.  My impression is that
        it was going to take a few years for things to shake out.  Is that
        wrong?
       
          dualogy wrote 1 day ago:
          > My impression is that it was going to take a few years for things
          to shake out. Is that wrong?
          
          I had that very impression in early 2020 after some months of Zigging
          (and being burned by constant breaking changes), and left, deciding
          "I'll check it out again in a few years."
          
          I had some intuition it might be one of these forever-refactoring
          eternal-tinker-and-rewrite fests and here I am 5 years later, still
          lurking for that 1.0 from the sidelines, while staying in Go or C
          depending on the nature of the thing at hand.
          
          That's not to say it'll never get there, it's a vibrant project
          prioritizing making the best design decisions rather than mere
          Shipping Asap. For a C-replacement that's the right spirit, in
          principle. But whether there's inbuilt immunity to engineers falling
          prey to their forever-refine-and-resculpt I can't tell. I find it a
          great project to wait for leisurely  (=
       
          lukaslalinsky wrote 1 day ago:
          It really depends on what you are doing, but if it's something
          related to I/O and you embrace the buffered reader/writer interfaces
          introduced in Zig 0.15, I think not much is going to change. You
          might need changes on how you get those interfaces, but the core of
          your code is unchanged.
       
          laserbeam wrote 1 day ago:
          Kind of is a bad idea. Even the author’s library is not using the
          latest zig IO features and is planning for big changes with 0.16.
          From the readme of the repo:
          
          > Additionally, when Zig 0.16 is released with the std.Io interface,
          I will implement that as well, allowing you to use the entire
          standard library with this runtime.
          
          Unrelated to this library, I plan to do lots of IO with Zig and will
          wait for 0.16. Your intuition may decide otherwise and that’s ok.
       
          grayhatter wrote 1 day ago:
          IMO, it's very wrong. Zig's language is not drastically changing,
          it's adding a new, *very* powerful API, which similar to how most
          everything in zig passes an allocator as a function param, soon
          functions that want to do IO, will accept an object that will provide
          the desired abstraction, so that callers can define the ideal
          implementation.
          
          In other words, the only reason to not use zig if you detest
          upgrading or improving your code. Code you write today will still
          work tomorrow. Code you write tomorrow, will likely have a new Io
          interface, because you want to use that standard abstraction. But, if
          you don't want to use it, all your existing code will still work.
          
          Just like today, if you want to alloc, but don't want to pass an
          `Allocator` you can call std.heap.page_allocator.alloc from anywhere.
          But because that abstraction is so useful, and zig supports it so
          ergonomically, everyone writes code that provides that improved API
          
          side note; I was worried about upgrading all my code to interface
          with the new Reader/Writer API that's already mostly stable in
          0.15.2, but even though I had to add a few lines in many existing
          projects to upgrade. I find myself optionally choosing to refactor a
          lot of functions because the new API results is code that is SO much
          better. Both in readability, but also performance. Do I have to
          refactor? No, the old API works flawlessly, but the new API is simply
          more ergonomic, more performant and easier to read and reason about.
          I'm doing it because I want to, not because I have to.
          
          Everyone knows' a red diff is the best diff, and the new std.Io API
          exposes an easier way to do things. Still, like everything in zig, it
          allows you to write the code that you want to write. But if you want
          to do it yourself, that's fully supported too!
       
            kunley wrote 1 day ago:
            Zealotry in almost every paragraph.
       
            brabel wrote 1 day ago:
            > Code you write today will still work tomorrow.
            
            Haha no! Zig makes breaking changes in the stdlib in every release.
            I can guarantee you won’t be able to update a non trivial project
            between any of the latest 10 versions and beyond without changing
            your code , often substantially, and the next release is changing
            pretty much all code doing any kind of IO. I know because I keep
            track of that in a project and can see diffs between each of the
            latest versions. This allows me to modify other code much more
            easily.
            
             But TBH, in 0.15 only zig build broke IIRC. However, I just
            didn’t happen to use some of the things that changed, I believe.
       
            do_not_redeem wrote 1 day ago:
            This isn't quite accurate. If you look at the new IO branch[1]
            you'll see (for example) most of the std.fs functions are gone, and
            most of what's left is deprecated. The plan is for all file/network
            access, mutexes, etc to be accessible only through the Io
            interface. It'll be a big migration once 0.16 drops.
            
            [1] > Do I have to refactor? No, the old API works flawlessly
            
            The old API was deleted though? If you're saying it's possible to
            copy/paste the old stdlib into your project and maintain the old
            abstractions forward through the ongoing language changes, sure
            that's possible, but I don't think many people will want to fork
            std. I copy/pasted some stuff temporarily to make the 0.15
            migration easier, but maintaining it forever would be swimming
            upstream for no reason.
            
  HTML      [1]: https://github.com/ziglang/zig/blob/init-std.Io/lib/std/fs...
       
              grayhatter wrote 1 day ago:
              > most of the std.fs functions are gone, and most of what's left
              is deprecated.
              
              uhhh.... huh? you and I must be using very different definitions
              for the word most.
              
              > The old API was deleted though?
              
              To be completely fair, you're correct, the old deprecated writer
              that was available in 0.15 has been removed [1] contrasted with
              the master branch which doesn't provide this anymore.
              
              edit: lmao, your profile about text is hilarious, I appreciate
              the laugh!
              
  HTML        [1]: https://ziglang.org/documentation/0.15.2/std/#std.Io.Dep...
       
                do_not_redeem wrote 1 day ago:
                Even the basic stuff like `openFile` is deprecated. I don't
                know what else to tell you. Zig won't maintain two slightly
                different versions of the fs functions in parallel. Once
                something is deprecated, that means it's going away.
                
  HTML          [1]: https://github.com/ziglang/zig/blob/init-std.Io/lib/st...
       
                  grayhatter wrote 1 day ago:
                  Oh, I guess that's a fair point. I didn't consider the change
                  from `std.fs.openFile` to `std.Io.Dir.openFile` to be
                  meaningful, but I guess that is problematic for some reason?
                  
                  You're of course correct here; but I thought it was
                  reasonable to omit changes that I would describe as namespace
                  changes. Now considering the audience I regret doing so. (it
                  now does require nhe Io object as well, so namespace is
                  inarticulate here)
       
                    bccdee wrote 19 hours 28 min ago:
                    > I didn't consider the change from `std.fs.openFile` to
                    `std.Io.Dir.openFile` to be meaningful, but I guess that is
                    problematic for some reason?
                    
                    Because you explicitly said that existing code would
                    continue to work without `std.Io`.
                    
                    > Code you write tomorrow, will likely have a new Io
                    interface, because you want to use that standard
                    abstraction. But, if you don't want to use it, all your
                    existing code will still work.
                    
                    I like Zig, but it does not have a stable API. That's just
                    how it is.
       
                    Ar-Curunir wrote 1 day ago:
                    That is literally a breaking change, so your old code will
                    by definition not work flawlessly. Maybe the migration
                    overhead is low, but it’s not zero like your comment
                    implies
       
          geysersam wrote 1 day ago:
          What's a few years? They go by in the blink of an eye. Zig is a
          perfectly usable language. People who want to use it will, those who
          don't won't.
       
            attila-lendvai wrote 1 day ago:
            following upstream is overrated since we have good package managers
            and version control.
            
            it's completely feasible to stick to something that works for you,
            and only update/port/rewrite when it makes sense.
            
            what matters is the overall cost.
       
              kunley wrote 1 day ago:
              Hmm, if one writes a library Zetalib for the language Frob v0.14
              and then Frob v0.15 introduces breaking changes that everyone
              else is going to adapt to, then well, package managers and
              version control is going to help indeed - they will help in
              staying in a void as no one will use Zetalib anymore because of
              the older Frob.
       
                all2 wrote 18 hours 19 min ago:
                For libs, yes, for applications dev, no.
                
                I would expect fixing an application to an older version would
                be just fine, so long as you don't need newer language
                features. If newer language features are a requirement, I would
                expect that would drive refactoring or selecting a different
                implementation language entirely if refactoring would prove to
                be too onerous.
       
            tonyhart7 wrote 1 day ago:
            only for hobby project
       
              scuff3d wrote 1 day ago:
              TigerBeetle, Bun, and Ghostty all beg to differ...
       
              nesarkvechnep wrote 1 day ago:
              You or in general? Because, you know, this is like, your opinion,
              man.
       
                tonyhart7 wrote 1 day ago:
                My Opinion???
                
                how about you goes to Zig github and check how progress of the
                language
                
                it literally there and its still beta test and not fit for
                production let alone have mature ecosystem
       
                  dns_snek wrote 1 day ago:
                  Yes, your opinion. I run it in production and everything I've
                  built with it has been rock solid (aside from my own bugs). I
                  haven't touched a few of my projects in a few years and they
                  work fine, but if I wanted to update them to the latest
                  version of Zig I'd have a bit of work ahead of me. That's it.
       
        dxxvi wrote 1 day ago:
        Do you know that there's a concurrent Scala library named ZIO ( [1] )?
        :-)
        
  HTML  [1]: https://zio.dev
       
        breatheoften wrote 1 day ago:
        What makes a NATS client implementation the right prototype from which
        to extract a generic async framework layer?
        
        This looks interesting but I'm not familiar with NATS
       
          lukaslalinsky wrote 1 day ago:
          The layer was not extracted from the NATS client, the NATS client was
          just a source of frustration that prompted this creation.
       
          maxbond wrote 1 day ago:
          If you succeed in creating a generic async primitive, it doesn't
          really matter what the original task was (as long as it's something
          that requires async), no? That's an implication of it being generic?
       
        supportengineer wrote 1 day ago:
        Move Zig, for great justice.
       
          echelon wrote 1 day ago:
          One of the very first internet memes. The zig team should adopt it as
          the slogan.
          
  HTML    [1]: https://en.wikipedia.org/wiki/All_your_base_are_belong_to_us
       
            dgb23 wrote 18 hours 9 min ago:
            It has in a way! See:
            
  HTML      [1]: https://github.com/ziglang/zig/blob/master/lib/init/src/ma...
       
       
   DIR <- back to front page