_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
HTML How I turned Zig into my favorite language to write network programs in
5- wrote 19 hours 47 min ago:
perhaps it's a trivial observation that people tend to conflate the
programming language in the strict sense (syntax, semantics, compiler
implementation etc.) with its standard and/or community libraries and
tooling.
of course these are very important, but perhaps i'm just a language
nerd/pedant who gets confused when an article about a programming
language tends to be about async i/o libraries.
RustSupremacist wrote 21 hours 23 min ago:
> In the previous C++ version, I used Qt, which might seem very strange
for a server software, but I wanted a nice way of doing asynchronous
I/O and Qt allowed me to do that. It was callback-based, but Qt has a
lot of support for making callbacks usable. In the newer prototypes, I
used Go, specifically for the ease of networking and concurrency. With
Zig, I was stuck.
There are new Qt bindings for these. Go has [1] and Zig has [2] . I
wonder if the author knew about them. I don't know enough about either
language to speak on the async parts.
For me, I want these for Rust, especially what Zig has because I use
KDE. I know about [3] and it is the only maintained effort for Rust
that is left standing after all these years. But I don't want QML. I
definitely don't want C++ or CMake. I just want Rust and Cargo.
HTML [1]: https://github.com/mappu/miqt
HTML [2]: https://github.com/rcalixte/libqt6zig
HTML [3]: https://github.com/KDAB/cxx-qt
d3ckard wrote 1 day ago:
Honestly, have been excited about Zig for quite a while, dabbled a bit
a while back and was waiting for it getting closer to 1.0 to actually
do a deep dive... but that moment doesn't seem to come.
I don't mind, it's up to the maintainers on how they want to proceed.
However, I would greatly appreciate if Zig news was a bit clearer on
what's happening, timelines etc.
I think it takes relatively little time to do so, but optics would be
so much better.
cat-whisperer wrote 1 day ago:
Stackful coroutines make sense when you have the RAM for it.
I've been using Zig for embedded (ARM Cortex-M4, 256KB RAM) mainly for
memory safety with C interop. The explicitness around calling
conventions catches ABI mismatches at compile-time instead of runtime
crashes.
I actually prefer colored async (like Rust) over this approach. The
"illusion of synchronous code" feels magical, but magic becomes a
gotcha in larger codebases when you can't tell what's blocking and what
isn't.
pron wrote 18 hours 32 min ago:
All synchronous code is an illusion created in software, as is the
very notion of "blocking". The CPU doesn't block for IO. An OS thread
is a (scheduled) "stackful coroutine" implemented in the OS that
gives the illusion of blocking where there is none.
The only problem is that the OS implements that illusion in a way
that's rather costly, allowing only a relatively small number of
threads (typically, you have no more than a few thousand
frequently-active OS threads), while languages, which know more about
how they use the stack, can offer the same illusion in a way that
scales to a higher number of concurrent operations. But there's
really no more magic in how a language implements this than in how
the OS implements it, and no more illusion. They are both a mostly
similar implementation of the same illusion. "Blocking" is always a
software abstraction over machine operations that don't actually
block.
The only question is how important is it for software to distinguish
the use of the same software abstraction between the OS and the
language's implementation.
zozbot234 wrote 17 hours 55 min ago:
Unfortunately, the illusion of an OS thread relies on keeping a
single consistent stack. Stackful coroutines (implemented on top of
kernel threads) break this model in a way that has many detrimental
effects; stackless ones do not.
pron wrote 15 hours 34 min ago:
It is true that in some languages there could be difficulties due
to the language's idiosyncrasies of implementation, but it's not
an intrinsic difficulty. We've implemented virtual threads in the
JVM, and we've used the same Thread API with no issue.
hawk_ wrote 14 hours 21 min ago:
Yep the JVM structured concurrency implementation is amazing.
One thing I got wondering especially when reading this post on
HN though is if stackless coroutines could (have) fit the JVM
in some way to get even better performance for those who may
care.
pron wrote 12 hours 31 min ago:
They wouldn't have had better performance, though. There is
no significant performance penalty we're paying, although
there's a nuance here that may be worth pointing out.
There are two different usecases for coroutines that may
tempt implementors to address with a single implementation,
but the usecases are sufficiently different to separate into
two different implementations. One is the generator use case.
What makes it special is that there are exactly two
communicating parties, and both of their state may fit in the
CPU cache. The other use case is general concurrency,
primarily for IO. In that situation, a scheduler juggles a
large number of user-mode threads, and because of that, there
is likely a cache miss on every context switch, no matter how
efficient it is. However, in the second case, almost all of
the performance is due to Little's law rather than context
switch time (see my explanation here: [1] ).
That means that a "stackful" implementation of user-mode
threads can have no significant performance penalty for the
second use case (which, BTW, I think has much more value than
the first), even though a more performant implementation is
possible for the first use case. In Java we decided to tackle
the second use case with virtual threads, and so far we've
not offered something for the first (for which the demand is
significantly lower).
What happens in languages that choose to tackle both use
cases with the same construct is that they gain negligible
performance in the second use case (at best), but they're
paying for that negligible benefit with a substantial
degradation in user experience. That's just a bad tradeoff,
but some languages (especially low-level ones) may have
little choice, because their stackful solution does carry a
significant performance cost compared to Java because of
Java's very efficient heap memory management.
HTML [1]: https://inside.java/2020/08/07/loom-performance/
lukaslalinsky wrote 17 hours 47 min ago:
The OS allocates your thread stack in a very similar way that a
coroutine runtime allocates the coroutine stack. The OS will swap
the stack pointer and a bunch more things in each context switch,
the coroutine runtime will also swap the stack pointer and some
other things. It's really the same thing. The only difference is
that the runtime in a compiled language knows more about your
code than the OS does, so it can make assumptions that the OS
can't and that's what makes user-space coroutines lighter. The
mechanisms are the same.
zozbot234 wrote 17 hours 42 min ago:
And the stackless runtime will use some other register than the
stack pointer to access the coroutine's activation frame,
leaving the stack pointer register free for OS and library use,
and avoiding the many drawbacks of fiddling with the system
stack as stackful coroutines do. It's the same thing.
audunw wrote 19 hours 1 min ago:
The new Zig IO will essentially be colored, but in a nicer way than
Rust.
You don't have to color your function based on whether you're
supposed to use in in an async or sync manner. But it will
essentially be colored based on whether it does I/O or not (the
function takes IO interface as argument). Which is actually important
information to "color" a function with.
Whether you're doing async or sync I/O will be colored at the place
where you call an IO function. Which IMO is the correct way to do it.
If you call with "async" it's nonblocking, if you call without it,
it's blocking. Very explicit, but not in a way that forces you to
write a blocking and async version of all IO functions.
The Zio readme says it will be an implementation of Zig IO interface
when it's released.
I guess you can then choose if you want explicit async (use Zig
stdlib IO functions) or implicit async (Zio), and I suppose you can
mix them.
> Stackful coroutines make sense when you have the RAM for it.
So I've been thinking a bit about this. Why should stackful
coroutines require more RAM? Partly because when you set up the
coroutine you don't know how big the stack needs to be, right? So you
need to use a safe upper bound. While stackless will only set up the
memory you need to yield the coroutine. But Zig has a goal of having
a built-in to calculate the required stack size for calling a
function. Something it should be able to do (when you don't have
recursion and don't call external C code), since Zig compiles
everything in one compilation unit.
Zig devs are working on stackless coroutines as well. But I wonder if
some of the benefits goes away if you can allocate exactly the amount
of stack a stackful coroutine needs to run and nothing more.
zozbot234 wrote 18 hours 51 min ago:
> You don't have to color your function based on whether you're
supposed to use in in an async or sync manner. But it will
essentially be colored based on whether it does I/O or not (the
function takes IO interface as argument). Which is actually
important information to "color" a function with.'
The Rust folks are working on a general effect system, including
potentially an 'IO' effect. Being able to abstract out the
difference between 'sync' and 'async' code is a key motivation of
this.
lukaslalinsky wrote 18 hours 55 min ago:
This is not true. Imagine code like this:
const n = try reader.interface.readVec(&data);
Can you guess if it's going to do blocking or non-blocking I/O
read?
The io parameter is not really "coloring", as defined by the
async/await debate, because you can have code that is completely
unaware of any async I/O, pass it std.Io.Reader and it will just
work, blocking or non-blocking, it makes no difference. Heck, you
even even wrap this into C callbacks and use something like hiredis
with async I/O.
Stackful coroutines need more memory, because you need to
pre-allocate large enough stack for the entire lifetime. With
stackless coroutines, you only need the current state, but with the
disadvantage that you need frequent allocations.
NobodyNada wrote 16 hours 32 min ago:
> Stackful coroutines need more memory, because you need to
pre-allocate large enough stack for the entire lifetime. With
stackless coroutines, you only need the current state, but with
the disadvantage that you need frequent allocations.
This is not quite correct -- a stackful coroutine can start with
a small stack and grow it dynamically, whereas stackless
coroutines allocate the entire state machine up front.
The reason why stackful coroutines typically use more memory is
that the task's stack must be large enough to hold both
persistent state (like local variables that are needed across
await points) and ephemeral state (like local variables that
don't live across await points, and stack frames of leaf
functions that never suspend). With a stackless implementation,
the per-task storage only holds persistent state, and the OS
thread's stack is available as scratch space for the current
task's ephemeral state.
vrnvu wrote 22 hours 44 min ago:
> when you can't tell what's blocking and what isn't.
Isn't that exactly why they're making IO explicit in functions? So
you can trace it up the call chain.
pjmlp wrote 1 day ago:
Zio already exists,
HTML [1]: https://zio.dev/
noselasd wrote 1 day ago:
Mostly out of curiosity, a read on a TCP connection could easily block
for a month - how does the I/O timeout interface look like ? e.g. if
you want to send an application level heartbeat when a read has blocked
for 30 seconds.
dgb23 wrote 18 hours 15 min ago:
You can set read and write timeouts on TCP sockets: [1] Zig has a
posix API layer.
HTML [1]: https://linux.die.net/man/3/setsockopt
secondcoming wrote 1 day ago:
This is very true. Most examples of async io I've seen - regardless
of the framework - gloss over timeouts and cancellation. It's really
the hardest part. Reading and writing asynchronously from a socket,
or whatever, is the straightforward part.
lukaslalinsky wrote 1 day ago:
I don't have a good answer for that yet, mostly because TCP reads are
expected to be done through std.Io.Reader which isn't aware of
timeouts.
What I envision is something like `asyncio.timeout` in Python, where
you start a timeout and let the code run as usual. If it's in I/O
sleep when the timeout fires, it will get woken up and the operation
gets canceled.
I see something like this:
var timeout: zio.Timeout = .init;
defer timeout.cancel(rt);
timeout.set(rt, 10);
const n = try reader.interface.readVec(&data);
sgt wrote 1 day ago:
Are you working using Zig master with the new Io interface passed
around, by the way?
lukaslalinsky wrote 23 hours 56 min ago:
No, I'm targeting Zig 0.15. The new Io interface is not in master
yet, it's still evolving. When it's merged to master and stable,
I'll start implementing the vtable. But I'm just passing Runtime
around, instead of Io. So you can easily migrate code from zio to
std when it's released.
sriku wrote 1 day ago:
The article says it was created to write audio software but I'm unable
to find any first sources for that. Pointers?
lukaslalinsky wrote 1 day ago:
See the first example in Andrew's introduction:
HTML [1]: https://andrewkelley.me/post/intro-to-zig.html
aidenn0 wrote 1 day ago:
I am still mystified as to why callback-based async seems to have
become the standard. What this and e.g. libtask[1] do seems so much
cleaner to me.
The Rust folks adopted async with callbacks, and they were essentially
starting from scratch so had no need to do it that way, and they are
smarter than I (both individually and collectively) so I'm sure they
have a reason; I just don't know what it is.
1:
HTML [1]: https://swtch.com/libtask/
NobodyNada wrote 16 hours 49 min ago:
> The Rust folks adopted async with callbacks
Rust's async is not based on callbacks, it's based on polling. So
really there are three ways to implement async:
- The callback approach used by e.g. Node.js and Swift, where a
function that may suspend accepts a callback as an argument, and
invokes the callback once it is ready to make progress. The compiler
transforms async/await code into continuation-passing style.
- The stackful approach used by e.g. Go, libtask, and this; where a
runtime switches between green threads when a task is ready to make
progress. Simple and easy to implement, but introduces complexity
around stack size.
- Rust's polling approach: an async task is statically transformed
into a state machine object that is polled by a runtime when it's
ready to make progress.
Each approach has its advantages and disadvantages.
Continuation-passing style doesn't require a runtime to manage tasks,
but each call site must capture local variables into a closure, which
tends to require a lot of heap allocation and copying (you could also
use Rust's generic closures, but that would massively bloat code size
and compile times because every suspending function must be
specialized for each call site). So it's not really acceptable for
applications looking for maximum performance and control over
allocations.
Stackful coroutines require managing stacks. Allocating large stacks
is very expensive in terms of performance and memory usage; it won't
scale to thousands or millions of tasks and largely negates the
benefits of green threading. Allocating small stacks means you need
the ability to dynamically resize stacks at runtime, which requires
dynamic allocation and adds significant performance and complexity
overhead if you want to make an FFI call from an asynchronous task
(in Go, every function begins with a prologue to check if there is
enough stack space and allocate more if needed; since foreign
functions do not have this prologue, an FFI call requires switching
to a sufficiently large stack). This project uses fixed-sized task
stacks, customizable per-task but defaulting to 256K [1]. This
default is several orders of mangitude larger than a typical task
size in other green-threading runtimes, so to achieve large scale the
programmer must manually manage the stack size on a per-task basis,
and face stack overflows if they guess wrong (potentially only in
rare/edge cases).
Rust's "stackless" polling-based approach means the compiler knows
statically exactly how much persistent storage a suspended task
needs, so the application or runtime can allocate this storage
up-front and never need to resize it; while a running task has a full
OS thread stack available as scratch space and for FFI. It doesn't
require dynamic memory allocation, but it imposes limits on things
like recursion. Rust initially had stackful coroutines, but this was
dropped in order to not require dynamic allocation and remove the FFI
overhead.
The async support in Zig's standard library, once it's complete, is
supposed to let the application developer choose between stackful and
stackless coroutines depending on the needs of the application.
[1]
HTML [1]: https://github.com/lalinsky/zio/blob/9e2153eed99a772225de9b2...
MisterTea wrote 17 hours 17 min ago:
See also: [1] The history of this concurrency model is here:
HTML [1]: https://man.9front.org/2/thread
HTML [2]: https://seh.dev/go-legacy/
torginus wrote 23 hours 38 min ago:
Stackless coroutines can be implemented using high level language
constructs, and entirely in your language. Because of this it
interacts with legacy code, and existing language features in
predictable ways. Some security software or code hardening and
instrumentation libraries will break as well.
Also, async at low level is literally always callbacks (even
processor interrupts are callbacks)
By mucking about with the stack, you break stuff like stack unwinding
for exceptions and GC, debuggers, and you probably make a bunch of
assumptions you shouldn't
If you start using the compiler backend in unexpected ways, you
either expose bugs or find missing functionality and find that the
compiler writers made some assumptions about the code (either
rightfully or not), that break when you start wildly overwriting
parts of the stack.
Writing a compiler frontend is hard enough as it is, and becoming an
LLVM expert is generally too much for most people.
But even if you manage to get it working, should you have your code
break in either the compiler or any number of widely used external
tooling, you literally can't fast track your fix, and thus you can't
release your language (since it depends on a broken external
dependency, fix pending whenever they feel like it).
I guess even if you are some sort of superhero who can do all this
correclty, the LLVM people won't be happy merging some low level
codegen change that has the potential to break all compiled software
of trillion dollar corporations for the benefit of some small
internet project.
secondcoming wrote 1 day ago:
> callback-based async seems to have become the standard
At some level it's always callbacks. Then people build frameworks on
top of these so programmers can pretend they're not dealing with
callbacks.
boomlinde wrote 1 day ago:
One thing I would consider "unclean" about the zio approach (and e.g.
libtask) is that you pass it an arbitrary expected stack size (or, as
in the example, assume the default) and practically just kind of hope
it's big enough not to blow up and small enough to be able to spawn
as many tasks as you need. Meanwhile, how much stack actually ends up
being needed by the function is a platform specific implementation
detail and hard to know.
This is a gotcha of using stack allocation in general, but
exacerbated in this case by the fact that you have an incentive to
keep the stacks as small as possible when you want many concurrent
tasks. So you either end up solving the puzzle of how big exactly the
stack needs to be, you undershoot and overflow with possibly
disastrous effects (especially if your stack happens to overflow into
memory that doesn't cause an access violation) or you overshoot and
waste memory. Better yet, you may have calculated and optimized your
stack size for your platform and then the code ends up doing UB on a
different platform with fewer registers, bigger `c_long`s or
different alignment constraints.
If something like [1] actually gets implemented I will be happier
about this approach.
HTML [1]: https://github.com/ziglang/zig/issues/157
aidenn0 wrote 7 hours 53 min ago:
Maybe I've been on x64 Linux too long, but I would just specify 8MB
of stack for each fiber and let overcommit handle the rest. For
small fibers that would be 4k per fiber of RSS so a million fibers
is 4GB of RAM which seems fine to me?
Hendrikto wrote 22 hours 27 min ago:
Couldnât you use the Go approach of starting with a tiny stack
that is big enough for 90% of cases, then grow it as needed?
boomlinde wrote 15 hours 31 min ago:
Consider that resizing the stack may require reallocating it
elsewhere in memory. This would invalidate any internal pointers
to the stack.
AFAIK Go solves this by keeping track of these pointer locations
and adjusting them when reallocating the stack. Aside from the
run-time cost this incurs, this is unsuitable for Zig because it
can't stricly know whether values represent pointers.
Go technically also has this problem as well, if you for example
convert a pointer to a uintptr, but maintains no guarantee that a
former pointer will still be valid when converted back. Such
conversions are also rarely warranted and are made explicit using
the `unsafe` package.
Zig is more like C in that it gives the programmer rather than a
memory management runtime exclusive control and free rein over
the memory. If there are some bits in memory that happen to have
the same size as a pointer, Zig sees no reason to stop you from
interpreting them as such. This is very powerful, but precludes
abstractions like Go's run-time stack reallocation.
loeg wrote 16 hours 50 min ago:
8kB is enough for 90% of use cases. But then you invoke
getaddrinfo() once and now your stack is 128kB+.
lukaslalinsky wrote 21 hours 50 min ago:
Go depends on the fact that it can track all pointers, and when
it needs to resize stacks, it can update them.
Previous versions of Go used segmented stacks, which are
theoretically possible, if Zig really wanted (would need compiler
support), but they have nasty performance side-effects, see
HTML [1]: https://www.youtube.com/watch?v=-K11rY57K7k
loeg wrote 16 hours 49 min ago:
Resizing stacks on use does not depend on any of these
properties of Go. You can do it like this in C, too. It does
not require segmentation.
boomlinde wrote 15 hours 24 min ago:
Resizing stacks insofar that expansion may require moving the
stack to some other place in memory that can support the new
size depends on these properties. Your initial 4k of
coroutine stack may have been allocated some place that wont
fit the new 8k of coroutine stack.
Or are you making a point about virtual memory? If so, that
assumption seems highly platform dependent.
loeg wrote 14 hours 2 min ago:
You would implement this with virtual memory. Obviously,
this is less of a limited resource on 64-bit systems. And
I wouldn't recommend the Go/stack/libtask style model for
high concurrency on any platform.
lukaslalinsky wrote 16 hours 9 min ago:
I'm very interested to know how. Do you mean reserving a huge
chunk of virtual memory and slowly allocating it? That works
to some degree, but limits how many coroutines can you really
spawn.
loeg wrote 14 hours 2 min ago:
Yes, exactly.
oaiey wrote 1 day ago:
I think it started with an interrupt. And less abstraction often
wins.
dgb23 wrote 18 hours 44 min ago:
This is the only explanation here I can intuitively understand!
loeg wrote 1 day ago:
The thread stack for something like libtask is ambiguously sized and
often really large relative to like, formalized async state.
vlovich123 wrote 1 day ago:
The research Microsoft engineers did on stackful vs stackless
coroutines for the c++ standard I think swayed this as âthe wayâ
to implement it for something targeting a systems level -
significantly less memory overhead (you only pay for what you use)
and offload the implementation details of the executor (lots of
different design choices that can be made).
aidenn0 wrote 7 hours 58 min ago:
> significantly less memory overhead
On an OS with overcommit, you might also only pay for what you use
(at a page granularity), but this may be defeated if the stack gets
cleared (or initialized to a canary value) by the runtime.
zozbot234 wrote 1 day ago:
Yup, stackful fibers are an anti-pattern. Here's Gor Nishanov's
review for the C++ ISO committee [1] linked from [2] . Notice how
it sums things up:
> DO NOT USE FIBERS!
HTML [1]: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p...
HTML [2]: https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=...
aidenn0 wrote 7 hours 39 min ago:
Many of these issues go away if you control the compiler and
runtime, which Rust does (and they needed to make changes to
those to add async, so changes were inevitable).
gpderetta wrote 23 hours 31 min ago:
And this is the rebuttal: [1] There are downsides to stackful
coroutines (peak stack usage for example), but I feed that p1364
was attacking a strawman: first of all it is comparing a solution
with builtin compiler support against a pure library
implementation, second it is not even comparing against the
reference implementation of the competing proposal.
HTML [1]: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2019...
aidenn0 wrote 7 hours 50 min ago:
The TL;DR of that sums up my opinions pretty well.
As an aside, I know Rust would be unlikely to implement
segmented stacks for fibers, given that they were burned by the
performance implications thereof previously.
torginus wrote 23 hours 48 min ago:
> DO NOT USE FIBERS!
For C++.
If your language has RAII or exceptions, it raises crazy
questions about how if thread A is hosting fiber 1, which throws
an exception, which propagates outside of the fiber invocation
scope, destroys a bunch of objects, then we switch to fiber 2,
which sees the world in an inconsistent state (outside resources
have been cleaned up, inside ones still alive).
This was literally impossible in pre-fiber code, so most existing
code would probably not handle it well.
gpderetta wrote 23 hours 29 min ago:
That's not different from threads running concurrent exceptions
(in fact it is simpler in the single threaded example). RAII or
exceptions are really not an issue for stackful coroutines.
sgt wrote 1 day ago:
Is stackful fibers the same as stackful coroutines?
gpderetta wrote 23 hours 29 min ago:
yes same thing, different names.
otobrglez wrote 1 day ago:
There is an extremely popular library/framework for Scala named ZIO out
there,⦠Naming is hard.
tombert wrote 1 day ago:
I really need to play with Zig. I got really into Rust a few months
ago, and I was actually extremely impressed by Tokio, so if this
library also gives me Go-style concurrency without having to rely on a
garbage collector, then I am likely to enjoy it.
lukaslalinsky wrote 1 day ago:
Go has tricks that you can't replicate elsewhere, things like
infinitely growable stacks, that's only possible thanks to the
garbage collector. But I did enjoy working on this, I'm continually
impressed with Zig for how nice high-level looking APIs are possible
in such a low-level language.
gpderetta wrote 23 hours 18 min ago:
You mean GO segmented stacks? You can literally them in C and C++
with GCC and glibc. It was implemented to support gccgo, but it
works for other languages as well.
It is an ABI change though, so you need to recompile the whole
stack (there might be the ability for segmented code to call non
segmented code, but I don't remember the extent of the support) and
it is probably half deprecated now. But it works and it doesn't
need GC.
pjmlp wrote 22 hours 54 min ago:
I think by now we can consider gccgo will enventually join gcj.
The Fortran, Modula-2 and ALGOL 68 frontends are getting much
more development work than gccgo, stuck in pre-generics Go,
version 1.18 from 2022 and no one is working on it other than
minor bug fixes.
lukaslalinsky wrote 23 hours 1 min ago:
No, Go abandoned segmented stacks a long time ago. It causes
unpredictable performance, because you can hit alloc/free cycle
somewhere deep in code. What they do now is that when they hit
stack guard, they allocate a new stack (2x size), copy the data,
update pointers. Shrinking happens during GC.
pjmlp wrote 1 day ago:
Also, it is about time to let go with GC-phobia. [1] [2] Note the
> This video illustrates the use case of Perc within the Aegis
Combat System, a digital command and control system capable of
identifying and tracking incoming threats and providing the war
fighter with a solution to address threats. Aegis, developed by
Lockheed Martin, is critical to the operation of the DDG-51, and
Lockheed Martin has selected Perc as the operating platform for
Aegis to address real-time requirements and response times.
Not all GCs are born alike.
HTML [1]: https://www.withsecure.com/en/solutions/innovative-securit...
HTML [2]: https://www.ptc.com/en/products/developer-tools/perc
jandrewrogers wrote 18 hours 36 min ago:
That GC introduces latencies of ~1000µs. The article is about
eliminating ~10µs context switching latencies. Completely
different performance class. The "GC-phobia" is warranted if you
care about software performance, throughput, and scalability.
DoD uses languages like Java in applications where raw throughput
and low-latency is not critical to success. A lot of what AEGIS
does is not particularly performance sensitive.
bccdee wrote 19 hours 5 min ago:
Real-time GCs can only guarantee a certain number of
deallocations per second. Even with a very well-designed GC,
there's no free lunch. A system which manages its memory
explicitly will not need to risk overloading its GC.
aidenn0 wrote 7 hours 35 min ago:
I think you have that backwards; they can only guarantee a
certain number of allocations per second (once the application
hits steady-state the two are the same, but there are times
when it matters)
RossBencina wrote 1 day ago:
> Not all GCs are born alike.
True. However in the bounded-time GC space few projects share the
same definitions of low-latency or real-time. So you have to find
a language that meets all of your other desiderata and provides a
GC that meets your timing requirements. Perc looks interesting,
Metronome made similar promises about sub-ms latency. But I'd
have to get over my JVM runtime phobia.
pjmlp wrote 1 day ago:
I consider one where human lifes depend on it, for good or
worse depending on the side, real time enough.
bccdee wrote 19 hours 11 min ago:
Human lives often depend on processes that can afford to be
quite slow. You can have a real time system requiring only
sub-hour latency; the "realness" of a real-time deadline is
quite distinct from the duration of that deadline.
kunley wrote 1 day ago:
GC is fine, what scaries me is using j*va in Aegis..
Ygg2 wrote 1 day ago:
The OutOfMemoryError will happen after rocket hits the target.
aidenn0 wrote 1 day ago:
Pre-1.0 Rust used to have infinitely growing stacks, but they
abandoned it due to (among other things) performance reasons (IIRC
the stacks were not collected with Rust's GC[1], but rather on
return; the deepest function calls may happen in tight loops, and
if you are allocating and freeing the stack in a tight loop, oops!)
1: Yes, pre-1.0 Rust had a garbage collector.
RustSupremacist wrote 21 hours 19 min ago:
Rust still has garbage collection if you use Arc and Rc. Not a
garbage collector but this type of garbage collection.
aidenn0 wrote 7 hours 46 min ago:
I'm going to veer into no-true-scottsman territory for a bit
and claim that those don't count since they cannot collect
cycles (if I'm wrong and they implement e.g. trial-deletion,
let me know). This isn't just academic, since cyclic
data-structures are an important place where the borrow-checker
can't help you, so a GC would be useful.
echelon wrote 18 hours 45 min ago:
You mean Drop, which is entirely predictable and controlled by
the user?
mrasong wrote 1 day ago:
The first time I heard about Zig was actually on Bunâs website,
itâs been getting better and better lately.
mananaysiempre wrote 1 day ago:
> Context switching is virtually free, comparable to a function call.
If youâre counting that low, then you need to count carefully.
A coroutine switch, however well implemented, inevitably breaks the
branch predictorâs idea of your return stack, but the effect of
mispredicted returns will be smeared over the target coroutineâs
execution rather than concentrated at the point of the switch. (Similar
issues exist with e.g. measuring the effect of blowing the cache on a
CPU migration.) Iâm actually not sure if Zigâs async design even
uses hardware call/return pairs when a (monomorphized-as-)async
function calls another one, or if every return just gets translated to
an indirect jump. (This option affords what I think is a cleaner design
for coroutines with compact frames, but it is much less friendly to the
CPU.)
So a foolproof benchmark would require one to compare the total
execution time of a (compute-bound) program that constantly switches
between (say) two tasks to that of an equivalent program that not only
does not switch but (given what little I know about Zigâs
âcolorlessâ async) does not run under an async executor(?) at all.
Those tasks would also need to yield on a non-trivial call stack each
time. Seems quite tricky all in all.
gpderetta wrote 20 hours 50 min ago:
If you constantly switch between two tasks from the bottom of their
call stack (as for stackless coroutines) and your stack switching
code is inlined, then you can mostly avoid the mispaired call/ret
penalty.
Also, if you control the compiler, an option is to compile all
call/rets in and out of "io" code in terms of explicit jumps. A ret
implemented as pop+indirect jump will be less less predictable than a
paired ret, but has more chances to be predicted than an unpaired
one.
My hope is that, if stackful coroutines become more mainstreams, CPU
microarchitectures will start using a meta-predictor to chose between
the return stack predictor and the indirect predictor.
jadbox wrote 20 hours 54 min ago:
Semi-unrelated, but async is coming soon to Zig. I'm sorta holding
off getting deep into Zig until it lands.
HTML [1]: https://kristoff.it/blog/zig-new-async-io/
throwawaymaths wrote 19 hours 21 min ago:
the point of all this io stuff is that you'll be able to start
playing with zig before async comes and when async comes it will be
either drop in if you choose an async io for main() or it will be a
line or two of code if you pick an event loop manually.
lukaslalinsky wrote 1 day ago:
You are right that the statement was overblown, however when I was
testing with "trivial" load between yields (synchronized ping-pong
between coroutines), I was getting numbers that I had trouble
believing, when comparing them to other solutions.
gpderetta wrote 21 hours 22 min ago:
In my test of a similar setup in C++ (IIRC about 10 years ago!), I
was able to do a context switch every other cycle. The bottleneck
was literally the cycles per taken jump of the microarchitecture I
was testing again. As in your case it was a trivial test with two
coroutines doing nothing except context switching, so the compiler
had no need to save any registers at all and I carefully defined
the ABI to be able to keep stack and instruction pointers in
registers even across switches.
messe wrote 1 day ago:
> Iâm actually not sure if Zigâs async design even uses hardware
call/return pairs
Zig no longer has async in the language (and hasn't for quite some
time). The OP implemented task switching in user-space.
loeg wrote 1 day ago:
Even so. You're talking about storing and loading at least ~16
8-byte registers, including the instruction pointer which is
essentially a jump. Even to L1 that takes some time; more than a
simple function call (jump + pushed return address).
ori_b wrote 16 hours 40 min ago:
Which, with store forwarding, can be shockingly cheap. You may
not actually be hitting L1, and if you are, you're probably not
hitting it synchronously. [1] and, section 15.10 of
HTML [1]: https://easyperf.net/blog/2018/03/09/Store-forwarding
HTML [2]: https://www.agner.org/optimize/microarchitecture.pdf
loeg wrote 14 hours 0 min ago:
Are you talking about context switching every handful of
cycles? This is going to be extremely inefficient even with
store forwarding.
ori_b wrote 11 min ago:
Sure, and so is calling a function every handful of cycles.
That's a big part of why compilers inline.
Either you're context switching often enough that store
forwarding helps, or you're not spending a lot of time
context switching. Either way, I would expect that you aren't
waiting on L1: you put the write into a queue and move on.
lukaslalinsky wrote 1 day ago:
Only stack and instruction pointer are explicitly restored. The
rest is handled by the compiler, instead of depending on the C
calling convention, it can avoid having things in registers
during yield.
See this for more details on how stackful coroutines can be made
much faster:
HTML [1]: https://photonlibos.github.io/blog/stackful-coroutine-ma...
messe wrote 1 day ago:
> The rest is handled by the compiler, instead of depending on
the C calling convention, it can avoid having things in
registers during yield.
Yep, the frame pointer as well if you're using it. This is
exactly how its implemented in user-space in Zig's WIP std.Io
branch green-threading implementation: [1] On ARM64, only fp,
sp and pc are explicitly restored; and on x86_64 only rbp, rsp,
and rip. For everything else, the compiler is just informed
that the registers will be clobbered by the call, so it can
optimize allocation to avoid having to save/restore them from
the stack when it can.
HTML [1]: https://github.com/ziglang/zig/blob/ce704963037fed60a3...
flimflamm wrote 1 day ago:
Is this just buttering the cost of switches by crippling the
optimization options compiler have?
GoblinSlayer wrote 1 day ago:
I wonder how you see it. Stackful coroutines switch context
on syscall in the top stack frame, the deeper frames are
regular optimized code, but syscall/sysret is already big
context switch. And read/epoll loop has exactly same
structure, the point of async programming isn't
optimization of computation, but optimization of memory
consumption. Performance is determined by features and
design (and Electron).
hawk_ wrote 1 day ago:
What do you mean by "buttering the cost of switches", can
you elaborate? (I am trying to learn about this topic)
masfuerte wrote 23 hours 53 min ago:
I think it is
> buttering the cost of switches [over the whole
execution time]
The switches get cheaper but the rest of the code gets
slower (because it has less flexibility in register
allocation) so the cost of the switches is "buttered"
(i.e. smeared) over the rest of the execution time.
But I don't think this argument holds water. The
surrounding code can use whatever registers it wants. In
the worst case it saves and restores all of them, which
is what a standard context switch does anyway. In other
words, this can be better and is never worse.
lukaslalinsky wrote 1 day ago:
If this was done the classical C way, you would always have
to stack-save a number of registers, even if they are not
really needed. The only difference here is that the
compiler will do the save for you, in whatever way fits the
context best. Sometimes it will stack-save, sometimes it
will decide to use a different option. It's always strictly
better than explicitly saving/restoring N registers unaware
of the context. Keep in mind, that in Zig, the compiler
always knows the entire code base. It does not work on
object/function boundaries. That leads to better
optimizations.
hawk_ wrote 1 day ago:
This is amazing to me that you can do this in Zig code
directly as opposed to messing with the compiler.
lukaslalinsky wrote 1 day ago:
See [1] for GNU C++ example. It's a tiny bit more
limited, because of how the compilation works, but the
concept is the same.
HTML [1]: https://github.com/alibaba/PhotonLibOS/blob/2f...
messe wrote 1 day ago:
To be fair, this can be done in GNU C as well. Like the
Zig implementation, you'd still have to use inline
assembly.
hawk_ wrote 1 day ago:
> If this was done the classical C way, you would
always have to stack-save a number of registers
I see, so you're saying that GCC can be coaxed into
gathering only the relevant registers to stack and
unstack not blindly do all of them?
messe wrote 21 hours 37 min ago:
Yes, you write inline assembly that saves the frame
pointer, stack pointer, and instruction pointer to
the stack, and list every other register as a
clobber. GCC will know which ones its using at the
call-site (assuming the function gets inlined; this
is more likely in Zig due to its single unit of
compilation model), and save those to the stack. If
it doesn't get inlined, it'll be treated as any
other C function and only save the ones needed to
be preserved by the target ABI.
quantummagic wrote 1 day ago:
Isn't this a bad time to be embracing Zig? It's currently going
through an intrusive upheaval of its I/O model. My impression is that
it was going to take a few years for things to shake out. Is that
wrong?
dualogy wrote 1 day ago:
> My impression is that it was going to take a few years for things
to shake out. Is that wrong?
I had that very impression in early 2020 after some months of Zigging
(and being burned by constant breaking changes), and left, deciding
"I'll check it out again in a few years."
I had some intuition it might be one of these forever-refactoring
eternal-tinker-and-rewrite fests and here I am 5 years later, still
lurking for that 1.0 from the sidelines, while staying in Go or C
depending on the nature of the thing at hand.
That's not to say it'll never get there, it's a vibrant project
prioritizing making the best design decisions rather than mere
Shipping Asap. For a C-replacement that's the right spirit, in
principle. But whether there's inbuilt immunity to engineers falling
prey to their forever-refine-and-resculpt I can't tell. I find it a
great project to wait for leisurely (=
lukaslalinsky wrote 1 day ago:
It really depends on what you are doing, but if it's something
related to I/O and you embrace the buffered reader/writer interfaces
introduced in Zig 0.15, I think not much is going to change. You
might need changes on how you get those interfaces, but the core of
your code is unchanged.
laserbeam wrote 1 day ago:
Kind of is a bad idea. Even the authorâs library is not using the
latest zig IO features and is planning for big changes with 0.16.
From the readme of the repo:
> Additionally, when Zig 0.16 is released with the std.Io interface,
I will implement that as well, allowing you to use the entire
standard library with this runtime.
Unrelated to this library, I plan to do lots of IO with Zig and will
wait for 0.16. Your intuition may decide otherwise and thatâs ok.
grayhatter wrote 1 day ago:
IMO, it's very wrong. Zig's language is not drastically changing,
it's adding a new, *very* powerful API, which similar to how most
everything in zig passes an allocator as a function param, soon
functions that want to do IO, will accept an object that will provide
the desired abstraction, so that callers can define the ideal
implementation.
In other words, the only reason to not use zig if you detest
upgrading or improving your code. Code you write today will still
work tomorrow. Code you write tomorrow, will likely have a new Io
interface, because you want to use that standard abstraction. But, if
you don't want to use it, all your existing code will still work.
Just like today, if you want to alloc, but don't want to pass an
`Allocator` you can call std.heap.page_allocator.alloc from anywhere.
But because that abstraction is so useful, and zig supports it so
ergonomically, everyone writes code that provides that improved API
side note; I was worried about upgrading all my code to interface
with the new Reader/Writer API that's already mostly stable in
0.15.2, but even though I had to add a few lines in many existing
projects to upgrade. I find myself optionally choosing to refactor a
lot of functions because the new API results is code that is SO much
better. Both in readability, but also performance. Do I have to
refactor? No, the old API works flawlessly, but the new API is simply
more ergonomic, more performant and easier to read and reason about.
I'm doing it because I want to, not because I have to.
Everyone knows' a red diff is the best diff, and the new std.Io API
exposes an easier way to do things. Still, like everything in zig, it
allows you to write the code that you want to write. But if you want
to do it yourself, that's fully supported too!
kunley wrote 1 day ago:
Zealotry in almost every paragraph.
brabel wrote 1 day ago:
> Code you write today will still work tomorrow.
Haha no! Zig makes breaking changes in the stdlib in every release.
I can guarantee you wonât be able to update a non trivial project
between any of the latest 10 versions and beyond without changing
your code , often substantially, and the next release is changing
pretty much all code doing any kind of IO. I know because I keep
track of that in a project and can see diffs between each of the
latest versions. This allows me to modify other code much more
easily.
But TBH, in 0.15 only zig build broke IIRC. However, I just
didnât happen to use some of the things that changed, I believe.
do_not_redeem wrote 1 day ago:
This isn't quite accurate. If you look at the new IO branch[1]
you'll see (for example) most of the std.fs functions are gone, and
most of what's left is deprecated. The plan is for all file/network
access, mutexes, etc to be accessible only through the Io
interface. It'll be a big migration once 0.16 drops.
[1] > Do I have to refactor? No, the old API works flawlessly
The old API was deleted though? If you're saying it's possible to
copy/paste the old stdlib into your project and maintain the old
abstractions forward through the ongoing language changes, sure
that's possible, but I don't think many people will want to fork
std. I copy/pasted some stuff temporarily to make the 0.15
migration easier, but maintaining it forever would be swimming
upstream for no reason.
HTML [1]: https://github.com/ziglang/zig/blob/init-std.Io/lib/std/fs...
grayhatter wrote 1 day ago:
> most of the std.fs functions are gone, and most of what's left
is deprecated.
uhhh.... huh? you and I must be using very different definitions
for the word most.
> The old API was deleted though?
To be completely fair, you're correct, the old deprecated writer
that was available in 0.15 has been removed [1] contrasted with
the master branch which doesn't provide this anymore.
edit: lmao, your profile about text is hilarious, I appreciate
the laugh!
HTML [1]: https://ziglang.org/documentation/0.15.2/std/#std.Io.Dep...
do_not_redeem wrote 1 day ago:
Even the basic stuff like `openFile` is deprecated. I don't
know what else to tell you. Zig won't maintain two slightly
different versions of the fs functions in parallel. Once
something is deprecated, that means it's going away.
HTML [1]: https://github.com/ziglang/zig/blob/init-std.Io/lib/st...
grayhatter wrote 1 day ago:
Oh, I guess that's a fair point. I didn't consider the change
from `std.fs.openFile` to `std.Io.Dir.openFile` to be
meaningful, but I guess that is problematic for some reason?
You're of course correct here; but I thought it was
reasonable to omit changes that I would describe as namespace
changes. Now considering the audience I regret doing so. (it
now does require nhe Io object as well, so namespace is
inarticulate here)
bccdee wrote 19 hours 28 min ago:
> I didn't consider the change from `std.fs.openFile` to
`std.Io.Dir.openFile` to be meaningful, but I guess that is
problematic for some reason?
Because you explicitly said that existing code would
continue to work without `std.Io`.
> Code you write tomorrow, will likely have a new Io
interface, because you want to use that standard
abstraction. But, if you don't want to use it, all your
existing code will still work.
I like Zig, but it does not have a stable API. That's just
how it is.
Ar-Curunir wrote 1 day ago:
That is literally a breaking change, so your old code will
by definition not work flawlessly. Maybe the migration
overhead is low, but itâs not zero like your comment
implies
geysersam wrote 1 day ago:
What's a few years? They go by in the blink of an eye. Zig is a
perfectly usable language. People who want to use it will, those who
don't won't.
attila-lendvai wrote 1 day ago:
following upstream is overrated since we have good package managers
and version control.
it's completely feasible to stick to something that works for you,
and only update/port/rewrite when it makes sense.
what matters is the overall cost.
kunley wrote 1 day ago:
Hmm, if one writes a library Zetalib for the language Frob v0.14
and then Frob v0.15 introduces breaking changes that everyone
else is going to adapt to, then well, package managers and
version control is going to help indeed - they will help in
staying in a void as no one will use Zetalib anymore because of
the older Frob.
all2 wrote 18 hours 19 min ago:
For libs, yes, for applications dev, no.
I would expect fixing an application to an older version would
be just fine, so long as you don't need newer language
features. If newer language features are a requirement, I would
expect that would drive refactoring or selecting a different
implementation language entirely if refactoring would prove to
be too onerous.
tonyhart7 wrote 1 day ago:
only for hobby project
scuff3d wrote 1 day ago:
TigerBeetle, Bun, and Ghostty all beg to differ...
nesarkvechnep wrote 1 day ago:
You or in general? Because, you know, this is like, your opinion,
man.
tonyhart7 wrote 1 day ago:
My Opinion???
how about you goes to Zig github and check how progress of the
language
it literally there and its still beta test and not fit for
production let alone have mature ecosystem
dns_snek wrote 1 day ago:
Yes, your opinion. I run it in production and everything I've
built with it has been rock solid (aside from my own bugs). I
haven't touched a few of my projects in a few years and they
work fine, but if I wanted to update them to the latest
version of Zig I'd have a bit of work ahead of me. That's it.
dxxvi wrote 1 day ago:
Do you know that there's a concurrent Scala library named ZIO ( [1] )?
:-)
HTML [1]: https://zio.dev
breatheoften wrote 1 day ago:
What makes a NATS client implementation the right prototype from which
to extract a generic async framework layer?
This looks interesting but I'm not familiar with NATS
lukaslalinsky wrote 1 day ago:
The layer was not extracted from the NATS client, the NATS client was
just a source of frustration that prompted this creation.
maxbond wrote 1 day ago:
If you succeed in creating a generic async primitive, it doesn't
really matter what the original task was (as long as it's something
that requires async), no? That's an implication of it being generic?
supportengineer wrote 1 day ago:
Move Zig, for great justice.
echelon wrote 1 day ago:
One of the very first internet memes. The zig team should adopt it as
the slogan.
HTML [1]: https://en.wikipedia.org/wiki/All_your_base_are_belong_to_us
dgb23 wrote 18 hours 9 min ago:
It has in a way! See:
HTML [1]: https://github.com/ziglang/zig/blob/master/lib/init/src/ma...
DIR <- back to front page