* * * * * Of course it's slower, but I didn't expect it to be quite that bad Time for another useless µbenchmark! This time, the overhead of trapping integer overflow! So, inspired by this post about trapping integer overflow [1], I thought it might be interesting to see how bad the overhead is of using the x86 [2] instruction INTO [3] to catch integer overflow. To do this, I'm using DynASM [4] to generate code from an expression that uses INTO after every operation. There are other ways of doing this, but the simplist way is to use INTO. I'm also using 16-bit operations, as the numbers involved (between -32,768 and 32,767) are reasonable (for a human) to deal with (unlike the 32-bit range - 2,147,483,648 to 2147483647 or the insane 64-bit range of - 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807). The one surprising result was that Linux treats the INTO trap as a segfault! Even requesting additional information (passing the SA_SIGINFO flag with sigaction()) doesn't tell you anything. But that in itself tells you it's not a real segfault, as a real segfault will report a memory mapping error. Personally, I would have expected a floating point fault, even though it's not a floating point operation, because on Linux, integer division by 0 results in floating point fault (and oddly enough, a floating point division by 0 results in ∞ but no fault)! But, aside from that, some results. I basically run the expression one million times and simply record how long it takes. The first is just setting a variable to a fixed value (and the “- 0” bit is there just to ensure an overflow check is included): Table: x = 1 - 0 overflow time expression result ------------------------------ true 0.009080000 1 false 0.006820000 1 Okay, not terribly bad. But how about a longer expression? (and remember, the expresssion isn't optimized) Table: x = 1 + 1 + 1 + 1 + 1 + 1 * 100 / 13 overflow time expression result ------------------------------ true 0.079528000 46 false 0.030125000 46 Yikes! (But this is also including the function call overhead). For the curious, the last example compiled down to: > xor eax,eax > mov ax,1 > add ax,1 > into > add ax,1 > into > add ax,1 > into > add ax,1 > into > add ax,1 > into > imul 100 > into > mov bx,13 > cwd > idiv bx > into > mov [$0804f50E],ax > ret > The non-overflow version just had the INTO instructions missing—otherwise it was the same code. I think what's surprising the most here is that the INTO instruction just checks the overflow flag and only if set does it cause a trap. The timings I have (and I'll admit, the figures I have are old and for the 80486) show that INTO only has a three-cycle overhead if not taken. I'm guessing things are worse with the newer multipipelined multiscalar multiprocessor monstrosities we use these days. Next I'll have to try using the JO instruction [5] and see how well that fares. [1] http://blog.regehr.org/archives/1154 [2] https://en.wikipedia.org/wiki/X86 [3] http://x86.renejeschke.de/html/file_module_x86_id_142.html [4] gopher://gopher.conman.org/0Phlog:2015/09/05.1 [5] gopher://gopher.conman.org/0Phlog:2015/09/07.1 Email Sean Conner at sean@conman.org .