--Topography-------------------------------------------------------------------- AWK Workshop / Discussion - April 11 - 23, 2026 * what: an informal exploration of plain AWK, aka "new AWK" * where: SDF.org - both pcom ("awk" room) and irc.sdf.org ("#awk") * when: Saturdays, 10-11am MDT ; Tuesdays & Thursdays, 6-7pm MDT -------------------------------------------------------------------------------- [b******s] Day 6: the getline function and misc. AWK code snippets [b******s] somewhat interesting read: https://awklab.com/unix-pipes-under-load [b******s] haven't played around much w/ GNU parallel [j******e] Skimming the linked article. Yeah, the whole streaming aspect of pipes is really nice. It's something I wish more software would adopt. [j******e] Too many libaries seem to be built around the idea of slurping up a whole file and then only working with selected parts of it. [b******s] hey j******h [b******s] I ran that first indentity on two different 32 systems and got the same result [j******h] hey b******s [b******s] it was (IIRC) -0.00000000000000011102 [j******e] Floating-point numbers are fun! [b******s] hey a******r [j******h] More where it came from: https://www.tweedledum.com/rwg/idents.htm [b******s] looks like it's time [a******r] greetings all [b******s] ... the getline function ... [b******s] worth reading: http://awk.freeshell.org/AllAboutGetline (comp.lang.awk FAQ) [b******s] getline provides an alternative to the implicit pattern-action input loop [b******s] => getline syntax doesn't match the other AWK functions / statements [b******s] in BEGIN / END getline allows reading sources not tied to FILENAME [b******s] in main body getline can read from FILENAME, and/or other sources [b******s] getline returns 1 if record is present, 0 if EOF, -1 on read error [b******s] so while it looks like a "statement" you can test its return value [b******s] note: if used as part of an expression, getline is evaluated first [b******s] i.e. while (getline < "foo" > 0) == while((getline < "foo") > 0) [b******s] probably you've all used this a bit [a******r] parenthesis still a good call [b******s] I agree [j******e] So what happens to the text that getline *reads* ? Is it just output to stdout and pipeable? [b******s] it depends on how it's used [b******s] the default form, without a var, will populate the usual: $0, NF, etc. [j******h] j******e: the text that getline reads becomes $0, as I understand it. [b******s] yeah, it shouldn't ever dump it to stdout I don't think [j******e] That doesn't make for confusing syntax at all... [j******e] So in the above example, getline is reading from a file named "foo"? [j******h] Well, $0 and NF are always being updated, even when reading a file, so getline just inserts itself into that process. [b******s] how getline is used decides what variable(s) are set: ------------------------------------------------------------------------ # Table A-10 Getline Function # ======================================= # expression variables set # --------------------------------------- # getline $0, NF, NR, FNR (default call) # getline var var, NR, FNR # getline best to use 'getline var' ; changing FS &/or using split() can help [b******s] mainly it's using getline in a user-defined function because of the $0 reassignment that is a potential hazard [j******e] In the above example, would the <"/dev/stdin" be requred? I'd assume that to just be the default behaviour. [b******s] TBH I'm not sure that would always work but I haven't tested that [b******s] you can use "-" ; I generally spell it out [a******r] so you could use it to make run time, record by record choices if you wanted [b******s] not sure I understand what you're asking [b******s] oh, maybe choosing to use different fields on a per record basis? [a******r] display a record ($0) then ask if you wanted to hand edit it [b******s] ah okay, yeah that could work [a******r] yes [j******h] Regarding the "no close() needed", I still take issue with awk's insistence on passing the entire command to close(). Perl has a much more readable syntax, where you open a piped process into a filehandle, and you call close() on that filehandle, not the original command. [j******e] j******h: Yeah, that is weird... [b******s] maybe it's related to the "backpressure" mentioned in that article I linked to earlier? [b******s] like, that pipes act like a pipe [j******e] Especially if you open two instances of the same command line... then which one are you closing? [b******s] right, good reason to assign the commands to separate variables [b******s] anyway, here's part of the filter I've been using for these logs: ------------------------------------------------------------------------ # ex. collect workshop participants from pcom log before processing # (assumes each participant posted at least one comment; # since this is only for obfuscation it works fine) # # BEGIN { # ... # RawLog = "pcom-raw.log" # FS = "[[\\]]" # while (getline < RawLog > 0) # if ($2 !~ /[[:space:]]/ && $0 ~ /^[[][[:alpha:]]/ && !($2 in Usrs)) # Usrs[$2] # <= array containing unique UIDs # close (RawLog) # FS = " " # } # # where # - calling getline in while() loop reads RawLog to EOF => 0 returned # - BEGIN{} + close(EawLog) => state of FILENAME, $0, etc. no problem # - if we did NOT close(RawLog) it would still be at EOF in main body # - need to reset FS back to default (" ") to avoid later weirdness.. # ------------------------------------------------------------------------ [b******s] here's another example, testing readability of files: ------------------------------------------------------------------------ # # fstate.awk - attempt to discern readability of file(s) # BEGIN { # Fmt = "%20s => %s\n" # for (i=1 ; i < ARGC ; i++) { # # note: the test order matters! # if (getline < ARGV[i] > 0) { # close (ARGV[i]) # printf Fmt, ARGV[i], "readable w/ content" # } else if (getline < ARGV[i] < 0) { # printf Fmt, ARGV[i], "unreadable" # } else if (getline < ARGV[i] == 0) { # close (ARGV[i]) # printf Fmt, ARGV[i], "readable & empty" # } # } # # $ awk -f fstate.awk test.regular test.empty test.unreadable # test.regular => readable w/ content # test.empty => readable & empty # test.unreadable => unreadable # # where # - above should run without producing any errors # - limitation: can't test writability => use test(1) # - no close() after 2nd test => file never opened # ------------------------------------------------------------------------ [b******s] as mentioned, AWK can't directly test for writeability of a file [b******s] should I pause? [a******r] but checking readability of potential input files is cool [j******e] I'm okay. I find I do better reviewing after the fact where there isn't a ticking clock. [b******s] yeah sorry that the dumps are kind of hard to take in on-the-fly [b******s] maybe using pastebin or similar would work better [j******e] Yeah, as long as I have a general idea of what they're doing. I'm okay for now. [b******s] okay, onwards.. [b******s] although AWK lacks an eval() function one can be constructed: ------------------------------------------------------------------------ # ex. creating a quasi-eval() function: # # $ cat eval.awk # function eval(code, arr, _cmd, _i) { # _cmd = "awk 'BEGIN{" code "}'" # while (_cmd | getline arr[++_i] > 0) ; # close (_cmd) # return _i # } # BEGIN { # printf "enter awk oneliner: " # getline Str < "/dev/tty" # N = eval(Str, Results) # for(i=1 ; i getline within BEGIN block allows conventional procedural style: ------------------------------------------------------------------------ # ex. menu-based command selector: # # #! /usr/bin/mawk -Wp,i,e # # run_cmds.awk # BEGIN { # # ANSI escapes: # Clr = "\033[H\033[2J" ; Grn = "\033[1;32m" ; Nrm = "\033[0m" # N = split ("date time uptime", Tags) # M = split ("date '+%x':date '+%X':uptime", Cmds,":") # while (1) { # print Clr, "\n Run commands:\n" # for(i=1; i<=N; i++) # printf "%4d)%4s%s\n", i, "", Grn Tags[i] Nrm # printf "\n enter # ; 'q' quits: " # getline Aws < "/dev/tty" # if (Aws ~ /^[qQ]$/) break # if (Aws in Cmds) { # Cmds[Aws] |getline Str # close (Cmds[Aws]) # print "\n =>", Grn Str Nrm # } else # print "\n =>", Grn "invalid choice.." Nrm # system ("sleep 2") # } print "" # } # # note: test terminal capabilities before using ANSI escapes # ------------------------------------------------------------------------ [b******s] have included some ANSI escapes to illustrate they're an option [b******s] that's all the getline stuff; we can pause here to discuss [a******r] pretty cool too, I wouldn't have thought of that [b******s] I did add misc. code snippets too just so it's not a half day [b******s] I sort of like making silly little interactive terminal things [a******r] yes, i like the one aboout getchar [b******s] alright, on to the snippets [b******s] ... misc. AWK code snippets ... [b******s] insertion sort (The AWK Programming Language, 1st ed., p154): ------------------------------------------------------------------------ # # A = called array[keys] (unsorted) ; n = size of A # function isort(A, n, i, j, tmp) { # for (i = 2; i <= n; i++) # for (j = i; j > 1 && A[j-1] > A[j]; j--) { # tmp = A[j-1] ; A[j-1] = A[j] ; A[j] = tmp # } # } # # where # - above sorts A assending and ignores duplicate lines # - for descending simply print sorted array in reverse # - to skip duplicates add test, ie. 'if (A[i] != A[i-1]) ...' # ------------------------------------------------------------------------ [b******s] type 'r' or 'R 123' to display [b******s] you could also use that 'sort |unique' AWK replacement in that linked article if you wanted to remove the duplicates from the sorted array [b******s] the AWK book also has some other sorts; the heap sort apparently does much better as the amount of data gets larger [j******e] I don't imagine that first order functions are a thing in awk? Can you pass a function into a function? [b******s] yes you can do that, call functions within functions, call themselves, etc. [j******e] Yeah, but can you pass a function as a parameter to another function. [b******s] this next one sort of came out of a comp.lang.awk thread [j******e] For instance with the sorting function, can you pass a comparitor in? [b******s] hmm, can you give an example? [b******s] like, testing for duplicates within the sort? [j******e] function foo(func) { func() } <-- would be a minimalistic (and not very useful) example [b******s] oh I see, you want to pass a function as an arg to another function [b******s] I don't *think* you can do that.. [j******e] Yeah. [b******s] although... you *may* be able from the main body if the function has a return value [j******h] Speaking of sorting algorithms, did any of you notice midfavila's bboard thread on a longstanding bug in GNU coreutils tsort? [a******r] haven't read it yet [b******s] @j******h I didn't see it [j******h] From some of the cluster nodes I'm not able to run bboard anyway, I keep hitting a baud error. [b******s] a POSIX getchar for AWK - similar to bash 'read -n1': ------------------------------------------------------------------------ # # returns _char up to length 2: # function getchar( _cmd, _char) { # system ("stty -icanon") # put TTY in "raw" mode # _cmd = "dd bs=6 count=1 2>/dev/null" # _cmd | getline _char # close (_cmd) # system ("stty icanon") # put TTY in "normal" mode # return _char # } # # where # - "stty -icanon" => TTY in "raw" mode # - "dd bs=6 count=1" => reads 6 bytes from stdin ; # using bs=6 seems to work well for most unicode # - "stty icanon" => TTY in "normal" mode # ------------------------------------------------------------------------ [b******s] can combine above with ANSI escapes for nicer interactive apps [b******s] you can also combine the stty calls w/ the dd(1) bit if you want [b******s] one last snippet.. [b******s] POSIXly obtaining time in secs since Unix epoch: ------------------------------------------------------------------------ # # returns seconds since epoch: # function t_epoch() { # srand() ; return srand() # } # # where # - srand() obtains seed from system time by default # - 2nd call to srand() returns seed # ------------------------------------------------------------------------ [b******s] boom - End of Workshop => you are all AWK experts! [b******s] I hope y******i can resurrect the Awka project [j******e] Why the first srand() in that fuction? Wouldn't the return srand() be sufficient? [a******r] got to get a t-shirt made [b******s] @j******e I thought that as well but it doesn't work [b******s] that was from some AWK-related YouTube video I came across a while ago [j******e] I wonder what the reason is. [b******s] just guessing but I think the 1st call sets the seed then subsequent calls, if no new expression is passed, returns the current seed [j******e] That would be my guess as well. [j******e] I wonder if it's a bug. [b******s] heh, IDK [j******e] Are we certain the srand() function doesn't return the *previous* seed? [b******s] perhaps the POSIX spec has some more info [j******h] Here's the bug report midfavila linked to, regarding coreutils tsort: https://bugs-devel.debian.org/cgi-bin/bugreport.cgi?bug=990854 ------------------------------------------------------------------------ srand([expr]) Set the seed value for rand to expr or use the time of day if expr is omitted. The previous seed value shall be returned. ------------------------------------------------------------------------ [j******e] Ah, it *is* the previous seed value. [b******s] plant the seed, then reap the bounty! [j******e] So, you could do: function gettime(_prev, _result) {_prev = srand(); _result = srand(); srand(_prev); return _result} to be non-destructive of the seed. [b******s] yeah, that works [actually it doesn't] [b******s] thanks for being part of this; been fun [j******e] I mean it's a weird hack, but... [b******s] yeah, most AWKs do have time functions too so it's not really that useful [j******e] Thanks for putting it together. I definitely learned a lot. [a******r] thanks for all of your planning and effort! [a******r] I've really enjoyed it [b******s] hope to see lots of posts on the bboard>>AWK board [j******e] Is there not a POSIX way of getting the time that AWK could call as well? [b******s] is date(1) POSIX? could use that I guess; srand() works though [b******s] well, for the secs since epoch [b******s] you could write a conversion I suppose [b******s] alright, back to planning our road trip [b******s] take care ya'all [a******r] yes, we need to keep learning in the AWK board [a******r] thanks again!