--Topography-------------------------------------------------------------------- AWK Workshop / Discussion - April 11 - 23, 2026 * what: an informal exploration of plain AWK, aka "new AWK" * where: SDF.org - both pcom ("awk" room) and irc.sdf.org ("#awk") * when: Saturdays, 10-11am MDT ; Tuesdays & Thursdays, 6-7pm MDT -------------------------------------------------------------------------------- [b******s] Day 5: arrays and user-defined functions [b******s] hey, looks like it's time [b******s] ... AWK Arrays ... [b******s] like variables, there is no declaring of arrays in AWK [b******s] arrays are 1-dimensional and associative; may be sparce [b******s] probably not really sparce since associative.. [b******s] due to AWK's single namespace, arrays are always global [b******s] ^ this will be expanded upon in the user-def. functions bit [b******s] array indices are treated as strings even if purely numeric [b******s] not neccessary to assign values, arrays can just be sets [b******s] as with variables, an uassigned array value is "" (null) [j******e] Seems pretty straight-forward. [b******s] okay, onwards.. [b******s] ENVIRON and ARGV are the two (2) standard built-in arrays [b******s] ex. for(v in ENVIRON) print ENVIRON[v] => prints env. vars [b******s] ex. for(i=1; i prints CLI args [b******s] these have been seen already but anyways.. [b******s] in AWK you can't have a variable and array with same name [b******s] arrays &/or array elements deletable via the 'delete' cmd ------------------------------------------------------------------------ # ex. delete Arr[42] => deletes Arr[42] but not Arr # delete Arr => deletes Arr # # note: deleted Array names NOT reusable as a Variables # ------------------------------------------------------------------------ [b******s] bulk assignment can be accomplished via the split() cmd ------------------------------------------------------------------------ # ex. split("AWK is different", arr) # splits on FS (default) # => arr[1] = "AWK", arr[2] = "is", ... # # split("hello, world", arr, "[, ]+") # splits on regex # => arr[1] = "hello", arr[2] = "world" # # => numerical indices via split() start at 1 # ------------------------------------------------------------------------ [j******e] Would setting an element to "" do the same as deleting? [b******s] ya pretty much; 'delete' is kind of not the right term [j******e] I guess it would leave the key intact. [b******s] right; you might experiment with that, i.e. numeric vs string keys [j******e] I also notice you keep using the term "command". Is there a difference between "command" and "function"? [b******s] I think the books list "commands/expressions" & "statements" [b******s] so like 'print' is a statement, 'toupper()' is a command [j******e] Is there any effective difference? [b******s] beats me; other than statements seem to lack the parens [a******r] maybe whether or not it returns some value [b******s] ya could be that [b******s] actually it is probably that [j******e] Consistency is overrated anyway... ;) [b******s] like you can't test the result of doing 'print "xyz"' [b******s] bulk assignment w/ non-numeric indices indirectly possible ------------------------------------------------------------------------ # ex. populate a[] w/ x y z as indices, 1 2 3 as values: # # N = split ("x y z", tmp) # for (i=1 ; i<=N ; i++) { a[tmp[i]] = i } # # => a[x] = 1 ,a[y] = 2, a[z] = 3 # ------------------------------------------------------------------------ [b******s] AWK can simulate multi-dimensional arrays via subscripts [b******s] => still key-value pairs, ie. Arr[1,2] == Arr[1 SUBSEP 2] ------------------------------------------------------------------------ # ex. arr[1, 2, 3] = "awk is different" # => index = 1, 2, 3 = 1 SUBSEP 2 SUBSEP 3 = 123 # => length(arr) = 1 # # for (i in arr) { N = split (i, ind, "") } # length(ind) = 5 # ind[1] = 1 # ind[2] = SUBSEP # ind[3] = 2 # ind[4] = SUBSEP # ind[5] = 3 # # print arr[1 SUBSEP 2 SUBSEP 3] => awk is different # print arr[1, 2, 3] => awk is different # print arr[123] => awk is different # ------------------------------------------------------------------------ [b******s] onwards.. [b******s] some array fun: Array[] = ------------------------------------------------------------------------ # -- # #! /usr/bin/awk -f # # array_test.awk # # # BEGIN { # Pfpn = "^[-]?[[:digit:]]+[.][[:digit:]]+$" ; Pfmt[Pfpn] = " %8.2f = fpn\n" # Pint = "^[-]?[[:digit:]]+$" ; Pfmt[Pint] = " %8d = int\n" # Pstr = ".*[^-.[:digit:]].*" ; Pfmt[Pstr] = " %8s = str\n" # } # { for (i in Pfmt) if ($1 ~ i) printf Pfmt[i], $1 } # -- # # # input data: # $ tr '\n' ' ' < test.data # 3.14 awk -57 4evar 42 -3.14 *$@&#! # # $ ./array_test.awk test.data # 3.14 = fpn # awk = str # -57 = int # 4evar = str # 42 = int # -3.14 = fpn # *$@&#! = str # ------------------------------------------------------------------------ [b******s] clear as vodka? [j******e] I'm definitely going to have to review this after the fact. I'm still hung up on the pseudo multi-dimensional arrays. [b******s] ah okay; those seem kind of a fudge that nonetheless works [j******e] I think I get the general idea, though. [a******r] this will take experimenting [j******e] It's shocking how infrequently I find I need to use multi-dimensional arrays anyway. [b******s] same [j******e] ...though that's mainly because I typically find myself using other hierarchical-type structures. [a******r] might be easier to just use parallel arrays [b******s] sometimes that's the easier route [b******s] just need to make sure they are the same size [j******e] That's also true of C's nested array approach. [b******s] on to user-defined functions? [a******r] yes [b******s] ... user-defined functions ... [b******s] same rules governing variable and array names apply to function names [b******s] basic form: function fname() {..} [b******s] note: no space allowed between fname and "()" [b******s] parameter list contains BOTH function args and any local variables [b******s] => variations in # of args may turn local vars into function args! [b******s] B. Kernighan lists this syntax decision as his biggest AWK regret [b******s] it is customary to add several spaces between args and local vars [b******s] prefixing local vars with underscore(s) also helps differentiate ------------------------------------------------------------------------ # ex. function fubar(fu, bar, _i, _j, _k) {...} # # alt. # function fubar(fu, bar, # _i, _j, _k) {...} # ------------------------------------------------------------------------ [b******s] I seem to recall the split line throwing an error in some AWKs [a******r] i like the prefixing idea [b******s] local variables can have same name as global variables => shaddowed [b******s] called args => 'call by value' / called arrays => 'call by reference' [b******s] => changes made to an array parameter effect the called array! [b******s] this ties in with the "all arrays are global" comment earlier [j******e] So, if I call the above function as fubar(0), bar essentially becomes a local variable. [j******e] There's nothing special about the spaces or underscores? [b******s] that's right [b******s] I suppose it could be useful, like where a function has 2 mandatory args and one optional [j******e] Weird, but okay. I understand why he would regret that decision. [j******e] Though you could exploit that a little to make functions with optional parameters. [b******s] 'return (expr)' statement should always be part of user-def. functions [b******s] including '(expr)' is optional but what is returned is then undefined [b******s] ex. function oddp(n){return(n%2 != 0 ?1:0)} # returns '1' (T) if 'n' odd [b******s] .. and '0' (F) if not [j******e] What happens if you omit the return entirely? Syntax error? [b******s] I think the return value is undefined but it should work [b******s] user-defined functions can call other functions as well as themselves ------------------------------------------------------------------------ # ex. recursion example - mimic the rev(1) command: # # rev.awk - from Effect. AWK Prog., p169 # function rev(str, start) { # if(start == 0) # return "" # return substr(str, start, 1) rev(str, start-1) # } # # $ echo 'taktik' |nawk '{print rev($0,length($0))}' -f rev.awk # kitkat # # where, # - args: $0 is passed to 'str' ; length($0) is passed to 'start' # - if start == 0 return "" => recursion complete / blank line # - otherwise return substr(str, start, 1) + re-call rev() with # start decremented ; recursion ends when start == 0 # ------------------------------------------------------------------------ [b******s] I think you asked about this in an earlier meetup [j******e] I believe you mentioned that recursion didn't tend to work well in AWK though? [b******s] I think I said it tended to be slower and of course you could end up using more system resources [b******s] true for other languages as well [b******s] I thought this was a good example as it reverses the string w/o using arrays [a******r] probably especially if you are using it to process a lot of lines or even files [b******s] must use 'getline' with care within user-defined functions (next time) [b******s] that all I got [b******s] questions? [j******e] Wow. We're early today. [b******s] I thought there'd be more questions after some of the earlier dumps [j******e] I think apart form the aforementioned multi-dimensional array thing, everything's pretty straight-forward. [a******r] i can see cobbling arrays together to make interesting structures [a******r] but I'm still thinking parallel type arrays [b******s] it's easier to quickly grasp vs multi-dimensional arrays [b******s] and you can always use an otherwise unused character like ":" to split the values into temporary arrays [j******e] Ohhhhhhh... I just figured it out. I forgot about how string concatenation worked. [j******e] ...and the fact that array indexes are just strings [b******s] Thursday may also be a shorter day [b******s] I was going to include misc. AWK quirks but I seemed to have included them over the previous days so really it'll just be getline stuff [a******r] when is final exam? [b******s] heh [b******s] alright well, I think that's a wrap [j******e] There's something to be said for a deliberately simple language. [b******s] ya, especially as you get older... [j******e] Sometimes engineers like to overcomplicate things. [a******r] again, thank you for all the time and expertise! [b******s] oh no problem; I've actually learned a few things doing this