--Topography-------------------------------------------------------------------- AWK Workshop / Discussion - April 11 - 23, 2026 * what: an informal exploration of plain AWK, aka "new AWK" * where: SDF.org - both pcom ("awk" room) and irc.sdf.org ("#awk") * when: Saturdays, 10-11am MDT ; Tuesdays & Thursdays, 6-7pm MDT -------------------------------------------------------------------------------- [b******s] Day 4: data, variables, coercion, strings and maths [b******s] except for local vars in user-defined functions, all vars are GLOBAL [b******s] unlike other languages, AWK treats data as *BOTH* numbers & strings [b******s] treatment is largely context driven, i.e. if maths => treat as numbers [b******s] in AWK '0' or "" = False, everything else = True ; unassigned vars = 0 or "" [b******s] => makes assignment test easy: if (Var) print "T"; else print "F" [b******s] => same with ternary operator: print (Var) ? "T" : "F" [b******s] an example.. ------------------------------------------------------------------------ # ex. print value of 'V' before & after assignment: # $ awk 'BEGIN { Fmt = "V = %d (%c)\n" # format str # printf Fmt, V, (V) ? "T" : "F" # V = 1 # printf Fmt, V, (V) ? "T" : "F" # }' # V = 0 (F) # V = 1 (T) # # where, # "%d" in Fmt str coerces V's unset value of "" to 0 for visibility # (V)?"T":"F" ternary oper. returns "T" if V is set, "F" otherwise # ------------------------------------------------------------------------ [j******e] Would the string "F" be treated as false? [b******s] well it's set so, no [j******e] ...or is that just a convention you are using for illustrative purposes? [b******s] ya, just for illustration [j******e] Gotcha. [b******s] in the above if we just did 'print V' you'd get "" => using "%d" format string to coerce [b******s] as seen before, by default $0 = current record, split in fields $1 - NF using FS [j******h] AWK does all arithmetic with floating point, so interpreting '0' as False will also catch any number smaller than machine precision? [j******e] Sorry, I assume in the above example, the # marks a comment? I don't remember covering that, but I'm assuming it from context. [b******s] ya, I think I didn't actually mention that comments begin with "#" [b******s] @j******h, I'm not sure TBH; you are the math guy so what is it? [j******h] I haven't tested it yet. [b******s] I recall doing some testing on various machines WRT number limits but it isn't something I usually have to think about [j******e] That would seem a reasonable assumption. [b******s] all record fields are reassignable, i.e. {$1="" ; print $0} [b******s] in above, use 'print substr($0, 2)' to avoid printing leading space [b******s] setting RS = "" can select multi-line data separated by blank lines [p******e] question: reassigning a field has effects in subsequent blocks? [j******e] I assume that if you change $1, it *wouldn't* affect $0, and vice-versa? [b******s] it does affect it [j******e] That's surprising, but kind of cool. [b******s] ya it can be useful for read in data that is run through the main body's data gaunlet [a******r] i think it depends on the awk [a******r] i mean if you print it first [j******h] j******e: $0 simply pieces together $1, $2, ... $NF, whatever their current values are, and separated by $OFS. So yes, changing $1 has an effect on $0. [b******s] right, and NF itself can be reassigned which can truncate or append to $0 [j******e] I would have expected that they were just regular variables that were automatically assigned on each record read, but I digress. [b******s] I think it makes sense in the context of the standard AWK logic [p******e] what would be appended? [b******s] NF++ => $0 OFS $(NF+1) [a******r] so if NR = 7, you could change it to 8 and then have a $8 that you could assign? [j******e] This is dark magic. [b******s] right; if you don't assign it it's just "" [p******e] ok [a******r] This is dark magic. [b******s] decrementing NF truncates $0 ; incrementing NF appends empty fields [j******h] a******r makes a good point. Could you assign to $8 before redefining NF, and would that automatically increase NF? [p******e] what if I assign before incrementing NR? [b******s] it *should* I think [p******e] *NF ------------------------------------------------------------------------ # $ echo 'a b c' |awk '//;{print "NF =", NF ; $4="d"};//;{print "NF =", NF}' # a b c # NF = 3 # a b c d # NF = 4 # ------------------------------------------------------------------------ [b******s] basically until a record gets to the last pattern-action pair you can tweak $0 continuously [j******e] So, as mentioned before // prints the line. I assume a regex without a body has an implicit {print} and a body without a condition has in implicit //? [b******s] right, it's just short-hand [j******e] That's what I'd pieced together on review, but good to have it confirmed. :) [b******s] for complex tweaking it helps to comment your code so you know what is happening at each patten-action [p******e] So, is RS="" special? I mean: is a single blank line a default-default separator? [b******s] it's useful for multi-line records which I think used to be more common; the default value of RS is "\n" [j******h] Once you get to the END block, there's no current record to operate on, so you have to type an explicit 'print' rather than the shorthand //, right? [b******s] an example.. ------------------------------------------------------------------------ # ex. print records w/ "--" separator + total record count: # $ printf 'a\nb\n\nc\nd\n\ne\nf' | \ # awk -vRS='' '//;NR>Prev{print "--";Prev=NR};END{print "record cnt =", NR}' # a # b # -- # c # d # -- # e # f # -- # record cnt = 3 # ------------------------------------------------------------------------ [b******s] no, we're setting it to a empty string to match the blank lines separating the records [a******r] like a mailing address [b******s] yes [b******s] these days I think databases are doing all the heavy lifting [p******e] this begs a question: what if I want no record separator and treat all the file at once? [b******s] you can do that; there's a small blurb in the Effective AWK Programming book about that [p******e] RS='0' perhaps? [b******s] think it's RS="\0" [b******s] for "null" [b******s] it's not portable but likely works w/ most AWKs [b******s] so, as example, you could copy an entire file as one big string [a******r] gawk says use '0' the null character [a******r] with a forward slash [a******r] i mean back, I think [p******e] ok [b******s] if I recall correctly, mawk doesn't like that version of null [b******s] as mentioned, AWK treaks data as both strings and numbers [p******e] *forward*? Not ? [p******e] ah ok back [b******s] data can be coerced as needed: "42" + 0 => digit ; 42 "" => string [a******r] more dark magic [p******e] nice [b******s] as previously shown, AWK will try to use data per context of use: [b******s] yeah, that's really best - convince yourself via experimentation ------------------------------------------------------------------------ ex. possibly unexpected behavior due to data coercion: # # $ echo "000" |awk '!$0{print "F"}' # "000" => 0 # F # $ awk 'BEGIN{print "123" + "zero" + 4}' # "zero" => 0 # 127 # $ awk 'BEGIN{print "123" + "4evar"}' # "4evar" => 4 # 127 # $ awk 'BEGIN{print "12" 1 + 2 "45"}' # result => str # 12345 # ------------------------------------------------------------------------ [p******e] ok this is tricky [p******e] thank you for including it [b******s] it's a bit weird at first [j******h] For sorting of strings, does awk rely on ENVIRON[LANG]? And should you assign to ENVIRON[LANG] if you want your awk program to be portable? [b******s] ya I haven't really mentioned that; it can really complicate things I think [p******e] it blows my mind that "4evah" = 4 [b******s] and "evar4" == 0 [p******e] *is treated as [b******s] ya it's not something I would actually do, just something to be aware of [b******s] right [a******r] if your field said "4lbs", it would use the 4 and leave the rest [j******e] A lot of this lesson seems to be "what not to do". ;) [j******e] Mind you, those things *are* good to know. [b******s] ya I figured most people here already do programming so maybe covering the odd stuff is more useful? [p******e] @a******r ok that's a good way to remember [b******s] .. on to string concatenation [b******s] string concatenation: no special operator, just string them together ------------------------------------------------------------------------ # $ awk 'BEGIN{Str = "A" SUBSEP "wk" FS "is" OFS "odd"; print Str}' # Awk is odd # # where, # Str is a concatenation of 7 strings # FS = OFS = " " # SUBSEP = non-printing char "\034" (used w/ arrays) # ------------------------------------------------------------------------ [b******s] useful commands for strings: length(), index(), match(), substr() [b******s] don't know what else I can say about strings; most languages have similar ways to interact with them [b******s] AWK has several built-in math fuctions: see reference materials [b******s] a subsectional sampling.. ------------------------------------------------------------------------ # int(x) returns integer part of x, truncates toward zero # rand() returns uniformly distributed pseudorandom # r, 0 <= r < 1 # srand(x) sets pseudorandom-number generator seed to x ; returns seed # srand() uses current time in secs as seed (relative to system epoch) # ------------------------------------------------------------------------ [b******s] mawk excepted, AWK uses same default seed on each run if srand() is not called ------------------------------------------------------------------------ # ex. rand() with & without srand(): # # $ for R in {1..3} ;do # echo "run #$R:" # awk 'BEGIN { while(i++ < 4) { N=rand() ; print "N =", N} }' # sleep 3 # done # run #1: # N = 0.924046 # N = 0.593909 # N = 0.306394 # N = 0.578941 # run #2: # N = 0.924046 # N = 0.593909 # N = 0.306394 # N = 0.578941 # run #3: # N = 0.924046 # N = 0.593909 # N = 0.306394 # N = 0.578941 # # $ for R in {1..3} ;do # echo "run #$R:" # awk 'BEGIN { srand() ; while(i++ < 4) { N=rand() ; print "N =", N} }' # sleep 3 # done # run #1: # N = 0.547808 # N = 0.248426 # N = 0.845114 # N = 0.847767 # run #2: # N = 0.796647 # N = 0.199754 # N = 0.6066 # N = 0.245839 # run #3: # N = 0.747587 # N = 0.117811 # N = 0.693405 # N = 0.82476 # ------------------------------------------------------------------------ [b******s] sorry for the big dump there [p******e] cool [b******s] int() can be used with rand() to get random integer ranges ------------------------------------------------------------------------ # ex. print random integer between 1 - N, inclusive: # # $ awk -vN=9 'BEGIN { srand() ; print int(N * rand()) + 1 }' # 7 # $ awk -vN=9 'BEGIN { srand() ; print int(N * rand()) + 1 }' # 3 # $ awk -vN=9 'BEGIN { srand() ; print int(N * rand()) + 1 }' # 6 # ------------------------------------------------------------------------ [p******e] wait, I have a question on the first srand() example [b******s] okay [p******e] I suppose if we sleep between each run, the numbers would change, am I right? [b******s] well by default srand() uses the system time for it's seed so sleeping shouldn't be needed [p******e] I thought the numbers were the same because less than a second passes between each run [b******s] oh I see what you mean [b******s] I'll redo that first srand() example w/ sleep(1) for the log [b******s] actually, I have another dump that maybe illustrates this issue.. ------------------------------------------------------------------------ beware potentially unexpected behavior when running in shell.. # calling srand() yet not getting random numbers => need sleep(1).. # $ for n in {1..5} ;do # bash shell # awk -vN=9 'BEGIN {srand() ; print int(N * rand()) + 1 }' # done # 5 # 5 # 5 # 5 # 5 # # same but with mawk - and NOT calling srand().. # $ for n in {1..5} ;do # bash shell # mawk -vN=9 'BEGIN { print int(N * rand()) + 1 }' # done # 4 # 9 # 8 # 2 # 2 # # looping within awk instead... # $ awk -vN=9 'BEGIN{srand();for(;i<5;i++)print int(N * rand()) + 1}' # 7 # 4 # 2 # 1 # 8 # ------------------------------------------------------------------------ [a******r] mawk doing its own thing [b******s] I mostly stick with mawk if I need randomness [b******s] a bit about precedence of operatorions ------------------------------------------------------------------------ # AWK operators in the order of precedence (low to high): # (from Sed & Awk, 2nd ed.) # # Operators Description # ---------------------------------------------------------------- # = += -= *= /= %= ^= Assignment # ?: C conditional expression # || Logical OR # && Logical AND # ~ !~ Match regular expression and negation # < <= > >= != == Relational operators # (blank) Concatenation # + - Addition, subtraction # * / % Multiplication, division, and modulus # + - ! Unary plus and minus, and logical negation # ^ Exponentiation # ++ -- Increment & decrement, prefix or postfix # $ Field reference # ---------------------------------------------------------------- # ------------------------------------------------------------------------ [p******e] mawk has an attitude [b******s] order of precedence can be modified using parens '()' as needed [p******e] order of precedence is what I never remember on any lang [p******e] cool [b******s] that's all the bullet points I've got so feel free to ask questions, etc [p******e] will you cover ranges? [b******s] ranges of what? [j******h] The Field reference operator $ is used similarly in AWK and Perl, you have to incluude it if you're reassigning a value to $NF. But bash and other shells tend to use $ only when dereferencing, not when assigning. [p******e] I remember [g]awk having some tricky concept called "ranges" [p******e] maybe start-pattern / end-pattern? ------------------------------------------------------------------------ $ echo 'a b c' |awk '//;{$NF="dog"};//;NF--' a b c a b dog a b ------------------------------------------------------------------------ [b******s] re: j******h's NF comment [j******h] p******e: maybe you're thinking of sed? Restrict the range of the s// operator, for example? [b******s] hope some of that was helpful and not too confused [a******r] very helpful! [a******r] and good dose of dark magic [b******s] lol [j******e] I think I got most of it. I'll review and email if I have any questions. [b******s] sounds good [p******e] I'm still trying it [j******e] A few points I really want to wrap my brain around a little more. [b******s] BTW I have the raw logs if anyone wants those; the sanitized ones are on my gopher page [j******h] Thanks b******s! Enjoy the rest of your weekend, everybody! [p******e] echo 'a b c' |awk '//;{$NF--};//' [p******e] echo 'a b c' |awk '{$NF--};//' ---> a b -1 [a******r] I'm copying the ones from the gopher page [j******e] I've been keeping my own logs. ;) [b******s] okay, have a great weekend [p******e] sanitized for swearwords? [j******e] They help with the review. [j******e] Sanitized to anonymize a little. [b******s] ^ and fix my typos [p******e] "Oh, Cr@p!" [b******s] heh [p******e] cool