Notes on the M4 Macro Language [1]Michael Breen © 2008 __________________________________________________________________ * [2]About this document + [3]Which m4? + [4]Who should read this? + [5]How is this different from the manual? * [6]What is m4? * [7]Basics: Simple macros, whitespace, quoting, comments * [8]How m4 works * [9]Quotes, escaping and non-ASCII characters * [10]Comments * [11]Alternatives to comments * [12]Conditionals * [13]Numbers * [14]Strings * [15]Defining macros with arguments; a recursive macro * [16]Scope of macros; local variables * [17]Pushing and popping macro definitions * [18]Macros that don't expand without arguments * [19]Name clashes: making macro names safe * [20]Loops * [21]Suspending and discarding output: Buffers and redirection * [22]Including files * [23]Accessing the shell; creating temporary files * [24]Debugging * [25]Aliasing and renaming macros (including builtins) * [26]Accessing internal builtins * [27]Macros for literal quotes * [28]Indirect macro calls * [29]Recursion pitfall: nesting limits * [30]Using unexpanding macros for arrays and hashes * [31]String macro problem workaround * [32]M4: Assessment __________________________________________________________________ About this document Which m4? This document describes GNU m4, as included with Linux; areas of potential incompatibility of which I am aware are mentioned as they arise and highlighted with a boldface “GNU”. This was originally based on GNU m4 version 1.4.5; it has been updated for version 1.4.10. Who should read this? You may find this helpful if * you want to decide whether m4 is the tool you need for some task (once you get a rough idea of what the language is about, you might want to skip down to the [33]comparative assessment) * you need to quickly get up to speed on m4, or revise or (perhaps) learn more about the language You should already be familiar with fundamental programming concepts (e.g., recursion). How is this different from the manual? There is a substantial overlap between the GNU m4 info pages and this document. The info pages are designed to be a comprehensive reference. This document is a much shorter “m4 by example” which is still “practically” complete – that is, I have tried to include: * everything helpful in using m4 effectively * anything that might cause a problem if you weren't aware of it Examples of the kind of details omitted are: * experimental features that may disappear in future versions * the ways different versions of m4 handle the changequote macro (in practice, all you need to know are the restrictions to observe in order to ensure compatibility) * details on the myriad debugging flags: effective debugging is possible using just two or three flags and macros There is also some original material here: * tips, e.g., macros to protect unbalanced quote characters inside quotes * different examples What is m4? M4 can be called a “template language”, a “macro language” or a “preprocessor language”. The name “m4” also refers to the program which processes texts in this language: this “preprocessor” or “macro processor” takes as input an m4 template and sends this to the output, after acting on any embedded directives, called macros. At its most basic, it can be used for simple embedded text replacement. If m4 receives the input define(AUTHOR, William Shakespeare) A Midsummer Night's Dream by AUTHOR then it outputs A Midsummer Night's Dream by William Shakespeare While similar in principle to the better-known C preprocessor, it is a far more powerful, general-purpose tool. Some significant uses are: * sendmail: sendmail's rather cryptic configuration file (/etc/mail/sendmail.cf) is generated using m4 from a template file that is much easier to read and edit (/etc/mail/sendmail.mc). * GNU Autoconf: m4 macros are used to produce “configure” scripts which make source code packages portable across different Unix-like platforms. * Security Enhanced Linux: SELinux policy files are (at time of writing) processed using m4. (In fact, m4 is the source of some difficulties here because its flexibility allows abuses and makes automated policy analysis difficult to apply.) Basics: Simple macros, whitespace, quoting, comments M4 is a Unix filter program. Its arguments, if any, are the files it is to read; if none is specified then it reads from stdin. The resulting text is sent to stdout. M4 comes with an initial set of built-in macros, often simply called “builtins”. The most basic of these, define, is used to create new macros: define(AUTHOR, W. Shakespeare) After this definition, the word “AUTHOR” is recognized as a macro that expands to “W. Shakespeare”. The define macro itself – including its two arguments – expands to an empty string, that is, it produces no output. However the newline at the end of the AUTHOR definition above would be echoed to the output. If a blank line added to the output is a problem then you can suppress it using the “delete to newline” macro: define(AUTHOR, W. Shakespeare)dnl There is no space between the end of the macro and the dnl: if there were then that space would be echoed to the output. No whitespace is allowed between a macro name and the opening parenthesis. Any whitespace before the beginning of a parameter is discarded. Thus the following definition is equivalent to the one above: define( AUTHOR,W. Shakespeare)dnl It's also possible to pass definitions on the command line using the -D option, for example: m4 -DAUTHOR="W. Shakespeare" -DYEAR=1587 input_file.m4 Quoting a string suppresses macro expansion. The default quote characters are the backtick (`) and apostrophe ('). M4 strips off these delimiters before outputting the string. Thus define(AUTHOR, W. Shakespeare)dnl `AUTHOR' is AUTHOR produces the output AUTHOR is W. Shakespeare For conciseness, most examples will show m4's output in the following way: `AUTHOR' is AUTHOR # -> AUTHOR is W. Shakespeare In m4, the hash character # is the default opening delimiter of a comment. A comment lasts up to and including the following newline character. The contents of a comment are not examined by m4; however, contrary to what you might expect, comments are echoed to the output. Thus, the previous line, if entered in full, would actually produce the output AUTHOR is W. Shakespeare # -> AUTHOR is W. Shakespeare Opening comment delimiters can be protected by quotes: `#' AUTHOR # -> # W. Shakespeare Nested quotes are recognized as such: ``AUTHOR'' is AUTHOR # -> `AUTHOR' is W. Shakespeare Quoted strings can include newlines: define(newline,`line break') a newline here outputs a line break here Without a matching opening quote character (`), a closing quote (') is simply echoed to the output. Thus `AUTHOR ' is AUTHOR.'' produces AUTHOR is W. Shakespeare.'' M4 also understands nested parentheses within a macro's argument list: define(PARENS, ()) brackets: PARENS # -> brackets: () Unbalanced parentheses can be quoted to protect them: define(LPAREN,`(') define(RPAREN,`)') LPAREN bracketed RPAREN # -> ( bracketed ) (Unbalanced quote characters are more problematic; a solution is given [34]later.) Pitfall: In fact, quoting of the macro name is also recommended. Consider the following: define(LEFT, [) LEFT # -> [ define(LEFT, {) LEFT # -> [ Why didn't the second define work? The problem is that, within the second define, the macro LEFT was expanded before the define macro itself took effect: define(LEFT, {) # -> define([, {) -> That is, instead of redefining the macro LEFT, a new macro named [ was defined. GNU m4 allows macros to have non-standard names, including punctuation characters like [. In fact, the new macro doesn't seem to work either: [ # -> [ That's because GNU m4 doesn't ordinarily recognize a macro as a macro unless it has a valid name – that is, a sequence of ASCII letters, underscores, or digits, beginning with an underscore or letter. For example, my_macro1 and _1stMacro are both valid names; my.macro1 and 1stMacro are not. (We will see later how the ability to define [35]macros with invalid names can be useful.) Quoting the macro's arguments avoids this problem: define(`LEFT',`[') LEFT # -> [ define(`LEFT',`{') LEFT # -> { For the same reason, the undefine macro will normally work as expected only if its argument is quoted: define(`RIGHT', `]') undefine(RIGHT) # -> undefine(]) -> RIGHT # -> ] undefine(`RIGHT') RIGHT # -> RIGHT (Note that undefine does not complain if it is given the name of a non-existent macro, it simply does nothing.) How m4 works M4's behaviour can be mystifying. It is best to get an early understanding of how it works. This should save you time figuring out what's going on when it doesn't do what you expect. First, m4 looks for tokens in its input – roughly speaking, it divides it into quoted strings, macro arguments, names (i.e., identifiers), numbers and other symbols (punctuation characters). Whitespace (including newlines), numbers and punctuation usually mark token boundaries; exceptions are when they appear within a quoted string or a macro argument. define( `Version2', A – 1 )99Version2:Version2_ Version22 # -> 99A – 1 :Version2_ Version22 Above, since a valid name can include digits but cannot begin with one, the names seen after the definition are Version2, Version2_, and Version22; only the first of these corresponds to a defined macro. Continuing: Version2(arg1, arg2) Version2 (junk) garbage(trash)Version2() # -> A – 1 A – 1 (junk) garbage(trash)A – 1 If the name of a macro is followed immediately by a '(' then m4 reads in a list of arguments. The Version2 macro we have defined ignores its arguments -- but that doesn't matter to m4: it swallows up the arguments and outputs only the macro's expansion “A – 1 ”. In general, m4 passes input tokens and separators straight through to the output, making no change except to remove the quotes surrounding quoted string tokens. When it encounters a macro name, however, it stops echoing to the output. Instead: 1. it reads in the macro's arguments (if any) 2. it determines the expansion of the macro and inserts this expansion at the beginning of its input 3. m4 continues scanning the input, starting with the expansion If while reading in a macro's arguments, m4 encounters another macro then it repeats this process for the nested macro. An example makes this clearer: define(`definenum', `define(`num', `99')') num # -> num definenum num # -> define(`num', `99') num -> 99 As soon as m4 gets to the end of “definenum” on the last line above, it recognizes it as a macro and replaces it with “define(`num', 99)” -- however, instead of outputting this expansion, it sticks it back on the beginning of its input buffer and starts again from there. Thus, the next thing it reads in is “define(`num', 99)”. As the define macro expands to an empty string, nothing is output; however, the new macro num is now defined. Then m4 reads in a space which it echoes to the output, followed by the macro num, which it replaces with its expansion. The last line therefore results in the output “ 99”. Unless a nested macro is quoted, it is expanded immediately: define(`definenum', define(`num', `99')) num # -> 99 definenum # -> Here, when m4 reads in the nested define macro, it immediately defines num; it also replaces the macro “define(`num', `99')” with its expansion – an empty string. Thus, “definenum” ends up being defined as an empty string. Arbitrary nesting is possible -- with (ordinarily) an extra layer of protective quotes at each level of nesting: define(`definedefineX',`define(`defineX',`define(`X',`xxx')')') defineX X # -> defineX X definedefineX X # -> X defineX X # -> xxx If rescanning of a macro's expansion is not what you want then just add more quotes: define(`stmt',``define(`Y',`yyy')'') stmt # -> define(`Y',`yyy') Y # -> Y Above, the outermost quotes are removed when the nested macro is being read in – so stmt expands first to `define(`Y',`yyy')'; m4 then rescans this as a string token and removes the second layer of quotes before sending it to the output. Now consider the definition define(`plus', `+') Suppose we want to use this plus macro twice in succession with no intervening space. Clearly, plusplus doesn't work – it is read as a single token, plusplus, not two plus tokens: plusplus # -> plusplus We can use an argument list as a separator: plus()plus # -> ++ But watch what happens with an extra level of indirection: define(`oper', `plus') oper()oper # -> plusoper Here, oper() expands to plus; but then rescanning of the input starts from the beginning of the expansion. Thus, the next thing read in is the token plusoper. As it doesn't correspond to a macro, it is copied straight to the output. The problem can be solved by adding an empty quote as a separator: oper`'oper # -> plus`'oper -> +`'oper -> ... -> ++ It is a good idea to include such a separator in macro definitions as a matter of policy: define(`oper',`plus`'') oper()oper # -> plus`'oper -> +`'oper -> +oper -> ... -> ++ If ever m4 seems to hang or stop working, it is probably because a faulty macro has sent it into an infinite loop: define(`Bye', `Bye for now') Hello. # -> Hello. Bye. # -> Bye for now. -> Bye for now for now. -> ... Such an error is not always this obvious: the cycle may involve more than one macro. Finally, look at this example: define(`args', ``NAME', `Marie'') define(args) # -> define(`NAME', `Marie') -> NAME # -> Marie args(define(`args',`Rachel')) # -> args() -> `NAME', `Marie' -> NAME, Marie args # -> Rachel In the second part of the example, although args doesn't take an argument, we can still pass it one. In this case the argument redefines the macro that's currently being expanded. However, it is the expansion that was in force when the macro identifier was read in that is output. Similarly, it is possible to define a self-modifying macro or even a self-destructing macro: define(`msg', `undefine(`msg')Secret message.') msg # -> Secret message. msg # -> msg [36]Recursive macros can also be defined. Quotes, escaping and non-ASCII characters A deficiency of m4 is that there is no escape character. This means that if you want to use the backtick (`) for anything other than an opening quote delimiter you need to take care. Sometimes you can just add an extra layer of quotes: I said, ``Quote me.'' # -> I said, `Quote me.' However, in other cases, you might need an opening quote without m4 interpreting it as such. The general way around this problem is to use the changequote macro, e.g., changequote() a `quoing' outputs a `quoted string' Without parameters, changequote restores the default delimiters. In general, it is best to avoid using changequote. You can define [37]macros to insert literal quotes should you need them. Sometimes, however, it is necessary to change the quote character globally, e.g., because the backtick character is not available on some keyboards or because the text being processed makes extensive use of the default quote characters. If you do use changequote then be aware of the pitfalls: GNU m4's changequote can differ from other implementations of m4 and from earlier versions of GNU m4. For portability, call changequote only with two arguments – or with no arguments, i.e., changequote`' # (trailing `' is separator if needed) Note that changequote changes how existing macros are interpreted, e.g., define(x,``xyz'') x # -> xyz changequote({,}) x # -> `xyz' Don't choose the same delimiter for the left and right quotes: doing so makes it impossible to have nested quotes. Don't change a quote delimiter to anything that begins with a letter or underscore or a digit; m4 won't complain but it only recognizes a delimiter if it starts with a punctuation character. A digit may be recognized as a delimiter but not if it is scanned as part of the preceding token. While later versions of GNU m4 have a greater tolerance for non-ASCII characters (e.g., the pound sign or an accented character) it is better to avoid them, certainly in macro names and preferably in delimiters too. If you do use 8-bit characters and m4 is not behaving quite as you expect, this may be the reason. Where multibyte character encoding is used, m4 should not be used at all. Comments As mentioned above, line comments are echoed to the output, e.g., define(`VERSION',`A1') VERSION # VERSION `quote' unmatched` expands to A1 # VERSION `quote' unmatched` Comments are not very useful. However, even if you don't use them you need to remember to quote any hash character in order to prevent it being interpreted as the beginning of a comment: `#' VERSION -> # A1 You can change the opening comment delimiter, e.g., changecom(`@@') – as with changequote, the new delimiter should start with a punctuation character. If you want echoing block comments, you can also change the closing delimiter, e.g., for C-like comments, changecom(/*,*/) VERSION `quote' /* VERSION `quote' ` */ VERSION # -> # A1 quote /* VERSION # `quote' ` */ A1 Without arguments, changecom restores the default comment delimiters. Alternatives to comments For a comment that should not be echoed to the output, use dnl: this macro not only prevents the following newline from being output (as we saw above), it also discards everything up to the newline. dnl These two lines will not result dnl in any output. Non-echoing block comments: multiline comments that are not echoed to the output can be written like this ifelse(` This is a comment spanning more than one line. ')dnl This is a hack which takes advantage of the fact that the ifelse macro (described below) has no effect if it is passed only one argument. Some versions of m4 may therefore issue a warning about insufficient arguments; GNU m4 doesn't. Be sure there are no unmatched quotes in the comment text. Conditionals ifdef(`a',b) outputs b if a is defined; ifdef(`a',b,c) outputs c if a is not defined. The definition being tested may be empty, e.g., define(`def') `def' is ifdef(`def', , not )defined. # -> def is defined. ifelse(a,b,c,d) compares the strings a and b. If they match, the macro expands to string c; if not, string d. This can be extended to multiple else-ifs: ifelse(a,b,c,d,e,f,g) means that if a matches b, then return (expand to) c; else if d matches e, then return f; else return g. In other words, it's shorthand for ifelse(a,b,c,ifelse(d,e,f,g)) Numbers M4 normally treats numbers as strings. However, the eval macro allows access to integer arithmetic; expressions can include these operators (in order of precedence) + - unary plus and minus ** exponent * / % multiplication, division, modulo (eval(8/-5) -> -1) + - addition and subtraction << >> shift up or down (eval(-8>>1) -> -4) == != < <= >= > relational ! logical not (converts non-zero to 0, 0 to 1) ~ bitwise not (eval(~0) -> -1) & bitwise and (eval(6&5) -> 4) ^ bitwise exclusive or (eval(3^2) -> 1) | bitwise or (eval(1|2) -> 3) && logical and || logical or The above table is for GNU m4; unfortunately, the operators and precedence are version-dependent. Some versions of m4 incorrectly treat ^ the same as ** (exponent). For maximum compatibility, make liberal use of parentheses to enforce precedence. Should you need it, octal, hexadecimal and indeed arbitrary radix arithmetic are available. It's also possible to specify the width of eval's output. (See the m4 info pages for details on these.) eval(7*6) # -> 42 eval(7/3+100) # -> 102 There are also incr and decr builtins as shortcuts which expand to the argument plus or minus one, e.g., incr(x) is equivalent to eval(x+1): define(`n', 0) n # -> 0 define(`n', incr(n)) n # -> 1 Beware of silent integer overflow, e.g., on my machine, the integer range is -2**31 ... 2**31-1; eval(2**31) erroneously expands to -2147483648. Logical conditions can be checked like this: `n' is ifelse(eval(n < 2), 1, less than , eval(n = 2), 1, , greater than )2 Strings len: len(`hello') # -> 5 substr: substr(`hello', 1, 3) # -> ell substr(`hello', 2) # -> llo index: index(`hello',`llo') # -> 2 index(`not in string', `xyz') # -> -1 translit: define(`ALPHA', `abcdefghijklmnopqrstuvwxyz') define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ') define(`ROT13', `nopqrstuvwxyzabcdefghijklm') translit(`abc ebg13', ALPHA, ALPHA_UPR) # -> ABC EBG13 translit(`abc ebg13', ALPHA, ROT13) # -> nop rot13 GNU m4 includes some additional string macros: regexp, to search for a regular expression in a string, and patsubst, to do find and replace. Unfortunately, m4's usual approach of rescanning the expansion of a macro can be a problem with macros that operate on strings: define(`eng',`engineering') substr(`engineer',0,3) # -> eng -> engineering translit(`rat', ALPHA, ROT13) # -> eng -> engineering This is not normally the desired behaviour and is arguably a design bug in m4: the builtins should at least provide some way to allow us to prevent the extracted or transformec substring from being expanded. A workaround is suggested [38]below. Defining macros with arguments; a recursive macro In standard m4 (Unix), a macro can have up to 9 arguments; within the macro definition, these are referenced as $1 ... $9. (GNU m4 has no fixed limit on the number of arguments.) Arguments default to the empty string, e.g., if 2 arguments are passed then $3 will be empty. Going in at the deep end, here is a reimplementation of the len builtin (replacing it) as a recursive macro. define(`len',`ifelse($1,,0,`eval(1+len(substr($1,1)))')') In a macro definition, argument references like $1 expand immediately, regardless of surrounding quotes. For example, len(`xyz') above would expand (at the first step) to ifelse(xyz,,0,`eval(1+len(substr(xyz,1)))')') Where necessary, this immediate expansion can be prevented by breaking up the reference with inside quotes, e.g., $`'1. The name of the macro is given by $0; $# expands to the number of arguments. Note in the following example that empty parentheses are treated as delimiting a single argument: an empty string: define(`count', ``$0': $# args') count # -> count: 0 args count() # -> count: 1 args count(1) # -> count: 1 args count(1,) # -> count: 2 args $* expands to the list of arguments; $@ does the same but protects each one with quotes to prevent them being expanded: define(`list',`$`'*: $*; $`'@: $@') list(len(`abc'),`len(`abc')') # -> $*: 3,3; $@: 3,len(`abc') A common requirement is to process a list of arguments where we don't know in advance how long the list will be. Here, the shift macro comes in useful – it expands to the same list of arguments with the first one removed: shift(1,2, `abc', 4) # -> 2,abc,4 shift(one) # -> define(`echolast',`ifelse(eval($#<2),1,`$1`'', `echolast(shift($@))')') echolast(one,two,three) # -> three Scope of macros; local variables All macros have global scope. What if we want a “local variable” – a macro that is used only within the definition of another macro? In particular, suppose we want to avoid accidentally redefining a macro used somewhere else. One possibility is to prefix “local” macro names with the name of the containing macro. Unfortunately, this isn't entirely satisfactory – and it won't work at all in a recursive macro. A better approach is described in the [39]next section. Pushing and popping macro definitions For each macro, m4 actually creates a stack of definitions – the current definition is just the one on top of the stack. It's possible to temporarily redefine a macro by using pushdef to add a definition to the top of the stack and, later, popdef to destroy only the topmost definition: define(`USED',1) define(`proc', `pushdef(`USED',10)pushdef(`UNUSED',20)dnl `'`USED' = USED, `UNUSED' = UNUSED`'dnl `'popdef(`USED',`UNUSED')') proc # -> USED = 10, UNUSED = 20 USED # -> 1 If the macro hasn't yet been defined then pushdef is equivalent to define. As with undefine, it is not an error to popdef a macro which isn't currently defined; it simply has no effect. In GNU m4, define(X,Y) works like popdef(X)pushdef(X,Y), i.e., it replaces only the topmost definition on the stack; in some implementations, define(X) is equivalent to undefine(X)define(X,Y), i.e., the new definition replaces the whole stack. Macros that don't expand without arguments When GNU m4 encounters a word such as “define” that corresponds to a builtin that requires arguments, it leaves the word unchanged unless it is immediately followed by an opening parenthesis. define(`MYMACRO',`text') # -> define a macro # -> define a macro Actually, we can say that m4 does expand the macro – but that it expands only to the same literal string. We can make our own macros equally intelligent by adding an ifelse – or an extra clause to an existing “ifelse”: define(`reverse',`ifelse($1,,, `reverse(substr($1,1))`'substr($1,0,1)')') reverse drawer: reverse(`drawer') # -> drawer: reward define(`reverse',`ifelse($#,0,``$0'',$1,,, `reverse(substr($1,1))`'substr($1,0,1)')') reverse drawer: reverse(`drawer') # -> reverse drawer: reward Name clashes: making macro names safe Unfortunately, some macros do not require arguments and so m4 has no way of knowing whether a word corresponding to a macro name is intended to be a macro call or just accidentally present in the text being processed. Also, other versions of m4, and older versions of GNU m4, may expand macro names which are not followed by arguments even where GNU m4 does not: # GNU m4 1.4.10 we shift the responsibility # -> we shift the responsibility # GNU m4 1.4.5 we shift the responsibility # -> we the responsibility In general, the problem is dealt with by quoting any word that corresponds to a macro name: we `shift' the responsibility # -> we shift the responsibility However if you are not fully in control of the text being passed to m4 this can be troublesome. Many macro names, like “changequote”, are unlikely to occur in ordinary text. Potentially more problematic are dictionary words that are recognized as macros even without arguments: * divert, undivert (covered [40]below) * windows (“windows” – as well as “unix” and “os2” – is defined in some versions of m4 as a way of testing the platform on which m4 is running; by default it is not defined in GNU m4.) An alternative to quoting macro names is to change all m4's macro names so that they won't clash with anything. Invoking m4 with the -P command-line option prefixes all builtins with “m4_”: define(`M1',`text1')M1 # -> define(M1,text1)M1 m4_define(`M1',`text1')M1 # -> text1 On the basis that unnecessary changes to a language are generally undesirable, I suggest not using -P option if you can comfortably avoid it. However, if you are writing a set of m4 macros that may be included by others as a module, do add some kind of prefix to your own macros to reduce the possibility of clashes. Loops Although m4 provides no builtins for iteration, it is not difficult to create macros which use recursion to do this. Various implementations can be found on the web. This author's “for” loop is: define(`for',`ifelse($#,0,``$0'',`ifelse(eval($2<=$3),1, `pushdef(`$1',$2)$4`'popdef(`$1')$0(`$1',incr($2),$3,`$4')')')') for n = for(`x',1,5,`x,')... # -> for n = 1,2,3,4,5,... for(`x',1,3,`for(`x',0,4,`eval(5-x)') ') # -> 54321 54321 54321 Note the use of pushdef and popdef to prevent loop variables clobbering any existing variable; in the nested for loop, this causes the second x to hide (shadow) the first one during execution of the inner loop. A “for each” macro might be written: define(`foreach',`ifelse(eval($#>2),1, `pushdef(`$1',`$3')$2`'popdef(`$1')dnl `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')') foreach(`X',`Open the X. ',`door',`window') # -> Open the door. Open the window. foreach(`X',`foreach(`Y',`Y the X. ',`Open',`Close')',`door',`window') # -> Open the door. Close the door. Open the window. Close the window. define(`OPER',``$2 the $1'') foreach(`XY',`OPER(XY). ', ``window',`Open'', ``door',`Close'') # -> Open the window. Close the door. In a “for” loop of either kind, it can be useful to know when you've reached the last item in the sequence: define(`foreach',`ifelse(eval($#>2),1, `pushdef(`last_$1',eval($#==3))dnl `'pushdef(`$1',`$3')$2`'popdef(`$1')dnl `'popdef(`last_$1')dnl `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')') define(`everyone',``Tom',`Dick',`Harry'') foreach(`one',`one`'ifelse(last_one,0,` and ')',everyone). # -> Tom and Dick and Harry. Finally, a simple “while” loop macro: define(`while',`ifelse($#,0,``$0'',eval($1+0),1,`$2`'$0($@)')') define(`POW2',2) while(`POW2<=1000',`define(`POW2',eval(POW2*2))') POW2 # -> 1024 Here, the apparently redundant +0 in eval($1+0) does have a purpose: without it, a while without arguments expands to ifelse(0,0,``while'',eval() ... whereupon eval() produces an empty argument warning. Suspending and discarding output: Buffers and redirection To discard output – in particular, to prevent newlines in a set of definitions being output – use divert: divert(-1) divert(0)dnl Unlike the contents of a comment, the definitions (and any other macros) are still processed by m4; divert(-1) merely causes m4 to do this silently, without sending anything to the output. The last line above, with its dnl to prevent the following newline being echoed, could also have been written: divert`'dnl divnum expands to the number of the currently active diversion; 0, the default, means standard output (stdout); positive numbers are temporary buffers which are output in numeric order at the end of processing. Standard m4 has 9 buffers (1..9); in GNU m4 there is no fixed limit. undivert(num) appends the contents of diversion num to the current diversion (normally stdout), emptying it; without arguments, undivert retrieves all diversions in numeric order. Note that undivert() is the same as undivert(0) and has no effect: diversion 0 is stdout which is effectively an empty buffer. The contents of the buffer are not interpreted when undivert is run, they are simply output as raw text, e.g., the following code results in Z Z Z being output (not 9 9 9): divert(1) Z Z Z divert define(`Z',9) undivert(1) There is an implicit divert and undivert when m4 reaches the end of the input, i.e., all buffers are flushed to the standard output. If you want to avoid this for any reason, you can of course discard the contents of the buffers by putting the following line at the end of your input divert(-1)undivert or by exiting using the m4exit builtin. Including files include(filename.m4) causes the contents of the named file to be read and interpreted as if it was part of the current file (just like #include in the C preprocessor). GNU m4 allows for an include file search path. To specify directories to be searched for include files use the -I option on the command line, e.g., m4 -I ~/mydir -Ilocaldir/subdir or use the environment variable M4PATH, e.g. (bash shell) export M4PATH=~/mydir:localdir/subdir m4 test.m4 sinclude(nonexistentfile) (silent include) is a version of include that doesn't complain if the file doesn't exist. To include a file uninterpreted, GNU m4 allows undivert to be passed a filename argument. If inc.m4 contains define(`planet',`jupiter') then undivert(`inc.m4') # -> define(`planet',`jupiter') planet # -> planet include(`inc.m4')planet # -> jupiter Accessing the shell; creating temporary files A system command can be passed to the shell, e.g., syscmd(`date --iso-8601|sed s/-/./g') outputs something like 2007.10.16. The output from the command sent to syscmd is not interpreted: syscmd(`echo "define(\`AUTHOR',\`Orwell')"') # -> define(`AUTHOR',`Orwell') AUTHOR # -> AUTHOR However GNU m4 provides another macro, esyscmd, that does process the output of the shell command: esyscmd(`echo "define(\`AUTHOR',\`Orwell')"') # -> AUTHOR # -> Orwell The macro sysval expands to the exit status of the last shell command issued (0 for success): sysval # -> 0 esyscmd(`ls /no-dir/') sysval # -> 2 Naturally, m4 can be used as a filter in shell scripts or interactively: echo "eval(98/3)"|m4 outputs 32. Temporary files can be created to store the output of shell commands: maketemp(prefixXXXXXX) creates a temporary file and expands to the filename – this name will be the (optional) prefix with the six X's replaced by six random letters and digits. In older versions of GNU m4 and in other implementations of m4, the X's are generated from the process ID. In certain contexts, this may be a security hole. Another macro, mkstemp, is available in newer m4's which always generates a random filename extension. define(`FILENAME',mkstemp(`/tmp/myscriptXXXXXX')) The temporary file can be read in using include (perhaps in conjunction with divert). Debugging Most bugs relate to problems with quoting so check that first. If you want to see step-by-step what m4 is doing, either invoke it with the -dV option or, to limit full debug output to one part of the file, debugmode(V) ...problematic section... debugmode The V flag is for full debugging; other flags for finer control are described in the info pages. dumpdef(`macro', ...) outputs to standard error the formatted definition of each argument – or just if macro is a builtin; dumpdef without arguments dumps all definitions to stderr. Nothing is sent to stdout. For user-defined macros, defn(`macro') expands to the definition string (i.e., not prefixed by the macro name). errprint(`this message goes to standard error (stderr)') Aliasing and renaming macros (including builtins) Suppose we want to allow strlen to be used instead of len. This won't work: define(`strlen',`len') strlen(`hello') # -> len because we forgot to relay the arguments: define(`strlen',`len($@)') strlen(`hello') # -> 5 OK, but suppose we want to replace len altogether. Clearly, this doesn't work: define(`strlen',`len($@)')undefine(`len') strlen(`hello') # -> len(hello) since expansion now stops at len. However, using the builtin defn to access the definition of a macro, it's possible to alias or rename macros quite simply. For user-defined macros, defn expands to the text of the macro (protected with quotes before being output). The defn of a builtin expands in most contexts to the empty string – but when passed as an argument to “define” it expands to a special token that has the desired effect: define(`rename', `define(`$2',defn(`$1'))undefine(`$1')') rename(`define',`create') create(`vehicle',`truck') vehicle # -> truck define(`fuel',`diesel') # -> define(fuel,diesel) fuel # -> fuel And, because the intelligence is built into the macro definition, m4 is still smart enough not to expand the word “create” unless it is followed by arguments – compare the indirect approach, where defn is not used: create a macro # -> create a macro create(`new',`create($@)') new(`wheels', 6) new wheels # -> 6 Accessing internal builtins Even when you undefine a builtin or define another macro with the same name, GNU m4 still keeps the internal definition which can be called indirectly via the macro builtin: define(`TREE',`maple') undefine(`define',`undefine') undefine(`TREE') # -> undefine(TREE) TREE # -> maple builtin(`undefine',`TREE') TREE # -> TREE builtin(`define',`create',`builtin'(``define'',$`'@)) create(`TREE',`ash') TREE # -> ash (Note the judicious use of quotes for the last argument to the call to builtin which defines the create macro above. Because of the use of inner quotes, the usual approach of surrounding the whole argument with quotes, i.e., builtin(`define',`create',`builtin(`define',$`'@)') would not have worked as desired: instead, any call to the create macro would have ended up defining a macro called “$@”.) Because they can be accessed only indirectly and so don't need to be protected, the names of these internal macros are not changed by the -P flag. Macros for literal quotes The obvious way to prevent the characters ` and ' being interpreted as quotes is to change m4's quote delimiters as described [41]above. This has some drawbacks, for example, to ensure the new delimiters don't accidentally occur anywhere else, more than one character may be used for each delimiter – and if there's a lot of quoting, the code will become more verbose and perhaps more difficult to read. Another approach is to keep m4's existing quote delimiters and define macros which hide the backtick and apostrophe from m4. The trick is to balance the quotes while m4 still sees them as nested quotes, temporarily change the quoting, and then prevent one of the quotes being output: define(`LQ',`changequote(<,>)`dnl' changequote`'') define(`RQ',`changequote(<,>)dnl` 'changequote`'') define(myne, `It`'RQ()s mine!') LQ()LQ()myne'' # -> ``It's mine!'' Indirect macro calls GNU m4 allows any macro to be called indirectly using the macro indir: indir(`define',`SIZE',78) SIZE # -> 78 indir(`SIZE') # -> 78 This is useful where the name of the macro to be called is derived dynamically or where it does not correspond to a token (i.e., a macro name with spaces or punctuation). Compared to an ordinary call, there are two differences to be aware of: * the called macro must exist, otherwise m4 issues an error * the arguments are processed before the definition of the macro being called is retrieved indir(`define(`SIZE')',67) # -> m4: undefined macro `define(`SIZE')' indir(`SIZE', indir(`define',`SIZE',53)) # -> 53 indir(`SIZE', indir(`undefine',`SIZE')) # -> m4: undefined macro `SIZE' We can of course define our own higher-order macros. For example, here is a macro, do, roughly similar to indir [42]above: define(do, $1($2, $3, $4, $5)) do(`define', ``x'', 4) x # -> 4 Since extra arguments are normally ignored, do works for any macro taking up to 4 arguments. Note however that the example here, which expands to define(`x', 4, , , ), does generate a warning: “excess arguments to builtin `define' ignored”. Recursion pitfall: nesting limits Pretend we don't know that the sum n + (n-1) + ... + 1 is given by n*(n+1)/2 and so we define a recursive macro to calculate it: define(`sigma',`ifelse(eval($1<=1),1,$1,`eval($1+sigma(decr($1)))')') If too large a number is passed to this macro then m4 may crash with a message like ERROR: recursion limit of 1024 exceeded (for GNU m4 1.4.10). In fact, the problem is not that sigma is recursive, it is the degree of nesting in the expansion, e.g., sigma(1000) will expand to eval(1000 + eval(999 + eval(998 + eval(997 + ... The nesting limit could be increased using a command line option (-L). However, we do better to avoid the problem by performing the calculation as we go using an extra parameter as an accumulator: define(`sigma',`ifelse(eval($1<1),1,$2,`sigma(decr($1),eval($2+$1))')') Now, no matter how many steps in the expansion, the amount of nesting is limited at every step, e.g., sigma(1000) becomes ifelse(eval(1000<1),1,,`sigma(decr(1000),eval(+1000))') which becomes sigma(999,1000) which in turn expands to ifelse(eval(999<1),1,1000,`sigma(decr(999),eval(1000+999))') and so on. Here, the default value of the added parameter (an empty string) worked OK. In other cases, an auxiliary macro may be required: the auxiliary macro will then be the recursive one; the main macro will call it, passing the appropriate initial value for the extra parameter. Using unexpanding macros for arrays and hashes Although it is not standard, GNU m4 allows any text string to be defined as a macro. Since only valid identifiers are checked against macros, macros whose names include spaces or punctuation characters will not be expanded. However, they can still be accessed as variables using the defn macro: define(`my var', `a strange one') my var is defn(`my var'). # -> my var is a strange one. This feature can be used to implement arrays and hashes (associative arrays): define(`_set', `define(`$1[$2]', `$3')') define(`_get', `defn(`$1[$2]')') _set(`myarray', 1, `alpha') _get(`myarray', 1) # -> alpha _set(`myarray', `alpha', `omega') _get(`myarray', _get(`myarray',1)) # -> omega defn(`myarray[alpha]') # -> omega String macro problem workaround [43]Above, we noted a problem with the string macros: it's not possible to prevent the string that's returned from being expanded. Steven Simpson wrote a [44]patch for m4 which fixes the problem by allowing an extra parameter to be passed to string macros – however this of course means using a non-standard m4. A less radical fix is to redefine the substr macro as follows. It works by extracting the substring one letter at a time, thus avoiding any unwanted expansion (assuming, of course, that no one-letter macros have been defined): define(`substr',`ifelse($#,0,``$0'', $#,2,`substr($@,eval(len(`$1')-$2))', `ifelse(eval($3<=0),1,, `builtin(`substr',`$1',$2,1)`'substr( `$1',eval($2+1),eval($3-1))')')')dnl define(`eng',`engineering') substr(`engineer',0,3) # -> eng To keep it simple, this definition assumes reasonably sensible arguments, e.g., it doesn't allow for substr(`abcdef', -2) or substr(`abc'). Note that, as with the corresponding builtin substr, you may have problems where a string contains quotes, e.g., substr(``quoted'',0,3) The new version of substr can in turn be used to implement a new version of translit: define(`translit',`ifelse($#,0,``$0'', len(`$1'),0,, `builtin(`translit',substr(`$1',0,1),`$2',`$3')`'translit( substr(`$1',1),`$2',`$3')')')dnl define(`ALPHA', `abcdefghijklmnopqrstuvwxyz') define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ') translit(`alpha', ALPHA, ALPHA_UPR) # -> ALPHA M4: Assessment M4's general character as a macro language can be seen by comparing it to another, very different macro language: [45]FreeMarker. GNU m4 and FreeMarker are both free in both senses of the word: FreeMarker is covered by a BSD-style license. They are more-or-less equally “powerful”, e.g., both languages support recursive macros. In some respects, m4 has an edge over FreeMarker: * m4 is a standalone tool, FreeMarker requires Java. * On Unix platforms, m4 is a standard tool with a long heritage – e.g., a Makefile can reasonably expect to be able invoke it as a filter in a processing sequence. * m4 scripts can interact with the Unix shell. * m4 is arguably a simpler, “cleaner”, macro language. The two languages are quite different in appearance and how they work. In m4, macros are ordinary identifiers; FreeMarker uses XML-like markup for the <#opening> and delimiters of macros. While m4's textual rescanning approach is conceptually elegant, it can be confusing in practice and demands careful attention to layers of nested quotes. FreeMarker, in comparison, works like a conventional structured programming language, making it much easier to read, write and debug. On the other hand, FreeMarker markup is more verbose and might seem intrusive in certain contexts, for example, where macros are used to extend an existing programming language. FreeMarker has several distinct advantages: * it has an associated tool, FMPP, which can read in data from different sources (e.g., in XML or CSV format) and incorporate it into the template output. * FreeMarker has a comprehensive set of builtin macros and better data handling capabilities. * No compatibility issues: there is a single, cross-platform implementation that is quite stable and mature (whereas differences even between recent GNU m4 versions are not strictly backwardly compatible). * FreeMarker supports Unicode; m4 is generally limited to ASCII, or at best 8-bit character sets. Ultimately, which language is “better” depends on the importance of their relative advantages in different contexts. This author has very positive experience of using FreeMarker/FMPP for [46]automatic code generation where, for several reasons, m4 was unsuitable. On the other hand, m4 is clearly a more sensible and appropriate choice for Unix sendmail's configuration macros. References 1. http://mbreen.com/ 2. https://mbreen.com/m4.html#toc1 3. https://mbreen.com/m4.html#toc2 4. https://mbreen.com/m4.html#toc3 5. https://mbreen.com/m4.html#toc4 6. https://mbreen.com/m4.html#toc5 7. https://mbreen.com/m4.html#toc6 8. https://mbreen.com/m4.html#toc7 9. https://mbreen.com/m4.html#quotes 10. https://mbreen.com/m4.html#toc9 11. https://mbreen.com/m4.html#toc10 12. https://mbreen.com/m4.html#toc11 13. https://mbreen.com/m4.html#toc12 14. https://mbreen.com/m4.html#strings 15. https://mbreen.com/m4.html#recursion 16. https://mbreen.com/m4.html#toc15 17. https://mbreen.com/m4.html#pushpop 18. https://mbreen.com/m4.html#ignore_argless_macros 19. https://mbreen.com/m4.html#toc18 20. https://mbreen.com/m4.html#toc19 21. https://mbreen.com/m4.html#redir 22. https://mbreen.com/m4.html#toc21 23. https://mbreen.com/m4.html#toc22 24. https://mbreen.com/m4.html#toc23 25. https://mbreen.com/m4.html#renamingmacros 26. https://mbreen.com/m4.html#builtin 27. https://mbreen.com/m4.html#quotemacro 28. https://mbreen.com/m4.html#indir 29. https://mbreen.com/m4.html#toc28 30. https://mbreen.com/m4.html#unexpandingmacros 31. https://mbreen.com/m4.html#substrfix 32. https://mbreen.com/m4.html#assessment 33. https://mbreen.com/m4.html#assessment 34. https://mbreen.com/m4.html#quotemacro 35. https://mbreen.com/m4.html#unexpandingmacros 36. https://mbreen.com/m4.html#recursion 37. https://mbreen.com/m4.html#quotemacro 38. https://mbreen.com/m4.html#substrfix 39. https://mbreen.com/m4.html#pushpop 40. https://mbreen.com/m4.html#redir 41. https://mbreen.com/m4.html#quotes 42. https://mbreen.com/m4.html#indir 43. https://mbreen.com/m4.html#strings 44. http://www.comp.lancs.ac.uk/~ss/websitemgmt/tools#m4patch 45. http://freemarker.sourceforge.net/ 46. http://statestep.com/templates.html