=============================================================================== ========================== ============================ ========================== PARSE-O-MATIC ============================ ========================== ============================ =============================================================================== Parse-O-Matic is Copyright (C) 1992 by Pinnacle Software, CP 386 Mount Royal, Quebec Canada H3P 3C6 U.S. Office: Box 714 Airport Road, Swanton, Vermont 05488 USA Support Line (514) 345-9578 -- Free Files BBS (514) 345-8654 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This is a SHAREWARE product. That means we would like you to pass around unregistered copies to other people. If you have a modem, please upload it to your favourite bulletin board system, or give a copy to a friend who you think might need a program like this. Shareware means sharing! Pass it on! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Design and coding: Timothy Campbell Marketing: James Ludwick Paul Renault Documentation: Natasha Mirage Distribution: Kevin Beck Support services: L. Wilson Z. Krome P. Katt =============================================================================== INTRODUCTION =============================================================================== WHY YOU NEED PARSE-O-MATIC -------------------------- There are plenty of programs out there that have valuable data locked away inside them. How do you get that data OUT of one program and into another one? Some programs provide a feature which "exports" a file into some kind of generic format. Perhaps the most popular of these formats is known as a "comma-delimited file", which is a text file in which each data field is separated by a comma. Literal strings -- which might contain commas! -- are represented in double quotes. So a few lines from a comma-delimited file might look something like this (an export from a hypothetical database of people who owe your company money): +----------------------------------------------------------------+ | "JONES","FRED","1234 GREEN AVENUE", "KANSAS CITY", "MO",293.64 | | "SMITH","JOHN","2343 OAK STREET","NEW YORK","NY",22.50 | | "WILLIAMS","JOSEPH","23 GARDEN CRESCENT","TORONTO","ON",16.99 | +----------------------------------------------------------------+ Unfortunately, not all programs export data in this format, and not all programs READ data in that format. What's most annoying of all is when one program exports data in a format that is ALMOST what you need! If that's the case, you might find that it's worth your while to spend a few hours in a text editor, modifying the export file so that the other program can understand it. Or you might write a program to do the editing for you. Both solutions are time-consuming. An even more challenging problem arises when a program which has no export capability DOES have the ability to "print" reports to a file. You can write a program to read these files and convert them to something you can use, but this can be a LOT of work! PARSE-O-MATIC TO THE RESCUE! ---------------------------- Parse-O-Matic is a utility that interprets text files and converts them to other formats. It can help you "boil down" reports into their essential data. You can also use it to convert NEARLY compatible file formats. HOW IT WORKS ------------ You need three things: 1) The Parse-O-Matic program 2) A Parse-O-Matic "POM" file (to tell Parse-O-Matic what to do) 3) The input file The input file is usually a report from another program. We've provided several examples of typical input files. For example, the file EXAMPLE2.TXT comes from the AccPac accounting software. AccPac is a great program, but its export capabilities leave something to be desired. Parse-O-Matic can help! =============================================================================== HOW TO CODE THE POM FILE =============================================================================== This documentation assumes that you are an experienced computer user. If you have trouble, you might ask a programmer to help you -- POM file creation is a little like programming! THE PARSE-O-MATIC COMMAND ------------------------- The format of the Parse-O-Matic command line is: POM pom-file input-file output-file Here's an example, as you would type it at the DOS command line: POM POMFILE.POM REPORT.TXT OUTPUT.TXT For a more formal description of the command line, start up POM by typing this command at the DOS prompt: POM THE POM FILE ------------ The POM file is a text file with a .POM extension. The following conventions are used when interpreting the POM file: - Null lines and lines starting with a semi-colon are ignored. - A POM file may contain up to 500 lines of specifications. Comment lines do not count in this total. A POM file contains no "loops" (to use the programming term). Each line of the input file is processed by the entire POM file. If you'd like it expressed in terms of programming languages, here's what POM does: +----------------------------------------------------------------+ | START: If there's nothing left in the input file, go to QUIT. | | Read a line from the input file | | Do everything in the POM file | | Go to START | | QUIT: Tell the user you're finished! | +----------------------------------------------------------------+ PADDING FOR CLARITY ------------------- Spaces and tabs between the words and variables in a POM file line are generally ignored (except in the case of the OUT and OUTEND commands). You can use spaces to make the POM file easier to read. Additionally, in any line in the POM file, the following terms are ignored: = THEN ELSE These can be added to make the lines easier to read. For example, the IF command can be written in any of the following ways: Very terse: IF PRICE "0.00" BONUS "0.00" "1.00" Padded with spaces: IF PRICE "0.00" BONUS "0.00" "1.00" Fully padded: IF PRICE = "0.00" THEN BONUS = "0.00" ELSE "1.00" COMMAND WORDS ------------- For ease of learning, the commands words are presented in the following order: MINLEN SET IGNORE ACCEPT IF TRIM PAD INSERT CHANGE OUT OUTEND MINLEN number Specifies the minimum length a line must be to be considered for parsing. (Note that null lines in the input file are always ignored) If you omit the MINLEN command, the minimum length is assumed to be 1. That is to say, all lines longer than 1 character will be processed. MINLEN is useful for ignoring brief information lines that clutter up a report that you are parsing. For example, in the sample file EXAMPLE2.POM, the MINLEN command is set to 85 to ensure that all lines shorter than 85 characters long will be ignored. This simplifies the coding considerably. SET var1 value1 This assigns a value to a variable. The usual reason to do this is to set a variable from the input line ($FLINE) prior to cleaning it up with TRIM. For example: SET NAME = $FLINE[50 59] (Sets the variable from the input line) TRIM NAME "R" " " (Trims any spaces on the right side) We might find a blank-padded string between columns 50 and 59. Thus: SET assigns this value to the variable NAME: "FRED " After the TRIM, the variable NAME becomes: "FRED" You will also use SET if you plan to include a substring of $FLINE in the output, since OUT and OUTEND do not recognize substrings after the "|" marker, only complete variables. IGNORE value1 value2 When value1 contains value2, the input line is ignored and all further processing on the input line stops. The usual format of this command is as in this example: IGNORE $FLINE[3 9] = "Date" This would skip any input line that contains the word "Date" between columns 3 and 9 ($FLINE is the line just read from the input file). ACCEPT value1 value2 Accepts the input line if value1 contains value2. For example, if the entire POM file read as follows: ACCEPT $FLINE[15 17] "YES" OUTEND "X" = "X" |{$FLINE} then any input line that contains "YES", starting in column 15, would be sent to the output file. All other lines would be ignored. CLUSTERED ACCEPTS: Sometimes you have to check more than one value to see if the input line is valid. You do this using "clustered ACCEPTs", which are several ACCEPT commands in a row. Briefly stated, if you have several ACCEPTs in a row ("clustered"), they are all processed to determine if the input line is acceptable or not. If even one ACCEPT matches up, the line is accepted. To express this in more detail... When value1 contains value2, the line is accepted, and processing of the POM file continues for that input line, even if the immediately following ACCEPTs do NOT produce a match. After all, we've already got a match! If value1 does NOT contain value2, Parse-O-Matic looks at the next commmand in the POM file. If it is not another ACCEPT, the input line is ignored. If it is another ACCEPT, maybe it will product a match! So Parse-O-Matic moves to that command. The following POM file uses clustered ACCEPTs to accept any line that contains the name "FRED" or "MARY" between columns 5 and 8, or contains the word "MEMBER" between columns 20 and 25. SET NAME = $FLINE[5 8] (Set the variable) ACCEPT NAME = "FRED" (Look for FRED) ACCEPT NAME = "MARY" (Look for MARY) ACCEPT $FLINE[20 25] = "MEMBER" (Look for MEMBER) OUTEND "X" = "X" |{$FLINE} (Output the line if we get this far) The following example would NOT work, however: ACCEPT $FLINE[20 25] = "MEMBER" SET NAME = $FLINE[5 8] ACCEPT NAME = "FRED" ACCEPT NAME = "MARY" OUTEND "X" = "X" |{$FLINE} It would not work because the ACCEPTs are not clustered; if the first ACCEPT fails, the input line will be rejected as soon as the SET command is encountered. The next two ACCEPTs would not be reached in such case. IF value1 value2 var1 value3 [value4] If value1 contains value2, var1 is set to value3. Otherwise, it is set to value4. If value4 is missing, nothing is done (i.e. var1 is not changed). Here's an example of the IF command... SET EARNING = $FLINE[20 26] TRIM EARNING "A" " " IF EARNING = "0.00" THEN BONUS = "0.00" ELSE "1.00" This would obtain the value between columns 20 and 26, remove any spaces, then check if it equals "0.00". If it does, the variable BONUS would be set to 0.00. If not, BONUS would be set to "1.00". TRIM var1 spec1 character Removes characters from var1. This is usually used to remove blanks. spec1 can be: A=All B=Both ends L=Left side only R = Right side only For example: SET PRICE = $FLINE[20 26] TRIM PRICE "A" "," TRIM PRICE "L" "$" This would remove all commas from the variable "PRICE", and remove the leadings dollar sign. Thus: If the input contained the string: "$25,783" The first TRIM would change it to: "$25783" The second TRIM would change it to: "25783" PAD var1 spec1 character len Makes var1 a particular length, padded with a particular character. spec1 is "L", "R", or "C" (Left, Right or Center) character is the character used to pad the string len is the desired string length For example, if the variable ABC is set to "1234" ... PAD ABC "L" "0" "7" left-pads it 7 characters wide with zeros ("0001234") PAD ABC "R" " " "5" right-pads it 5 characters wide with spaces ("1234 ") PAD ABC "C" "*" "8" would center it, 8 wide, with asterisks ("**1234**") If the length is less than the length of the string, it is unchanged. For example, if you set variable XYZ to "PINNACLE", then PAD XYZ "R" " " "3" would leave the string as-is ("PINNACLE"). If it was your intention to make XYZ 3 letters long, it would be better to use the SET command: SET XYZ = XYZ[1 3] INSERT var1 spec1 value1 Inserts text on the left or right of var1, or at a "found text" position. spec1 is "L" or "R" (Left or Right) or a find-string (e.g. "@HELLO") value1 is the value to be inserted For example, if the variable ABC is set to "ParseOMatic", then INSERT ABC "L" "Register " would set ABC to "Register ParseOMatic" INSERT ABC "R" "is super!" would set ABC to "ParseOMatic is super!" You can use a find-string to insert text at the first occurance of the text you specify. For example: INSERT ABC "@OMatic" "-" would set ABC to "Parse-OMatic" If the find-string is not found, nothing is done. CHANGE var1 value1 value2 Replaces ALL occurances of value1 with value2. This is more powerful than TRIM, but is not as efficient. Here is an example of CHANGE: SET DATE = $FLINE[31 38] CHANGE DATE "/" "--" If the input contained the string: "93/10/15" The CHANGE would convert it to: "93--10--15" OUT[END] value1 value2 |output-picture This is actually two command words: OUT and OUTEND. OUT writes to the output file without an end-of-line. OUTEND writes an end-of-line to the file. When value1 matches value2, a line is output to the output file, according to the output picture. Within the output picture, all text is taken literally (i.e. " is taken to mean literally that -- a quotation mark character). The only exception to this is that variable names are identified by the { and } characters. For example, a POM file that contained the following single line: OUTEND "X" = "X" |{$FLUPC} would simply output in uppercase every line in the input file. NOTE: OUT does not write immediately to the output file; it accumulates the output until it reaches 255 characters before writing. You must do an OUTEND to ensure that the data is actually written. No single OUT or OUTEND command can output more than 255 characters. You can not use substrings after the "|" marker. Thus, the following line is NOT legal: OUTEND $FLINE[1 3] = "IBM" |{$FLINE[1 15]} The correct way to code this is as follows: SET CODE = $FLINE[1 15] OUTEND $FLINE[1 3] = "IBM" |{CODE} This would output the first 15 characters of any line that contains the letters IBM in the first three positions. VALUES ------ A value can be specified in the following ways: "text" A literal text string VARNAME The name of a variable VARNAME[start end] A substring of a variable VARNAME[start] A single character VARNAME+ Increments variable (see explanation below) Variable names can be up to 8 characters long. There is no distinction between upper and lower case. You can create up to 225 variables and literals (this number includes the predefined variables). Parse-O-Matic predefines several variables. They are: $FLINE = The line just read from the file $FLUPC = The line just read from the file, in uppercase $BRL = The { character (used in OUT) $BRR = The } character (used in OUT) DELIMITERS ---------- If you need to specify a quotation mark, use "". For example: IGNORE $FLINE "He said ""Hello"" to me." This would ignore lines containing: He said "Hello" to me. ILLEGAL CHARACTERS ------------------ No command can contain these ASCII characters: HEX DECIMAL NAME --- ------- ---- $00 0 NULL $0A 10 LF $0D 13 CR Of course, LF and CR do appear at the very end of each line. INCREMENTING ------------ Only numeric incrementing is supported at this time. Attempting to increment another type of variable will result in an error. - Incrementing "1" gives you "2" - Incrementing "9" gives you "10" TRACING ------- By setting the DOS variable POM to ALL, you can generate a trace file, named POM.TRC. This is helpful if you have trouble understanding why your file isn't being parsed properly. But be sure to test it with a SMALL input file. The trace is quite detailed, and it can easily generate a HUGE output file. To save space, you can specify a particular list of variables to be traced, rather than tracing everything. For example, to trace only the variable PRICE, you would enter this DOS command: SET POM=PRICE To trace several variables, separate the variable names by slashes, as in this example: SET POM=PRICE/BONUS/NAME