Chapter 19. Report Writing with Pictures

Perl is the Practical Extraction and Report Language. After you have practically extracted and manipulated all the data in your file, you may want to write a formatted report to categorize and summarize this information. If you have written reports with the awk programming language, you may find Perl's formatting a little peculiar at first.
19.1. The Template
In order to write a report, Perl requires that you define a template to describe visually how the report will be displayed; that is, how the report is to be formatted. Do you want left-justified, centered, or right-justified columns? Do you have numeric data that needs formatting? Do you want a title on the top of each page or column titles? Do you have some summary data you want to print at the end of the report?
We'll start with a simple template for a simple report and build on that until we have a complete example.
A format template is structured as follows:
format FILEHANDLE=
picture line
value line (text to be formatted)
write;
|
19.1.1. Steps in Defining the Template
The steps for defining a template are as follows.
| |
1. | Start the format with the keyword format, followed by the name of the output filehandle and an equal sign. The default filehandle is STDOUT, the screen. The template definition can be anywhere in your script.
|
2. | Although any text in the template will be printed as is, the template normally consists of a picture line to describe how the output will be displayed. The picture consists of symbols that describe the type of the fields (see Table 19.1). The fields are centered, left justified, or right justified (see Table 19.2). The picture line can also can be used to format numeric values.
Table 19.1. Field Designator SymbolsField Designator | Purpose |
---|
@ | Indicates the start of a field | @* | Used for multiline fields | ^ | Used for filling fields |
Table 19.2. Field Display SymbolsType of Field Symbol | Type of Field Definition |
---|
< | Left justified | > | Right justified | | | Centered | # | Numeric | . | Indicates placement of decimal point |
|
3. | After the field designator, the @ symbol, the type of field symbol is repeated as many times as there will be characters of that type. This determines the size of the field. If >>>>>> is placed directly after the @ symbol, it describes a seven-character right-justified field. Strange, huh? Any real text is not placed directly after the @ symbol, or the field type is not interpreted. |
4. | After the picture line, which breaks the line into fields, comes the value line, text that will be formatted as described by the picture. Each text field is divided by a comma and corresponds, one to one, with the field symbol in the picture line. Any whitespace in the value line is ignored.
|
| |
5. | When you are finished creating the template, a period (.) on a line by itself terminates the template definition.
|
6. | After the template has been defined, the write function invokes the format and sends the formatted records to the specified output filehandle. For now, the default filehandle is STDOUT.
|
Example 19.1.
(The Script)
#!/bin/perl
1 $name="Tommy";
$age=25;
$salary=50000.00;
$now="03/14/97";
# Format Template
2 format STDOUT=
3 ---------------------REPORT-------------------------
4 Name: @<<<<<< Age:@##Salary:@#####.## Date:@<<<<<<<<<<
5 $name, $age, $salary, $now
6 .
# End Template
7 write;
8 print "Thanks for coming. Bye.\n";
(Output)
---------------------REPORT-------------------------
Name: Tommy Age: 25 Salary: 50000.00 Date:03/14/97
Thanks for coming. Bye.
The keyword format is followed by STDOUT, the default and currently selected filehandle, followed by an equal sign. This line will be printed as is. Any text in the format that is not specifically formatted will print as is. This line is called the picture line. It is a picture of how the output will be formatted. The @ defines the start of a field. There will be four fields. The first one is a left-justified seven-character field preceded by the string Name: and followed by the string Age:. The second field consists of two digits followed by Salary:. The third field consists of eight digits with a decimal point inserted after the sixth digit, followed by the string Date. These are the variables that are formatted according to the picture. Each variable is separated by a comma and corresponds to the picture field above it. The dot ends the template definition. The write function will invoke the template to display the formatted output to STDOUT.
|
19.1.2. Changing the Filehandle
If you want to write the report to a file instead of to the screen, the file is assigned to a filehandle when it is opened. This same filehandle is used for the format filehandle when defining the template. To invoke the format, the write function is called with the name of the output filehandle as an argument.
Example 19.2.
Code View: #!/bin/perl
1 $name="Tommy";
$age=25;
$salary=50000.00;
$now="05/21/07";
2 open(REPORT, ">report" ) || die "report: $!\n";
# REPORT filehandle is opened for writing
3 format REPORT= # REPORT is also used for the format filehandle
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
4 Name: @<<<<<<
5 $name
-----------------------
Age:@###
$age
-----------------------
Salary:@#####.##
$salary
-----------------------
Date:@>>>>>>>>>>
$now
-----------------------
6 .
7 write REPORT; # The write function sends output to the file
# associated with the REPORT filehandle
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
(Output)
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
Name: Tommy
-----------------------
Age: 25
-----------------------
Salary: 50000.00
-----------------------
Date: 05/21/07
-----------------------
The file report is opened and attached to the REPORT filehandle. The format keyword is followed by the output filehandle REPORT. The picture line describes the text string Name:, followed by a six-character left-justified field. The scalar variable $name will be formatted as described in the picture line corresponding to it (see above). The dot ends the format template definition. The write function will invoke the format called REPORT and write formatted output to that filehandle. If the filehandle is not specified, the filehandle REPORT does not receive the formatted output, because the write function has not been told where the output should go.
|
19.1.3. Top-of-the-Page Formatting
In the following example, the title, EMPLOYEE INFORMATION, is printed each time the format is invoked. It might be preferable to print only the title at the top of each page. Perl allows you to define a top-of-the-page format that will be invoked only when a new page is started. The default length for a page is 60 lines. After 60 lines are printed, Perl will print the top-of-the-page format at the top of the next page. (The default length can be changed by setting the special variable $= to another value.) In the following example, the write function sends all output to STDOUT each time the while loop is entered.
The example is shown before top-of-the-page formatting is applied.
Example 19.3.
Code View: (The File)
$ cat datafile
Tommy Tucker:55:500000:5/19/66
Jack Sprat:44:45000:5/6/77
Peter Piper:32:35000:4/12/93
(The Script)
#!/bin/perl
1 open(DB, "datafile" ) || die "datafile: $!\n";
2 format STDOUT=
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
Name: @<<<<<<<<<<<<
$name
-----------------------
Age: @##
$age
-----------------------
Salary: @#####.##
$salary
-----------------------
Date: @>>>>>>>>>>
$start
.
3 while(<DB>){
4 ($name, $age, $salary, $start)=split(":");
5 write ;
}
(Output)
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
Name: Tommy Tucker
-----------------------
Age: 55
-----------------------
Salary: 50000.00
-----------------------
Date: 5/19/66
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
Name: Jack Sprat
-----------------------
Age: 44
-----------------------
Salary: 45000.00
-----------------------
Date: 5/6/77
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
Name: Peter Piper
-----------------------
Age: 32
-----------------------
Salary: 35000.00
-----------------------
Date: 4/12/54
The file datafile is opened for reading via the DB filehandle. The format for STDOUT is created with picture lines and data. The while loop reads one line at a time from the DB filehandle. Each line is split by colons into an array of scalars. The write function invokes the STDOUT format and sends the formatted line to STDOUT.
|
The format for top-of-the-page formatting follows.
format STDOUT_TOP=
picture line
value line (text to be formatted)
.(End of template)
|
The keyword format is followed by the name of the filehandle appended with an underscore and the word TOP. If a picture line is included, the value line consists of the formatted text. Any text not formatted by a picture is printed literally. The period (.) terminates the top-of-page format template.
The $% is a special Perl variable that holds the number of the current page.
The following example shows the results of top-of-the-page formatting.
Example 19.4.
Code View: (The Script)
#!/bin/perl
1 open(DB, "datafile" ) || die "datafile: $!\n";
2 format STDOUT_TOP=
3 -@||-
4 $%
-----------------------
5 | EMPLOYEE INFORMATION |
-----------------------
6 .
7 format STDOUT=
Name: @<<<<<<<<<<<<<
$name
-----------------------
Age: @##
$age
-----------------------
Salary: @#####.##
$salary
-----------------------
Date: @>>>>>>>
$start
.
8 while(<DB>){
9 ($name, $age, $salary, $start)=split(":");
10 write;
}
(Output)
- 1 -
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
Name: Tommy Tucker
-----------------------
Age: 55
-----------------------
Salary: 50000.00
-----------------------
Date: 5/19/66
Name: Jack Sprat
-----------------------
Age: 44
-----------------------
Salary: 45000.00
-----------------------
Date: 5/6/77
Name: Peter Piper
-----------------------
Age: 32
-----------------------
Salary: 35000.00
-----------------------
Date: 4/12/93
|
Example 19.5.
Code View: (The Script)
#!/bin/perl
1 open(DB, "datafile" ) || die "datafile: $!\n";
2 open(OUT, ">outfile" )|| die "outfile: $!\n";
3 format OUT_TOP= # New filehandle
4 -@||-
5 $%
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
.
format OUT=
Name: @<<<<<<<<<<<<<
$name
-----------------------
Age: @##
$age
-----------------------
Salary: @#####.##
$salary
-----------------------
Date: @>>>>>>>
$start
-----------------------
.
while(<DB>){
($name, $age, $salary, $start)=split(":");
6 write OUT;
}
(Output)
$ cat outfile
- 1 -
-----------------------
| EMPLOYEE INFORMATION |
-----------------------
Name: Tommy Tucker
-----------------------
Age: 55
-----------------------
Salary: 50000.00
-----------------------
Date: 5/19/66
-----------------------
Name: Jack Sprat
-----------------------
Age: 44
-----------------------
Salary: 45000.00
-----------------------
Date: 5/6/77
-----------------------
Name: Peter Pumpkin
-----------------------
Age: 32
-----------------------
Salary: 35000.00
-----------------------
Date: 4/12/93
-----------------------
|
19.1.4. The select Function
The select function is used to set the default filehandle for the print and write functions. When you have selected a particular filehandle with the select function, the write or print functions do not require an argument. The selected filehandle becomes the default when a format is invoked or when the print function is called.
The select function returns the scalar value of the previously selected filehandle.
If you have a number of formats with different names, the $~ variable is used to hold the name of the report format for the currently selected output filehandle. The write and print functions will send their output to the currently selected output filehandle.
The $^ variable holds the name of the top-of-page format for the currently selected output filehandle.
The $. variable holds the record number (similar to the NR variable in awk).
Example 19.6.
Code View: (The Script)
#!/usr/bin/perl
# Write an awklike report
1 open(MYDB, "> mydb") || die "Can't open mydb: $!\n";
2 $oldfilehandle= select(MYDB);
# MYDB is selected as the filehandle for write
3 format MYDB_TOP =
DATEBOOK INFO
Name Phone Birthday Salary
____________________________________________________
.
4 format MYDB =
@<<<@<<<<<<<<<<<<<<<<@<<<<<<<<<<<<@|||||||||@#######.##
$., $name, $phone, $bd, $sal
.
5 format SUMMARY =
____________________________________________________
6 The average salary for all employees is $@######.##.
$total/$count
The number of lines left on the page is @###.
$-
The default page length is @###.
$=
7 .
open(DB,"datebook") || die "Can't open datebook: $!\n";
while(<DB>){
( $name, $phone, $address, $bd, $sal )=split(/:/);
8 write ;
$count++;
$total+=$sal;
}
close DB;
9 $~=SUMMARY; # New report format for MYDB filehandle
10 write ;
11 select ($oldfilehandle); # STDOUT is now selected for further
# writes or prints
12 print "Report Submitted On" , 'date';
(Output)
16 Report Submitted On Sat Mar 26 11:52:04 PST 2001
(The Report)
$ cat mydb
DATEBOOK INFO
Name Phone Birthday Salary
1 Betty Boop 245-836-8357 6/23/23 14500.00
2 Igor Chevsky 385-375-8395 6/18/68 23400.00
3 Norma Corder 397-857-2735 3/28/45 245700.00
. . .
25 Paco Gutierrez 835-365-1284 2/28/53 123500.00
26 Ephram Hardy 293-259-5395 8/12/20 56700.00
27 James Ikeda 834-938-8376 12/1/38 45000.00
The average salary for all employees is $82572.50.
The number of lines left on the page is 32.
The default page length is 60.
The filehandle MYDB is opened for writing. The select function sets the default filehandle for the write and print functions to the filehandle MYDB. The scalar $oldfilehandle is assigned the value of the previously assigned filehandle. The previously defined filehandle, in this example, is the default, STDOUT. The top-of-the-page template is defined for filehandle MYDB. The format for the body of the report is set for filehandle MYDB. Another format template is defined with a new name, SUMMARY. This format can be invoked by assigning the format name SUMMARY to the special variable $~. (See line 9.) The picture line is defined. The format template is terminated. The format is invoked and output is written to the currently selected filehandle, MYDB. The $~ variable is assigned the new format name. This format will be used for the currently selected filehandle, MYDB. The write function invokes the format SUMMARY for the currently selected filehandle, MYDB. The select function sets the filehandle to the value of $oldfilehandle, STDOUT. Future write and print functions will send their output to STDOUT unless another output filehandle is selected. This line is sent to the screen.
|
19.1.5. Multiline Fields
If the value line contains more than one newline, the @* variable is used to allow multiline fields. It is placed in a format template on a line by itself, followed by the multiline value.
Example 19.7.
(The Script)
#!/bin/perl
1 $song="To market,\n
to market, \nto buy a fat pig.\n";
2 format STDOUT=
3 @*
4 $song
5 @*
6 "\nHome again,\nHome again,\nJiggity, Jig!\n"
.
write;
(Output)
To market,
to market,
to buy a fat pig.
Home again,
Home again,
Jiggity, Jig!
The scalar $song contains newlines. The format template is set for STDOUT. The @* fieldholder denotes that a multiline field will follow. The value line contains the scalar $song, which evaluates to a multiline string. The @* fieldholder denotes that a multiline field will follow. The value line contains a string embedded with newline characters.
|
19.1.6. Filling Fields
The caret (^) fieldholder allows you to create a filled paragraph containing text that will be placed according to the picture specification. If there is more text than will fit on a line, the text will wrap to the next line, etc., until all lines are printed in a paragraph block format. Each line of text is broken into words. Perl will place as many words as will fit on a specified line. The value line variable can be repeated over multiple lines. Only the remaining text for each line is printed rather than reprinting the entire value over again. If the number of value lines is more than the actual number of lines to be formatted, blank lines will appear.
Extra blank lines can be suppressed by using the special tilde (~) character, called the suppression indicator. If two consecutive tildes are placed on the value line, the field that is to be filled (preceded by a ^) will continue filling until all text has been blocked in the paragraph.
Example 19.8.
(The Script)
#!/bin/perl
$name="Hamlet";
print "What is your favorite line from Hamlet? ";
1 $quote = <STDIN>;
2 format STDOUT=
3 Play: @<<<<<<<<<< Quotation: ^<<<<<<<<<<<<<<<<<<
4 $name, $quote
5 ^<<<<<<<<<<<<<<<<<<
$quote
^<<<<<<<<<<<<<<<<<<
$quote
6 ~ ^<<<<<<<<<<<<<<<<<<
$quote
.
write;
(Output)
What is your favorite line from Hamlet? To be or not to be, that is the question:
Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune...
Play: Hamlet Quotation: To be or not to be, that is the
question: Whether 'tis nobler in the
mind to suffer the slings and arrows
of outrageous fortune...
The user is asked for input and should type a line from Shakespeare's Hamlet. (The line wraps.) The format template for STDOUT is defined. The picture line contains two fields, one for the name of the play, $name, or Hamlet, and one for the line of user input, $quote. The ^ fieldholder is used to create a filled paragraph. The quote will be broken up into words that will fit over four lines. If there are more words than value lines, they will not be formatted. If there are fewer words than lines, blank lines will be suppressed due to the ~ character preceding the last picture line. The value line contains the variables to be formatted according to the picture above them. The second line contained in $quote is placed here if all of it did not fit on the first line. If we run out of text after formatting three lines, the blank line will be suppressed.
|
Example 19.9.
Code View: (The Script)
#!/bin/perl
$name="Hamlet";
print "What is your favorite line from Hamlet? ";
$quote = <STDIN>;
format STDOUT=
Play: @<<<<<<<<<< Quotation: ^<<<<<<<<<<<<<<<<<<
1 $name, $quote
2 ~~ ^<<<<<<<<<<<<<<<<<<
$quote
.
3 write;
(Output)
What is your favorite line from Hamlet? To be or not to be, that is the question:
Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune...
Play: Hamlet Quotation: To be or not to be,
that is the
question: Whether
'tis nobler in the
mind to suffer the
slings and arrows of
outrageous fortune...
The value line is set. It will contain the name of the play and the quotation. Using two tildes (suppression indicators) tells Perl to continue filling the paragraph until all of the text in $quote is printed or a blank line is encountered. The write function invokes the format.
|
19.1.7. Dynamic Report Writing
Now that you know how to create a report, you can dynamically change the width of the fields in the template on demand. First, the designated field types are assigned to variables, and Perl's repeat string operator is used to designate the number of characters per field. This number can be assigned different values depending on how wide the field will be. The report template will be assigned to a string consisting of variables to represent the field type, number of characters, and field values. When needed, the format string can be interpreted by Perl's eval function. See Example 19.10.
Example 19.10.
Code View: (The Script)
1 open(FH, "datebook") or die; # Open a file for reading
2 open(SORT, "|sort") or die; # Open a pipe to sort output
3 $field1="<" x 18; # Create format strings
$field2="<" x 12;
$field3="|" x 10;
$field4="#" x 6 . ".##";
# Create the format template
4 $format=qq(
5 format SORT=
6 \@$field1\@$field2\@$field3\@$field4
7 \$name, \$phone, \$birth, \$sal
.
);
8 eval $format;
9 while(<FH>){
($name,$phone,$address,$birth,$sal)=split(":");
($first, $last)=split(" ", $name);
$name=$last.", ". $first;
10 write SORT;
}
close(FH);
close(SORT);
(Output)
Blenheim, Steve 238-923-7366 11/12/56 20300.00
Boop, Betty 245-836-8357 6/23/23 14500.00
Chevsky, Igor 385-375-8395 6/18/68 23400.00
Corder, Norma 397-857-2735 3/28/45 245700.00
Cowan, Jennifer 548-834-2348 10/1/35 58900.00
DeLoach, Jon 408-253-3122 7/25/53 85100.00
Evich, Karen 284-758-2857 7/25/53 85100.00
Evich, Karen 284-758-2867 11/3/35 58200.00
Evich, Karen 284-758-2867 11/3/35 58200.00
Fardbarkle, Fred 674-843-1385 4/12/23 780900.00
Fardbarkle, Fred 674-843-1385 4/12/23 780900.00
Gortz, Lori 327-832-5728 10/2/65 35200.00
The datebook file is opened for reading. A pipe is created to send Perl's output to the system's "sort" utility. The sort will be ascending and by last name. Variables are created to hold the strings to represent the "picture line." The "x" operator specifies the number of times the string on its left will be repeated. Doing this allows you to change the width of a field easily as you test the way the report looks when displayed. The variable $format contains a string that will be used as the report format. The template will send its output to the SORT filehandle, a pipe created on line 2. These variables represent each field designator and its width. Note that the @ sign is preceded with a backslash to prevent Perl from trying to evaluate lines prematurely. These variables represent the actual field values that will be displayed based on the picture line defined above. These fields are also backslashed so as not to be evaluated too soon. The eval function evaluates the variable and replaces the results of the evaluation back into the program. When Perl interprets each line, it will see the current report format template as: format SORT=
@<<<<<<<<<<<<<<<<<<@<<<<<<<<<<<<@||||||||||@######.##
$name, $phone, $birth, $sal. Now we loop through the file one line at a time, splitting the name field to get the first and last names and then concatenating the first name to the last name in order to sort by last name. To write function involves the SORT template to cause each line to be written to the screen according to the report template on lines 6 and 7.
|