15.2. DBM Files
The Perl distribution comes with a set of database management library files called DBM, short for database management. The concept of DBM files stems from the early days of UNIX and consists of a set of C library routines that allow random access to its records. DBM database files are stored as key/value pairs, an associative array that is mapped into a disk file. There are a number of flavors of DBM support, and they demonstrate the most obvious reasons for using tied hashes.
DBM files are binary. They can handle very large databases. The nice thing about storing data with DBM functions is that the data is persistent; that is, any program can access the file as long as the DBM functions are used. The disadvantage is that complex data structures, indexes, multiple tables, and so forth are not supported, and there is no reliable file locking and buffer flushing, making concurrent reading and updating risky. File locking can be done with the Perl flock function, but the strategy for doing this correctly is beyond the scope of this book.
So that you don't have to figure out which of the standard DBM packages to use, the AnyDBM_File.pm module will get the appropriate package for your system from the standard set in the standard Perl library. The AnyDBM_File module is also useful if your program will run on multiple platforms. It will select the correct libraries for one of five different implementations:
Table 15.1. DBM Implementations
odbm | "Old" DBM implementation found on UNIX systems and replaced by NDBM |
ndbm | "New"DBM implementation found on UNIX systems |
sdbm | Standard Perl DBM, provides cross-platform compatibility, but not good for large databases |
gdbm | GNU DBM, a fast, portable DBM implementation; see www.gnu.org |
bsd-db | Berkeley DB; found on BSD UNIX systems, most powerful of all the DBMs; see www.sleepycat.com |
The following table comes from the documentation for AnyDBM_FILE and lists some of the differences in the various DBM implementations. At your command-line prompt, type:
perldoc AnyDBM_File
odbm ndbm sdbm gdbm bsd-db
---- ---- ---- ---- ------
Linkage comes w/ perl yes yes yes yes yes
Src comes w/ perl no no yes no no
Comes w/ many unix os yes yes[0] no no no
Builds ok on !unix ? ? yes yes ?
Code Size ? ? small big big
Database Size ? ? small big? ok[1]
Speed ? ? slow ok fast
FTPable no no yes yes yes
Easy to build N/A N/A yes yes ok[2]
Size limits 1k 4k 1k[3] none none
Byte-order independent no no no no yes
Licensing restrictions ? ? no yes no
15.2.1. Creating and Assigning Data to a DBM File
Before a database can be accessed, it must be opened by using the dbmopen function or the tie function. This binds the DBM file to an associative array (hash). Two files will be created: one file contains an index directory and has .dir as its suffix; the second file, ending in .pag, contains all the data. The files are not in a readable format. The dbm functions are used to access the data. These functions are invisible to the user.
Data is assigned to the hash, just as with any Perl hash, and an element removed with Perl's delete function. The DBM file can be closed with the dbmclose or the untie function.
dbmopen(hash, dbfilename, mode);
tie(hash, Module , dbfilename, flags, mode);
Example 15.5.
dbmopen(%myhash, "mydbmfile", 0666);
tie(%myhash,SDBM_File, "mydbmfile", O_RDWR|O_CREAT,0640);
|
|
Perl's report writing mechanism is very useful for generating formatted data from one of the DBM files. The following examples illustrate how to create, add, delete, and close a DBM file and how to create a Perl-style report.
Example 15.6.
Code View: (The Script)
#!/usr/bin/perl
# Program name: makestates.pl
# This program creates the database using the dbm functions
1 use AnyDBM_File; # Let Perl pick the right dbm for your system
2 dbmopen(%states, "statedb", 0666) || die;
# Create or open the database
3 TRY: {
4 print "Enter the abbreviation for your state. ";
chomp($abbrev=<STDIN>);
$abbrev = uc $abbrev; # Make sure abbreviation is uppercase
5 print "Enter the name of the state. ";
chomp($state=<STDIN>);
lc $state;
6 $states{$abbrev}="\u$state"; # Assign values to the database
7 print "Another entry? ";
$answer = <STDIN>;
8 redo TRY if $answer =~ /Y|y/;
}
9 dbmclose(%states); # Close the database
---------------------------------------------------------------------
(The Command line)
10 $ ls
makestates.pl statedb.dir statedb.pag
------------------------------------------------------------------
(Output)
4 Enter the abbreviation for your state. CA
5 Enter the name of the state. California
7 Another entry? y
Enter the abbreviation for your state. me
Enter the name of the state. Maine
Another entry? y
Enter the abbreviation for your state. NE
Enter the name of the state. Nebraska
Another entry? y
Enter the abbreviation for your state. tx
Enter the name of the state. Texas
Another entry? n
The AnyDBM_File module selects the proper DBM libraries for your particular installation. The dbmopen function binds a DBM file to a hash. In this case, the database file created is called statedb and the hash is called %states. If the database does not exist, a valid permission mode should be given. The octal mode given here is 0666, read and write for all, on UNIX-type systems. The labeled block is entered. The user is asked for input, the abbreviation of his state. This input will be used to fill the %states hash. The user is asked to enter the name of his state. The value state is assigned to the %states hash where the key is the abbreviation for the state. The \u escape sequence causes the first letter of the state to be uppercase. When this assignment is made, the DBM file will be assigned the new value through a tie mechanism that takes place behind the scenes. The user is asked to enter another entry into the DBM file. If the user wants to add another entry to the DBM file, the program will go to the top of the block labeled TRY and start over. The dbmclose function breaks the tie (by calling the untie function), binding the DBM file to the hash %states. The listing displays the files that were created with the dbmopen function. The first file, makestates.pl, is the Perl script. The second file, statedb.dir, is the index file, and the last file, statedb.pg, is the file that contains the hash data.
|
15.2.2. Retrieving Data from a DBM File
Once the DBM file has been opened, it is associated with a tied hash in the Perl script. All details of the implementation are hidden from the user. Data retrieval is fast and easy. The user simply manipulates the hash as though it were any ordinary Perl hash. Since the hash is tied to the DBM file, when the data is retrieved, it is coming from the DBM file.
Example 15.7.
Code View: (The Script)
#!/bin/perl
# Program name: getstates.pl
# This program fetches the data from the database
# and generates a report
1 use AnyDBM_File;
2 dbmopen(%states, "statedb", 0666); # Open the database
3 @sortedkeys=sort keys %states; # Sort the database by keys
4 foreach $key ( @sortedkeys ){
5 $value=$states{$key};
$total++;
6 write;
}
7 dbmclose(%states); #Close the database
8 format STDOUT_TOP=
Abbreviation State
==============================
9 .
10 format STDOUT=
@<<<<<<<<<<<<<<@<<<<<<<<<<<<<<<
$key, $value
.
11 format SUMMARY=
==============================
Number of states:@###
$total
.
$~=SUMMARY;
write;
---------------------------------------------------------------------
(Output)
Abbreviation State
==============================
AR Arizona
CA California
ME Maine
NE Nebraska
TX Texas
WA Washington
==============================
Number of states: 6
The AnyDBM_File module selects the proper DBM libraries for your particular installation. The dbmopen function binds a DBM file to a hash. In this case, the database file opened is called statedb and the hash is called %states. Now that the DBM file has been opened, the user can access the key/value pairs. The sort function combined with the keys function will sort out the keys in the %states hash. The foreach loop iterates through the list of sorted keys. Each time through the loop, another value is retrieved from the %states hash, which is tied to the DBM file. After adding one to the $total variable (keeping track of how many entries are in the DBM file), the write function invokes the report templates to produce a formatted output. The DBM file is closed; that is, the hash %states is disassociated from the DBM file. This is the format template that will be used to put a header on the top of each page. The period ends the template definition. This is the format template for the body of each page printed to standard output. The picture line below is used to format the key/value pairs on the line below the picture. This is the format template that will be invoked at the bottom of the report.
|
15.2.3. Deleting Entries from a DBM File
To empty the completed DBM file, you can use the undef function; for example, undef %states would clear all entries in the DBM file created in Example 15.8. Deleting a key/value pair is done simply by using the Perl built-in delete function on the appropriate key within the hash that was tied to the DBM file.
Example 15.8.
Code View: (The Script)
#!/bin/perl
# dbmopen is an older method of opening a dbm file but simpler
# than using tie and the SDBM_File module provided
# in the standard Perl library Program name: remstates.pl
1 use AnyDBM_File;
2 dbmopen(%states, "statedb", 0666) || die;
TRY: {
print "Enter the abbreviation for the state to remove. ";
chomp($abbrev=<STDIN>);
$abbrev = uc $abbrev; # Make sure abbreviation is uppercase
3 delete $states{"$abbrev"};
print "$abbrev removed.\n";
print "Another entry? ";
$answer = <STDIN>;
redo TRY if $answer =~ /Y|y/; }
4 dbmclose(%states);
(Output)
5 $ remstates.pl
Enter the abbreviation for the state to remove. TX
TX removed.
Another entry? n
6 $ getstates.pl
Abbreviation State
==============================
AR Arizona
CA California
ME Maine
NE Nebraska
WA Washington
==============================
Number of states: 5
7 $ ls
getstates.pl makestates.pl rmstates.pl statedb.dir
statedb.pag
The AnyDBM_File module selects the proper DBM libraries for your particular installation. The dbmopen function binds a DBM file to a hash. In this case, the database file opened is called statedb and the hash is called %states. Now that the DBM file has been opened, the user can access the key/value pairs. The delete function will remove the value associated with the specified key in the %states hash tied to the DBM file. The DBM file is closed; in other words, the hash %states is disassociated from the DBM file. The Perl script remstates.pl is executed to remove Texas from the DBM file. The Perl script getstates.pl is executed to display the data in the DBM file. Texas was removed. The listing shows the files that were created to produce these examples. The last two are the DBM files created by dbmopen.
|
Example 15.9.
1 use Fcntl;
2 use SDBM_File;
3 tie(%address, 'SDBM_File', 'email.dbm', O_RDWR|O_CREAT, 0644)
|| die $!;
4 print "The package the hash is tied to: ",ref tied %address,"\n";
5 print "Enter the email address.\n";
chomp($email=<STDIN>);
6 print "Enter the first name of the addressee.\n";
chomp($firstname=<STDIN>);
$firstname = lc $firstname;
$firstname = ucfirst $firstname;
7 $address{"$email"}=$firstname;
8 while( ($email, $firstname)=each(%address)){
print "$email, $firstname\n";
}
9 untie %address;
The file control module is used to perform necessary tasks on the DBM files. It is where the O_RDWR and O_CREAT flags are defined, flags needed to set up the DBM file. The SDBM_File module is used. It is the DBM implementation that comes with standard Perl and works across platforms. Instead of using the dbmopen function to create or access a DBM file, this example uses the tie function. The hash %address is tied to the package SDBM_File. The DBM file is called email.dbm. If the database doesn't exist, the O_CREATE flag will cause it to be created with read/write permissions (O_RDWR). The tied function returns true if it was successful, and the ref function returns the name of the package where the hash is tied. In this example, an e-mail address is used as the key in %address hash, and the value associated with the key will be the first name of the user. Since the keys are always unique, this mechanism will prevent storing duplicate e-mail addresses. The user is asked for input. The value for the key is requested from the user. Here the database is assigned a new entry. The key/value pair is assigned and stored in the DBM file. The each function will pull out both the key and value from the hash %address, which is tied to the DBM file. The contents of the DBM file are displayed. The untie function disassociates the hash from the DBM file.
|