Previous Page Next Page

15.2. DBM Files

The Perl distribution comes with a set of database management library files called DBM, short for database management. The concept of DBM files stems from the early days of UNIX and consists of a set of C library routines that allow random access to its records. DBM database files are stored as key/value pairs, an associative array that is mapped into a disk file. There are a number of flavors of DBM support, and they demonstrate the most obvious reasons for using tied hashes.

DBM files are binary. They can handle very large databases. The nice thing about storing data with DBM functions is that the data is persistent; that is, any program can access the file as long as the DBM functions are used. The disadvantage is that complex data structures, indexes, multiple tables, and so forth are not supported, and there is no reliable file locking and buffer flushing, making concurrent reading and updating risky.[2] File locking can be done with the Perl flock function, but the strategy for doing this correctly is beyond the scope of this book.[3]

[2] Although Perl's tief unction will probably replace the dbmopen function, for now we'll use this function because it's easier than tie.

[3] For details on file locking, see Descartes, A., and Bunce, T., Programming the Perl DBI, O'Reilly & Associates, 2000, p. 35.

So that you don't have to figure out which of the standard DBM packages to use, the AnyDBM_File.pm module will get the appropriate package for your system from the standard set in the standard Perl library. The AnyDBM_File module is also useful if your program will run on multiple platforms. It will select the correct libraries for one of five different implementations:

Table 15.1. DBM Implementations
odbm"Old" DBM implementation found on UNIX systems and replaced by NDBM
ndbm"New"DBM implementation found on UNIX systems
sdbmStandard Perl DBM, provides cross-platform compatibility, but not good for large databases
gdbmGNU DBM, a fast, portable DBM implementation; see www.gnu.org
bsd-dbBerkeley DB; found on BSD UNIX systems, most powerful of all the DBMs; see www.sleepycat.com


The following table comes from the documentation for AnyDBM_FILE and lists some of the differences in the various DBM implementations. At your command-line prompt, type:

perldoc AnyDBM_File

                          odbm  ndbm    sdbm    gdbm    bsd-db
                          ----  ----    ----    ----    ------
  Linkage comes w/ perl   yes     yes     yes     yes     yes
  Src comes w/ perl       no      no      yes     no      no
  Comes w/ many unix os   yes     yes[0]  no      no      no
  Builds ok on !unix      ?       ?       yes     yes     ?
  Code Size               ?       ?       small   big     big
  Database Size           ?       ?       small   big?    ok[1]
  Speed                   ?       ?       slow    ok      fast
  FTPable                 no      no      yes     yes     yes
  Easy to build          N/A     N/A      yes     yes     ok[2]
  Size limits             1k      4k      1k[3]   none    none
  Byte-order independent  no      no      no      no      yes
  Licensing restrictions  ?       ?       no      yes     no

15.2.1. Creating and Assigning Data to a DBM File

Before a database can be accessed, it must be opened by using the dbmopen function or the tie function. This binds the DBM file to an associative array (hash). Two files will be created: one file contains an index directory and has .dir as its suffix; the second file, ending in .pag, contains all the data. The files are not in a readable format. The dbm functions are used to access the data. These functions are invisible to the user.

Data is assigned to the hash, just as with any Perl hash, and an element removed with Perl's delete function. The DBM file can be closed with the dbmclose or the untie function.

Format

dbmopen(hash, dbfilename, mode);
tie(hash, Module , dbfilename, flags, mode);

Example 15.5.

dbmopen(%myhash, "mydbmfile", 0666);
tie(%myhash,SDBM_File, "mydbmfile", O_RDWR|O_CREAT,0640);

 


Perl's report writing mechanism is very useful for generating formatted data from one of the DBM files. The following examples illustrate how to create, add, delete, and close a DBM file and how to create a Perl-style report.

Example 15.6.

 (The Script)
    #!/usr/bin/perl
     # Program name: makestates.pl
     # This program creates the database using the dbm functions
1   use AnyDBM_File;  # Let Perl pick the right dbm for your system
2   dbmopen(%states, "statedb", 0666[a]) || die;
                      # Create or open the database
3   TRY:  {
4       print "Enter the abbreviation for your state. ";
        chomp($abbrev=<STDIN>);
        $abbrev = uc $abbrev;  # Make sure abbreviation is uppercase
5       print "Enter the name of the state. ";
        chomp($state=<STDIN>);
        lc $state;
6       $states{$abbrev}="\u$state";  # Assign values to the database
7       print "Another entry? ";
        $answer = <STDIN>;
8       redo TRY  if $answer =~ /Y|y/;
    }
9   dbmclose(%states);       # Close the database
---------------------------------------------------------------------

(The Command line)
10  $ ls
    makestates.pl statedb.dir statedb.pag[b]
------------------------------------------------------------------
(Output)
4   Enter the abbreviation for your state. CA
5   Enter the name of the state. California
7   Another entry? y
    Enter the abbreviation for your state. me
    Enter the name of the state. Maine
    Another entry? y
    Enter the abbreviation for your state. NE
    Enter the name of the state. Nebraska
    Another entry? y
    Enter the abbreviation for your state. tx
    Enter the name of the state. Texas
    Another entry? n

					  

Explanation

  1. The AnyDBM_File module selects the proper DBM libraries for your particular installation.

  2. The dbmopen function binds a DBM file to a hash. In this case, the database file created is called statedb and the hash is called %states. If the database does not exist, a valid permission mode should be given. The octal mode given here is 0666, read and write for all, on UNIX-type systems.

  3. The labeled block is entered.

  4. The user is asked for input, the abbreviation of his state. This input will be used to fill the %states hash.

  5. The user is asked to enter the name of his state.

  6. The value state is assigned to the %states hash where the key is the abbreviation for the state. The \u escape sequence causes the first letter of the state to be uppercase. When this assignment is made, the DBM file will be assigned the new value through a tie mechanism that takes place behind the scenes.

  7. The user is asked to enter another entry into the DBM file.

  8. If the user wants to add another entry to the DBM file, the program will go to the top of the block labeled TRY and start over.

  9. The dbmclose function breaks the tie (by calling the untie function), binding the DBM file to the hash %states.

  10. The listing displays the files that were created with the dbmopen function. The first file, makestates.pl, is the Perl script. The second file, statedb.dir, is the index file, and the last file, statedb.pg, is the file that contains the hash data.

[a] Permissions are ignored on Win32 systems.

[b] On some versions, only one file with a .db extension is created.

 

15.2.2. Retrieving Data from a DBM File

Once the DBM file has been opened, it is associated with a tied hash in the Perl script. All details of the implementation are hidden from the user. Data retrieval is fast and easy. The user simply manipulates the hash as though it were any ordinary Perl hash. Since the hash is tied to the DBM file, when the data is retrieved, it is coming from the DBM file.

Example 15.7.

 (The Script)
     #!/bin/perl
     # Program name: getstates.pl
     # This program fetches the data from the database
     # and generates a report

1   use AnyDBM_File;
2   dbmopen(%states, "statedb", 0666);   # Open the database
3   @sortedkeys=sort keys %states;    # Sort the database by keys
4       foreach $key ( @sortedkeys ){
5       $value=$states{$key};
        $total++;
6       write;
    }
7   dbmclose(%states);    #Close the database
8   format STDOUT_TOP=
    Abbreviation     State
    ==============================
9   .
10  format STDOUT=
    @<<<<<<<<<<<<<<@<<<<<<<<<<<<<<<
    $key,          $value
    .
11  format SUMMARY=
    ==============================
    Number of states:@###
                    $total
    .
    $~=SUMMARY;
    write;

---------------------------------------------------------------------


(Output)
  Abbreviation    State
==============================
  AR              Arizona
  CA              California
  ME              Maine
  NE              Nebraska
  TX              Texas
  WA              Washington
==============================
        Number of states:   6

					  

Explanation

  1. The AnyDBM_File module selects the proper DBM libraries for your particular installation.

  2. The dbmopen function binds a DBM file to a hash. In this case, the database file opened is called statedb and the hash is called %states.

  3. Now that the DBM file has been opened, the user can access the key/value pairs. The sort function combined with the keys function will sort out the keys in the %states hash.

  4. The foreach loop iterates through the list of sorted keys.

  5. Each time through the loop, another value is retrieved from the %states hash, which is tied to the DBM file.

  6. After adding one to the $total variable (keeping track of how many entries are in the DBM file), the write function invokes the report templates to produce a formatted output.

  7. The DBM file is closed; that is, the hash %states is disassociated from the DBM file.

  8. This is the format template that will be used to put a header on the top of each page.

  9. The period ends the template definition.

  10. This is the format template for the body of each page printed to standard output. The picture line below is used to format the key/value pairs on the line below the picture.

  11. This is the format template that will be invoked at the bottom of the report.

 

15.2.3. Deleting Entries from a DBM File

To empty the completed DBM file, you can use the undef function; for example, undef %states would clear all entries in the DBM file created in Example 15.8. Deleting a key/value pair is done simply by using the Perl built-in delete function on the appropriate key within the hash that was tied to the DBM file.

Example 15.8.

(The  Script)
    #!/bin/perl
    # dbmopen is an older method of opening a dbm file but simpler
    # than using tie and the SDBM_File module provided
    # in the standard Perl library Program name: remstates.pl
1   use AnyDBM_File;
2   dbmopen(%states, "statedb", 0666) || die;
    TRY: {
        print "Enter the abbreviation for the state to remove. ";
        chomp($abbrev=<STDIN>);
        $abbrev = uc $abbrev;  # Make sure abbreviation is uppercase
3       delete $states{"$abbrev"};
        print "$abbrev removed.\n";
        print "Another entry? ";
        $answer = <STDIN>;
        redo TRY  if $answer =~ /Y|y/;  }
4   dbmclose(%states);

(Output)
5   $ remstates.pl
    Enter the abbreviation for the state to remove. TX
    TX removed.
    Another entry? n
6   $ getstates.pl
    Abbreviation     State
    ==============================
    AR             Arizona
    CA             California
    ME             Maine
    NE             Nebraska
    WA             Washington
    ==============================
    Number of states:   5

7   $ ls
    getstates.pl    makestates.pl   rmstates.pl     statedb.dir
statedb.pag

					  

Explanation

  1. The AnyDBM_File module selects the proper DBM libraries for your particular installation.

  2. The dbmopen function binds a DBM file to a hash. In this case, the database file opened is called statedb and the hash is called %states.

  3. Now that the DBM file has been opened, the user can access the key/value pairs. The delete function will remove the value associated with the specified key in the %states hash tied to the DBM file.

  4. The DBM file is closed; in other words, the hash %states is disassociated from the DBM file.

  5. The Perl script remstates.pl is executed to remove Texas from the DBM file.

  6. The Perl script getstates.pl is executed to display the data in the DBM file. Texas was removed.

  7. The listing shows the files that were created to produce these examples. The last two are the DBM files created by dbmopen.

Example 15.9.

1   use Fcntl;
2   use SDBM_File;
3   tie(%address, 'SDBM_File', 'email.dbm', O_RDWR|O_CREAT, 0644)
    || die $!;
4   print "The package the hash is tied to: ",ref tied %address,"\n";

5   print "Enter the email address.\n";
    chomp($email=<STDIN>);
6   print "Enter the first name of the addressee.\n";
    chomp($firstname=<STDIN>);
    $firstname = lc $firstname;
    $firstname = ucfirst $firstname;
7   $address{"$email"}=$firstname;
8   while( ($email, $firstname)=each(%address)){
        print  "$email, $firstname\n";
    }
9   untie %address;

Explanation

  1. The file control module is used to perform necessary tasks on the DBM files. It is where the O_RDWR and O_CREAT flags are defined, flags needed to set up the DBM file.

  2. The SDBM_File module is used. It is the DBM implementation that comes with standard Perl and works across platforms.

  3. Instead of using the dbmopen function to create or access a DBM file, this example uses the tie function. The hash %address is tied to the package SDBM_File. The DBM file is called email.dbm. If the database doesn't exist, the O_CREATE flag will cause it to be created with read/write permissions (O_RDWR).

  4. The tied function returns true if it was successful, and the ref function returns the name of the package where the hash is tied.

  5. In this example, an e-mail address is used as the key in %address hash, and the value associated with the key will be the first name of the user. Since the keys are always unique, this mechanism will prevent storing duplicate e-mail addresses. The user is asked for input.

  6. The value for the key is requested from the user.

  7. Here the database is assigned a new entry. The key/value pair is assigned and stored in the DBM file.

  8. The each function will pull out both the key and value from the hash %address, which is tied to the DBM file. The contents of the DBM file are displayed.

  9. The untie function disassociates the hash from the DBM file.

Previous Page Next Page