ch18lev1sec1.html

Chapter 18. Interfacing with the System

18.1. System Calls

Those migrating from shell (or batch) programming to Perl often expect that a Perl script is like a shell script—just a sequence of UNIX/Linux (or MS-DOS) commands. However, system utilities are not accessed directly in Perl programs as they are in shell scripts. Of course, to be effective there must be some way in which your Perl program can interface with the operating system. Perl has a set of functions, in fact, that specifically interface with the operating system and are directly related to the UNIX/Linux system calls so often found in C programs. Many of these system calls are supported by Windows. The ones that are generally not supported are found at the end of this chapter.

A system call requests some service from the operating system (kernel), such as getting the time of day, creating a new directory, removing a file, creating a new process, terminating a process, and so on. A major group of system calls deals with the creation and termination of processes, how memory is allocated and released, and sending information (e.g., signals) to processes. Another function of system calls is related to the file system: file creation, reading and writing files, creating and removing directories, creating links, etc.^[1]

^[1] System calls are direct entries into the kernel, whereas library calls are functions that invoke system calls. Perl's system interface functions are named after their counterpart UNIX system calls in Section 2 of the UNIX manual pages.

The UNIX^[2] system calls are documented in Section 2 of the UNIX manual pages. Perl's system functions are almost identical in syntax and implementation. If a system call fails, it returns a –1 and sets the system's global variable errno to a value that contains the reason the error occurred. C programs use the perror function to obtain system errors stored in errno; Perl programs use the special $! variable. (See "Error Handling" on page 755.)

^[2] From now on when referring to UNIX, assume that Linux also applies.

The following Perl functions allow you to perform a variety of calls to the system when you need to manipulate or obtain information about files or processes. If the system call you need is not provided by Perl, you can use Perl's syscall function, which takes a UNIX system call as an argument. (See"The syscall Function and the h2ph Script" on page 747.)

In addition to the built-in functions, the standard Perl library comes bundled with a variety of over 200 modules that can be used to perform portable operations on files, directories, processes, networks, etc. If you installed ActiveState, you will also find a collection of Win32 modules in the standard Perl library under C:\perl\site\lib\Win32.

To read the documentation for any of the modules (filenames with a .pm extension) from the standard Perl library, use the Perl built-in perldoc function or the UNIX man command. ActiveState (Win32) provides online documentation found by clicking the Start button, Programs, and then ActiveState.

Example 18.1.

(At the command line) 1 $ perldoc Copy.pm

Explanation

The perldoc function takes a module name as its argument (with or without the .pm extension). The documentation for the module will then be displayed in a window (Notepad on Win32 platforms). This example displays part of the documentation for the Copy.pm module found in the standard Perl library.

Figure 18.1. perldoc and the Copy.pm module.

[View full size image]

18.1.1. Directories and Files

When walking through a file system, directories are separated by slashes. UNIX file systems indicate the root directory with a forward slash (/), followed by subdirectories separated by forward slashes where, if a filename is specified, it is the final component of the path. The names of the files and directories are case sensitive, and their names consist of alphanumeric characters and punctuation, excluding whitespace. A period in a filename has no special meaning but can be used to separate the base filename from its extension, such as in program.c or file.bak. The length of the filename varies from different operating systems, with a minimum of 1 character, and on most UNIX-type file systems, up to 255 characters are allowed. Only the root directory can be named / (slash).^[3]

^[3] The Mac OS file system (HFS) is also hierarchical and uses colons to separate path components.

Win32 file systems, mainly FAT, FAT32, and NTFS, use a different convention for specifying a directory path. Basic FAT directories and files are separated by a backslash (\). Their names are case insensitive and start with a limit of 8 characters, followed by a period, and a suffix of no more than 3 characters. (Windows 2000/NT allow longer filenames.) The root of the file system is a drive number, such as C:\ or D:\, rather than only a slash. In networked environments, the universal naming convention (UNC) uses a different convention for separating the components of a path; the drive letter is replaced with two backslashes, as in \\myserver\dir\dir.

Backslash Issues

The backslash in Perl scripts is used as an escape or quoting character (\n, \t,\U, \$500, etc.), so when specifying a Win32 path separator, two backslashes are often needed, unless a particular module allows a single backslash or the pathname is surrounded by single quote. For example, C:\Perl\lib\File should be written C:\\Perl\\lib\\File.

The File::Spec Module

The File::Spec module found in the standard Perl library was designed to portably support operations commonly performed on filenames, such as creating a single path out of a list of path components and applying the correct path delimiter for the appropriate operating system or splitting up the path into volume, directory, and filename, etc. A list of File::Spec functions is provided in Table 18.1.

Table 18.1. File::Spec Functions
Function What It Does
abs2rel Takes a destination path and an optional base path and returns a relative path from the base path to the destination path.
canonpath No physical check on the file system but a logical cleanup of a path. On UNIX, eliminates successive slashes and successive "/.".
case_tolerant Returns a true or false value indicating, respectively, that alphabetic case is or is not significant when comparing file specifications.
catdir Concatenates two or more directory names to form a complete path ending with a directory and removes the trailing slash from the resulting string.
catfile Concatenates one or more directory names and a filename to form a complete path ending with a filename.
catpath Takes volume, directory, and file portions and returns an entire path. In UNIX, $volume is ignored, and directory and file are catenated. A "/" is inserted if necessary.
curdir Returns a string representation of the current directory. "." on UNIX.
devnull Returns a string representation of the null device. "/dev/null" on UNIX.
file_name_is_absolute Takes as argument a path and returns true if it is an absolute path.
join join is the same as catfile.
no_upwards Given a list of filenames, strips out those that refer to a parent directory.
path Takes no argument, returns the environment variable PATH as an array.
rel2abs Converts a relative path to an absolute path.
rootdir Returns a string representation of the root directory. "/" on UNIX.
splitpath Splits a path into volume, directory, and filename portions. On systems with no concept of volume, returns undef for volume.
tmpdir Returns a string representation of the first writable directory from the following list or "" if none is writable.
updir Returns a string representation of the parent directory. ".." on UNIX.

Table 18.1. File::Spec Functions
Function	What It Does
abs2rel	Takes a destination path and an optional base path and returns a relative path from the base path to the destination path.
canonpath	No physical check on the file system but a logical cleanup of a path. On UNIX, eliminates successive slashes and successive "/.".
case_tolerant	Returns a true or false value indicating, respectively, that alphabetic case is or is not significant when comparing file specifications.
catdir	Concatenates two or more directory names to form a complete path ending with a directory and removes the trailing slash from the resulting string.
catfile	Concatenates one or more directory names and a filename to form a complete path ending with a filename.
catpath	Takes volume, directory, and file portions and returns an entire path. In UNIX, $volume is ignored, and directory and file are catenated. A "/" is inserted if necessary.
curdir	Returns a string representation of the current directory. "." on UNIX.
devnull	Returns a string representation of the null device. "/dev/null" on UNIX.
file_name_is_absolute	Takes as argument a path and returns true if it is an absolute path.
join	join is the same as catfile.
no_upwards	Given a list of filenames, strips out those that refer to a parent directory.
path	Takes no argument, returns the environment variable PATH as an array.
rel2abs	Converts a relative path to an absolute path.
rootdir	Returns a string representation of the root directory. "/" on UNIX.
splitpath	Splits a path into volume, directory, and filename portions. On systems with no concept of volume, returns undef for volume.
tmpdir	Returns a string representation of the first writable directory from the following list or "" if none is writable.
updir	Returns a string representation of the parent directory. ".." on UNIX.

Since these functions are different for most operating systems, each set of OS-specific routines is available in a separate module, including:

File::Spec::UNIX
File::Spec::Mac
File::Spec::OS2
File::Spec::Win32
File::Spec::VMS

Example 18.2.

1 use File::Spec; 2 $pathname=File::Spec->catfile("C:","Perl","lib","CGI"); 3 print "$pathname\n"; (Output) 3 C:\Perl\lib\CGI

Explanation

If the operating system is not specified, the File::Spec module is loaded for the current operating system, in this case Windows 2000. It is an object-oriented module but has a function-oriented syntax as well.
A scalar, $pathname, will contain a path consisting of the arguments passed to the catfile method. The catfile function will concatenate the list of path elements.
The new path is printed with backslashes separating the path components. On UNIX systems, the path would be printed /Perl/lib/CGI.

18.1.2. Directory and File Attributes

UNIX

The most common type of file is a regular file. It contains data, an ordered sequence of bytes. The data can be text data or binary data. Information about the file is stored in a system data structure called an inode. The information in the inode consists of such attributes as the link count, the owner, the group, mode, size, last access time, last modification time, and type. The UNIX ls command lets you see the inode information for the files in your directory. This information is retrieved by the stat system call. Perl's stat function also gives you information about the file. It retrieves the device number, inode number, mode, link count, user ID, group ID, size in bytes, time of last access, and so on. (See "The stat and lstat Functions" on page 710.)

A directory is a specific file maintained by the UNIX kernel. It is composed of a list of filenames. Each filename has a corresponding number that points to the information about the file. The number, called an inode number, is a pointer to an inode. The inode contains information about the file as well as a pointer to the location of the file's data blocks on disk. The following functions allow you to manipulate directories, change permissions on files, create links, etc.

Directory Entry
Inode # Filename

Directory Entry
Inode #	Filename

Windows

Files and directories contain data as well as metainformation that describes attributes of a file or directory. The four basic attributes of Win32 files and directories are ARCHIVE, HIDDEN, READONLY, and SYSTEM. See Table 18.2.

Table 18.2. Basic File and Directory Attributes
Attribute Description
ARCHIVE Set when file content changes
HIDDEN A file not shown in a directory listing
READONLY A file that cannot be changed
SYSTEM Special system files, such as IO.SYS and MS-DOS.SYS, normally invisible

Table 18.2. Basic File and Directory Attributes
Attribute	Description
ARCHIVE	Set when file content changes
HIDDEN	A file not shown in a directory listing
READONLY	A file that cannot be changed
SYSTEM	Special system files, such as IO.SYS and MS-DOS.SYS, normally invisible

To retrieve and set file attributes, use the standard Perl extension Win32::File. All of the functions return FALSE (0) if they fail, unless otherwise noted. The function names are exported into the caller's namespace by request. See Table 18.3.

Table 18.3. Win32::File Functions
Function What It Does
GetAttributes(Filename, ReturnedAttributes) Gets attributes of a file or directory. ReturnedAttributes will be set to the ored combination of the filename attributes.
SetAttributes(Filename, NewAttributes) Sets the attributes of a file or directory. newAttributes must be an ored combination of the attributes.

Table 18.3. Win32::File Functions
Function	What It Does
GetAttributes(Filename, ReturnedAttributes)	Gets attributes of a file or directory. ReturnedAttributes will be set to the ored combination of the filename attributes.
SetAttributes(Filename, NewAttributes)	Sets the attributes of a file or directory. newAttributes must be an ored combination of the attributes.

To retrieve file attributes, use Win32::File::GetAttributes($Path, $Attributes), and to set file attributes, use Win32::File::SetAttributes($Path,$Attributes). See Table 18.4. The Win32::File also provides a number of constants; see Example 18.3.

Table 18.4. Win32::File Attributes
Attribute Description
ARCHIVE Set when file content changes. Used by backup programs.
COMPRESSED Windows compressed file, not a zip file. Cannot be set by the user.
DIRECTORY File is a directory. Cannot be set by the user.
HIDDEN A file not shown in a directory listing.
NORMAL A normal file. ARCHIVE, HIDDEN, READONLY, and SYSTEM are not set.
OFFLINE Data is not available.
READONLY A file that cannot be changed.
SYSTEM Special system files, such as IO.SYS and MS-DOS.SYS, normally invisible.
TEMPORARY File created by some program.

Table 18.4. Win32::File Attributes
Attribute	Description
ARCHIVE	Set when file content changes. Used by backup programs.
COMPRESSED	Windows compressed file, not a zip file. Cannot be set by the user.
DIRECTORY	File is a directory. Cannot be set by the user.
HIDDEN	A file not shown in a directory listing.
NORMAL	A normal file. ARCHIVE, HIDDEN, READONLY, and SYSTEM are not set.
OFFLINE	Data is not available.
READONLY	A file that cannot be changed.
SYSTEM	Special system files, such as IO.SYS and MS-DOS.SYS, normally invisible.
TEMPORARY	File created by some program.

Example 18.3.

Code View:
1 use Win32::File; 2 $File='C:\Drivers'; 3 Win32::File::GetAttributes($File, $attr) or die; 4 print "The attribute value returned is: $attr.\n"; 5 if ( $attr ){ 6 if ($attr & READONLY){ print "File is readonly.\n"; } if ($attr & ARCHIVE){ print "File is archive.\n"; } if ($attr & HIDDEN){ print "File is hidden.\n"; } if ($attr & SYSTEM){ print "File is a system file.\n"; } if ($attr & COMPRESSED){ print "File is compressed.\n"; } if ($attr & DIRECTORY){ print "File is a directory.\n"; } if ($attrib & NORMAL){ print "File is normal.\n"; } if ($attrib & OFFLINE){ print "File is normal.\n"; } if ($attrib & TEMPORARY){ print "File is temporary.\n"; } } else{ 7 print Win32::FormatMessage(Win32::GetLastError),"\n"; } (Output) 4 The attribute value returned is 18. File is hidden. File is a directory.

Explanation

The Win32::File module is loaded.
The folder Drivers on the C:\ drive is assigned to $File.
The GetAttributes function is called with two arguments: the first is the name of the file, and the second is the bitwise ored value of the attribute constants, READONLY, HIDDEN, etc. This value is filled in by the function GetAttributes. Note the Get-Attributes function is called with a fully qualified package name. That is because it is listed in @EXPORT_OK in the Win32::File module and must be either specifically requested by the user or given a fully qualified name. If specifically requested, all of the constants would have to be listed as well or they will not be switched to the user's namespace.
The value of the ored attributes is printed. If the value is 0, something is wrong, and an error will be formatted and printed from line 7.
If one of the attributes for a file or directory is present, the following tests will show which ones were returned describing the file or directory.
By bitwise logically anding the value of $attr with the value of a constant (in this case, READONLY), if the resulting value is true (nonzero), the file is read-only.
This function will produce a human-readable error message coming from the last error reported by Windows.

18.1.3. Finding Directories and Files

The File::Find module lets you traverse a file system tree for specified files or directories based on some criteria, like the UNIX find command or the Perl find2perl translator.

Format

use File::Find; find(\&wanted, '/dir1', '/dir2'); sub wanted { ... }

The first argument to find() is either a hash reference describing the operations to be performed for each file or a reference to a subroutine. Type perldoc File::Find for details. The wanted() function does whatever verification you want for the file. $File::Find::dir contains the current directory name, and $_ is assigned the current filename within that directory. $File::Find::name contains the complete pathname to the file. You are chdir()ed to $File::Find::dir when the function is called, unless no_chdir was specified. The first argument to find() is either a hash reference describing the operations to be performed for each file or a code reference.

Table 18.5. Hash Reference Keys for Find::File
Key Value
bydepth Reports directory name after all entries have been reported.
follow Follows symbolic links.
follow_fast Similar to follow but may report files more than once.
follow_skip Processes files (but not directories and symbolic links) only once.
no_chdir Doesn't chdir to each directory as it recurses.
untaint If -T (taint mode) is turned on, won't cd to directories that are tainted.
untaint_pattern This should be set using the qr quoting operator. The default is set to qr|^([-+@\w./]+)$|.
untaint_skip If set, directories (subtrees) that fail the *untaint_pattern* are skipped. The default is to die in such a case.
wanted Used to call the wanted function.

Table 18.5. Hash Reference Keys for Find::File
Key	Value
bydepth	Reports directory name after all entries have been reported.
follow	Follows symbolic links.
follow_fast	Similar to follow but may report files more than once.
follow_skip	Processes files (but not directories and symbolic links) only once.
no_chdir	Doesn't chdir to each directory as it recurses.
untaint	If -T (taint mode) is turned on, won't cd to directories that are tainted.
untaint_pattern	This should be set using the qr quoting operator. The default is set to qr\|^([-+@\w./]+)$\|.
untaint_skip	If set, directories (subtrees) that fail the untaint_pattern are skipped. The default is to die in such a case.
wanted	Used to call the wanted function.

Example 18.4.

Code View:
(UNIX) 1 use File::Find; 2 find(\&wanted, '/httpd', '/ellie/testing' ); 3 sub wanted{ -d $_ && print "$File::Find::name\n"; } (Output) /httpd /httpd/php /httpd/Icons /httpd/Cgi-Win /httpd/HtDocs /httpd/HtDocs/docs /httpd/HtDocs/docs/images /httpd/Cgi-Bin /httpd/Logs /ellie/testing /ellie/testing/Exten.dir /ellie/testing/extension /ellie/testing/mailstuff /ellie/testing/mailstuff/mailstuff /ellie/testing/OBJECTS /ellie/testing/OBJECTS/polymorph

Explanation

The File::Find module is loaded from the standard Perl library.
The first argument to find() is a reference to a subroutine called wanted followed by two directories to be found.
The wanted function will check that each name is a directory (-d) and list the full pathname of all subdirectories found. $_ is assigned the name of the current directory in the search.

Example 18.5.

Code View:
(Windows) 1 use File::Find; 2 use Win32::File; # Works on both FAT and NTFS file systems. 3 &File::Find::find(\&wanted,"C:\\httpd", "C:\\ellie\\testing"); 4 sub wanted{ 5 (Win32::File::GetAttributes($_,$attr)) && ($attr & DIRECTORY) && print "$File::Find::name\n"; } (Output) C:\httpd C:\httpd/php C:\httpd/Icons C:\httpd/Cgi-Win C:\httpd/HtDocs C:\httpd/HtDocs/docs C:\httpd/HtDocs/docs/images C:\httpd/Cgi-Bin C:\httpd/Logs C:\ellie\testing C:\ellie\testing/Exten.dir C:\ellie\testing/extension C:\ellie\testing/mailstuff C:\ellie\testing/mailstuff/mailstuff C:\ellie\testing/OBJECTS C:\ellie\testing/OBJECTS/polymorph

Explanation

The File::Find module is loaded from the standard Perl library.
The Win32::File module is loaded from the standard Perl library, from the site-specific directory for Win32 systems. It will be used to retrieve file or directory attributes.
The first argument to find() is a reference to a subroutine called wanted followed by two directories to be found.
The wanted function is defined.
The wanted function will check that each name is a directory by calling the Get-Attributes function (Win32::File::GetAttributes) and will list the full pathname of all subdirectories found. $_ is assigned the name of the current directory in the search.

18.1.4. Creating a Directory—The mkdir Function

UNIX

The mkdir function creates a new, empty directory with the specified permissions (mode). The permissions are set as an octal number. The entries for the . and .. directories are automatically created. The mkdir function returns 1 if successful and 0 if not. If mkdir fails, the system error is stored in Perl's $! variable.

Windows

If creating a directory at the MS-DOS prompt, the permission mask has no effect. Permissions on Win32 don't use the same mechanism as UNIX. For files on FAT partitions (which means all files on Windows 95), you don't have to set permissions explicitly on a file. All files are available to all users, and the directory is created with all permissions turned on for everyone.

Format

mkdir(FILENAME, MODE); (UNIX) mkdir(FILENAME); (Windows)

Example 18.6.

(The Command Line) 1 $ perl -e 'mkdir("joker", 0755);' # UNIX 2 $ ls -ld joker drwxr-xr-x 2 ellie 512 Mar 7 13:43 joker 3 $ perl -e "mkdir(joker);" # Windows

Explanation

The first argument to the mkdir function is the name of the directory. The second argument specifies the mode, or permissions, of the file. The permissions, 0755, specify that the file will have read, write, and execute permission for the owner; read and execute for the group; and read and execute for the others. (Remember that without execute permission, you cannot access a directory.)
The ls -ld command prints a long listing of the directory file with information about the file, the inode information. The leading d is for directory, and the permissions are rwxr-xr-x.
On Win32 systems, the directory is created with all permissions turned on for everyone.

Example 18.7.

# This script is called "makeit" 1 die "$0 <directory name> " unless $#ARGV == 0; 2 mkdir ($ARGV[0], 0755 ) || die "mkdir: $ARGV[0]: $!\n"; (At The Command Line) $ makeit 1 makeit <directory name> at makeit line 3. $ makeit joker 2 makeit: joker: File exists $ makeit cabinet $ ls -d cabinet cabinet

Explanation

If the user doesn't provide a directory name as an argument to the script, the die function prints an error message and the script exits.
Unless the directory already exists, it will be created.

18.1.5. Removing a Directory—The rmdir Function

The rmdir function removes a directory but only if it is empty.

Format

rmdir(DIRECTORY); rmdir DIRECTORY;

Example 18.8.

(At the Command Line) 1 $ perl -e 'rmdir("joke") || die qq(joke: $!\n)' # UNIX joke: Directory not empty 2 $ perl -e 'rmdir("joker") || die qq(joker: $!\n)' joker: No such file or directory 3 $ perl -e "rmdir(joker) || die qq(joker: $!\n);" # Windows joker: No such file or directory

Explanation

The directory joke contains files. It cannot be removed unless it is empty. The $! variable contains the system error Directory not empty.
The directory joker does not exist; therefore, it cannot be removed. The system error is stored in $!.
On Win32 systems, rmdir works the same way. You just have watch the quotes if you are doing this at the MS-DOS prompt. The directory joker is not removed, because it doesn't exist.

18.1.6. Changing Directories—The chdir Function

Each process has its own present working directory. When resolving relative path references, this is the starting place for the search path. If the calling process (e.g., your Perl script) changes the directory, it is changed only for that process, not the process that invoked it, normally the shell. When the Perl program exits, the shell returns with the same working directory it started with.

The chdir function changes the current working directory. Without an argument, the directory is changed to the user's home directory. The function returns 1 if successful and 0 if not. The system error code is stored in Perl's $! variable.^[4]

^[4] chdir is a system call provided with Perl for changing directories. The cd command used at the command line is a shell built-in and cannot be used directly in a Perl script.

Format

chdir (EXPR); chdir EXPR; chdir;

Example 18.9.

1 $ pwd # UNIX /home/jody/ellie 2 $ perl -e 'chdir "/home/jody/ellie/perl"; print 'pwd'' /home/jody/ellie/perl 3 $ pwd /home/jody/ellie 4 $ perl -e 'chdir " fooler" || die "Cannot cd to fooler: $!\n"' Cannot cd to fooler: No such file or directory 5 $ cd # Windows C:\ellie\testing 6 $ perl -e "chdir fooler || die qq(Cannot to fooler: $!\n);" Cannot cd to fooler: No such file or directory

Explanation

This is the present working directory for the shell.
The directory is changed to /home/jody/ellie/perl. When the pwd command is enclosed in backquotes, command substitution is performed, and the present working directory for this process is printed.
Since the Perl program is a separate process invoked by the shell, when Perl changes the present working directory, the directory is changed only while the Perl process is in execution. When Perl exits, the shell returns and its directory is unchanged.
If the attempt to change the directory fails, the die function prints its message to the screen. The system error is stored in the $! variable and then printed.
The present working directory is printed at the MS-DOS prompt. cd prints the present working directory. (At the UNIX prompt, it is used to change directories.)
The attempt to change directory failed as in the preceding UNIX example. If the directory had existed, the present working directory would be changed.

18.1.7. Accessing a Directory via the Directory Filehandle

The following Perl directory functions are modeled after the UNIX system calls sharing the same name. Although the traditional UNIX directory contained a 2-byte inode number and a 14-byte filename (Figure 18.2 ), not all UNIX systems have the same format. The directory functions allow you to access the directory regardless of its internal structure. The directory functions work the same way with Windows.

Figure 18.2. A UNIX directory.

The opendir Function

The opendir function opens a named directory and attaches it to the directory filehandle. This filehandle has its own namespace, separate from the other types of filehandles used for opening files and filters. The opendir function initializes the directory for processing by the related functions readdir(), telldir(), seekdir(), rewinddir(), and closedir(). The function returns 1 if successful.

Format

opendir(DIRHANDLE, EXPR)

Example 18.10.

1 opendir(MYDIR, "joker");

Explanation

The file joker is attached to the directory filehandle, MYDIR, and is opened for reading. The directory joker must exist and must be a directory.

The readdir Function

A directory can be read by anyone who has read permission on the directory. You can't write to the directory itself even if you have write permission. The write permission on a directory means that you can create and remove files from within the directory, not alter the directory data structure itself.

When we speak about reading a directory with the readdir function, we are talking about looking at the contents of the directory structure maintained by the system. If the opendir function opened the directory, in a scalar context, readdir returns the next directory entry. The readdir function returns the next directory entry. In an array context, it returns the rest of the entries in the directory.

Format

readdir(DIRHANDLE); readdir DIRHANDLE;

The closedir Function

The closedir function closes the directory that was opened by the opendir function.

Format

closedir (DIRHANDLE); closedir DIRHANDLE;

Example 18.11.

(The Script) 1 opendir(DIR, "..") || die "Can't open: $!\n"; # Open parent directory 2 @parentfiles=readdir(DIR); # Gets a list of the directory contents 3 closedir(DIR); # Closes the filehandle 4 foreach $file ( @parentfiles ) # Prints each element of the array { print "$file\n";} (Output) . .. filea fileb filec .sh_history stories

Explanation

The opendir function opens the directory structure and assigns it to DIR, the directory filehandle. The .. (parent) directory is opened for reading.
The readdir function assigns all the rest of the entries in the directory to the array @parentfiles.
The closedir function closes the directory.
The files are printed in the order they are stored in the directory structure. This may not be the order that the ls command prints out the files.

The telldir Function

The telldir function returns the current position of the readdir() routines on the directory filehandle. The value returned by telldir may be given to seekdir() to access a particular location in a directory.

Format

telldir(DIRHANDLE);

The rewinddir Function

The rewinddir function sets the position of DIRHANDLE back to the beginning of the directory opened by opendir. It is not supported on all machines.

Format

rewinddir(DIRHANDLE); rewinddir DIRHANDLE;

The seekdir Function

The seekdir sets the current position for readdir() on the directory filehandle. The position is set by the a value returned by telldir().

Format

seekdir(DIRHANDLE, POS);

Example 18.12.

(The Script) 1 opendir(DIR, "."); # Opens the current directory 2 while( $myfile=readdir(DIR) ){ 3 $spot=telldir(DIR); 4 if ( "$myfile" eq ".login" ) { print "$myfile\n"; last; } } 5 rewinddir(DIR); 6 seekdir(DIR, $spot); 7 $myfile=readdir(DIR); print "$myfile\n"; (Output) .login .cshrc

Explanation

The opendir function opens the present working directory for reading.
The while statement is executed, and the readdir function returns the next directory entry from the directory filehandle and assigns the file to the scalar $myfile.
After the readdir function reads a filename, the telldir function marks the location of that read and stores the location in the scalar $spot.
When the .login file is read, the loop is exited.
The rewinddir function resets the position of the DIR filehandle to the beginning of the directory structure.
The seekdir function uses the results of the telldir function to set the current position for the readdir function on the DIR filehandle.
The next directory entry is read by the readdir function and assigned to the scalar $myfile.

18.1.8. Permissions and Ownership

UNIX

There is one owner for every UNIX file. The one benefit the owner has over everyone else is the ability to change the permissions on the file, thus controlling who can do what to the file. A group may have a number of members, and the owner of the file may change the group permissions on a file so that the group will enjoy special privileges.

Every UNIX file has a set of permissions associated with it to control who can read, write, or execute the file. There are a total of 9 bits that constitute the permissions on a file. The first 3 bits control the permissions of the owner of the file, the second set controls the permissions of the group, and the last set controls the rest of the world; that is, everyone else. The permissions are stored in the mode field of the file's inode.

Windows

Win32 systems do not handle file permissions the way UNIX does. Files are created with read and write turned on for everyone. Files and folders inherit attributes that you can set. By clicking the mouse on a file icon and selecting Properties, you can, in a limited way, select permission attributes, such as Archive, Read-only, and Hidden. See Figure 18.3.

Figure 18.3. File attributes (Windows).

If your platform is Windows NT, you can set file and folder permissions only on drives formatted to use NTFS.^[5] To change permissions, you must be the owner or have been granted permission to do so by the owner. If you are using NTFS, go to Windows Explorer and then locate the file or folder for which you want to set permissions. Right-click the file or folder, click Properties, and then click the Security tab. You will be able to allow, deny, or remove permissions from the group or user.

^[5] NTFS is an advanced file system designed for Windows NT.

See the Win32::FileSecurity module in the Perl Resource Kit for Win32 if you need to maintain file permissions. To retrieve file permissions from a file or directory, use the Win32::FileSecurity::Get($Path, \%Perms) extension, where $Path is the relative or absolute path to the file or directory for which you are seeking permissions, and \%Perms is a reference to a hash containing keys representing the user or group and corresponding values representing the permission mask.

Table 18.6. Win32 Extensions to Manage Files and Directories
Extension What It Does
Win32::File Standard module for retrieving and setting file attributes
Win32::File::GetAtributes(path,attribute) Retrieves file attributes
Win32::File::SetAttributes(path,attribute) Sets file attributes
Win32::AdminMisc::GetFileInfo Retrieves file information fields: CompanyName, FileVersion, InternalName, LegalCopyright, OriginalFileName, ProductName, ProductVersion, LangID, and Language

Table 18.6. Win32 Extensions to Manage Files and Directories
Extension	What It Does
Win32::File	Standard module for retrieving and setting file attributes
Win32::File::GetAtributes(path,attribute)	Retrieves file attributes
Win32::File::SetAttributes(path,attribute)	Sets file attributes
Win32::AdminMisc::GetFileInfo	Retrieves file information fields: CompanyName, FileVersion, InternalName, LegalCopyright, OriginalFileName, ProductName, ProductVersion, LangID, and Language

The chmod Function (UNIX)

The chmod function changes permissions on a list of files. The user must own the files to change permissions on them. The files must be quoted strings. The first element of the list is the numeric octal value for the new mode. (Today, the binary/octal notation has been replaced by a more convenient mnemonic method for changing permissions. Perl does not use the new method.)

Table 18.7 illustrates the eight possible combinations of numbers used for changing permissions if you are not familiar with this method.

Table 18.7. Permission Modes
Octal Binary Permissions Meaning
0 000 none All turned off
1 001 --x Execute
2 010 -w- Write
3 011 -wx Write, execute
4 100 r-- Read
5 101 r-x Read, execute
6 110 rw- Read, write
7 111 rwx Read, write, execute
Make sure the first digit is a 0 to indicate an octal number. Do not use the mnemonic mode (e.g., +rx), because all the permissions will be turned off.

Table 18.7. Permission Modes
Octal	Binary	Permissions	Meaning
0	000	none	All turned off
1	001	--x	Execute
2	010	-w-	Write
3	011	-wx	Write, execute
4	100	r--	Read
5	101	r-x	Read, execute
6	110	rw-	Read, write
7	111	rwx	Read, write, execute
Make sure the first digit is a 0 to indicate an octal number. Do not use the mnemonic mode (e.g., +rx), because all the permissions will be turned off.

The chmod Function (Windows)

ActivePerl supports a limited version of the chmod function. However, it can be used only for giving the owner read/write access. (The group and other bits are ignored.)

The chmod function returns the number of files that were changed.

Format

chmod(LIST); chmod LIST;

Example 18.13.

(UNIX) 1 $ perl -e '$count=chmod 0755, "foo.p", "boo.p" ;print "$count files changed.\n"' 2 2 files changed. 3 $ ls -l foo.p boo.p -rwxr-xr-x 1 ellie 0 Mar 7 12:52 boo.p* -rwxr-xr-x 1 ellie 0 Mar 7 12:52 foo.p*

Explanation

The first argument is the octal value 0755. It turns on rwx for the user, r and x for the group and others. The next two arguments, foo.p and boo.p, are the files affected by the change. The scalar $count contains the number of files that were changed.
The value of $count is 2 because both files were changed to 0755.
The output of the UNIX ls -l command is printed, demonstrating that the permissions on files foo.p and boo.p have been changed to 0755.

The chown Function (UNIX)

The chown function changes the owner and group of a list of files. Only the owner or superuser can invoke it.^[6] The first two elements of the list must be a numerical uid and gid. Each authorized UNIX user is assigned a uid (user identification number) and a gid (group identification number) in the password file.^[7] The function returns the number of files successfully changed.

^[6] On BSD UNIX and some POSIX-based UNIX (Solaris), only the superuser can change ownership.

^[7] To get the uid or gid for a user, the getpwnam or getpwuid functions can be used.

Format

chown(LIST); chown LIST;

Example 18.14.

(The Script) 1 $ uid=9496; 2 $ gid=40; 3 $number=chown($uid, $gid, 'foo.p', 'boo.p'); 4 print "The number of files changed is $number\.n"; (Output) 4 The number of files changed is 2.

Explanation

The user identification number 9496 is assigned.
The group identification number 40 is assigned.
The chown function changes the ownership on files foo.p and boo.p and returns the number of files changed.

The umask Function (UNIX)

When a file is created, it has a certain set of permissions by default. The permissions are determined by what is called the system mask. On most systems, this mask is 022 and is set by the login program.^[8] A directory has 777 by default (rwxrwxrwx), and a file has 666 by default (rw-rw-rw). The umask function is used to remove or subtract permissions from the existing mask.

^[8] The user can also set the umask in the .profile (sh or ksh) or .cshrc (csh) initialization files.

To take write permission away from the "others" permission set, the umask value is subtracted from the maximum permissions allowed per directory or file:

  777 (directory)              666 (file)
-002 (umask value)      -002 (umask value)
  775                               664

The umask function sets the umask for this process and returns the old one. Without an argument, the umask function returns the current setting.

Format

umask(EXPR) umask EXPR umask

Example 18.15.

1 $ perl -e 'printf("The umask is %o.\n", umask);' The umask is 22. 2 $ perl -e 'umask 027; printf("The new mask is %o.\n", umask);' The new mask is 27.

Explanation

The umask function without an argument prints the current umask value.
The umask function resets the mask to octal 027.

18.1.9. Hard and Soft Links

UNIX

When you create a file, it has one hard link; that is, one entry in the directory. You can create additional links to the file, which are really just different names for the same file. The kernel keeps track of how many links a file has in the file's inode. As long as there is a link to the file, its data blocks will not be released to the system. The advantage to having a file with multiple names is that there is only one set of data, or master file, and that file can be accessed by a number of different names. A hard link cannot span file systems and must exist at link-creation time.

A soft link is also called a symbolic link and sometimes a symlink. A symbolic link is really just a very small file (it has permissions, ownership, size, etc.). All it contains is the name of another file. When accessing a file that has a symbolic link, the kernel is pointed to the name of the file contained in the symbolic link. For example, a link from thisfile to /usr/bin/joking/otherfile links the name thisfile to /usr/bin/joking/otherfile. When thisfile is opened, otherfile is the file really accessed. Symbolic links can refer to files that do or don't exist and can span file systems and even different computers. They can also point to other symbolic links.^[9]

^[9] Symbolic links originated in BSD and are supported under many ATT systems. They may not be supported on your system.

Windows

The Win32 system introduced shortcuts, special binary files with a .LNK extension. A shortcut is similar to a UNIX link, but it is processed by a particular application rather than by the system and is an alias for a file or directory. Shortcuts are icons with a little arrow in a white box in the left corner. See the Win32::Shortcut module to create, load, retrieve, save, and modify shortcut properties from a Perl script. (See Figure 18.4.)

Figure 18.4. Shortcuts and the .LNK extension.

The link and unlink Functions (UNIX)

The link function creates a hard link (i.e., two files that have the same name) on UNIX systems. The first argument to the link function is the name of an existing file; the second argument is the name of the new file, which cannot already exist. Only the superuser can create a link that points to a directory. Use rmdir when removing a directory.

Format

link(OLDFILENAME, NEWFILENAME);

Example 18.16.

(UNIX) 1 $ perl -e 'link("dodo", "newdodo");' 2 $ ls -li dodo newdodo 142726 -rw-r--r-- 2 ellie 0 Mar 7 13:46 dodo 142726 -rw-r--r-- 2 ellie 0 Mar 7 13:46 newdodo

Explanation

The old file dodo is given an alternative name, newdodo.
The i option to the ls command gives the inode number of the file. If the inode numbers are the same, the files are the same. The old file, dodo, started with one link. The link count is now two. Since dodo and newdodo are linked, they are the same file, and changing one will then change the other. If one link is removed, the other still exists. To remove a file, all hard links to it must be removed.

The unlink function deletes a list of files on both UNIX and Windows systems (like the UNIX rm command or the MS-DOS del command). If the file has more than one link, the link count is dropped by 1. The function returns the number of files successfully deleted. To remove a directory, use the rmdir function, since only the superuser can unlink a directory with the unlink function.

Format

unlink (LIST); unlink LIST;

Example 18.17.

(The Script) 1 unlink('a','b','c') || die "remove: $!\n"; 2 $count=unlink <*.c>; print "The number of files removed was $count\n";

Explanation

The files a, b, and c are removed.
Any files ending in .c (C source files) are removed. The number of files removed is stored in the scalar $count.

The symlink and readlink Functions (UNIX)

The symlink function creates a symbolic link. The symbolic link file is the name of the file that is accessed if the old filename is referenced.

Format

symlink(OLDFILE, NEWFILE)

Example 18.18.

1 $ perl -e 'symlink("/home/jody/test/old", "new");' 2 $ ls -ld new lrwxrwxrwx 1 ellie 8 Feb 21 17:32 new -> /home/jody/test/old

Explanation

The symlink function creates a new filename, new, linked to the old filename, /home/jody/test/old.
The ls-ld command lists the symbolically linked file. The symbol –> points to the new filename. The l preceding the permissions also indicates a symbolic link file.

The readlink function returns the value of the symbolic link and is undefined if the file is not a symbolic link.

Format

readlink(SYMBOLIC_LINK); readlink SYMBOLIC_LINK;

Example 18.19.

1 $ perl -e 'readlink("new")'; /home/jody/test/old

Explanation

The file new is a symbolic link. It points to /home/jody/test/old, the value returned by the readlink function.

18.1.10. Renaming Files

The rename Function (UNIX and Windows)

The rename function changes the name of the file, like the UNIX mv command. The effect is to create a new link to an existing file and then delete the existing file. The rename function returns 1 for success and returns 0 for failure. This function does not work across file system boundaries. If a file with the new name already exists, its contents will be destroyed.

Format

rename(OLDFILENAME, NEWFILENAME);

Example 18.20.

1 rename ("tmp", "datafile");

Explanation

The file tmp is renamed datafile. If datafile already exists, its contents are destroyed.

18.1.11. Changing Access and Modification Times

The utime Function

The utime function changes the access and modification times on each file in a list of files, like the UNIX touch command. The first two elements of the list must be the numerical access and modification times, in that order. The time function feeds the current time to the utime function. The function returns the number of files successfully changed. The inode modification time of each file in the list is set to the current time.

Format

utime (LIST); utime LIST;

Example 18.21.

(The Script--UNIX) 1 print "What file will you touch (create or change time stamp)? "; chop($myfile=<STDIN>); 2 $now=time; # This example makes the file if it doesn't exist 3 utime( $now, $now, $myfile) || open(TMP,">>$myfile");^[a] (The Command Line) $ ls -l brandnewfile brandnewfile: No such file or directory $ update.p 1 What file will you touch (create or update time stamp) ? brandnewfile $ ls -l brandnewfile 2 -rw-r--r-- 1 ellie 0 Mar 6 17:13 brandnewfile

Explanation

The user will enter the name of a file either to update the access and modification times or, if the file does not exist, to create it.
The variable $now is set to the return value of the time function, the number of nonleap seconds since January 1, 1970, UTC.
The first argument to $now is the access time, the second argument is the modification time, and the third argument is the file affected. If the utime function fails because the file does not exist, the open function will create the file, using TMP as the filehandle, emulating the UNIX touch command.

^[a] Wall, L. , Christianson, T., and Orwant, J., Programming Perl, 3rd ed., O'Reilly & Associates: Sebastopol, CA, 2000.

18.1.12. File Statistics

The information for a file is stored in a data structure called an inode, maintained by the kernel. For UNIX users, much of this information is retrieved with the ls command. In C and Perl programs, this information may be retrieved directly from the inode with the stat function. See the File::stat module, which creates a user interface for the stat function. Although the emphasis here is UNIX, the stat function also works with Win32 systems.

The stat and lstat Functions

The stat function returns a 13-element array containing statistics retrieved from the file's inode. The last two fields, dealing with blocks, are defined only on BSD UNIX systems.^[10]

^[10] Wall, L., Christianson, T., and Orwant, J., Programming Perl, 3rd ed., O'Reilly & Associates: Sebastopol, CA, 2000, p. 188.

The lstat function is like the stat function, but if the file is a symbolic link, lstat returns information about the link itself rather than about the file it references. If your system does not support symbolic links, a normal stat is done.

The special underscore filehandle is used to provide stat information from the file most previously stated. The 13-element array returned contains the following elements stored in the stat structure. (The order is a little different from the UNIX system call stat.)

Device number
Inode number
Mode
Link count
User ID
Group ID
For a special file, the device number of the device it refers to
Size in bytes, for regular files
Time of last access
Time of the last modification
Time of last file status change
Preferred I/O block size for file system
Actual number of 512-byte blocks allocated

Format

stat(FILEHANDLE); stat FILEHANDLE; stat(EXPR);

Example 18.22.

(UNIX) 1 open(MYFILE, "perl1") || die "Can't open: $!\n"; 2 @statistics=stat(MYFILE); 3 print "@statistics\n"; close MYFILE; 4 @stats=stat("perl1"); 5 printf("The inode number is %d and the uid is %d.\n", $stats[1], $stats[4]); 6 print "The file has read and write permissions.\n", if -r _ && -w _; (Output) 3 1819 142441 33261 1 9496 40 -21335 75 761965998 727296409 8192 2 5 The inode number is 142441 and the uid is 9496. 6 The file has read and write permissions.

Explanation

The file perl1 is opened via the filehandle MYFILE.
The stat function retrieves information from the file's inode and returns that information to a 13-element array, @statistics.
The 13-element array is printed. The last two elements of the array are the blockize and the number of blocks in 512-byte blocks. The size and number of blocks may differ because unallocated blocks are not counted in the number of blocks. The negative number is an NIS device number.
This time the stat function takes the filename as its argument, rather than the filehandle.
The second and fifth elements of the array are printed.
The special underscore (_) filehandle is used to retrieve the current file statistics from the previous stat call. The file perl1 was stated last. The file test operators, -r and -w, use the current stat information of perl1 to check for read and write access on the file.

Example 18.23.

Code View:
(Windows) # Since UNIX and Windows treat files differently, # some of the fields here are # blank or values returned are not meaningful 1 @stats = stat("C:\\ellie\\testing"); 2 print "Device: $stats[0]\n"; 3 print "Inode #: $stats[1]\n"; 4 print "File mode: $stats[2]\n"; 5 print "# Hard links: $stats[3]\n"; 6 print "Owner ID: $stats[4]\n"; 7 print "Group ID: $stats[5]\n"; 8 print "Device ID: $stats[6]\n"; 9 print "Total size: $stats[7]\n"; 10 print "Last access time: $stats[8]\n"; 11 print "Last modify time: $stats[9]\n"; 12 print "Last change inode time: $stats[10]\n"; 13 print "Block size: $stats[11]\n"; 14 print "Number of blocks: $stats[11]\n"; (Output) 2 Device: 2 3 Inode #: 0 4 File mode: 16895 5 # Hard links: 1 6 Owner ID: 0 7 Group ID: 0 8 Device ID: 2 9 Total size: 0 10 Last access time: 981360000 11 Last modify time: 977267374 12 Last change inode time: 977267372 13 Block size: 14 Number of blocks:

18.1.13. Low-Level File I/O

The read Function (fread)

The read function reads a specified number of bytes from a filehandle and puts the bytes in a scalar variable. If you are familiar with C's standard I/O fread function, Perl's read function handles I/O buffering in the same way. To improve efficiency, rather than reading a character at a time, a block of data is read and stored in a temporary storage area. C's fread function and Perl's read functions then transfer data, a byte at a time, from the temporary storage area to your program. (The sysread function is used to emulate C's low-level I/O read function.) The function returns the number of bytes read or an undefined value if an error occurred. If EOF (end of file) is reached, 0 is returned.

In Perl, the print function (not the write function) is used to output the actual bytes returned by the read function. Perl's print function emulates C's fwrite function.

Format

read(FILEHANDLE, SCALAR, LENGTH, OFFSET); read(FILEHANDLE, SCALAR, LENGTH);

Example 18.24.

(The Script) 1 open(PASSWD, "/etc/passwd") || die "Can't open: $!\n"; 2 $bytes=read (PASSWD, $buffer, 50); 3 print "The number of bytes read is $bytes.\n"; 4 print "The buffer contains: \n$buffer"; (Output) 3 The number of bytes is 50. 4 The buffer contains: root:YhTLR4heBdxfw:0:1:Operator:/:/bin/csh nobody:

Explanation

The /etc/passwd file is opened for reading via the PASSWD filehandle.
The read function attempts to read 50 bytes from the filehandle and returns the number of bytes read to the scalar $bytes.

The sysread and syswrite Functions

The sysread function is like C's read function. It bypasses the standard I/O buffering scheme and reads bytes directly from the filehandle to a scalar variable. Mixing read and sysread functions can cause problems, since the read function implements buffering, and the sysread function reads bytes directly from the filehandle.

The syswrite function writes bytes of data from a variable to a specified filehandle. It emulates C's write function.

Format

sysread(FILEHANDLE, SCALAR, LENGTH, OFFSET); sysread(FILEHANDLE, SCALAR, LENGTH); syswrite(FILEHANDLE, SCALAR, LENGTH, OFFSET); syswrite(FILEHANDLE, SCALAR, LENGTH);

The seek Function

Perl's seek function is the same as the fseek standard I/O function in C. It allows you to randomly access a file. It sets a position in a file, measured in bytes from the beginning of the file, where the first byte is byte 0. The function returns 1 if successful, 0 if not.

Format

seek(FILEHANDLE, OFFSET, POSITION);

POSITION = The absolute position in the file where
0 = Beginning of file
1 = Current position in file
2 = End of file

OFFSET = Number of bytes from POSITION. A positive offset advances the position forward in the file. A negative offset moves the position backward in the file. A negative OFFSET sets the file position for POSITION 1 or 2.

Example 18.25.

(The Script) 1 open(PASSWD, "/etc/passwd") || die "Can't open: $!\n"; 2 while ( chomp($line = <PASSWD>) ){ 3 print "---$line---\n" if $line =~ /root/; } 4 seek(PASSWD, 0, 0) || die "$!\n"; # Start back at the beginning # of the file at first byte 5 while(<PASSWD>){print if /ellie/;} 6 close(PASSWD); (Output) 3 ---root:YhTLR4heBdxfw:0:1:Operator:/:/bin/csh--- 5 ellie:aVD17JSsBMyGg:9496:40:Ellie Savage:/home/jodyellie:/bin/csh

Explanation

The /etc/passwd file is opened via the PASSWD filehandle.
The while statement loops through the PASSWD filehandle, reading a line at a time until end of file is reached.
The line is printed if it contains the regular expression root.
The seek function sets the file position at position 0, the beginning of the file, byte offset 0. Since the filehandle was not closed, it remains opened until closed with the close function.
The while statement loops through the file.
The PASSWD filehandle is officially closed.

The tell Function

The tell function returns the current byte position of a filehandle for a regular file. The position can be used as an argument to the seek function to move the file position to a particular location in the file.

Format

tell (FILEHANDLE); tell FILEHANDLE; tell;

Example 18.26.

(The Script) 1 open(PASSWD, "/etc/passwd") || die "Can't open: $!\n"; while ( chomp($line = <PASSWD>) ){ if ( $line =~ /sync/){ 2 $current = tell; print "---$line---\n"; } } 3 printf "The position returned by tell is %d.\n", $current; 4 seek(PASSWD, $current, 0); while(<PASSWD>){ 5 print; } (Output) 2 --sync::1:1::/:/bin/sync-- 3 The position returned by tell is 296. 5 sysdiag:*:0:1:Old System Diagnostic:/usr/diag/sysdiag/sysdiag sundiag:*:0:1:System Diagnostic:/usr/diag/sundiag/sundiag ellie:aVD17JSsBMyGg:9496:40:Ellie Savage:/home/jodyellie:/bin/csh

Explanation

The /etc/passwd file is opened via the PASSWD filehandle.
When the line containing the regular expression sync is reached, the tell function will return the current byte position to the scalar $current. The current position is the next byte after the last character read.
The byte position returned by the tell function is printed.
The seek function locates the current position starting from the beginning of the file to the offset position returned by the tell function.

18.1.14. Packing and Unpacking Data

Remember the printf and sprintf functions? They were used to format their arguments as floating point numbers, decimal numbers, strings, etc. But the pack and unpack functions take this formatting a step further. Both functions act on strings that can be represented as bits, bytes, integers, long integers, floating point numbers, etc. The format type tells both pack and unpack how to handle these strings.

The pack and unpack functions have a number of uses. These functions are used to pack a list into a binary structure and then expand the packed values back into a list. When working with files, you can use these functions to create uuencode files, relational databases, and binary files.

When working with files, not all files are text files. Some files, for example, may be packed into a binary format to save space, store images, or in a uuencoded format to facilitate sending a file through the mail. These files are not readable as is the text on this page. The pack and unpack functions can be used to convert the lines in a file from one format to another. The pack function converts a list into a scalar value that may be stored in machine memory. The template shown in Table 18.8 is used to specify the type of character and how many characters, will be formatted. For example, the string c4, or cccc, packs a list into four unsigned characters and a14 packs a list into a 14-byte ASCII string, null padded. The unpack function converts a binary formatted string into a list. The opposite of pack puts a string back into Perl format.

Table 18.8. The Template pack and unpack—Types and Values
Template Description
a An ASCII string (null padded)
A An ASCII string (space padded)
b A bit string (low-to-high order, like vec)
B A bit string (high-to-low order)
c A signed char value
C An unsigned char value
d A double-precision float in the native format
f A single-precision float in the native format
h A hexadecimal string (low nybble first, to high)
H A hexadecimal string (high nybble first)
i A signed integer
I An unsigned integer
l A signed long value
L An unsigned long value
n A short in "network" (big-endian) order
N A long in "network" (big-endian) order
p A pointer to a null-terminated string
P A pointer to a structure (fixed-length string)
q A signed 64-bit value
Q An unsigned 64-bit value
s A signed short value (16-bit)
S An unsigned short value (16-bit)
u A uuencoded string
v A short in "VAX" (little-endian) order
V A long in "VAX" (little-endian) order
w A BER compressed unsigned integer in base 128, high bit first
x A null byte
X Back up a byte
@ Null fill to absolute position

Format

$string=pack(Template, @list ); @list = unpack(Template, $string );

Example 18.27.

(The Script) 1 $bytes=pack("c5", 80,101,114, 108, 012); # 5 ASCII characters 2 print "$bytes\n"; (Output) Perl

Explanation

The first element in the list, the template (see Table 18.8 ), is composed of the type and the number of values to be packed; in this example, four signed characters. The rest of the list consists of the decimal values for characters P, e, r, and l and the octal value for the newline. This list is packed into a binary structure. The string containing the packed structure is returned and stored in $bytes. (See your ASCII table.)
The 4-byte character string is printed.

Example 18.28.

(Script) 1 $string=pack("A15A3", "hey","you"); # ASCII string, space padded 2 print "$string"; (Output) 2 hey you

Explanation

Two strings, hey and you, are packed into a structure using the template A15A3. A15 will convert the string hey into a space-padded ASCII string consisting of 15 characters. A3 converts the string you into a 3-character space-padded string.
The strings are printed according to the pack formatting template. They are left justified.

Example 18.29.

Code View:
(The Script) #!/bin/perl # Program to uuencode a file and then uudecode it 1 open(PW, "/etc/passwd") || die "Can't open: $!\n"; 2 open(CODEDPW, ">codedpw") || die "Can't open: $!\n"; 3 while(<PW>){ 4 $uuline=pack("u*", $_); # uuencoded string 5 print CODEDPW $uuline; } close PW; close CODEDPW; 6 open(UUPW, "codedpw") || die "Can't open: $!\n"; while(<UUPW>){ 7 print; } close UUPW; print "\n\n"; 8 open(DECODED, "codedpw") || die; 9 while(<DECODED>){ 10 @decodeline = unpack("u*", $_); 11 print "@decodeline"; } (Output) 7 E<F]O=#IX.C`Z,3I3=7!E<BU5<V5R.B\Z+W5S<B]B:6XO8W-H"@`` 19&%E;6]N.G@Z,3HQ.CHO.@H` 58FEN.G@Z,CHR.CHO=7-R+V)I;CH* .<WES.G@Z,SHS.CHO.@H` :861M.G@Z-#HT.D%D;6EN.B]V87(O861M.@H` L;'`Z>#HW,3HX.DQI;F4@4')I;G1E<B!!9&UI;CHO=7-R+W-P;V]L+VQP.@H` ?<VUT<#IX.C`Z,#I-86EL($1A96UO;B!5<V5R.B\Z"@!R E=75C<#IX.C4Z-3IU=6-P($%D;6EN.B]U<W(O;&EB+W5U8W`Z"@!L M;G5U8W`Z>#HY.CDZ=75C<"!!9&UI;CHO=F%R+W- P;V]L+W5U8W!P=6)L:6,Z 5+W5S<B]L:6(O=75C<"]U=6-I8V\* J;&ES=&5N.G@Z,S<Z-#I.971W;W)K($%D;6EN.B]U<W(O;F5T+VYL<SH* ?;F]B;V1Y.G@Z-C`P,#$Z-C`P,#$Z3F]B;V1Y.B\Z"@`O I;F]A8V-E<W,Z>#HV,#`P,CHV,#`P,CI.;R!!8V-E<W,@57-E<CHO.@H` J;F]B;V1Y-#IX.C8U-3,T.C8U-3,T.E-U;D]3(#0N>"!.;V)O9'DZ+SH* M96QL:64Z>#HY- #DV.C0P.D5L;&EE(%%U:6=L97DZ+VAO;64O96QL:64Z+W5S *<B]B:6XO8W-H"@!C 11 root:x:0:1:Super-User:/:/usr/bin/csh daemon:x:1:1::/: bin:x:2:2::/usr/bin: sys:x:3:3::/: adm:x:4:4:Admin:/var/adm: lp:x:71:8:Line Printer Admin:/usr/spool/lp: smtp:x:0:0:Mail Daemon User:/: uucp:x:5:5:uucp Admin:/usr/lib/uucp: nuucp:x:9:9:uucp Admin:/var/spool/uucppublic:/usr/lib/uucp/uucico listen:x:37:4:Network Admin:/usr/net/nls: nobody:x:60001:60001:Nobody:/: noaccess:x:60002:60002:No Access User:/: nobody4:x:65534:65534:SunOS 4.x Nobody:/: ellie:x:9496:40:Ellie Quigley:/home/ellie:/usr/bin/csh

Explanation

The local passwd file is opened for reading.
Another file, called codepw, is opened for writing.
Each line of the filehandle is read into $_ until the end of file is reached.
The pack function uuencodes the line ($_) and assigns the coded line to the scalar $uuline. uuencode is often used to convert a binary file into an encoded representation that can be sent using e-mail.
The uuencoded string is sent to the filehandle.
The file containing the uuencoded text is opened for reading and attached to the UUPW filehandle.
Each line of uuencoded text is printed.
The uuencoded file is opened for reading.
Each line of the file is read from the filehandle and stored in $_.
The unpack function converts the uuencoded string back into its original form and assigns it to @decodeline.
The uudecoded line is printed.

Example 18.30.

(The Script) #!/bin/perl 1 $ints=pack("i3", 5,-10,15); # pack into binary structure 2 open(BINARY, "+>binary" ) || die; 3 print BINARY "$ints"; 4 seek(BINARY, 0,0) || die; while(<BINARY>){ 5 ($n1,$n2,$n3)=unpack("i3", $_); 6 print "$n1 $n2 $n3\n"; } (Output) 6 5 -10 15

Explanation

The three integers 5, –10, and 15 are packed into three signed integers. The value returned is a binary structure assigned to $ints.
The BINARY filehandle is opened for reading and writing.
The packed integers are sent to the file. This file is compressed and totally unreadable. To read it, it must be converted back into an ASCII format. This is done with unpack.
The seek function puts the file pointer back at the top of the file at byte position 0.
We're reading from the file one line at a time. Each line, stored in $_, is unpacked and returned to its original list of values.
The original list values are printed.

Example 18.31.

(The Script) $str="0x123456789ABCDE ellie..."; 1 print "$str\n"; $bytes=unpack("H*",$str); # hex string (regular order) 2 print "$bytes\n"; $str2 = pack("H*", $bytes); 3 print "$str2\n"; $bytes = unpack("h*",$str); # hex string (reversed order) 4 print "$bytes\n"; $str1 = pack("h*", $bytes); 5 print"$str1\n"; (Output) 1 0x123456789ABCDE ellie... 2 3078313233343536373839414243444520656c6c69652e2e2e 3 0x123456789ABCDE ellie... 4 038713233343536373839314243444540256c6c69656e2e2e2 5 0x123456789ABCDE ellie...

Explanation

The string contains a hexadecimal number and some text.
The "h" and "H" fields pack a string that many nybbles (4-bit groups, representable as hexadecimal digits, 0-9a-f) long. Each byte of the input field of pack() generates 4 bits of the result. The variable $bytes consists of a hexidecimal string in regular hex order, where each character in the original string is represented by two hexadecimal numbers. For example, "ellie" is represented as 65 6c 6c 69 65, and the three dots are e2 e2 e2.

Template	Description
a	An ASCII string (null padded)
A	An ASCII string (space padded)
b	A bit string (low-to-high order, like vec)
B	A bit string (high-to-low order)
c	A signed char value
C	An unsigned char value
d	A double-precision float in the native format
f	A single-precision float in the native format
h	A hexadecimal string (low nybble first, to high)
H	A hexadecimal string (high nybble first)
i	A signed integer
I	An unsigned integer
l	A signed long value
L	An unsigned long value
n	A short in "network" (big-endian) order
N	A long in "network" (big-endian) order
p	A pointer to a null-terminated string
P	A pointer to a structure (fixed-length string)
q	A signed 64-bit value
Q	An unsigned 64-bit value
s	A signed short value (16-bit)
S	An unsigned short value (16-bit)
u	A uuencoded string
v	A short in "VAX" (little-endian) order
V	A long in "VAX" (little-endian) order
w	A BER compressed unsigned integer in base 128, high bit first
x	A null byte
X	Back up a byte
@	Null fill to absolute position