Before starting this chapter, a note to you, the reader. Each line of code in an example is numbered. The output and explanations are also numbered to match the number in the code. These numbers are provided to help you understand important lines of each program. When copying examples into your text editor, don't include these numbers, or you will generate many unwanted errors! With that said, let's proceed.
Variables are fundamental to all programming languages. They are data items whose values may change throughout the run of the program, whereas literals or constants remain fixed. They can be placed anywhere in the program and do not have to be declared as in other higher languages, where you must specify the data type that will be stored there. You can assign strings, numbers, or a combination of these to Perl variables. For example, you may store a number in a variable and then later change your mind and store a string there. Perl doesn't care.
Perl variables are of three types: scalar, array, and associative array (more commonly called hashes). A scalar variable contains a single value (e.g., one string or one number), an array variable contains an ordered list of values indexed by a positive number, and a hash contains an unordered set of key/value pairs indexed by a string (the key) that is associated with a corresponding value. (See "Scalars, Arrays, and Hashes" on page 77.)
The scope of a variable determines where it is visible in the program. In Perl scripts, the variable is visible to the entire script (i.e., global in scope) and can be changed anywhere within the script.
The Perl sample programs you have seen in the previous chapters are compiled internally into what is called a package, which provides a namespace for variables. Almost all variables are global within that package. A global variable is known to the whole package and, if changed anywhere within the package, the change will permanently affect the variable. The default package is called main, similar to the main() function in the C language. Such variables in C would be classified as static. At this point, you don't have to worry about naming the main package or the way in which it is handled during the compilation process. The only purpose in mentioning packages now is to let you know that the scope of variables in the main package, your script, is global. Later, when we talk about the our, local, and my functions in packages, you will see that it is possible to change the scope and namespace of a variable.
Unlike C or Java, Perl variables don't have to be declared before being used. They spring to life just by the mere mention of them. Variables have their own namespace in Perl. They are identified by the "funny characters" that precede them. Scalar variables are preceded by a $ sign, array variables are preceded by an @ sign, and hash variables are preceded by a % sign. Since the "funny characters" indicate what type of variable you are using, you can use the same name for a scalar, array, or hash and not worry about a naming conflict. For example, $name, @name, and %name are all different variables; the first is a scalar, the second is an array, and the last is a hash.[1]
[1] Using the same name is allowed but not recommended; it makes reading too confusing.
Since reserved words and filehandles are not preceded by a special character, variable names will not conflict with reserved words or filehandles. Variables are case sensitive. The variables named $Num, $num, and $NUM are all different.
If a variable starts with a letter, it may consist of any number of letters (an underscore counts as a letter) and/or digits. If the variable does not start with a letter, it must consist of only one character. Perl has a set of special variables (e.g., $_, $^, $., $1, $2, etc.) that fall into this category. (See "Special Variables" on page 845 in Appendix A.) In special cases, variables may also be preceded with a single quote but only when packages are used.
An unitialized variable will get a value of zero or null, depending on whether its context is numeric or string.
The assignment operator, the equal sign (=), is used to assign the value on its right-hand side to a variable on its left-hand side. Any value that can be "assigned to" represents a named region of storage and is called an lvalue.[2] Perl reports an error if the operand on the left-hand side of the assignment operator does not represent an lvalue.
[2] The value on the left-hand side of the equal sign is called an lvalue, and the value on the right-hand side an rvalue.
When assigning a value or values to a variable, if the variable on the left-hand side of the equal sign is a scalar, Perl evaluates the expression on the right-hand side in a scalar context. If the variable on the left of the equal sign is an array, then Perl evaluates the expression on the right in an array context. (See "Scalars, Arrays, and Hashes" on page 77.)
A simple statement is an expression terminated with a semicolon.
Formatvariable=expression; Example 5.1.
|
[a] The comma can be used in both Perl 4 and Perl 5. The => symbol was introduced in Perl 5.
[b] The => operator, unlike the comma, causes the key to be quoted, but if the key consists of more than one word or begins with a number, then it must be quoted.
Since quoting affects the way in which variables are interpreted, this is a good time to review Perl's quoting rules. Perl quoting rules are similar to shell quoting rules. This may not be good news to shell programmers, who find using quotes frustrating, to say the least. It is often difficult to determine which quotes to use, where to use them, and how to find the culprit if they are misused; in other words, it's a real debugging nightmare.[3] For those of you who fall into this category, Perl offers an alternative method of quoting.[4]
[3] Barry Rosenberg, in his book KornShell Programming Tutorial, has a chapter titled "The Quotes From Hell."
[4] Larry Wall, creator of Perl, calls his alternative quoting method "syntactic sugar."
Perl has three types of quotes and all three types have a different function. They are single quotes, double quotes, and backquotes.
The backslash (\) behaves like a set of single quotes but can be used only to quote a single character.
A pair of single or double quotes may delimit a string of characters. Quotes will either allow the interpretation of special characters or protect special characters from interpretation, depending on the kind of quotes you use.
Single quotes are the "democratic" quotes. All characters enclosed within them are treated equally; in other words, there are no special characters. But the double quotes discriminate. They treat some of the characters in the string as special characters. The special characters include the $ sign, the @ symbol, and escape sequences such as \t and \n.
When backquotes surround an operating system command, the command will be executed by the shell. This is called command substitution. The output of the command will either be printed as part of a print statement or assigned to a variable. If you are using Windows, Linux, or UNIX, the commands enclosed within backquotes must be supported by the particular operating system and will vary from system to system.
No matter what kind of quotes you are using, they must be matched. Because the quotes mark the beginning and end of a string, Perl will complain about a "Might be a multiline runaway string" or "Execution of quotes aborted . . ." or "Can't find string terminator anywhere before EOF..." and fail to compile if you forget one of the quotes.
Double quotes must be matched unless embedded within single quotes or preceded by a backslash.
When a string is enclosed in double quotes, scalar variables (preceded with a $) and arrays (preceded by the @ symbol) are interpolated (i.e., the value of the variable replaces the variable name in the string). Hashes (preceded by the % sign) are not interpolated within the string enclosed in double quotes.
Strings that contain string literals (e.g., \t, \n) must be enclosed in double quotes for backslash interpretation.
A single quote may be enclosed in double quotes, as in "I don't care!"
If a string is enclosed in single quotes, it is printed literally (what you see is what you get).
If a single quote is needed within a string, then it can be embedded within double quotes or backslashed. If double quotes are to be treated literally, they can be embedded within single quotes.
UNIX/Windows[5] commands placed within backquotes are executed by the shell, and the output is returned to the Perl program. The output is usually assigned to a variable or made part of a print string. When the output of a command is assigned to a variable, the context is scalar (i.e., a single value is assigned).[6] For command substitution to take place, the backquotes cannot be enclosed in either double or single quotes. (Make note, UNIX shell programmers, backquotes cannot be enclosed in double quotes as in shell programs.)
[5] If using other operating systems, such as DOS or Mac OS 9.1 and below, the OS commands available for your system will differ.
[6] If output of a command is assigned to an array, the first line of output becomes the first element of the array, the second line of output becomes the next element of the array, and so on.
Perl provides an alternative form of quoting—the q, qq, qx, and qw constructs.
The q represents single quotes.
The qq represents double quotes.
The qx represents backquotes.
The qw represents a quoted list of words. (See "Array Slices" on page 84.)
Quoting Construct | What It Represents |
---|---|
q/Hello/ | 'Hello' |
qq/Hello/ | "Hello" |
qx/date/ | 'date' |
@list=qw/red yellow blue/; | @list=( 'red', 'yellow', 'blue'); |
The string to be quoted is enclosed in forward slashes, but alternative delimiters can be used for all four of the q constructs. You can use a nonalphanumeric character for the delimeter, such as a # sign, ! point, or paired characters, such as parentheses, square brackets, etc. A single character or paired characters can be used:
>q/Hello/
q#Hello#
q{Hello}
q[Hello]
q(Hello)