Advanced Perl Workshop

Topics

Calls to the Operating System

You will find that you will often want to use Perl to perform annoying or repetitive system tasks on your computer. So it's lucky that Perl lets you issue commands as if to the operating system. There are three basic ways to do this.

system() and exec()

system() is easy to use. The only argument is the exact command (as a string) you want to issue to the operating system. For example, system("ls -l *.pl") will list all files ending in the .pl extension on a Unix machine. Note that while the launched process is running, Perl waits around for it to finish. Of course, you could background the process in Unix with &. Incidentally, you can invoke the system() function with as many arguments as you like. Then these arguments are passed to the command being called one at a time. What's nice about this is that the shell doesn't ever get to see the command and get to try to change the metacharacters it thinks it sees. Of course, then the & won't work. D'oh!

On the other hand, exec() does the same thing with one little twist. (Everything I said about system() still applies for exec().) The difference is that for while the command is executing, the Perl process basically becomes the new command's process. (It'll turn back later.) This is mainly handy if you want the call to the operating system to be the very last thing that Perl does. Then Perl isn't waiting around while the new command executes, only to itself terminate as soon as it's done. But for the most part, you want system().

Capturing Output

"Ah," you say, "but what if I want to capture the output of my command and do all kinds of Perl goodness to it?" Then you want the back-quotes. (`) Placed around the command, they execute your command and return the result. So

      
	my $listing = `ls -l *.pl`;
      
    

will save the output of the list command into the $listing variable. (Note that there will be a lot of newlines in there.) If you use a list variable rather than a scalar on the left side

      
	my @listing = `ls -l *.pl`;
      
    

you get each line in its own array element. (Basically, Perl saves you from having to perform a split(/\n/, $listing).)

The back-quotes also interpolate, so variable names are replaced with what the variables contain and you need to use escapes for things. It's just like double-quoted strings.

Working with Files

Perl also provides you with a lot more ways of dealing with files so that you don't have to worry about which operating system you're using and the like. Yay Perl!

File Tests

First up: file tests. Perl lets you test the properties of files before you might end up doing something horrible to them. (Or before causing your program to abort because you can't do horrible things to them. In the very least, you can then die a little bit more elegantly.) File test look like this:

      
	if(-e myfile.txt)
	{
	  Stuff to do if the file exists
	}
      
    

The -e is the file test and it is run on the file myfile.txt. (You can do this with directories, of course. Why "of course"? Because to Unix, directories are really just files anyway. So it makes a lot of sense.) This test returns a true or false value. Some others return file sizes, modification dates, etc. (Mostly, you'll use logical tests and not ones that return more information than "yes" and "no".) In this case, we've asked if the file exists or not. The syntax probably looks a bit creepy (it does to me), but that's it. All file tests are of the form -letter. Others include:

Test Description
e File exists
r File is readable
w File is writeable
x File is executable
o File is owned by this user.
s File exists and has non-zero size. (Actually returns the file size.)
d Entry is a directory
T File looks like a text file
B File looks like a binary file
M Modification age (in days)
A Access age, in days

That's only about half of them, actually. You can find the rest in the Llama Book. (Page 159 in the third edition.) Oh, and not a wee caveat on the age tests: they date things from when the Perl program started running. If the program runs for a really long time, negative ages are possible.

Deleting and Renaming Files

So you want to delete or rename a file, eh? And you hate making operating system calls. (As you well should.) Perl comes equipped with operators that delete and rename files regardless of the OS! (You use the same operators. Perl handles the details. Perl is like having your own Caliban, isn't it?)

First, deleting. The operator is unlink. It actually takes a list as an argument, although you can – of course – invoke it with only one item. The list contains a list of file names, either relative or absolute names, that you wanted whacked. So, for example, you could say

      
	unlink ("myfile1.dat", "myfile2.dat", "myProcedure.pro");
      
    

which would then send all of those happy, innocent files to their graves. Note that unlink works like rm -f without the -i flag set: it'll destroy any file, regardless of the permissions, as long as you're remotely capable of destroying the file. (It'll over-ride contrary permissions, unless you really don't own the file. In which case, the file is safe as you would expect it to be.)

Renaming is done with the rename("old", "new") function. This does what you think: it takes a file named old and renames it new. (Naturally, you can use names other than old and new if you want to be a rebel.) But be warned, if a file named new already exists, rename will happily tromp all over it, sending it to Never-Never Land. (The one from "Peter Pan", not Michael Jackson's ranch.) So this would be an excellent time to use that -e file test, wouldn't it?

Globbing

Another of my favorite operator names in Perl is glob. You've seen globbing before with your operating system, particularly if you've lived in Unix-land very long. Globbing is the process of taking a wildcard and expanding it to match some or all of the files in a directory: ls *.pl, for example, is globbed to ls a lot of files ending in .pl before it is actually handed to ls. The same thing happens when you type something like that in to invoke Perl. However, wouldn't it be nice to be able to do that for string internal to Perl? (You may have typed them right into the program, taken them from user input, or whatever.) I'm not even go to say it: you know what's coming.

The operator glob will take an expression like *.pl and expand it to a list of all matching files in the current directory. Pretty easy to use and sometimes handy. How about that?

More with Data Types and Structures

Hashes and arrays are a little bit more interesting that I led you to believe on the first day. For one thing, they can be combined in various ways.

Arrays of Arrays

You can think of these as 2-D arrays, just like in other languages. You create them via the following syntax:

      
	@array_of_arrays = (
	                    [1,3],
	                    [4,5,6],
	                    [7]
	                   );

      
    

Note that the arrays need not be the same size! Also, note the use of the square-brackets rather than parentheses. This has to do with how Perl secretly (well, not quite so secretly) views array-names: as pointers to the arrays. I'd invite you read O'Reilly's Programming Perl if you want to learn the details and also learn how you can use them to your advantage.

You can access an element of such an array as:

      
	$array_of_arrays[1][2];
      
    

In this case, the element is 6, the third element of the second array.

You can add arrays to an array via push, but use square brackets around the array so that Perl knows that you're not adding the items to the array like normal:

      
	push(@array_of_arrays, [@sub_array]);
      
    

Hashes of Arrays

This is the one I seem to use most often: a hash that has arrays for its values. (Well, like arrays of arrays, the values are secretly pointers. Enough said.) You assign them like:

      
	my %hash_of_arrays = {
	                       "January" = ["Monday", "Tuesday"..."Wednesday"],
	                       "February"= ["Thursday"..."Wednesday"],
	                       ...
	                      };
      
    

You can access an entry in this via:

      
	$hash_of_arrays{March}[1];
      
    

The value here will be be Friday, by the way. (Second day of March.)

Others

Of course, the other two possible combinations also exist. And you can probably guess how they'll work. If not, look them up in Learning Perl. I won't waste time on them here, though.

Dealing with Errors: When Tragedy Strikes

Sometimes your program will be unable to do what you wanted it to do and it will want to crash in a horrible scene involving an absolute landslide in the electoral college. No, wait, that's the Mondale presidential bid. But the point is, your programs might crash sometimes. It would sure be swell to catch those crashes and at least exit with a friendly error message informing the user what went wrong. Or, failing that, a serene haiku or a lecherous limerick. (If you can't help 'em, amuse 'em, I always say.)

Die! Die! Die! (Not German for "The! The! The!")

One way to do this is with the die() function. It does what it's name strongly implies: it makes the program die as soon as it's called. Well, almost as soon. You see, die() takes an argument: a string containing a message to pass to the user (via STDERR) that will hopefully inform her what the heck just went wrong. (Or, if you're working for Microsoft, make the user want to hurl her monitor into a wall in a Hulk-like rage. But promise me you'll never do that, OK?) This, combined with the basic data about the crash (program name and line where the fatal error occurred) will help the user understand what went wrong.

Most often, you use die() after you've tried to do something really key for the rest of the program that has an unusually high probability of not working right. For example, opening files is often dependent on having the file structure and permissions (frequently set by powers outside of Perl) on your side. So it can fail with alarming frequency. And if the file fails to be opened, the rest of your program is often moot anyway, so it's best to just exit right away (with a friendly error message/haiku) and save the time and potential to do actual damage with a half-executed program. You code might look like:

      
	if($my_value >= 0)
	{
	  $new_value = sqrt($my_value);
	}else
	{
	  die("\$my_value wasn't a positive number: $my_value\n");
	}
      
    

A fantastic way to use die() is in conjunction with an or (||):

      
	open(INFILE, "< my_file.dat") || die("I couldn't open my_file.dat!\n");
      
    

while will gracefully exit with a rather helpful error message if the file fails to be opened for writing. Note a couple of things, here. First, this works because the || will go on to the second statement if the first one is false, but not if it's true. Second, it only works if the command on the left returns a logical (true or false) value of some sort. (How you interpret that statement depends on what you consider a success. If any non-empty, non-zero string is a success for your function, you're good. If only certain strings are OK, you need to use a more careful system of checking things.) Third, you really need the parentheses on the arguments of the open here. Why? Because || has a somewhat higher precedence that you might want. In that case, the or might be done before you want it to be, leading to wackiness. There used to be an or operator that did the same thing, but had a lower precedence and was safer for this kind of thing. I am lead to understand that it's been deprecated, alas.

Warning

Sometimes you have an error that's worth mentioning to the user, but not really worth dying for. In this case, you might find the warn() function handy. warn() does the same thing as die() (send an error message to STDERR; the message will contain both the string you pass to the function as well as information about the program and line number where the error was found), but it doesn't terminate the program.

Exiting Without Warning

You might occasionally find the need to exit a program without sending an error to STDERR. For example, the error might simply not be worth it, you might be in an environment where it wouldn't be very helpful, or you simply have no error to begin with. In this case, exit() is the function for you. It does what it says: exits the program. You can print an error message with a print command, of course (useful for web scripting errors), but you don't have to.

Trapping Errors

One problem in all of programming is that you simply cannot anticipate all manner of errors. Even if your program is debugged in the sense that everything works the way you planned, you might get inputs that throw it for a loop. (In fact, it's generally fantastically difficult to avoid that.) The solution is eval. It looks like this:

      
	eval
	 {
	   Buncha code here
	 };
      
    

(Note the semi-colon? eval is an expression, not a control structure. So use the semi-colon. How confusing, eh?)

What this does is run the code inside the block. If an error occurs, it doesn't crash the program. Instead, it saves the error message in $@. If no errors occur, $@ will be empty. This means that you can easily determine if no errors occurred: !$@ means just that. If an error did occur, you can exit gracefully, perhaps with a nice message to the user and cleaning up your messes before you leave.

Taint-Checking

A way of avoiding getting errors from unexpected inputs is "taint-checking". If you invoke Perl with the -T flag (either at the command line or in your Perl-path at the top of your program), Perl will work on input data from just anywhere. With taint-mode on, Perl will not allow you to use these tainted variables or any variables based off of them (using an assignment to rename the variable doesn't cut it) to do anything that would modify system resources. (Say, opening a file to write where the file's name comes from an outside source.) The way you do this is with a regular expression match of some kind. The idea here is to create a regular expression that matches what you expect to be given, and then use that to create untainted variables. If the match doesn't go off, it means you got data in a form that you didn't anticipate and that you should probably not continue the program with those data.

I cannot honestly say that I've ever used this, but I really should.

Redirecting Default Output

If you're going to be printing to a particular file handle quite a bit, it gets kind of tiresome to always have to say print OUTFILE... rather that just print. Happily, we can change our standard output (STDOUT) to another filehandle. The operator we use is select FILEHANDLE where FILEHANDLE is – of course – a filehandle. After you run this command, all output will default to FILEHANDLE. It is good practice to select STDOUT when you're done with the file, by the way. And you can always use print STDOUT "Message"; to still print to standard out when some other filehandle is selected.

Brother, Can you Spare a Time?

It's often handy to know what the current date and time are when you're running a Perl program. Luckily, Perl has taken care of that. The function is localtime(). This function actually takes an argument which is the timestamp (the number of seconds since midnight on 1 January 1970 on Unix system, but don't worry about that since Perl can hide all of these details from you). How do you figure that out? You don't, time does it for you. The command you'll usually want is generally localtime(time). (The only times I have had cause to modify this have been to get the information of days a few before or after "today". To do this, add or subtract a multiple of 86400, the number of seconds in a day.) There are other ways to use localtime, but I don't foresee you needing them with any real regularity. And I think you'll recognize them when you encounter them.

So I've told you how to call localtime(), but not how to use what it returns. What it returns is a nine-element list:

      
	my ($second, 
	    $minute, 
	    $hour, 
	    $day, 
	    $month, 
	    $year,
	    $weekday,
	    $yearday,
	    $isdst) 
	            = localtime(time);
      
    

where the first six elements are what you probably think they are: the current second, minutes, hour, day, month, and year of your local time. (As determined by the clock on your computer! Set it wrong and localtime() can't magically see through the error!) One little danger: the year is the "two-digit" year; it's the number of years since 1900 (holy Y2K, Batman!). The solution to this problem is clear: add 1900 to the year to get the real year. Anyway, pretty useful stuff, huh? Yeah, but it gets even better. $weekday is the day of the week, as a number from 0 to 6 (Sunday through Saturday); make an array of names of the weekdays if you want to turn that into the names you know and love. $yearday is the day in the year, starting with 1 January being 0 and working up to 364 (or 365, this year) for 31 December. This can be handy for working out time between two days. (Also handy for astronomy nerds, since this does come up quite a bit.) Finally, $isdst is a logical (0 or 1) variable that tells you whether you're on daylight savings time.

Oh, there's another, similar function: gmtime. That gives you the current universal time, which you might also want. It works the exact same way as localtime() except that it'll be 6 or 7 hours off for Boulder.

Feeling out of Sorts

Remember when we learned about sort() and I said that it just did the ASCII order thing? Well, I promised that it can be taught new tricks. Here's how: you need to make a sort-definition subroutine. This function will tell Perl how to order any two items. After that, Perl's sort() handles the rest.

Your sort-definition subroutine will take two arguments. We'll call them $a and $b. You then return one of three values: -1, 0, or 1. These correspond to $a is earlier than $b, $a and $b have the same position, sort-wise, and $a comes after $b. So, for example, to do a numeric sort:

      
	sub by_number
	{
	  if($a < $b) 
	  {
	    return(-1);
	  }elsif($a>$b)
	  {
	    return(1)
	  }else
	  {
	    return(0);
	  }

	}
      
    

Quick eyes will have noticed that I didn't actually use the @_ array to define my variables. This is because sort hands this for you to speed things up. So $a and $b are automatically set. Yay!

By the way, a lot of sort-definition subroutines start with "by". You'll see why in a second. Another tidbit of interest is that there is a "space-ship" operator that does the whole if-structure above in one fell-swoop: $a <=>$b will return -1, 0, or 1 just as above.

To use this subroutine rather than the default ASCIIbetical sorting, invoke sort like this:

      
	my @sorted_array = sort(by_number @unsorted_array_of_numbers);
      
    

Note the lack of the comma.

Making it Modular

Sometimes you write a really swell function and you want to use it in a lot of Perl programs. Or you get sick of always including certain data in every program file. (For example, the APS website likes to know who the webmaster is, about Susan, Jo Ann, and Beth and all of our email addresses. It'd be a drag to always enter that data.) Or maybe you don't think you're up to coding a massive set of routines to get HTML documents across the web or to read CGI data. Wouldn't it be great of Perl let you somehow import these things?

OK, say it with me this time: Of course Perl can do it!

Importing Modules and Other Files

The syntax for reading in other Perl files or important modules is either:

      
	require("filename");
      
    

or

      
	use("filename");
      
    

There is a slight difference between these, although you'll see both used. (use() gets used almost all the time – in my experience – for full-blown Perl modules, by the way.) require() imports the code at run time while use() does it at compile time. In fact, if I understand correctly, use() uses require() as well as a little bit of other magic.

You can call both require() and use() with barewords (that is, not strings). In this case, call them without the .pm extension as they'll assume it. It's a bit less typing for you. Also, in the case of barewords you indicate directories with :: rather than the usual / (in Unix, anyway).

There are a lot of modules out there for Perl. You can always hit CPAN (http://www.cpan.org/ and get things you need. However, quite a few should be installed on your system to begin with. If you need a module, it might be best to ask your system administrator to install it for you. (Use cookies as a bribe if needed.) Some examples of modules I've found useful include CGI (helps you handle web forms), DBI (for interfacing with mySQL databases painlessly), and LWP (grabs webpages and so I have have Perl look them over).