You will find that you will often want to use Perl to perform annoying or repetitive system tasks on your computer. So it's lucky that Perl lets you issue commands as if to the operating system. There are three basic ways to do this.
system() is easy to use. The only argument is the
exact command (as a string) you want to issue to the operating
system. For example, system("ls -l *.pl") will list
all files ending in the .pl extension on a Unix
machine. Note that while the launched process is running, Perl
waits around for it to finish. Of course, you could background
the process in Unix with &. Incidentally, you
can invoke the system() function with as many
arguments as you like. Then these arguments are passed to the
command being called one at a time. What's nice about this is
that the shell doesn't ever get to see the command and get to try
to change the metacharacters it thinks it sees. Of course, then
the & won't work. D'oh!
On the other hand, exec() does the same thing with
one little twist. (Everything I said about system()
still applies for exec().) The difference is that
for while the command is executing, the Perl process basically
becomes the new command's process. (It'll turn back later.) This
is mainly handy if you want the call to the operating system to be
the very last thing that Perl does. Then Perl isn't waiting
around while the new command executes, only to itself terminate as
soon as it's done. But for the most part, you want
system().
"Ah," you say, "but what if I want to capture the output of my
command and do all kinds of Perl goodness to it?" Then you want
the back-quotes. (`) Placed around the command, they
execute your command and return the result. So
my $listing = `ls -l *.pl`;
will save the output of the list command into the
$listing variable. (Note that there will be a lot of
newlines in there.) If you use a list variable rather than a
scalar on the left side
my @listing = `ls -l *.pl`;
you get each line in its own array element.
(Basically, Perl saves you from having to perform a
split(/\n/, $listing).)
The back-quotes also interpolate, so variable names are replaced with what the variables contain and you need to use escapes for things. It's just like double-quoted strings.
Perl also provides you with a lot more ways of dealing with files so that you don't have to worry about which operating system you're using and the like. Yay Perl!
First up: file tests. Perl lets you test the properties of files before you might end up doing something horrible to them. (Or before causing your program to abort because you can't do horrible things to them. In the very least, you can then die a little bit more elegantly.) File test look like this:
if(-e myfile.txt)
{
Stuff to do if the file exists
}
The -e is the file test and it is run
on the file myfile.txt. (You can do this with
directories, of course. Why "of course"? Because to Unix,
directories are really just files anyway. So it makes a lot of
sense.) This test returns a true or false value. Some others
return file sizes, modification dates, etc. (Mostly, you'll use
logical tests and not ones that return more information than "yes"
and "no".) In this case, we've asked if the file exists or not.
The syntax probably looks a bit creepy (it does to me), but that's
it. All file tests are of the form -letter.
Others include:
| Test | Description |
|---|---|
| e | File exists |
| r | File is readable |
| w | File is writeable |
| x | File is executable |
| o | File is owned by this user. |
| s | File exists and has non-zero size. (Actually returns the file size.) |
| d | Entry is a directory |
| T | File looks like a text file |
| B | File looks like a binary file |
| M | Modification age (in days) |
| A | Access age, in days |
That's only about half of them, actually. You can find the rest in the Llama Book. (Page 159 in the third edition.) Oh, and not a wee caveat on the age tests: they date things from when the Perl program started running. If the program runs for a really long time, negative ages are possible.
So you want to delete or rename a file, eh? And you hate making operating system calls. (As you well should.) Perl comes equipped with operators that delete and rename files regardless of the OS! (You use the same operators. Perl handles the details. Perl is like having your own Caliban, isn't it?)
First, deleting. The operator is unlink. It
actually takes a list as an argument, although you can – of
course – invoke it with only one item. The list contains a
list of file names, either relative or absolute names, that you
wanted whacked. So, for example, you could say
unlink ("myfile1.dat", "myfile2.dat", "myProcedure.pro");
which would then send all of those happy, innocent
files to their graves. Note that unlink works like
rm -f without the -i flag set: it'll
destroy any file, regardless of the permissions, as long as you're
remotely capable of destroying the file. (It'll over-ride
contrary permissions, unless you really don't own the file. In
which case, the file is safe as you would expect it to be.)
Renaming is done with the rename("old", "new")
function. This does what you think: it takes a file named
old and renames it new. (Naturally, you
can use names other than old and new if
you want to be a rebel.) But be warned, if a file named
new already exists, rename will happily
tromp all over it, sending it to Never-Never Land. (The one from
"Peter Pan", not Michael Jackson's ranch.) So this would be an
excellent time to use that -e file test,
wouldn't it?
Another of my favorite operator names in Perl is
glob. You've seen globbing before with your
operating system, particularly if you've lived in Unix-land very
long. Globbing is the process of taking a wildcard and expanding
it to match some or all of the files in a directory: ls
*.pl, for example, is globbed to ls a lot of files
ending in .pl before it is actually handed to
ls. The same thing happens when you type something
like that in to invoke Perl. However, wouldn't it be nice to be
able to do that for string internal to Perl? (You may have typed
them right into the program, taken them from user input, or
whatever.) I'm not even go to say it: you know what's coming.
The operator glob will take an expression like
*.pl and expand it to a list of all matching files in
the current directory. Pretty easy to use and sometimes handy.
How about that?
Hashes and arrays are a little bit more interesting that I led you to believe on the first day. For one thing, they can be combined in various ways.
You can think of these as 2-D arrays, just like in other languages. You create them via the following syntax:
@array_of_arrays = (
[1,3],
[4,5,6],
[7]
);
Note that the arrays need not be the same size! Also, note the use of the square-brackets rather than parentheses. This has to do with how Perl secretly (well, not quite so secretly) views array-names: as pointers to the arrays. I'd invite you read O'Reilly's Programming Perl if you want to learn the details and also learn how you can use them to your advantage.
You can access an element of such an array as:
$array_of_arrays[1][2];
In this case, the element is 6, the third element of the second array.
You can add arrays to an array via push, but use
square brackets around the array so that Perl knows that you're
not adding the items to the array like normal:
push(@array_of_arrays, [@sub_array]);
This is the one I seem to use most often: a hash that has arrays for its values. (Well, like arrays of arrays, the values are secretly pointers. Enough said.) You assign them like:
my %hash_of_arrays = {
"January" = ["Monday", "Tuesday"..."Wednesday"],
"February"= ["Thursday"..."Wednesday"],
...
};
You can access an entry in this via:
$hash_of_arrays{March}[1];
The value here will be be Friday, by
the way. (Second day of March.)
Of course, the other two possible combinations also exist. And you can probably guess how they'll work. If not, look them up in Learning Perl. I won't waste time on them here, though.
Sometimes your program will be unable to do what you wanted it to do and it will want to crash in a horrible scene involving an absolute landslide in the electoral college. No, wait, that's the Mondale presidential bid. But the point is, your programs might crash sometimes. It would sure be swell to catch those crashes and at least exit with a friendly error message informing the user what went wrong. Or, failing that, a serene haiku or a lecherous limerick. (If you can't help 'em, amuse 'em, I always say.)
One way to do this is with the die() function. It
does what it's name strongly implies: it makes the program die as
soon as it's called. Well, almost as soon. You see,
die() takes an argument: a string containing a
message to pass to the user (via STDERR) that will
hopefully inform her what the heck just went wrong. (Or, if
you're working for Microsoft, make the user want to hurl her
monitor into a wall in a Hulk-like rage. But promise me you'll
never do that, OK?) This, combined with the basic data about the
crash (program name and line where the fatal error occurred) will
help the user understand what went wrong.
Most often, you use die() after you've tried to do
something really key for the rest of the program that has an
unusually high probability of not working right. For example,
opening files is often dependent on having the file structure and
permissions (frequently set by powers outside of Perl) on your
side. So it can fail with alarming frequency. And if the file
fails to be opened, the rest of your program is often moot anyway,
so it's best to just exit right away (with a friendly error
message/haiku) and save the time and potential to do actual damage
with a half-executed program. You code might look like:
if($my_value >= 0)
{
$new_value = sqrt($my_value);
}else
{
die("\$my_value wasn't a positive number: $my_value\n");
}
A fantastic way to use die() is in conjunction
with an or (||):
open(INFILE, "< my_file.dat") || die("I couldn't open my_file.dat!\n");
while will gracefully exit with a rather helpful
error message if the file fails to be opened for writing. Note a
couple of things, here. First, this works because the
|| will go on to the second statement if the first
one is false, but not if it's true. Second, it only works if the
command on the left returns a logical (true or false) value of
some sort. (How you interpret that statement depends on what you
consider a success. If any non-empty, non-zero string is a
success for your function, you're good. If only certain strings
are OK, you need to use a more careful system of checking things.)
Third, you really need the parentheses on the arguments of the
open here. Why? Because || has a
somewhat higher precedence that you might want. In that case, the
or might be done before you want it to be, leading to
wackiness. There used to be an or operator that did
the same thing, but had a lower precedence and was safer for this
kind of thing. I am lead to understand that it's been deprecated,
alas.
Sometimes you have an error that's worth mentioning to the
user, but not really worth dying for. In this case, you might
find the warn() function handy. warn()
does the same thing as die() (send an error message
to STDERR; the message will contain both the string
you pass to the function as well as information about the program
and line number where the error was found), but it doesn't
terminate the program.
You might occasionally find the need to exit a program without
sending an error to STDERR. For example, the error
might simply not be worth it, you might be in an environment where
it wouldn't be very helpful, or you simply have no error to begin
with. In this case, exit() is the function for you.
It does what it says: exits the program. You can print an error
message with a print command, of course (useful for web scripting
errors), but you don't have to.
One problem in all of programming is that you simply cannot
anticipate all manner of errors. Even if your program is debugged
in the sense that everything works the way you planned, you might
get inputs that throw it for a loop. (In fact, it's generally
fantastically difficult to avoid that.) The solution is
eval. It looks like this:
eval
{
Buncha code here
};
(Note the semi-colon? eval is an
expression, not a control structure. So use the semi-colon. How
confusing, eh?)
What this does is run the code inside the block. If an error
occurs, it doesn't crash the program. Instead, it saves the error
message in $@. If no errors occur, $@
will be empty. This means that you can easily determine if no
errors occurred: !$@ means just that. If an error did
occur, you can exit gracefully, perhaps with a nice message to the
user and cleaning up your messes before you leave.
A way of avoiding getting errors from unexpected inputs is
"taint-checking". If you invoke Perl with the -T
flag (either at the command line or in your Perl-path at the top
of your program), Perl will work on input data from just anywhere.
With taint-mode on, Perl will not allow you to use these tainted
variables or any variables based off of them (using an assignment
to rename the variable doesn't cut it) to do anything that would
modify system resources. (Say, opening a file to write where the
file's name comes from an outside source.) The way you do this is
with a regular expression match of some kind. The idea here is to
create a regular expression that matches what you expect to be
given, and then use that to create untainted variables. If the
match doesn't go off, it means you got data in a form that you
didn't anticipate and that you should probably not continue the
program with those data.
I cannot honestly say that I've ever used this, but I really should.
If you're going to be printing to a particular file handle
quite a bit, it gets kind of tiresome to always have to say
print OUTFILE... rather that just print.
Happily, we can change our standard output (STDOUT)
to another filehandle. The operator we use is select
FILEHANDLE where FILEHANDLE is – of
course – a filehandle. After you run this command, all
output will default to FILEHANDLE. It is good
practice to select STDOUT when you're done with the
file, by the way. And you can always use print STDOUT
"Message"; to still print to standard out when some other
filehandle is selected.
It's often handy to know what the current date and time are
when you're running a Perl program. Luckily, Perl has taken care
of that. The function is localtime(). This function
actually takes an argument which is the timestamp (the number of
seconds since midnight on 1 January 1970 on Unix system, but don't
worry about that since Perl can hide all of these details from
you). How do you figure that out? You don't, time
does it for you. The command you'll usually want is generally
localtime(time). (The only times I have had cause to
modify this have been to get the information of days a few before
or after "today". To do this, add or subtract a multiple of
86400, the number of seconds in a day.) There are other ways to
use localtime, but I don't foresee you needing them
with any real regularity. And I think you'll recognize them when
you encounter them.
So I've told you how to call localtime(), but not
how to use what it returns. What it returns is a nine-element
list:
my ($second,
$minute,
$hour,
$day,
$month,
$year,
$weekday,
$yearday,
$isdst)
= localtime(time);
where the first six elements are what you probably
think they are: the current second, minutes, hour, day, month, and
year of your local time. (As determined by the clock on your
computer! Set it wrong and localtime() can't
magically see through the error!) One little danger: the year is
the "two-digit" year; it's the number of years since 1900 (holy
Y2K, Batman!). The solution to this problem is clear: add 1900 to
the year to get the real year. Anyway, pretty useful stuff, huh?
Yeah, but it gets even better. $weekday is the day
of the week, as a number from 0 to 6 (Sunday through Saturday);
make an array of names of the weekdays if you want to turn that
into the names you know and love. $yearday is the
day in the year, starting with 1 January being 0 and working up to
364 (or 365, this year) for 31 December. This can be handy for
working out time between two days. (Also handy for astronomy
nerds, since this does come up quite a bit.) Finally,
$isdst is a logical (0 or 1) variable that tells you
whether you're on daylight savings time.
Oh, there's another, similar function: gmtime.
That gives you the current universal time, which you might also
want. It works the exact same way as localtime()
except that it'll be 6 or 7 hours off for Boulder.
Remember when we learned about sort() and I said
that it just did the ASCII order thing? Well, I promised that it
can be taught new tricks. Here's how: you need to make a
sort-definition subroutine. This function will tell Perl how to
order any two items. After that, Perl's sort()
handles the rest.
Your sort-definition subroutine will take two arguments. We'll
call them $a and $b. You then return
one of three values: -1, 0, or 1. These correspond to
$a is earlier than $b, $a
and $b have the same position, sort-wise, and
$a comes after $b. So, for example, to
do a numeric sort:
sub by_number
{
if($a < $b)
{
return(-1);
}elsif($a>$b)
{
return(1)
}else
{
return(0);
}
}
Quick eyes will have noticed that I didn't
actually use the @_ array to define my variables.
This is because sort hands this for you to speed
things up. So $a and $b are
automatically set. Yay!
By the way, a lot of sort-definition subroutines start with
"by". You'll see why in a second. Another tidbit of interest is
that there is a "space-ship" operator that does the whole
if-structure above in one fell-swoop: $a <=>$b
will return -1, 0, or 1 just as above.
To use this subroutine rather than the default ASCIIbetical sorting, invoke sort like this:
my @sorted_array = sort(by_number @unsorted_array_of_numbers);
Note the lack of the comma.
Sometimes you write a really swell function and you want to use it in a lot of Perl programs. Or you get sick of always including certain data in every program file. (For example, the APS website likes to know who the webmaster is, about Susan, Jo Ann, and Beth and all of our email addresses. It'd be a drag to always enter that data.) Or maybe you don't think you're up to coding a massive set of routines to get HTML documents across the web or to read CGI data. Wouldn't it be great of Perl let you somehow import these things?
OK, say it with me this time: Of course Perl can do it!
The syntax for reading in other Perl files or important modules is either:
require("filename");
or
use("filename");
There is a slight difference between these, although you'll see
both used. (use() gets used almost all the time
– in my experience – for full-blown Perl modules, by
the way.) require() imports the code at run time
while use() does it at compile time. In fact, if I
understand correctly, use() uses
require() as well as a little bit of other magic.
You can call both require() and use()
with barewords (that is, not strings). In this case, call them
without the .pm extension as they'll assume it. It's
a bit less typing for you. Also, in the case of barewords you
indicate directories with :: rather than the usual
/ (in Unix, anyway).
There are a lot of modules out there for Perl. You can always
hit CPAN (http://www.cpan.org/ and get
things you need. However, quite a few should be installed on your
system to begin with. If you need a module, it might be best to
ask your system administrator to install it for you. (Use cookies
as a bribe if needed.) Some examples of modules I've found useful
include CGI (helps you handle web forms),
DBI (for interfacing with mySQL databases
painlessly), and LWP (grabs webpages and so I have have Perl look
them over).