Back to main workshop page
The short version is that Perl is a C-like programming language. Anyone who knows C will be at home with most of the syntax in Perl. It's also more or less a scripting language, so you don't compile it into binaries on your drive but rather you run Perl on the program. (Effectively. Later, we'll see that you can set things up so that you can just type the program's name at the command prompt.)
What is Perl good for? A lot of things, but Perl is especially good at doing all kinds of fun things to text. It was originally written by Larry Wall (remember that name, it gets used a lot) for going in to text files and doing various things to them. (I don't think I need to tell you how useful that capability is.) Anyone who has used C or FORTRAN to attempt this will know that while anything is possible with these languages, a lot of things are horrifically painful and borderline emotionally scarring.
Perl is also fantastic at scripting in any number of operating
system. Not surprisingly, UNIX and Perl play extremely well
together. (It's to the point now where I'm told that people are
discouraging use of sed and awk in favor
of just writing Perl routines. They're more flexible and easier
to understand.)
Perl is also nice because it's fast to write. It's not as fast at execution as, say, C, but it's a lot easier to bang out a script thanks to a variety of built-in functions, pre-written modules (written by many Perl users around the globe, free for your enjoyment), and generally less strict coding requirements. (This has a down side in that it can allow for some slightly sloppy programming, but it's not too bad.)
In short, Perl is what I think of as my "Swiss-Army Programming Language". Like your basic "Swiss-Army Knife", it's a handy tool to have in your pocket to be pulled out a moment's noticed and used. If you had a lot of time and resources, it would seldom be the best solution (if you could actually spend the time to write a program in C, it would probably run faster, for example), but you generally don't need execution speed, just writing speed.
Oh, here are some tasks for which Perl is well-suited:
outfoleXXX.dat where
XXX is a number. Clearly, you're embarassed by the
typo. But with a thousand files, who wants to rename them one
at a time? Perl does. It takes about 20 lines of code,
including error checking and whitespace.)crontab to actually activate the program regularly,
by the way.Oh, Perl doesn't really stand for anything. Sure, you'll hear stuff. But Larry Wall didn't make it stand for anything when he created it. He's since declared it to stand for "Practical Extration and Report Language", but that's a retrofit. So it's not in all capitals and I don't think of it as a proper acronym.
Back to top of page.
If you already know C or a C-like (syntax-wise) language, you're a good part of the way to having learned Perl. If you've never used such a language, Perl is still pretty easy to pick up. And once you get the hang of it, you'll be a positioned to learn a lot of other handy languages quickly.
For this workshop, I'll assume that you're working in a UNIX environment, although Perl runs on most every OS imaginable. There will be minor differences in the coding (directory paths may look different, for example), but nothing major. Also, note that there are Perl programming environment, but that I'll be assuming that you're just using your text editor of choice. (I'm a big Xemacs fan, especially loaded with the various packages, including the Perl mode.)
Back to top of page.
To create a Perl program, you'll need to open a new file in
which to place the code. (Yes, there is a way to run Perl as a
command-line thing, but you'll probably never have call to use it
that way.) The usual extension for a Perl program is
.pl, by the way. You don't need to use this,
however. (If you want to make the code executable, you'll
generally leave the extension off altogether. Not that it matters
as far as the OS is concerned, but it does make the program more
like a command.)
A good way to start your program is with
#!/path/to/perl/command/perl -w. When you have
#! as the first two characters of the first line, the
OS knows that this is a Perl program and to execute it with the
following command. You may want to give the full path to the Perl
binary on your system rather than just the perl
command. (Many UNIX systems have it at
/usr/bin/perl/perl, although many also have a second
Perl at /usr/local/bin/perl/perl. For some reason,
the latter seems to be a better option as it gets updated more
often/rapidly. Why? I don't know.) The -w is one
of oh-so-many flags the perl command will take. In
this case, it's says to turn on full warnings. (Before Perl
executes a script, it actually compiles it into memory. While it
does that, it can spot various kinds of programming errors.
-w says to be more verbose about these to help you
debug. You want this on most of the time, trust me.)
Back to top of page.
Before diving in very far, we should look at the basic syntax of the Perl language.
A Perl statment can occur on more than one line in your code
and it can be longer than the page width in your editor. (This
latter is a bad idea for the most part, since it makes things
difficult to read. But there are times when it's no big deal,
especially when the statement is a tiny bit more than the
line-width.) How do you continue over a line break? Basically,
you don't. Perl doesn't know that it's reached the end of a
statment until it sees a semi-colon (;). So if you
hit return a half a dozen times in the middle of a calculation
then finish things off and add a semi-colon, Perl will smile and
deal with it. So none of this $-type silliness to
continue statments on more than one line.
Incidentally, forgetting your semi-colon is probably the most common beginning mistake. No shame in doing it, we all did and continue to do it periodically. (Although it comes a lot more naturally with time. You'll find that your thoughts naturally include the semi-colon when you finish a statment. It's like a period in written English, only books don't crash when you neglect one of those.) The semi-colon is also the statment terminator in C/Java/Javascript/etc, so it's probably a convention worth getting used to.
Whitespace is a term used to refer to any spaces, tabs, or returns in code. Perl, like it's brethern, doesn't notice the stuff very often. (There are a few times when it matters, but they're sort of obvious. Like inside of strings and after a variable name but before a function name.) So you can tab indent, leave blank lines, wrap lines and align calculations neatly, and generally go to town with your code. The whole purpose of this is so that you'll make your program look pretty and readable, by the way.
That gentle introduction made, I'll now turn a bit nastier: tab-indent your code and leave blank lines where appropriate! It makes it a lot easier to read. Many editors know how to tab intend code and will automatically chose the right intendation based on your current nesting of statments. Snazzy. And for blank lines, I typically group statments that are part of the same task together (no blanks between them), leave one blank line between related tasks, and progressively more between more distantly related tasks. (Of course, comments also slip their ways in between these in many cases. But you see what I mean.)
A fundamental part of programming in Perl (and C, and... yeah.)
are braces: {}. You can group any statments together with them,
although typically you use them to group code to be executed (or
not) based on control statments (if's, for's, etc). There are
lots of styles on how to use braces. A common one seems to be to
start the brace after the control statment
(if(condition){...), but I usually go down to
the next line and put the brace more or less under the control
statment. This allows me to align the opening and closing braces
nicely. But whatever, it's a matter of personal style, not
absolute syntax.
The # in our suggested first line is our comment
character. It makes everything on that line after the
# into a comment. (Rather like the //
in C++ and the ;in IDL.) Use this character often or
be doomed to be forever trying to figure out what you did a week
ago. Comments can also be used to deactivate lines of code while
debugging.
Functions in Perl look like function_name(arguments, go,
here). (Technical note: user-written functions are also
sometimes called "subroutines" in Perl. I think that this is
silly nomenclature, so let us never speak of it again.) Arguments
go inside the parantheses and are seperated by commas. If you've
defined a function yourself, you should also put a
& in front of the function name
(&function_name()) to make it clear to Perl that
this is a function. Functions can either modify variables passed
as arguments or return values (or both). You can return any data
type you want, by the way.
(Oh, here's a little secret. The parentheses aren't strictly
necessary. For example, the print function in Perl is
print("string goes in here"), but you'll often see it
as print "string goes here". I'll admit that for the
print command, I generally leave off the parentheses. However,
this is, in my view, bad form in general. It's better to have the
parentheses there and to make it really clear to both humans and
Perl what you meant.)
We'll learn how to actually write functions in a little while, but now you know one when you see one.
Back to top of page.
Now we get to the real meat of coding: variables. Perl has a few basic data types. None of them are declared, as such, by the way. This leads to faster coding, but it also means that more of the burden to be a good coder is on you.
By the way, this is a good time to introduce the
my function. This is how you declare a variable
(array, scalar, string, hash, whatever) to have a certain scope.
A scope refers to where the variable is valid. For example, I can
declare a variable to have a scope limited to a specific function,
the whole program, or even inside of a for-loop. It's a good
programming practice to declare variables to have the minimum
scope necessary to do their jobs. The way you do this in Perl is,
at the first usage, applying my: my $variable =
5;. Actually, you don't even need to do anything to the
variable: my $variable; is also valid. The variable
will now have a scope limited to the containing block of code and
all of it's sub-blocks. So if you declared this at the start of
the program, it'll have a scope of the entire program. If you did
it inside a for-loop, it's just the for-loop (and when that is
done, the variable goes away).
Oh, variable names. Variable names can contains letters,
numbers, and underscores (_). Different types of variables will
begin with special symbols, but you'll see that below. The first
character after the special symbol needs to be a letter (for
variables you are creating; built-in variables can start
with other symbols). And note that Perl is case-sensative, so
$a is a different variable from $A.
Back to top of page.
Scalars are any kind of data that cames in single units. They
are prefaced by a $ sign. (So a variable will look
like $time or $first_name.). There are
two types of scalars, numbers and strings.
Ye olde number data type. Perl is kind of like us, it only understands "number". It doesn't get the idea that there are floats and doubles and integers and... well, you know. Just numbers. (Really, this is kind of a good thing. Perl thinks like you do about numbers. As far as I know, people don't have floats and ints and stuff in their heads, after all.) Scalars all start with a dollar sign ($) as their special symbol. (This "s" for scalar. Sort of.)
To declare a scalar, just do what I outlined above:
my $example_scalar = 5;
Or perhaps:
my $example_scalar;
...
$example_scalar = 5;
There is a bit more that could be said about numbers, like how to write hexadecimal numbers, but I'll skip that. Ask if you want to know or pick up a handy Perl book.
This will be fast as you know the basics. Your operators are
what you expect: +, -, *,
and /. Use parentheses to delimit subexpressions and
the normal order of operations apply. Oh, modulo is
% (like in C, of course) and exponentiation is done
the FORTRAN way, ** (compared to ^ in IDL). Of
course, there are also other functions like cos(),
running around. But Perl isn't IDL: it isn't mainly a language
concerned with crunching numbers. Don't expect to see fanciness
unless you write it or find someone else who did.
While I'm mentioning it, here's a helpful hint for the non-C
crowd. If you want to add 5 to a variable, say
$ctr, there's the obvious way:
$ctr=$ctr+5. But that's sort of annoying. If only
there were a faster way... Of course, I'm not one to taunt you and
there is: +=. This is actually a form of an
assignment operator that says. "Take the variable on the left side
and add the number on the right side. Store the result back in
the variable at left." (It's harder to say than it is to
understand, actually. Who knew?) So our new code would be
$ctr += 5. It even works when the number
isn't 5. See how cool Perl is? You also can do this
with -=, *=, /=, and
%=. (No points awarded for figuring out what each
does.)
But it gets even better! You know how you always seem to be
adding or subtracting 1 to things? Perl can do that as a
shortcut, too: ++ and --. (There could
be a +- or a -+, but that would be silly since it wouldn't
actually do anything noticable. So pretend that I never said
anything, OK?) So to add one to $ctr I write
$ctr++;. It's shorter to write and it feels more
natural to think. Take that, IDL!
(This is a massive aside, but I'm putting it here because
footnotes don't work well on the Web. It concerns where you put
++ and --. (I'll just talk about the
former, but everything is the same for both.) You can actually
put it either before or after the variable in question. The only
difference is that when it's before, Perl will add one to your
variable before using it in the rest of the expression it is
currently looking at while if it is after the variable, Perl
increments the variable after doing everything else. For example,
consider:
my $a1 = 1;
my $a2 = 1;
my $b1 = ++$a1;
my $b2 = $a2++;
In this case, $b1 ends up
holding 2 and $b2 ends up with 1. Why? Because Perl
increased $a1 before assigning the value to
$b1, but $a2 didn't get incremented
until after the assignment happened. Both $a1 and
$a2 equal 2 at then end of that code, by the way.
I always use "postincrement" format. But I also always treat the increment as a seperate statment, so it doesn't matter. Personally, I think trying to do more than one thing (say, assign more than one value) in a given expression leads to seven kinds of badness down the road and should be avoided. It's generally worth it to assign more variables, even temporary ones, than to try to cramp too much in to one expression.)
Strings are a series of characters, like "Abraham Lincoln", "M", "", or "1234". (Note that the last of these is a number, but it can also be a string. It depends on how I used it. The second to last of these is an empty string. It's a defined variable, there's just nothing in it. This is generally handy as either a place-holder variable (we might not have filled it yet, but we plan to later) or as a way of noting that some bit of data we might have expected to read in wasn't there.) Strings also start with a dollar sign. (Um, "s" for string? Basically, single-element data start with dollar signs.) We'll see two types of data that don't in a second.
Now, there are a few ways to declare a string. First, there are single and double quotes:
my $string_a = 'Four score and seven years ago';
my $string_b = "Four score and seven years ago";
Now both strings are identical. So why two delimiting symbols?
They are a bit different. The single quotes basically take
anything inside them literally as you type it. The only
exceptions are the single-quote itself (if you need a single
quote, escape it with a backslash, like \') and then
the backslash (to get the backslash, \\). And when I
say everything is taken literally, I mean things like tabs and
returns, too. I don't use this one much, by the way. I tend to
think in terms of the double quotes.
Double quotes interpolate stuff inside them. First of all,
there are a lot of escape sequences. For example, \n
codes a newline character. (Depending on your system, this could
also be a \r. It's a Windows-related oddity,
mainly.) \t is a tab. There are a lot of special
characters, too, like the dollar sign. If you want any of these
to really be in your string, escape them with a backslash (so
\$). Why the special characters? Because another
thing that happens inside of double quotes is that Perl will
replace variable names with what they contain. For example, if I
continue the code above with:
my $string_c = "String B: $string_b\n";
print $string_c;
I'll get String B: Four score and seven years ago
with the next thing printed starting on a new line. You see, Perl
just stuck the value of $string_b in where its name
appeared. (And now you see why the $ has to be
escaped.)
There are a few handy string functions that you need to meet quickly. It won't take but a minute. (There will be much more string related goodness next time, but that will keep.)
First, there's adding two strings together: you use the
.. (It's just like a + in IDL.) So the following
code
my $stringA = "It was the best of times";
my $stringB = "it was the worst of times";
print $stringA . ", " . $stringB . "\n";
prints out It was the best of
times, it was the worst of times. (With a line break
afterward. When printing out results, it's usually a good idea to
remember the line break. Otherwise your output looks funky, most
of the time.)
By the way, that last statment was the same as print
"$stringA, $stringB\n";. You can concatonate with the
double-quotes if you so desire. (I often do, if only because I
often find myself wanting a delimiter anyway.)
Also by the way, there is a .= operator out there.
Guess what it does? Yep, it takes the string on the right and
adds it to the end of the string on the left. You can see how
this might be handy, I'm sure.
Another handy function is split(/delimiter/,
$string). Here delimiter indicates a
delimiter (where you want to split up the string) and
$string is the string. So, for example, you could
split up a string of items that were seperated by commas or tabs
so that each item is seperate and easier to deal with. Note that
this function returns an array, but to see what exactly that means
(although I'll be you've already guessed) you'll have to wait
while I make one more point.
How does Perl know a string from a number? Well, if you have
anything except digits and at most one decimal point, it's
definately a string. (Easy enough.) If not, it depends on how it
was declare or last assigned. (That is, when was the variable
last on the left side of an equal sign?) If you declare a
variable as $scalar_number=12.3 you'll have a number.
If you do it like $string_number="12.3", it'll be a
string. But in these cases, the definition is a bit fuzzy. If I
try to do math with $string_number, Perl will turn it
into a number for me. Conversely, if I try to do string things on
$scalar_number, Perl will make a string out of it
first. (However, in general I find it good form to wrap the
variable in double quotes when using it like a string. This, of
course, makes a string that contains the number in question.)
Back to top of page.
Now the fun really begins. Arrays are what you think they are
(although you might also know them as "vectors" or even "lists" in
other languages): a list of scalars indexed by number. The
special symbol for an array is the at sign (@). So
an array name might be @month_array. But here's
where it gets a wee bit tricky. If you select a single element of
the array to do something, you use the $ again. Why?
Because the single element is a scalar. It's only when you are
treating the entire array that you use the @. If
this sounds a bit confusing, that's OK. It might take a little
while to get the feel for when you use the @ and when
you use the $. But with a little practice, you'll
develop an intuitive sense of it.
Of course, this demands the question, "How do you refer to a
single element, anyway?" The answer is by using the array name
(remember, with the $) and then the index in square
brackets. Nota bene: Perl is indexed starting at 0, not 1
like some languages. So the first element of an array is
$month_array[0] and the eleventh is
$month_array[10]. You can also use negative numbers.
These count from the end of the array:
$array_name[-1] is the last element, and so
forth.
There are multiple ways to create an array. One is to assign
values to each element, one at a time ($month_array[i] =
$new_value;). This can be the way to go if you're looping
through the array filling it slowly. But if you already know all
of the values, it's kind of annoying to type. Luckily, there is a
faster way. You can provide a list, delimited with commas and
starting/ending with parantheses and assign it straight to the
array: @month_array = (31, 28, 31, 30, 31, 30, 31, 31, 30,
31, 30, 31);. That's a time-saver for you.
Another time-saver is interpolation. Perl can sometimes guess
what you want in an array and fill in the middle bits for you.
Iterpolation is down the with .. operator (almost
like an ellipsis – but with one fewer dots – for
obvious reasons). For example, for an array from 1 to 100, I can
do my @integers = (1..100); and Perl would take care
of filling all 100 elements correctly. Isn't Perl wonderful?
You can assign an entire array to another array in Perl just
like in IDL. (But not like in C – bummer, eh?) my
@new_array = @old_array;.
At this point, I should tell you that you can index more than
one element of an array, but it's in an array context in that
case. For example, @month_array[5,6,7] will be an
array containing three elements: the number of days in each of the
summer months.
A final thing to know about arrays is $#array_name
(where array_name is, of course, the name of your array). This
variable contains the index of the last element of your array.
(So it's actually the size of the array less one because Perl
indexes starting at zero.) I won't bother telling you how handy
this can be, you already have figured it out.)
Back to top of page.
At this point, I want to introduce a few special array
functions of interest: push, pop,
shift, and unshift.
push(@array, item)pop(@array)shift(@array)pop, but from the other end of the array.unshift(@array, item)push, but at the beginning of the array.Also potentially of interest is reverse(). As
you've undoubtedly surmised, this reverses the array and returns
the result. (As opposed to altering the array that it gets as an
argument.)
One last function of interest is sort().
Obviously, this well-named function sorts an array. It's easy to
see how it might deal with numbers, but you'd be wrong.
sort() works on strings by default and therefore
would sort numbers so that all of those that start with 0 where
first, then those that started with 1, and so forth. The result
of this is that 100 would come before 99, which isn't what you
generally want. (Of course, you've almost certainly seen this
sort of idiotic sorting before.) Luckily, Perl gives us a way to
make sort smarter. We'll see how to do that later, however. For
now, know that it exists and know that you can sort strings with
it without any extra help.
Back to top of page.
Finally, hashes or "associative arrays". Many of you will
never have seen these before, but they're like super arrays. A
hash, like an array, is a list of values. But now they're not
indexed by sequential numbers. They're indexed by "keys", each of
which could be a string or a number (as long as they're unique,
since otherwise things would get confusing, right?). For example,
you might have figured out that my array above
(month_array) is really just the number of days in
each month. (Non-leap year.) But to use it, I have to turn each
month into a number. That would be annoying in many cases. With
a hash, I could use the same values as above, but choose keys
"Jan", "Feb", "Mar", "Apr", etc.
How do we indicate a hash? With a percent sign:
%month_hash. Again, when we select a single element
from the hash, we use a dollar sign. We use curly-braces
({}) to enclose the key:
$month_hash{"Aug"} is 31, for example. (Note that
they key was a string, so I needed the quotes.)
We have the same options to assign values and keys to elements.
We could fill each pair one at a time (if the key has already been
used, Perl overwrites the old value. If the key is new, you get a
new key-value pair. Again, remember that you've got case
sensative keys!) Or you could do it in one fell swoop. You still
call the hash by its full hash name and use parantheses to enclose
the pairs and commans to delimit them. But now you need to give a
key and a value for each. Use => to indicate the
key to value relationship: %time_hash=("Jan"=>31, "Feb"=>28,
..., "Dec"=>31);
Back to top of page.
Of course there are special functions for hashes! (In fairness, I should note that I've only used one of these really, and even that is pretty rare. At least in my coding, hashes tend to take care of themselves.)
The first you might want to meet are keys() and
values(). Both of these act on the hash and return
arrays. You can probably figure out what they do, but in short
they return an array of the keys in the hash and the values in the
hash. (Note: if you do these two function back-to-back, the keys
and values will be paired correctly. However, if you mess with
the hash at all (delete or add entries, even if you appear to undo
what you just did) in between, there are no promises about the
pairings. So I wouldn't get too attached to the idea that they'll
match up. But, then, you don't need to given how hashes work,
right?) Really, I've only ever seen keys() used,
although I could cook up a scheme where you might want values.
But it's sort of unlikely.
Another function on hashes is each(%hash_name). This
function also returns an array, but in this case it's a
two-element array of the key and value. Each time you invoke the
function on a hash, the next key/value pair is returned. This can
be handy in loops, naturally.
delete() deletes a key/value pair from a given
hash. (Wait, why didn't we have something like this for arrays?
We did, it's just looks different because of how arrays and hashes
are indexed.) The syntax is delete($hash_name{"key"}).
Note that it's difficult to confuse this function. As long as the
hash exists, it's happy since if the key doesn't exist, the
function's job is done before it begins!
The aptly-named exists($hash_name{"key"}) tells
you if a given key exists in the hash. (Perl is pretty good at
naming functions, isn't it?) Actually, exists()
works on any scalar. It's akin to IDL
n_elements(variable), but shorter.
Back to top of page.
Have variables is great. Being able to do math is also great. But what makes a programming language really useful are the control structures: those things that tell the code to repeat operations or to make decisions. And it's time we met Perl's retinue. (Note: once again, if you know C, you'll be at home here. But don't get too cocky, there are a few changes and new faces.)
Right, if. You already know that it tells the Perl
to do one thing if a condition is met and (maybe) something else
if not. Pretty snazzy stuff, if you think about it. But also
very much at the heart of almost any code you'll ever write. Your
basic if-statment looks like this:
if(condition)
{
Commands here.
}
We will talk about what the conditions look like in a moment. The commands are any bit of Perl code you'd like. Oh, and the braces are mandatory in Perl (unlike C). What if you want an else-statment in there? Just add it:
if(condition)
{
Commands here.
}else
{
Other commands here.
}
Easy enough. What if you want more than two possible
conditions (for example, you're worried about the sign of a number
which could be positive, negative, or zero)? Then use the
elsif (note the spelling, it's a mite odd):
if(condition)
{
Commands here.
}elsif(another condition)
Some completely different set of commands here.
}else
{
Other commands here.
}
Note that you must have one if, at most one
else, and as many elsifs as you like.
(By the way, I recommend having a catchall else for the most part.
So for the above example of looking at the sign of the number, I'd
have an if that asks if it's postive, and
elsif that asks if it's negative, an
elsif that asks if it's zero, and then a final else
that we get to only if none of the conditions are true. (Say the
variable isn't defined or it is a string.) Believe it or not,
I've seen code hit these conditions before even when I thought
them impossible to meet!)
Back to top of page.
?:?: is a shortcut operator that exists in many
languages. (Including C, IDL, and – naturally –
Perl.) It is also the only "ternary" operator I can think of in
any language. (Unary operators act on one thing, like
++. Binary operators act on two, like
+. Ternary act on... yes, three.) What the operator
does is give you a shortcut for short if...else
statments. It looks like this: (conditional)? if true : if
false. So if the condition (left thing) is true, the
middle thing is done, if the condition is false, the right thing
is done. You can only use single statments in those actions, so
it's of limited use. However, it's very handy for doing things
like $days_in_Feb = ((($year % 4 == 0)) ? 29:
28);. (Nota Bene — The parentheses
emcompassing the entire right side of the assignment are strictly
speaking not needed. I usually put them in, though, out of
paranoia and because I find it easier to read. White them out if
you don't like them. Unless you're reading this on your computer
screen, in which case for crying out loud don't!)
Really, this is just a lazy way of doing an
if...else. But that's Perl for you: always giving
you ways of doing simple tasks that much faster. And
incidentally, this offers you the closest thing to the
"case...of..." structure in IDL (or "switch...case" in C),
although with a limit ability to only perform one action per case.
The way you do this is by nesting the ?:
structures:
my $my_variable =
($a < 0) ? "Negative" :
($a == 0)? "Zero" :
($a > 0) ? "Positive" :
"Non-number";
Back to top of page.
Just what are the possible ways of writing that condition?
First, a quick note: all the conditions that you're about to see
really just return 1 or 0 (true or false). In Perl, the condition
in the if-statment can either be one of these
statments, or a variable. Variables that are undef
(a special Perl variable for "undefined"), 0, the string 0, or the
empty string are all false. Everything else is true. This can be
quite handy to know. But on to conditions.
Numbers are easy in Perl, you'll recognize the conditions.
First, there is "equal to": ==. (A common mistake,
especially among those new to Perl, but occuring even for the Perl
veterns: using = rather than ==. The
assignment operator always evalutes to true. It also overwrites
you variable. This is generally not a good thing.) To to see if
$pi is equal to 5, if($pi == 5). (It
had better not, but whatever.) "Not equal to" is !=,
by the way. We'll say hello to Mr. ! again in a moment.
"Greater than" and "less than" are > and
< while "greater than or equal to" and "less than
or equal to" are >= and <=. So
far so good.
Of course, this wouldn't be Perl if we couldn't also look at
strings. The most common string comparision is equal to,
eq. (IDL/FORTRAN people, you recognize this. But it
only works on strings in Perl.) This tests for an exact
string match. I find that I don't use this a lot except when the
string is pretty short. When we meet regular expressions next
time, you'll understand why. The opposite of eq is
ne, which tests for non-equality.
There is also gt, lt,
ge, and le for "greater than", "less
than", "greater than or equal to" and "less than or equal to".
These test alphabetization, in effect. (Actually, they test ASCII
order. So you'll find issues with non-letters and mixed cases.)
As you already know, you'll frequently want to have one
condition and another condition met or possible one condition or
another condition. So you use Perl's and and or: &&
and ||. I suggest a liberal use of parantheses and
white-space to make sure that you get the conditional that you
want when you use compound conditions:
if(($a == 2) ||
(($a ==1) && ($b == 2))
)
{
Do stuff.
}
And sometimes it's easier to write a condition in the negative.
Preciding such a statment with a ! means "not". So
$b !=5 and !($b == 5) mean the same
thing.
By the way, a little secret to pass along. If you have a compound conditon with an or in it and the first condition is true, Perl never even looks at the second condition. Conversely, if you have an and in there and the first condition is false, then the second condition is also not checked. (You can see why, if you think about it.) This means that you can avoid potentially time-wasting function calls or embarassing crashes by putting the offending code in the second conditions.
Back to top of page.
Besides the ability to make decesions, computers are very good
are repetative tasks. In fact, they seem to like them. So it's
luck for everyone that Perl has a for-loop. The
syntax goes like this: for(initialize counter; while
condition; stepping condition). The initialization means
you set some inital value to the counter, like
$counter=0. The while condition is the condition
under which Perl should run the loop again (like
$counter<5). The stepping condition says what Perl
should do with the counter at each step ($counter++
is popular). So we'd have something like
for(my $counter=0; $counter<5;$counter++)
{
Some mindless, repetative tasks here
}
What will this do? This will repeat the loop 5 times. (For
$counter = 0, 1, 2, 3, and 4.) Make sure that you
note the semi-colons, by the way. They're important. (Note for
the nitpicky: I used my in the initialization. This
means that $counter is only valid inside the loop.
Or, at least, this incarnation of the variable is. If you want to
use the counter later, don't use my. In general,
however, you won't care so my is recommended.)
Oh, a little trivia for anyone interested: you technically
don't need to fill in any of the for fields. If you
leave them empty (you still need two semicolons!), you'd better
have initialized the counter variable, set up an ending condition
inside the loop, and/or done something to change the counter. For
example, if you put for(;;), the loop will run
infinitely unless you have thought to set up a way to break
out.
Back to top of page.
This is sort of a subset of for(), so I'll put it
here. There is another for-loop structure in Perl called
foreach. It's used like this: foreach $item
(@array). Perl's response to this is to go through each
item in @array in order and put a copy into
$item. If you're churning through an array, this is
quite handy. (Note, however, that you need to declare
$item before the loop (my doesn't seem
to be allowed in there, lord knows why) and you need a
pre-declared array, naturally.) It is worth remembering, however,
that this is the same as
for(my $ctr=0;$ctr<=$#array;$ctr++)
{
$item = $array[$ctr];
.
.
.
}
(Question: why did I not use
my on $item?) So foreach
is redundant, but you can see why it is a but nicer to work with
under many conditions.
Back to top of page.
Technically, for and while are also
redundant, although a bit more subtly than with
foreach. In some sense, while is the
more basic type of loop, but I think that for is a
bit easier to understand. while looks like this:
while(condition)
{
Code goes here.
}
What this does is run the code inside the braces until the condition becomes false. If the condition starts false, the code in the braces never executes even once.
Back to top of page.
Knowing about the two big sorts of loops (for- and while-loops) is all well and good, but you'll eventually want more. You'll want to be able to break out of a loop earlier than you had intended, for example. There are three commands that help you in this regard.
lastnextredonext, it stops the current iteration of
the loop. But this time, it hops up to the start of the block,
not incrementing counters and checking ending
conditions.Back to top of page.
And now for the moment you've all been waiting for: how to create a function. (OK, it's not that amazing. Actually, people seem to write a rather small number of functions in Perl.) Functions are handy in programming for a lot of reasons. One is because it saves you time: if there's a task that gets done a lot, making a function saves you having to write the same code over and over. It also saves you time debugging and modifying the code. Functions are also handy for writing clear, less buggy code: the more you break your tasks into functions, the easier the code tends to be to read and the more likely it is to be written correctly. (Although, as I said, Perl programs don't seem to be quite as function-oriented as in other languages, I've observed. This may, however, be my biased sample. In any case, this doesn't stop you from going to town with functions.)
Getting on with how to create a function, let's look at how to
declare a function. It's quite simple, since there is no
prototyping and there is nothing to say how many arguments the
function has to take. (The code for the function might have
issues if too few arguments are passed, of course, but that's a
problem for you to deal with.) I put my functions at the end of
my code, but apparently many folks actually put theirs at the
beginning. So whatever turns you on. You denote a function with
sub (short for subroutine, of course):
sub my_funtion
{
All kinds of functional fun goes in here.
}
Again, the braces are required so that Perl knows when to stop with the function. And you can name the function any sort of valid variable name.
Back to top of page.
One of the most important things to do with functions is to
pass them arguments so that they can do stuff with them. We have
already seen how to invoke a function with arguments, but here's
how to handle them. There is a built-in array in every function
called @_. This array contains the arguments used
when the function was called. So $_[0] is the first
argument, and so forth. (As a rule, I'd pick these off into
better-named variables in my functions right away when the
function was called. But that's a matter of style, really.) So
we could call a function with &my_function(1, 5, "Carl
Sagan"). The code for the function might be something
like:
sub my_function
{
my $iterations = $_[0];
my $prime_number = $_[1];
my $astronomer_name = $_[2];
Some bizzare code to do things with these parameters.
}
As a matter of policy, it is wise to check how many parameters were passed to the function. (I'll confess that I don't do this all that often, myself. I should, though.) If two few arguments are passed, abort the execution with some kind of message. (We'll meet some ways of exiting functions in a moment. As far as programs go, we'll see that in two weeks.) Actually, it's wise to even check the nature of the passes arguments. (Are they strings? Numbers? Arrays? Etc.)
Back to top of page.
Functions often return values (sometimes the value is just 1 or
0 to tell if it executed sucessfully). That was a general
statment about programming. Perl in particular always
returns values from functions. The values might be an empty list
or undef, Perl's internal code for "This variable
isn't really defined," but there is a return value. Bear this in
mind when working with functions! (In general, you may want to
make sure that the function returns something meaningful, if only
the simple "Yep, I did my job successfully" value of 1. Down the
road you might find it helpful to know that your function did its
thing.)
There are two ways to do this in Perl. The first way is what I call the sloppy (aka, "stupid") way: the result of last operation you perform in a function will automatically be returned. This is just a bad way to go, though, for reasons I think you can work out. Also note that the "last operation" can include essentially any line of Perl (except things like braces) so even a print statment (which technically returns a value of "1" if it sucessfully does its job) could be that line.
The better way to return values is with the creatively-named
return() funtion. It's easy to use since it just
returns the value in the parentheses and exits the function. (So
any code that comes later won't be executed!) This is definately
the way to go, in my view. For one thing, you can always have
multiple return statments in a function. (You might have some
if-statments and inside each you return a different
value, for example.) For another, it's always really easy to spot
where the returning is being done.
Oh, if you use return() with no parameter, the
undef or empty array gets passed back.
Back to top of page.
It's obvious that programs want input and output quite a bit. Otherwise, what's the point? We'll start with output to screen and work our way up from there.
You've already seen the print() function. This
prints to the screen by default, as you've already learned. I
should add that you can also print arrays as well as scalars.
print @my_array; will print out all of the values
stored in @my_array. This can be handy. It does
not, however, work on hashes.
Another way to print is with printf(). The syntax
here is printf(format_string, variables).
format_string is a string that contains formatting
information (and also general text stuff, if you wish). It's
followed by the variables (as many as you referenced) references in
the formatting string. Wait, what variables? Ah, that's the
beauty of the format string. While it's easy to pop variables
into a string for printing with print() (just use a
double-quoted string), you can't generally control the format.
Enter the format strings of printf().
The real key to format strings are the variable place-holders. They all start with percent signs and end with letters (%): %s, %d, %f, %e, %E, %g, etc. There might be other stuff in between, as we will see in a moment. The letters indicate different variable types. (String, integer, float, two kinds of scientific notation (one with a lowercase "e" and one with an uppercase "E"), and a "let Perl guess what numeric syntax works best.") Already you can see a value here: you can force the numbers into whatever format you need.
Even better, if you insert extra (optional) codes between the
percent and the identifier letter, you can control how the
variables are displayed. Just a number will control how wide the
outputted number is. A 4 will make it 4 places long. (This might
include a decimal point!) If you want to control the number of
places after the decimal, us a decimal point followed by the
number of places. (So %.2f would be good for
American currency, for example.) Prefacing the whole number with
a 0 will pad the number out so that it is exactly a certain number
of places long. (Given a choice, Perl will permit a number to be
up to that size, but not necessarily that many. So I
might use printf("My sister is class of '%02d.\n",
$sister_graduation_year); to get Perl to print "My sister
is class of '02." (Without the 0 in the format string, Perl would
print, "My sister is class of ' 2." Which looks funny to say the
least.)
One final note. Since we've become wise in the ways of
printf(), it behooves me to point out that there is
another function, sprintf() that works precisely the
same way. This function (which you can think of as "string print
formatted) does what printf() does, but puts the
result in a string which you can assign rather than print it to
screen.
(There are actually a number of places where these format strings are handy in I/O in Perl. And, for that matter, they're very handy in C. So they're probably worth your time to pick up, if only up to a conversant level.)
Back to top of page.