Introductory Perl

Topics

Back to main workshop page

What is Perl, Anyway?

The short version is that Perl is a C-like programming language. Anyone who knows C will be at home with most of the syntax in Perl. It's also more or less a scripting language, so you don't compile it into binaries on your drive but rather you run Perl on the program. (Effectively. Later, we'll see that you can set things up so that you can just type the program's name at the command prompt.)

What is Perl good for? A lot of things, but Perl is especially good at doing all kinds of fun things to text. It was originally written by Larry Wall (remember that name, it gets used a lot) for going in to text files and doing various things to them. (I don't think I need to tell you how useful that capability is.) Anyone who has used C or FORTRAN to attempt this will know that while anything is possible with these languages, a lot of things are horrifically painful and borderline emotionally scarring.

Perl is also fantastic at scripting in any number of operating system. Not surprisingly, UNIX and Perl play extremely well together. (It's to the point now where I'm told that people are discouraging use of sed and awk in favor of just writing Perl routines. They're more flexible and easier to understand.)

Perl is also nice because it's fast to write. It's not as fast at execution as, say, C, but it's a lot easier to bang out a script thanks to a variety of built-in functions, pre-written modules (written by many Perl users around the globe, free for your enjoyment), and generally less strict coding requirements. (This has a down side in that it can allow for some slightly sloppy programming, but it's not too bad.)

In short, Perl is what I think of as my "Swiss-Army Programming Language". Like your basic "Swiss-Army Knife", it's a handy tool to have in your pocket to be pulled out a moment's noticed and used. If you had a lot of time and resources, it would seldom be the best solution (if you could actually spend the time to write a program in C, it would probably run faster, for example), but you generally don't need execution speed, just writing speed.

Oh, here are some tasks for which Perl is well-suited:

Oh, Perl doesn't really stand for anything. Sure, you'll hear stuff. But Larry Wall didn't make it stand for anything when he created it. He's since declared it to stand for "Practical Extration and Report Language", but that's a retrofit. So it's not in all capitals and I don't think of it as a proper acronym.

Back to top of page.

Basics of Perl Programs

If you already know C or a C-like (syntax-wise) language, you're a good part of the way to having learned Perl. If you've never used such a language, Perl is still pretty easy to pick up. And once you get the hang of it, you'll be a positioned to learn a lot of other handy languages quickly.

For this workshop, I'll assume that you're working in a UNIX environment, although Perl runs on most every OS imaginable. There will be minor differences in the coding (directory paths may look different, for example), but nothing major. Also, note that there are Perl programming environment, but that I'll be assuming that you're just using your text editor of choice. (I'm a big Xemacs fan, especially loaded with the various packages, including the Perl mode.)

Back to top of page.

How To Create a Program

To create a Perl program, you'll need to open a new file in which to place the code. (Yes, there is a way to run Perl as a command-line thing, but you'll probably never have call to use it that way.) The usual extension for a Perl program is .pl, by the way. You don't need to use this, however. (If you want to make the code executable, you'll generally leave the extension off altogether. Not that it matters as far as the OS is concerned, but it does make the program more like a command.)

A good way to start your program is with #!/path/to/perl/command/perl -w. When you have #! as the first two characters of the first line, the OS knows that this is a Perl program and to execute it with the following command. You may want to give the full path to the Perl binary on your system rather than just the perl command. (Many UNIX systems have it at /usr/bin/perl/perl, although many also have a second Perl at /usr/local/bin/perl/perl. For some reason, the latter seems to be a better option as it gets updated more often/rapidly. Why? I don't know.) The -w is one of oh-so-many flags the perl command will take. In this case, it's says to turn on full warnings. (Before Perl executes a script, it actually compiles it into memory. While it does that, it can spot various kinds of programming errors. -w says to be more verbose about these to help you debug. You want this on most of the time, trust me.)

Back to top of page.

Syntax Basics

Before diving in very far, we should look at the basic syntax of the Perl language.

Statments

A Perl statment can occur on more than one line in your code and it can be longer than the page width in your editor. (This latter is a bad idea for the most part, since it makes things difficult to read. But there are times when it's no big deal, especially when the statement is a tiny bit more than the line-width.) How do you continue over a line break? Basically, you don't. Perl doesn't know that it's reached the end of a statment until it sees a semi-colon (;). So if you hit return a half a dozen times in the middle of a calculation then finish things off and add a semi-colon, Perl will smile and deal with it. So none of this $-type silliness to continue statments on more than one line.

Incidentally, forgetting your semi-colon is probably the most common beginning mistake. No shame in doing it, we all did and continue to do it periodically. (Although it comes a lot more naturally with time. You'll find that your thoughts naturally include the semi-colon when you finish a statment. It's like a period in written English, only books don't crash when you neglect one of those.) The semi-colon is also the statment terminator in C/Java/Javascript/etc, so it's probably a convention worth getting used to.

Whitespace

Whitespace is a term used to refer to any spaces, tabs, or returns in code. Perl, like it's brethern, doesn't notice the stuff very often. (There are a few times when it matters, but they're sort of obvious. Like inside of strings and after a variable name but before a function name.) So you can tab indent, leave blank lines, wrap lines and align calculations neatly, and generally go to town with your code. The whole purpose of this is so that you'll make your program look pretty and readable, by the way.

That gentle introduction made, I'll now turn a bit nastier: tab-indent your code and leave blank lines where appropriate! It makes it a lot easier to read. Many editors know how to tab intend code and will automatically chose the right intendation based on your current nesting of statments. Snazzy. And for blank lines, I typically group statments that are part of the same task together (no blanks between them), leave one blank line between related tasks, and progressively more between more distantly related tasks. (Of course, comments also slip their ways in between these in many cases. But you see what I mean.)

Braces

A fundamental part of programming in Perl (and C, and... yeah.) are braces: {}. You can group any statments together with them, although typically you use them to group code to be executed (or not) based on control statments (if's, for's, etc). There are lots of styles on how to use braces. A common one seems to be to start the brace after the control statment (if(condition){...), but I usually go down to the next line and put the brace more or less under the control statment. This allows me to align the opening and closing braces nicely. But whatever, it's a matter of personal style, not absolute syntax.

Comments

The # in our suggested first line is our comment character. It makes everything on that line after the # into a comment. (Rather like the // in C++ and the ;in IDL.) Use this character often or be doomed to be forever trying to figure out what you did a week ago. Comments can also be used to deactivate lines of code while debugging.

Functions

Functions in Perl look like function_name(arguments, go, here). (Technical note: user-written functions are also sometimes called "subroutines" in Perl. I think that this is silly nomenclature, so let us never speak of it again.) Arguments go inside the parantheses and are seperated by commas. If you've defined a function yourself, you should also put a & in front of the function name (&function_name()) to make it clear to Perl that this is a function. Functions can either modify variables passed as arguments or return values (or both). You can return any data type you want, by the way.

(Oh, here's a little secret. The parentheses aren't strictly necessary. For example, the print function in Perl is print("string goes in here"), but you'll often see it as print "string goes here". I'll admit that for the print command, I generally leave off the parentheses. However, this is, in my view, bad form in general. It's better to have the parentheses there and to make it really clear to both humans and Perl what you meant.)

We'll learn how to actually write functions in a little while, but now you know one when you see one.

Back to top of page.

Data Types

Now we get to the real meat of coding: variables. Perl has a few basic data types. None of them are declared, as such, by the way. This leads to faster coding, but it also means that more of the burden to be a good coder is on you.

By the way, this is a good time to introduce the my function. This is how you declare a variable (array, scalar, string, hash, whatever) to have a certain scope. A scope refers to where the variable is valid. For example, I can declare a variable to have a scope limited to a specific function, the whole program, or even inside of a for-loop. It's a good programming practice to declare variables to have the minimum scope necessary to do their jobs. The way you do this in Perl is, at the first usage, applying my: my $variable = 5;. Actually, you don't even need to do anything to the variable: my $variable; is also valid. The variable will now have a scope limited to the containing block of code and all of it's sub-blocks. So if you declared this at the start of the program, it'll have a scope of the entire program. If you did it inside a for-loop, it's just the for-loop (and when that is done, the variable goes away).

Oh, variable names. Variable names can contains letters, numbers, and underscores (_). Different types of variables will begin with special symbols, but you'll see that below. The first character after the special symbol needs to be a letter (for variables you are creating; built-in variables can start with other symbols). And note that Perl is case-sensative, so $a is a different variable from $A.

Back to top of page.

Scalars

Scalars are any kind of data that cames in single units. They are prefaced by a $ sign. (So a variable will look like $time or $first_name.). There are two types of scalars, numbers and strings.

Numbers

Ye olde number data type. Perl is kind of like us, it only understands "number". It doesn't get the idea that there are floats and doubles and integers and... well, you know. Just numbers. (Really, this is kind of a good thing. Perl thinks like you do about numbers. As far as I know, people don't have floats and ints and stuff in their heads, after all.) Scalars all start with a dollar sign ($) as their special symbol. (This "s" for scalar. Sort of.)

To declare a scalar, just do what I outlined above:

      
	my $example_scalar = 5;
      
    

Or perhaps:

      
	my $example_scalar;
	...
	$example_scalar = 5;
	
    

There is a bit more that could be said about numbers, like how to write hexadecimal numbers, but I'll skip that. Ask if you want to know or pick up a handy Perl book.

Mathematical Operators

This will be fast as you know the basics. Your operators are what you expect: +, -, *, and /. Use parentheses to delimit subexpressions and the normal order of operations apply. Oh, modulo is % (like in C, of course) and exponentiation is done the FORTRAN way, ** (compared to ^ in IDL). Of course, there are also other functions like cos(), running around. But Perl isn't IDL: it isn't mainly a language concerned with crunching numbers. Don't expect to see fanciness unless you write it or find someone else who did.

While I'm mentioning it, here's a helpful hint for the non-C crowd. If you want to add 5 to a variable, say $ctr, there's the obvious way: $ctr=$ctr+5. But that's sort of annoying. If only there were a faster way... Of course, I'm not one to taunt you and there is: +=. This is actually a form of an assignment operator that says. "Take the variable on the left side and add the number on the right side. Store the result back in the variable at left." (It's harder to say than it is to understand, actually. Who knew?) So our new code would be $ctr += 5. It even works when the number isn't 5. See how cool Perl is? You also can do this with -=, *=, /=, and %=. (No points awarded for figuring out what each does.)

But it gets even better! You know how you always seem to be adding or subtracting 1 to things? Perl can do that as a shortcut, too: ++ and --. (There could be a +- or a -+, but that would be silly since it wouldn't actually do anything noticable. So pretend that I never said anything, OK?) So to add one to $ctr I write $ctr++;. It's shorter to write and it feels more natural to think. Take that, IDL!

(This is a massive aside, but I'm putting it here because footnotes don't work well on the Web. It concerns where you put ++ and --. (I'll just talk about the former, but everything is the same for both.) You can actually put it either before or after the variable in question. The only difference is that when it's before, Perl will add one to your variable before using it in the rest of the expression it is currently looking at while if it is after the variable, Perl increments the variable after doing everything else. For example, consider:

      
	my $a1 = 1;
	my $a2 = 1;
	my $b1 = ++$a1;
	my $b2 = $a2++;
      
    

In this case, $b1 ends up holding 2 and $b2 ends up with 1. Why? Because Perl increased $a1 before assigning the value to $b1, but $a2 didn't get incremented until after the assignment happened. Both $a1 and $a2 equal 2 at then end of that code, by the way.

I always use "postincrement" format. But I also always treat the increment as a seperate statment, so it doesn't matter. Personally, I think trying to do more than one thing (say, assign more than one value) in a given expression leads to seven kinds of badness down the road and should be avoided. It's generally worth it to assign more variables, even temporary ones, than to try to cramp too much in to one expression.)

Strings

Strings are a series of characters, like "Abraham Lincoln", "M", "", or "1234". (Note that the last of these is a number, but it can also be a string. It depends on how I used it. The second to last of these is an empty string. It's a defined variable, there's just nothing in it. This is generally handy as either a place-holder variable (we might not have filled it yet, but we plan to later) or as a way of noting that some bit of data we might have expected to read in wasn't there.) Strings also start with a dollar sign. (Um, "s" for string? Basically, single-element data start with dollar signs.) We'll see two types of data that don't in a second.

Now, there are a few ways to declare a string. First, there are single and double quotes:

      
	my $string_a = 'Four score and seven years ago';
	my $string_b = "Four score and seven years ago";
      
    

Now both strings are identical. So why two delimiting symbols? They are a bit different. The single quotes basically take anything inside them literally as you type it. The only exceptions are the single-quote itself (if you need a single quote, escape it with a backslash, like \') and then the backslash (to get the backslash, \\). And when I say everything is taken literally, I mean things like tabs and returns, too. I don't use this one much, by the way. I tend to think in terms of the double quotes.

Double quotes interpolate stuff inside them. First of all, there are a lot of escape sequences. For example, \n codes a newline character. (Depending on your system, this could also be a \r. It's a Windows-related oddity, mainly.) \t is a tab. There are a lot of special characters, too, like the dollar sign. If you want any of these to really be in your string, escape them with a backslash (so \$). Why the special characters? Because another thing that happens inside of double quotes is that Perl will replace variable names with what they contain. For example, if I continue the code above with:

      
	my $string_c = "String B: $string_b\n";
	print $string_c;
      
    

I'll get String B: Four score and seven years ago with the next thing printed starting on a new line. You see, Perl just stuck the value of $string_b in where its name appeared. (And now you see why the $ has to be escaped.)

Useful String Functions

There are a few handy string functions that you need to meet quickly. It won't take but a minute. (There will be much more string related goodness next time, but that will keep.)

First, there's adding two strings together: you use the .. (It's just like a + in IDL.) So the following code

      
	my $stringA = "It was the best of times";
	my $stringB = "it was the worst of times";

	print $stringA . ", " . $stringB . "\n";
	
      
    

prints out It was the best of times, it was the worst of times. (With a line break afterward. When printing out results, it's usually a good idea to remember the line break. Otherwise your output looks funky, most of the time.)

By the way, that last statment was the same as print "$stringA, $stringB\n";. You can concatonate with the double-quotes if you so desire. (I often do, if only because I often find myself wanting a delimiter anyway.)

Also by the way, there is a .= operator out there. Guess what it does? Yep, it takes the string on the right and adds it to the end of the string on the left. You can see how this might be handy, I'm sure.

Another handy function is split(/delimiter/, $string). Here delimiter indicates a delimiter (where you want to split up the string) and $string is the string. So, for example, you could split up a string of items that were seperated by commas or tabs so that each item is seperate and easier to deal with. Note that this function returns an array, but to see what exactly that means (although I'll be you've already guessed) you'll have to wait while I make one more point.

Strings versus Numbers

How does Perl know a string from a number? Well, if you have anything except digits and at most one decimal point, it's definately a string. (Easy enough.) If not, it depends on how it was declare or last assigned. (That is, when was the variable last on the left side of an equal sign?) If you declare a variable as $scalar_number=12.3 you'll have a number. If you do it like $string_number="12.3", it'll be a string. But in these cases, the definition is a bit fuzzy. If I try to do math with $string_number, Perl will turn it into a number for me. Conversely, if I try to do string things on $scalar_number, Perl will make a string out of it first. (However, in general I find it good form to wrap the variable in double quotes when using it like a string. This, of course, makes a string that contains the number in question.)

Back to top of page.

Arrays

Now the fun really begins. Arrays are what you think they are (although you might also know them as "vectors" or even "lists" in other languages): a list of scalars indexed by number. The special symbol for an array is the at sign (@). So an array name might be @month_array. But here's where it gets a wee bit tricky. If you select a single element of the array to do something, you use the $ again. Why? Because the single element is a scalar. It's only when you are treating the entire array that you use the @. If this sounds a bit confusing, that's OK. It might take a little while to get the feel for when you use the @ and when you use the $. But with a little practice, you'll develop an intuitive sense of it.

Of course, this demands the question, "How do you refer to a single element, anyway?" The answer is by using the array name (remember, with the $) and then the index in square brackets. Nota bene: Perl is indexed starting at 0, not 1 like some languages. So the first element of an array is $month_array[0] and the eleventh is $month_array[10]. You can also use negative numbers. These count from the end of the array: $array_name[-1] is the last element, and so forth.

There are multiple ways to create an array. One is to assign values to each element, one at a time ($month_array[i] = $new_value;). This can be the way to go if you're looping through the array filling it slowly. But if you already know all of the values, it's kind of annoying to type. Luckily, there is a faster way. You can provide a list, delimited with commas and starting/ending with parantheses and assign it straight to the array: @month_array = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);. That's a time-saver for you.

Another time-saver is interpolation. Perl can sometimes guess what you want in an array and fill in the middle bits for you. Iterpolation is down the with .. operator (almost like an ellipsis – but with one fewer dots – for obvious reasons). For example, for an array from 1 to 100, I can do my @integers = (1..100); and Perl would take care of filling all 100 elements correctly. Isn't Perl wonderful?

You can assign an entire array to another array in Perl just like in IDL. (But not like in C – bummer, eh?) my @new_array = @old_array;.

At this point, I should tell you that you can index more than one element of an array, but it's in an array context in that case. For example, @month_array[5,6,7] will be an array containing three elements: the number of days in each of the summer months.

A final thing to know about arrays is $#array_name (where array_name is, of course, the name of your array). This variable contains the index of the last element of your array. (So it's actually the size of the array less one because Perl indexes starting at zero.) I won't bother telling you how handy this can be, you already have figured it out.)

Back to top of page.

Array Functions

At this point, I want to introduce a few special array functions of interest: push, pop, shift, and unshift.

push(@array, item)
"Pushes" an item onto the end of an array. I use this a lot; you'll see some uses in a bit.
pop(@array)
"Pops" the last item in the array off. This means that the item is deleted (and so the list is shrunk by one element) and return.
shift(@array)
"Shifts" the first element off an array. Just like pop, but from the other end of the array.
unshift(@array, item)
Um, "unshifts" an item onto the beginning of the array? (That makes no sense. Forget I said it.) Basically, just like push, but at the beginning of the array.

Also potentially of interest is reverse(). As you've undoubtedly surmised, this reverses the array and returns the result. (As opposed to altering the array that it gets as an argument.)

One last function of interest is sort(). Obviously, this well-named function sorts an array. It's easy to see how it might deal with numbers, but you'd be wrong. sort() works on strings by default and therefore would sort numbers so that all of those that start with 0 where first, then those that started with 1, and so forth. The result of this is that 100 would come before 99, which isn't what you generally want. (Of course, you've almost certainly seen this sort of idiotic sorting before.) Luckily, Perl gives us a way to make sort smarter. We'll see how to do that later, however. For now, know that it exists and know that you can sort strings with it without any extra help.

Back to top of page.

Hashes

Finally, hashes or "associative arrays". Many of you will never have seen these before, but they're like super arrays. A hash, like an array, is a list of values. But now they're not indexed by sequential numbers. They're indexed by "keys", each of which could be a string or a number (as long as they're unique, since otherwise things would get confusing, right?). For example, you might have figured out that my array above (month_array) is really just the number of days in each month. (Non-leap year.) But to use it, I have to turn each month into a number. That would be annoying in many cases. With a hash, I could use the same values as above, but choose keys "Jan", "Feb", "Mar", "Apr", etc.

How do we indicate a hash? With a percent sign: %month_hash. Again, when we select a single element from the hash, we use a dollar sign. We use curly-braces ({}) to enclose the key: $month_hash{"Aug"} is 31, for example. (Note that they key was a string, so I needed the quotes.)

We have the same options to assign values and keys to elements. We could fill each pair one at a time (if the key has already been used, Perl overwrites the old value. If the key is new, you get a new key-value pair. Again, remember that you've got case sensative keys!) Or you could do it in one fell swoop. You still call the hash by its full hash name and use parantheses to enclose the pairs and commans to delimit them. But now you need to give a key and a value for each. Use => to indicate the key to value relationship: %time_hash=("Jan"=>31, "Feb"=>28, ..., "Dec"=>31);

Back to top of page.

Functions for Hashes

Of course there are special functions for hashes! (In fairness, I should note that I've only used one of these really, and even that is pretty rare. At least in my coding, hashes tend to take care of themselves.)

The first you might want to meet are keys() and values(). Both of these act on the hash and return arrays. You can probably figure out what they do, but in short they return an array of the keys in the hash and the values in the hash. (Note: if you do these two function back-to-back, the keys and values will be paired correctly. However, if you mess with the hash at all (delete or add entries, even if you appear to undo what you just did) in between, there are no promises about the pairings. So I wouldn't get too attached to the idea that they'll match up. But, then, you don't need to given how hashes work, right?) Really, I've only ever seen keys() used, although I could cook up a scheme where you might want values. But it's sort of unlikely.

Another function on hashes is each(%hash_name). This function also returns an array, but in this case it's a two-element array of the key and value. Each time you invoke the function on a hash, the next key/value pair is returned. This can be handy in loops, naturally.

delete() deletes a key/value pair from a given hash. (Wait, why didn't we have something like this for arrays? We did, it's just looks different because of how arrays and hashes are indexed.) The syntax is delete($hash_name{"key"}). Note that it's difficult to confuse this function. As long as the hash exists, it's happy since if the key doesn't exist, the function's job is done before it begins!

The aptly-named exists($hash_name{"key"}) tells you if a given key exists in the hash. (Perl is pretty good at naming functions, isn't it?) Actually, exists() works on any scalar. It's akin to IDL n_elements(variable), but shorter.

Back to top of page.

Control Structures

Have variables is great. Being able to do math is also great. But what makes a programming language really useful are the control structures: those things that tell the code to repeat operations or to make decisions. And it's time we met Perl's retinue. (Note: once again, if you know C, you'll be at home here. But don't get too cocky, there are a few changes and new faces.)

If()

Right, if. You already know that it tells the Perl to do one thing if a condition is met and (maybe) something else if not. Pretty snazzy stuff, if you think about it. But also very much at the heart of almost any code you'll ever write. Your basic if-statment looks like this:

      
	if(condition)
	{
	  Commands here.
	}
      
    

We will talk about what the conditions look like in a moment. The commands are any bit of Perl code you'd like. Oh, and the braces are mandatory in Perl (unlike C). What if you want an else-statment in there? Just add it:

      
	if(condition)
	{
	  Commands here.
	}else
	{
	  Other commands here.
	}
      
    

Easy enough. What if you want more than two possible conditions (for example, you're worried about the sign of a number which could be positive, negative, or zero)? Then use the elsif (note the spelling, it's a mite odd):

      
	if(condition)
	{
	  Commands here.
	}elsif(another condition)
	  Some completely different set of commands here.
	}else
	{
	  Other commands here.
	}
      
    

Note that you must have one if, at most one else, and as many elsifs as you like. (By the way, I recommend having a catchall else for the most part. So for the above example of looking at the sign of the number, I'd have an if that asks if it's postive, and elsif that asks if it's negative, an elsif that asks if it's zero, and then a final else that we get to only if none of the conditions are true. (Say the variable isn't defined or it is a string.) Believe it or not, I've seen code hit these conditions before even when I thought them impossible to meet!)

Back to top of page.

A Cute Little Ternary Operator ?:

?: is a shortcut operator that exists in many languages. (Including C, IDL, and – naturally – Perl.) It is also the only "ternary" operator I can think of in any language. (Unary operators act on one thing, like ++. Binary operators act on two, like +. Ternary act on... yes, three.) What the operator does is give you a shortcut for short if...else statments. It looks like this: (conditional)? if true : if false. So if the condition (left thing) is true, the middle thing is done, if the condition is false, the right thing is done. You can only use single statments in those actions, so it's of limited use. However, it's very handy for doing things like $days_in_Feb = ((($year % 4 == 0)) ? 29: 28);. (Nota Bene — The parentheses emcompassing the entire right side of the assignment are strictly speaking not needed. I usually put them in, though, out of paranoia and because I find it easier to read. White them out if you don't like them. Unless you're reading this on your computer screen, in which case for crying out loud don't!)

Really, this is just a lazy way of doing an if...else. But that's Perl for you: always giving you ways of doing simple tasks that much faster. And incidentally, this offers you the closest thing to the "case...of..." structure in IDL (or "switch...case" in C), although with a limit ability to only perform one action per case. The way you do this is by nesting the ?: structures:

      
	my $my_variable = 
	                 ($a < 0) ? "Negative" :
	                 ($a == 0)? "Zero" :
	                 ($a > 0) ? "Positive" :
	                            "Non-number";
      
    

Back to top of page.

Conditionals

Just what are the possible ways of writing that condition? First, a quick note: all the conditions that you're about to see really just return 1 or 0 (true or false). In Perl, the condition in the if-statment can either be one of these statments, or a variable. Variables that are undef (a special Perl variable for "undefined"), 0, the string 0, or the empty string are all false. Everything else is true. This can be quite handy to know. But on to conditions.

Numbers

Numbers are easy in Perl, you'll recognize the conditions. First, there is "equal to": ==. (A common mistake, especially among those new to Perl, but occuring even for the Perl veterns: using = rather than ==. The assignment operator always evalutes to true. It also overwrites you variable. This is generally not a good thing.) To to see if $pi is equal to 5, if($pi == 5). (It had better not, but whatever.) "Not equal to" is !=, by the way. We'll say hello to Mr. ! again in a moment.

"Greater than" and "less than" are > and < while "greater than or equal to" and "less than or equal to" are >= and <=. So far so good.

Strings

Of course, this wouldn't be Perl if we couldn't also look at strings. The most common string comparision is equal to, eq. (IDL/FORTRAN people, you recognize this. But it only works on strings in Perl.) This tests for an exact string match. I find that I don't use this a lot except when the string is pretty short. When we meet regular expressions next time, you'll understand why. The opposite of eq is ne, which tests for non-equality.

There is also gt, lt, ge, and le for "greater than", "less than", "greater than or equal to" and "less than or equal to". These test alphabetization, in effect. (Actually, they test ASCII order. So you'll find issues with non-letters and mixed cases.)

Compounding Conditions

As you already know, you'll frequently want to have one condition and another condition met or possible one condition or another condition. So you use Perl's and and or: && and ||. I suggest a liberal use of parantheses and white-space to make sure that you get the conditional that you want when you use compound conditions:

      
	if(($a == 2) ||
	   (($a ==1) && ($b == 2))
	   )
	{
	   Do stuff.
	}

      
    

And sometimes it's easier to write a condition in the negative. Preciding such a statment with a ! means "not". So $b !=5 and !($b == 5) mean the same thing.

By the way, a little secret to pass along. If you have a compound conditon with an or in it and the first condition is true, Perl never even looks at the second condition. Conversely, if you have an and in there and the first condition is false, then the second condition is also not checked. (You can see why, if you think about it.) This means that you can avoid potentially time-wasting function calls or embarassing crashes by putting the offending code in the second conditions.

Back to top of page.

For Loops

Besides the ability to make decesions, computers are very good are repetative tasks. In fact, they seem to like them. So it's luck for everyone that Perl has a for-loop. The syntax goes like this: for(initialize counter; while condition; stepping condition). The initialization means you set some inital value to the counter, like $counter=0. The while condition is the condition under which Perl should run the loop again (like $counter<5). The stepping condition says what Perl should do with the counter at each step ($counter++ is popular). So we'd have something like

      
	for(my $counter=0; $counter<5;$counter++)
	{
	  Some mindless, repetative tasks here
	}
      
    

What will this do? This will repeat the loop 5 times. (For $counter = 0, 1, 2, 3, and 4.) Make sure that you note the semi-colons, by the way. They're important. (Note for the nitpicky: I used my in the initialization. This means that $counter is only valid inside the loop. Or, at least, this incarnation of the variable is. If you want to use the counter later, don't use my. In general, however, you won't care so my is recommended.)

Oh, a little trivia for anyone interested: you technically don't need to fill in any of the for fields. If you leave them empty (you still need two semicolons!), you'd better have initialized the counter variable, set up an ending condition inside the loop, and/or done something to change the counter. For example, if you put for(;;), the loop will run infinitely unless you have thought to set up a way to break out.

Back to top of page.

Foreach

This is sort of a subset of for(), so I'll put it here. There is another for-loop structure in Perl called foreach. It's used like this: foreach $item (@array). Perl's response to this is to go through each item in @array in order and put a copy into $item. If you're churning through an array, this is quite handy. (Note, however, that you need to declare $item before the loop (my doesn't seem to be allowed in there, lord knows why) and you need a pre-declared array, naturally.) It is worth remembering, however, that this is the same as

      
	for(my $ctr=0;$ctr<=$#array;$ctr++)
	{
	  $item = $array[$ctr];
	  .
	  .
	  .
	}

      
    

(Question: why did I not use my on $item?) So foreach is redundant, but you can see why it is a but nicer to work with under many conditions.

Back to top of page.

While Loops

Technically, for and while are also redundant, although a bit more subtly than with foreach. In some sense, while is the more basic type of loop, but I think that for is a bit easier to understand. while looks like this:

      
	while(condition)
	{
	  Code goes here.
	}
      
    

What this does is run the code inside the braces until the condition becomes false. If the condition starts false, the code in the braces never executes even once.

Back to top of page.

Controlling Your Loops

Knowing about the two big sorts of loops (for- and while-loops) is all well and good, but you'll eventually want more. You'll want to be able to break out of a loop earlier than you had intended, for example. There are three commands that help you in this regard.

last
Makes this the last time through the loop. In fact, it exits the loop and heads to the first command after the loop. Akin to "break" in C.
next
Tells Perl to head to the next pass through the loop. (Actually, formally, it tells Perl to jump to the end of the loop block. This way, the control steps are taken: counters are incremeents and ending conditions are checked.
redo
Like next, it stops the current iteration of the loop. But this time, it hops up to the start of the block, not incrementing counters and checking ending conditions.

Back to top of page.

Functions

And now for the moment you've all been waiting for: how to create a function. (OK, it's not that amazing. Actually, people seem to write a rather small number of functions in Perl.) Functions are handy in programming for a lot of reasons. One is because it saves you time: if there's a task that gets done a lot, making a function saves you having to write the same code over and over. It also saves you time debugging and modifying the code. Functions are also handy for writing clear, less buggy code: the more you break your tasks into functions, the easier the code tends to be to read and the more likely it is to be written correctly. (Although, as I said, Perl programs don't seem to be quite as function-oriented as in other languages, I've observed. This may, however, be my biased sample. In any case, this doesn't stop you from going to town with functions.)

Getting on with how to create a function, let's look at how to declare a function. It's quite simple, since there is no prototyping and there is nothing to say how many arguments the function has to take. (The code for the function might have issues if too few arguments are passed, of course, but that's a problem for you to deal with.) I put my functions at the end of my code, but apparently many folks actually put theirs at the beginning. So whatever turns you on. You denote a function with sub (short for subroutine, of course):

      
	sub my_funtion
	{
	   All kinds of functional fun goes in here.
	}
      
    

Again, the braces are required so that Perl knows when to stop with the function. And you can name the function any sort of valid variable name.

Back to top of page.

Passing Arguments

One of the most important things to do with functions is to pass them arguments so that they can do stuff with them. We have already seen how to invoke a function with arguments, but here's how to handle them. There is a built-in array in every function called @_. This array contains the arguments used when the function was called. So $_[0] is the first argument, and so forth. (As a rule, I'd pick these off into better-named variables in my functions right away when the function was called. But that's a matter of style, really.) So we could call a function with &my_function(1, 5, "Carl Sagan"). The code for the function might be something like:

      
	sub my_function
	{
	  my $iterations = $_[0];
	  my $prime_number = $_[1];
	  my $astronomer_name = $_[2];

	  Some bizzare code to do things with these parameters.
	}
      
    

As a matter of policy, it is wise to check how many parameters were passed to the function. (I'll confess that I don't do this all that often, myself. I should, though.) If two few arguments are passed, abort the execution with some kind of message. (We'll meet some ways of exiting functions in a moment. As far as programs go, we'll see that in two weeks.) Actually, it's wise to even check the nature of the passes arguments. (Are they strings? Numbers? Arrays? Etc.)

Back to top of page.

Returning Values

Functions often return values (sometimes the value is just 1 or 0 to tell if it executed sucessfully). That was a general statment about programming. Perl in particular always returns values from functions. The values might be an empty list or undef, Perl's internal code for "This variable isn't really defined," but there is a return value. Bear this in mind when working with functions! (In general, you may want to make sure that the function returns something meaningful, if only the simple "Yep, I did my job successfully" value of 1. Down the road you might find it helpful to know that your function did its thing.)

There are two ways to do this in Perl. The first way is what I call the sloppy (aka, "stupid") way: the result of last operation you perform in a function will automatically be returned. This is just a bad way to go, though, for reasons I think you can work out. Also note that the "last operation" can include essentially any line of Perl (except things like braces) so even a print statment (which technically returns a value of "1" if it sucessfully does its job) could be that line.

The better way to return values is with the creatively-named return() funtion. It's easy to use since it just returns the value in the parentheses and exits the function. (So any code that comes later won't be executed!) This is definately the way to go, in my view. For one thing, you can always have multiple return statments in a function. (You might have some if-statments and inside each you return a different value, for example.) For another, it's always really easy to spot where the returning is being done.

Oh, if you use return() with no parameter, the undef or empty array gets passed back.

Back to top of page.

Input and Output

It's obvious that programs want input and output quite a bit. Otherwise, what's the point? We'll start with output to screen and work our way up from there.

Printing to the Screen

You've already seen the print() function. This prints to the screen by default, as you've already learned. I should add that you can also print arrays as well as scalars. print @my_array; will print out all of the values stored in @my_array. This can be handy. It does not, however, work on hashes.

Another way to print is with printf(). The syntax here is printf(format_string, variables). format_string is a string that contains formatting information (and also general text stuff, if you wish). It's followed by the variables (as many as you referenced) references in the formatting string. Wait, what variables? Ah, that's the beauty of the format string. While it's easy to pop variables into a string for printing with print() (just use a double-quoted string), you can't generally control the format. Enter the format strings of printf().

The real key to format strings are the variable place-holders. They all start with percent signs and end with letters (%): %s, %d, %f, %e, %E, %g, etc. There might be other stuff in between, as we will see in a moment. The letters indicate different variable types. (String, integer, float, two kinds of scientific notation (one with a lowercase "e" and one with an uppercase "E"), and a "let Perl guess what numeric syntax works best.") Already you can see a value here: you can force the numbers into whatever format you need.

Even better, if you insert extra (optional) codes between the percent and the identifier letter, you can control how the variables are displayed. Just a number will control how wide the outputted number is. A 4 will make it 4 places long. (This might include a decimal point!) If you want to control the number of places after the decimal, us a decimal point followed by the number of places. (So %.2f would be good for American currency, for example.) Prefacing the whole number with a 0 will pad the number out so that it is exactly a certain number of places long. (Given a choice, Perl will permit a number to be up to that size, but not necessarily that many. So I might use printf("My sister is class of '%02d.\n", $sister_graduation_year); to get Perl to print "My sister is class of '02." (Without the 0 in the format string, Perl would print, "My sister is class of ' 2." Which looks funny to say the least.)

One final note. Since we've become wise in the ways of printf(), it behooves me to point out that there is another function, sprintf() that works precisely the same way. This function (which you can think of as "string print formatted) does what printf() does, but puts the result in a string which you can assign rather than print it to screen.

(There are actually a number of places where these format strings are handy in I/O in Perl. And, for that matter, they're very handy in C. So they're probably worth your time to pick up, if only up to a conversant level.)

Back to top of page.