This part of the course is intended for novices. In this seminar, we will discuss how to create a website in your own directory, the basic layout of the HTML document, and a few of the most important tags. In the next two sessions, we will build on these ideas, adding more tags, formatting capabilities, and other niftiness. If this is your first time using HTML, I encourage you to spend some time before next session creating a few personal pages for yourself. These need not be fancy, but they should give you some experience in writing HTML.
Back to main workshop page
If this is your first time making a website, the first thing we
need to talk about is where to put your various files. If you
have an account on one of the ITS-run Unix machines on campus
(Origins, Cosmos, Bogart, etc.), you can create a personal website
by creating a directory in your home directory titled
public_html. (Remember, this is case-sensitive!)
This will make your website URLs look like
http://server.colorado.edu/~username/files, where
server is the name of the server (origins, bogart,
etc.) and username is your username on that machine.
(files will be the name of the files in the
directory. More on that shortly.
A few words on permissions: your documents need to be readable
to people other than you! Set permissions so that group and world
have read privileges. You probably do not want to make
you files world writable (or even group writable, most of the
time). The execute privilege isn't really relevant to HTML
documents, since they aren't programs. I personally tend to
usually periodically issue the command chmod 755
*.html in my web directories. This gives me write
privileges, as well as everyone read and execute privileges. This
is mostly habit, though, and 744 is probably wiser.
Incidentally, the only reason not to give yourself write
privileges is if you are worried that you might destroy or delete
a webpage that you have carefully worked on and don't intend to
change. However, be aware that it is the nature of webpages to
generally be dynamic entities, changing as our tastes, needs, and
whims change. The better solution to the danger of destroying
your hard work is to back up you website periodically by
downloading it to a local hard drive, writable CD-ROM, etc. (Some
servers are, of course, backed up by the operators. I'd encourage
you to not rely on that, however. Not only are you counting on
things well beyond your control, recovering files off of tapes is
often a rather annoying task and I suspect that the Ops would
better-used dealing with other matters.)
What should you call your file? Well, here is a good start:
index.html. This file, if it exists in any of your
web directories (either public_html or any
subdirectory of that directory), will automatically be loaded if a
user types a URL ending in that directory name rather than a
file inside that directory. For example, my homepage's URL (as I usually give
it out) is http://moonlets.org. The
source for that page, however, is actually in
http://moonlets.org/index.html. This
feature is nice for a couple of reasons. The first is that the
first URL is
shorter and probably easier for someone to remember, especially if
they're even passingly familiar with the ~
convention. The second is that this allows you to hide your
directory contents from snoopers. That's right, if you do
not have an index.html file, anyone typing a
URL ending in
your directory name will get to see the contents of your
directory. (OK, technically this is dependent on your server's
configuration. That can change with the server or if the
operators alter things... so I wouldn't wager on server
configurations for privacy.) If you have pages not intended for
public viewing or just plain don't like people snooping, this is
unfortunate.
Finally, I should mention document extensions. HTML documents should
have an extension .html or .htm.
Typically, on a Unix machine, the former is preferred. (The
latter is, I believe, a relic of DOS's inability to have extensions
with more than three characters.) This tells the browser that the
document is a hypertext document and that it should render it as
such. What happens if you mis-label an HTML document as, say, a .txt document?
The user will probably see the document source rather than the
document as you intended it to be seen.
Back to top of page.
HTML stands for HyperText Mark-up Language. "Hypertext" refers to the ability to link parts of documents together. The "mark-up" bit of the name describes the type of language it is. Specifically, you use tags to indicate instructions as to how to display the content of the page. Those of you who have used LaTeX will recognize this concept quite well: it's the same idea. The syntax is, however, quite different-looking in HTML and the two languages are clearly designed for very different applications. (Math in HTML is dicey business, while LaTeX has to make many assumptions about the medium in which it will be presented.)
A tag in HTML is enclosed in a <>
combination. Anything inside these delimiters will not be
rendered into the final document. Most tags have a start and stop
pairing, where the stop entity starts with a /. For
example, I start a paragraph with <p> and end
it with </p>. (<p> is HTML
for "paragraph." More on this matter shortly.) A few tags do not
have stop tags associated with them. They will be kind of obvious
when you get to them and I'll try to point them out. (Note that
the <p> did not have a stop element for quite
some time, but the current specifications do require it. Most
browsers will know what you mean if you forget the stop tag, but
the results are... unpredictable.)
A few more acronyms so that you can speak like a complete nerd. (Amaze your friends! Terrify your enemies! Get no attention at all from people are totally indifferent to you!) URL stands for "Uniform Resource Locator," which just means that it's a general purpose addressing system for all manner of data online. HTTP stands for HyperText Transfer Protocol, which is complete computer nerd for "how to transfer the data back and forth." Many of you will recognize the similarity to FTP (File Transfer Protocol), and this isn't a coincidence. HTTP, HTML, and the Web were actually invented at CERN as a sort of glorified FTP to be used to read papers from other scientists. (And you thought that porn drove all developments in this kind of technology.)
Back to top of page.
How do you edit HTML files, anyway? Well, that's kind of up
to you. You can edit them with fancy proprietary software, but
that's kind of silly. You can also use free software, such as
what comes packaged with Netscape/Mozilla. (I'd suggest the
latter, but I'm rather keen on Mozilla in general. Feel free to
ignore my advice.) You can get Mozilla at www.mozilla.org. However, and
I'm a bit of a snob about that, I am a big fan of the good old
text editor of your choice. On a Unix machine, you can use
vi, nedit, or [X]emacs, as
well as a host of other editors, many of which are painful to use.
I'm a big fan of [X]emacs, as it is powerful, full of handy
shortcuts, has syntax-specific features that you can get (such as
color highlighting of parts of your HTML code and automatically creating
the nifty date stamp/reply-to stuff at the bottom of the page),
and is generally available. (I'd suggest using the Xemacs when
you can, since it's a little be friendlier in terms of graphical
buttons and what-not.) Most ITS-run machines should have the
syntax-specific packages already installed.
And here's where I get on my soap box. (Ever notice that you don't see soap boxes around anymore?) Where ever it is feasible for you, try to avoid converting anything to HTML as a way of generating a webpage. A Powerpoint presentation is an example. These things tend to render very badly in many browsers/platforms and the code is so ugly as to rival the Gorgons. In fact, this is sort of why I stump for editing source code directly: you will tend to write more elegant, easy to read code than any software package. (For example, a lot of software will leave vestigial tags in your files, which only serves to slow downloads and potentially confuse a browser.)
Since HTML is largely oblivious to white space, it is easy to make your documents look nice inside as well as when they're viewed in a browser. Put returns between parts of your document (more returns for more significant breaks), indent your elements, etc. And get rid of tags that are no longer needed. This will make maintaining your webpages easier. Also, since people might occasionally look at your source code (more on that later), it'll impress them to see a nice, well laid-out document.
Back to top of page.
This section will look at the parts of an HTML document to provide us with the context for how to write such a document. We'll get that that shortly, I promise.
A good HTML document should start with a line like the following:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN">
This line tells the agent (the browser or whatever) what kind
of document it is about to get. Is this obvious (I mean, it is a
web browser, right)? Almost, but not quite. There are
other agents running around there and your documents could be in
any number of other formats. In this specific case, we're telling
the agent that what is coming is an HTML document that follows the W3C's
standards (hopefully), version 4.01 (transitional). (If you want
to follow the strict standards, just drop the
transitional from the definition.) The
EN is actually sort of trite: it means you're writing
in English.
What if you forget this? Will the Earth be sucked into an enormous black hole? Will your pages totally fail? Unlikely. But it never hurts to be explicit to web browsers about what they are to do and it definitely never hurts to follow the standards. On a personal note, I have never seen the lack of this line cause a problem as such. But as I said, it never hurts to be explicit.
The next line should start the HTML, so put an <html> tag
there. (Not that it has to be on its own line, but it does look
nicer that way.)
Back to top of page.
Next up is the head of your document. Starting with a
<head> tag, the following information is
not supposed to get displayed in the rendered document.
However, it can set up a lot of the stuff in the document, so make
good use of the elements in the header.
The first line in the header should be
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=iso-8859-1">
This is just another note about how the document is encoded (the character set in particular). Why you need this, I don't know. (Strictly speaking, the server should send this along. It seems that the ITS-run machines do not, however, so it's up to you to do so.) Again, it doesn't hurt to be explicit.
I should note whenever I start a new HTML document, I just copy and paste everything up to here into the new document. Actually, I typically copy and paste a few more lines, too, but we won't see those until later sessions. For now, just go ahead and copy and paste this stuff.
On to the real meat of the HTML document! First thing that you should
always add to an HTML document's header is a title. Enclose
your title in <title> and
</title> tags. The title should be
descriptive, but not overly long. (When I first started writing
HTML, back when it was on clay tablets, I was told that the title
couldn't be more than six words long. I'm not sure if that was
ever strictly true, or if it was just good advice. It certainly
isn't true now. But it's still a good rule of thumb.) I should
note that the title will not explicitly appear in the
final rendered document. So what does it do, then? First off, it
makes the title appear at the top of most (all?) browsers and in
the tabs if the user is using tabbed browsing. This is very handy
for anyone who has multiple windows/tabs opened and needs to jump
to one without having to examine every single window/tab.
Additionally, if a user bookmarks your page, the title will be the
title of the bookmark (at least, in all browsers that I've seen).
While the user can always modify a bookmark's title, it's nice to
help her out by at least giving her at least a name to start with.
(I have several bookmarks that I've never gotten around to giving
actual, useful names. If the writers had actually taken a few
seconds to add titles, I'd at least know what the bookmarks are
for.)
That's it for the header for the time being. In later sessions
I'll point out other nifty things that you can tuck in here to
make your webpages cool. End off this part of the document with a
</head> tag.
Back to top of page.
At last, the main part of the webpage, the part that people
actually really see! Start off this section with a
<body> tag. A few more words on the body tag,
since it can take various arguments. First of all, you can set
the background image for your document with the
background option. If you want a solid color
background (or if you want to specify one should the background
image not load, which is always a good idea), use the
bgcolor option. You can change the color of your
main text, of the links, and of the already-viewed links with
text, link, and vlink,
respectively. So, for example, your body tag might look like:
<body background = "ursae2.jpg" bgcolor = "#000000"
text = "#FFFF00" link = "#FF0000" vlink ="#FF8800">
This says to the browser, "I'm starting the body of the document. Please use the file 'ursae2.jpg', in this current directory, for the background. Failing that, set the color to 000000. The text should be FFFF00, the links FF0000, and the viewed links FF8800".
Wait a second! What do those colors mean? They're hexadecimal (base 16) representations of RGB (red, green, blue) colors. The # alerts the browser that a hexadecimal number is coming. Each is two digits long (for a total of six digits), digits ranging from 0 to F (the latter being 15 in hex). The digits are in RGB order, so that "#000000" is no colors at all, so black. "#FF0000" says to turn on all the red and nothing else, so we get bright red. "#FFFF00" is all red and all green on and no blue. This makes a bright yellow. I'll leave it as an exercise to the reader to determine what #FF8800" should mean.
I should note that hexadecimal isn't the only way to notate colors. For example, some colors (like "#FF0000" = "red") have names. But I wouldn't count on that. Besides, hexadecimal is extremely flexible, letting you tweak the colors to juuuuust what you want. So I'd learn them. You can always use trial and error to find the color that you want, of course.
Another note: in the last session, we'll discover that none of
the above optional arguments to the <body> tag
are necessary. Style-sheets have taken over this kind of thing
entirely, and I highly recommend them. But for now, we'll stick
with this.
The rest of this session will be devoted to creating the body of your document. But before we get to that, I want to point out a few more general things. First, don't forget to close off the body and the html tags at the end of your document! Most browsers will know what you meant, but it's never wise to make them guess.
Second, there are two types of tags within the body of your
document. The first of these is the block-level tag, while the
second is an inline-level tag. Block-level tags are (usually) not
inside any other tags, except for the html and body tags. These
are self-standing entities, like paragraphs, headings, tables,
lists, and so forth. The inline-level tags should always appear
encased in some block tag. An example is the emphasis tag,
<em>, which tweaks the appearance of the font
inside of it, but isn't really a part of the document in the
component sense. (As a warning: the image tag is an
inline tag, even though I'm sure that you, like I, will
want to use it in a block context a lot. Wrap it in a paragraph
tag and you'll be fine.) I'll try to point out which are which in
what follows. After a while, you'll probably be able to guess on
your own, though.
Back to top of page.
This final section of this session is devoted to what I consider the most basic tags. These are the ones that you'll use over and over again, and are generally pretty easy to employ. One warning: you can nest tags, but never close of the outer-layer of tags without closing off the inner-layer first!
Probably the single most often used tag, the paragraph tag
(<p>) denotes that the contents between the
start and stop tags are a paragraph. Simple idea, and easy to
use. As I have previously noted, older versions of HTML didn't apparently
support (or at least didn't encourage) the closing tag for
paragraphs. Never standards, however, push for these for a
variety of reasons that I'll defer for later. Suffice it to say,
you should use both tags with each paragraph.
The paragraph tag is a block-level tag.
Back to top of page.
You'll want to denote headings for different sections in your
document, or even the title of the document as a whole. For this,
use the heading tags: <h#>, where # is a number
1 through 6. <h1> is the largest, boldest,
most in-your-face heading and <h6> is the
smallest, meekest heading that you can employ. (Note that we'll
see how to adjust these appearances later. That's another
style-sheets thing, really.)
This is also a block-level tag, so close it off before you start the paragraphs of wonderful text that you plan to write!
Back to top of page.
Sometimes you will want to break up bits of your document not
just as paragraphs, headings, and the like. HTML contains a tag
that inserts a single return (unlike the two that you tend to get
with the paragraph tag) without breaking the current
block-element. This is the <br> tag. It is an
inline tag, by the way. It's use is pretty straight forward and
there is no closing element. (This should make sense: there is no
real need to show that you've "stopped" a return.)
The other tag I wanted to mention at this point is the
<hr> tag. This tag inserts a "horizontal rule"
into your document, further dividing the document in an obvious
manner. This fellow is block-level and has no closing tag,
either. He looks like this:
Back to top of page.
The image tag, <img> is the second sort of
complex tag we've encountered so far. (The first was the body
tag.) An image tag really needs to have more than just the tag
name to be useful, since you want to load a particular image. The
syntax for this is <img src="image.ext">,
where you should read src as "source" and
image.ext is the file name.
The image tag is an inline-level element; it must be wrapped in a block-level tag to be correctly employed. Inserting the image into a paragraph is quite easy and makes perfect sense.
The image also has additional options. The most important is
the alt option. alt is the
alternate version of your image, a short bit of text to tell the
user that they've missed because the image didn't load for them.
(It can range from "a pretty picture of my house" to "Click Here
to Submit", depending on what the image was for.) To be totally
compliant with HTML standards, you should use the
alt option. For a variety of reasons, you cannot
guarantee that the image will load, so in the very least you
should tell the user what was there.
An example of an image tag in action
<p><img src="../Images/Cavies/minipigs3.jpg"
alt="Little Baby Pallas, a Few Hours Old"></p>
This looks like:

(Isn't he cute?)
If the image wouldn't load, we'd get

Back to top of page.
The anchor tag, <a>, is another important
beast. Like the <img> tag, the anchor tag
needs other options to be of any use. Unlike the
<img> tag, the anchor tag needs to be closed
off. (</a>)
The main use of anchor tags is to link to another document.
The syntax for this is <a href="link_url">The Linked
Text Goes Here</a>, where link_url is
the URL of the
other page. Which leads us to a discuss we were going to have to
have, sooner or later: relative versus absolute URLs.
URLs come in two flavors, the relative and the absolute. The difference is really is how the addressing works. A relative URL is in relation to the current page, an absolute URL is the same from anywhere. Not clear yet? Here's an analogy: street addresses. I could give my street address as "walk 3 blocks east of here, on the left." In some cases, this is a great reference. However, if you were planning to mail me something, I'd suggest something more like
1600 Pennsylvania
Avenue
Washington DC, 20500
USA
The difference is that for the latter format, you have a reference that anyone, anywhere can decipher, while the first one requires you to be in a specific spot.
OK, so the absolute reference has its appeal. It's the only way to refer to a totally different website, for example. Also, no matter how you shuffle you pages around, it still works. So why on Earth would you bother with a relative URL? Two reasons: simplicity and ease of maintenance. The first reason is rather trite, but relative URLs are generally shorter than absolute URLs. The second point is a stronger one. If you move your website and you have used absolute URLs for links between your pages, you need to update every single URL to the new machine or directory. A relative URL, however, is still accurate if the pages still bear the same relative "positions" to each other. (For example, if they all started in the same directory and they all ended in the same directory, the relative URLs haven't changed.)
Mechanically, here's the difference: a link with a relative
URL looks
something like <a href="index.html">. Notice
that its simple and short. All this says is, "find the file
index.html in this same directory as I am in right now." A link
with an absolute URL would look like <a
href="http://moonlets.org/WebClass/index.html">.
The key difference is that the href starts with http://
("HypterText Transfer Protocol") and it requires knowing the name
of the server.
Some of you are probably already asking if the
http:// can ever be something else. The answer is
yes, indeed. For example, you can use ftp:// and then a FTP
reference. (I've never actually needed to do this, by the way.
You need an FTP server to make it work, of course, and HTTP seems
more powerful in any case.) A more common and more handy form of
link is mailto: in place of the http://.
This form of link tells the the address that follows is a mail-to
link. So if the user clicks the linked text, their email client
starts up and prepares to compose a message to the address in the
link. (Provided that their email client is configured on that
machine, etc. You can't really bet on this working for everyone,
but it's a nice feature to those for whom it does work.)
Back to top of page.
What about making our font look cooler? While straight text is great for emails, data files, and other text-only interfaces, HTML has a lot more power than that. You've all see italic text, for example. How do you do that? Happily, it's an easy thing to pull off with HTML. However, there are two different ways to do this sort of thing, and you need to decide which one makes the most sense for each application. (Don't worry. It sounds like a lot of thinking, but after a few applications you find that you don't really need to think about it much.)
Basically, there are two ways to tell HTML that there is something stylistically different about some bit of text. The first is a physical style and the second is a logical style. Here's the difference (and here's where you start hearing me preach about keeping your HTML "pure"): a physical style is one that says "make this text look like X" while a logical style says "this text has this role in the overall flow, so format it appropriately."
So here's the preaching. HTML is not a typesetting language and it was never meant to be. (I know that going back to my early days of learning HTML this point was made to me. Then I pretty much ignored it, like everyone else, until recently when I became apparent that there was a problem.) What HTML is really meant to do is tell the client (browser, usually) what role the different parts of the document play and then let the browsers handle displaying it. You can compare this to Word or LaTeX, where you generally know what your final display will look like so that you can (and do) control what the appearance will be quite a bit. With the Web, you can make some guesses, but since webpage can be viewed on different computers with different monitor settings with different browsers (if any browser at all!) in different operating systems, you really can't say a lot about how your end-user will be viewing your page. So this is why it's best to let the browser made more of the decisions. (As we'll see in the final session, style sheets let us give a healthy dose of suggestions to the user's client. The suggestions are generally abided, but the final choice is still on the user's end.)
In practical terms, here's what happens. There are two tags
for italicizing text and two for bolding it. For italics, you
have <i> and <em>
("emphasis"). For bold-face you have <b> and
<strong>. The former in each pair is a
physical style which tells the browser, "Yes, I really want this
text to be bold/italicized, definitely." The latter in each pair
is a logical style which says, "Hey, this bit of text is to be
emphasized/strengthened because it is important. Do whatever you
think best to render this." In most browsers, these are the same
thing, but not in all. For example, if I were blind and using a
speech browser, italics make little sense. But emphasized text
can be spoken with emphasis by my browser. For most uses, you
really want the logical styles, since you're not really so
interested in the appearance of the text exactly as in making sure
that the text gets its due attention. Occasionally, you really do
want to control the actual appearance, such as following certain
typesetting rules. (For example, technically – though
seldom in reality– foreign words and phrases.) So you
should probably default to the logical styles of
<em> and <strong>.
A final word on italic and bold text. All of these tags (as with all text-styling tags) have closing tags. You want to remember this because all of the text inside the opening and closing tags have the style applied to it. If you forget a closing tag (or misplace it) you'll have extra text in that style. This can be very bad because lots of bolded text is read as shouting at the reader. This gets really annoying and rude if half of the web page is rendered this way! Italicized text is sort of worse, since it's actually rather hard to read in large quantities. So close off those tags! (And check your pages in a browser to make sure that you've done so.)
Back to top of page.
HTML contains a lot of style tags of interest. Most of them
are logical, not physical. (Actually, the only remaining physical
tag which I can think of , <u> (underline), has
been deprecated and should not even be used.) Here are some
examples of nifty tags:
<code><pre><p> tags.<blockquote><pre> tag, the
<blockquote> tag is a block-level element.
In this case, it designates an extended quotation (more than a
few sentences, typically). Most browsers treat this element by
increasing the margins so that the text is narrower than other
paragraphs.<q><sub> and <sup><cite><kbd>, <samp>, and
<var><address><abbr> and
<acronym>title attribute to the full version of the
acronym/abbreviation, when a user mouses over it they will see
the translation. At least, this occurs in some browsers. This
is kinda nifty, I think.Note that this is not exactly an exhaustive list and that there are other tags out there. If you have logical-tag needs that aren't met by this list, check the HTML specifications to see if the tag you need exists.
Back to top of page.
These tags were either never part of HTML as such or have been officially
given the ax. They still work in many cases, but should be
avoided. Why, then, am I telling you about them? There are two
reasons. The first is because, like above with the
<body> tag attributes, I haven't told you how
to handle styling of documents using the preferred method, so
these folks will tide you over until I do so. The second reason
is what I like to call the "sex ed. rationale": you'll probably
see these tags around either in source code or hearing people
discuss them so it is just as well that I tell you about them now.
As I said, avoid them for now (but use them if you really feel a
compelling need) and expect to replace them in a few weeks, never
to use them again.
Back to top of page.
The <center> tag will do just that, center
(horizontally) everything that comes between it and its closing
element. Honestly, it's pretty simple.
Back to top of page.
The <font> tag is a way of controlling the
look of parts of your text. It has a number of attributes that
you can set, including color (sets the color),
family (sets the actual font/font family), and
size (sets the size of the font). Color just takes a
color value (see above for more on that topic). I'm not going to
even try to explain font families here, but you shouldn't be too
desperate to play with this for a while. Size, however, bears
discussing some. You have a few options with the
size attribute. One is to do a relative size
measurement, like size="+1" (taps the font-size up by
one unit). You can use any integer from -7 to +7 in this case.
(0 is not allowed for obvious reasons.) You can also specify an
absolute measurement as in (size="3").
Back to top of page.
Finally, I've created a few exercises for you to try out. These aren't exactly brilliant, but really the best thing for you to do is to work on your own little website in order to try things out. And we'll see each other in a little while!
Back to main workshop page
Back to top of page.