HTML forms are the user interface that provides input to your CGI scripts. They are primarily used for two purposes: collecting data and accepting commands. Examples of data you collect may include registration information, payment information, and online surveys. You may also collect commands via forms, such as using menus, checkboxes, lists, and buttons to control various aspects of your application. In many cases, your forms will include elements for both: collecting data as well as application control.
A great advantage of HTML forms is that you can use them to create a frontend for numerous gateways (such as databases or other information servers) that can be accessed by any client without worrying about platform dependency.
This chapter covers:
How form data is sent to the server
How to use HTML tags for writing forms
How CGI scripts decode the form data
In the last couple of chapters, we have referred to the options that a browser can include with an HTTP request. In the case of a GET request, these options are included as the query string portion of the URL passed in the request line. In the case of a POST request, these options are included as the content of the HTTP request. These options are typically generated by HTML forms.
<INPUT TYPE="checkbox" NAME="send_email" VALUE="yes">
If this checkbox is checked, then the option send_email with a value of yes is sent to the web server. Other form elements, which we will look at in a moment, act similarly. Before the browser can send form option data to the server, the browser must encode it. There are currently two different forms of encoding form data. The default encoding, which has the media type of application/x-www-form-urlencoded, is used almost exclusively. The other form of encoding, multipart/form-data, is primarily used with forms which allow the user to upload files to the web server. We will look at this in Section 5.2.4, "File Uploads with CGI.pm".
For now, let's look at how application/x-www-form-urlencoded works. As we mentioned, each HTML form element has a name and a value attribute. First, the browser collects the names and values for each element in the form. It then takes these strings and encodes them according to the same rules for encoding URL text that we discussed in Chapter 2, "The Hypertext Transport Protocol ". If you recall, characters that have special meaning for HTTP are replaced with a percentage symbol and a two-digit hexadecimal number; spaces are replaced with +. For example, the string "Thanks for the help!" would be converted to "Thanks+for+the+help%21".
Next, the browser joins each name and value with an equals sign. For example, if the user entered "30" when asked for the age, the key-value pair would be "age=30". Each key-value pair is then joined, using the "&" character as a delimiter. Here is an example of an HTML form:
<HTML> <HEAD> <TITLE>Mailing List</TITLE> </HEAD> <BODY> <H1>Mailing List Signup</H1> <P>Please fill out this form to be notified via email about updates and future product announcements.</P> <FORM ACTION="/cgi/register.cgi" METHOD="POST"> <P> Name: <INPUT TYPE="TEXT" NAME="name"><BR> Email: <INPUT TYPE="TEXT" NAME="email"> </P> <HR> <INPUT TYPE="SUBMIT" VALUE="Submit Registration Info"> </FORM> </BODY> </HTML>
Figure 4-1 shows how the form looks in Netscape with some sample input.
When this form is submitted, the browser encodes these three elements as:
POST /cgi/register.cgi HTTP/1.1 Host: localhost Content-Length: 67 Content-Type: application/x-www-form-urlencoded name=Mary+Jones&email=mjones%40jones.com
GET /cgi/register.cgi?name=Mary+Jones&email=mjones%40jones.com HTTP/1.1 Host: localhost
Copyright © 2001 O'Reilly & Associates. All rights reserved.