CGI Perl Programs & Forms


The following are links to CGI references:


Introduction

A plain HTML document to produce a web page is static and does not change, say to respond to a database query; but a Common Gateway Interface (CGI) program is dynamic and executes in real-time.A CGI program can take a request for information from a database, which the user submits with an HTML form, query the database, perform any required processing, and return the results to a web page or a plain text document.The user visits our web site, and, as necessary, the CGI program runs on our machine. The CGI program can interface both with our web page and our database as well as perform additional processing.

Because the program runs on our computer, we must be conscious of security.Usually, a CGI program must be in the /cgi-bin directory, and the webmaster controls this directory, allowing only authorized programs to reside in this area.

For a CGI program, we can use any programming language that produces an executable file. C, C++, FORTRAN, Perl, Python, UNIX scripts, Visual BASIC, AppleScript, and TCL are common choices.We use Perl because of its wide use in scientific databases, such as genomic databases.

The current module discusses Perl CGI program interface to HTML forms, and next module,"CGI Programs and Databases", covers Perl CGI program access of MySQL databases.

(Back to Top)

Communication Methods

AsFigure 1 in the module "Web Forms for Database Queries" shows, CGI is the interface for passing information between the web server and the CGI program, which is in Perl or another programming language.CGI accomplishes the communication through four methods:

  • Environment variables
  • Command line(This method is not used very much and will not be discussed.)
  • Standard input
  • Standard output

The web server presents the user with a form that he or she completes and submits.Using standard input and environment variables, the server sends the information from the form to the CGI program.The program transmits SQL requests to the database; and the database returns the appropriate information, which the program processes and communicates to the web server using the standard output method.

Quick Review Question #1

Select the method(s) that CGI uses for communication.


(Back to Top)

CGI Output

For communication from a CGI program to the user, such as after accessing the database and performing any processing, the program writes results to a MIME (Multipurpose Internet Mail Extension) encoded standard output file, and the web server returns this file to the browser. On the web page "Atomic Weights and Isotopic Compositions of the Elements with Relative Atomic Masses," the user can specify output to be as a HTML Table, Pre-formatted ASCII, Table or Linearized ASCII Output.The latter two choices indicate an ASCII file, or text file, that is not a web page.In this case, the first line of output contains the content-type descriptor indicating plain text, as follows:

content-type: text/plain

Regardless of the content type, the second output line must be blank, so we display two newline characters, “\n\n”.Suppose in a Perl CGI program the variables $formula, $MolecularWt, $RegistryNum, and $ChemStruct have appropriate values from the database.The following code segment produces a text file displaying the data:

print “content-type: text/plain\n\n”;

print “Benzene\n\n”;

print “      * Formula $formula\n”;

print “      * Molecular Weight: $MolecularWt\n”;

print “      * CAS Registry Number: $RegistryNum\n”;

print “      * Chemical Structure: [$ChemStruct]\n”;

The text file that the browser displays is similar to the following:

Benzene

 

   * Formula: C6H6

   * Molecular Weight: 78.11

   * CAS Registry Number: 71-43-2

   * Chemical Structure: [C6H6]

For a web page, the first line of output contains a MIME content-type descriptor indicating an HTML document, as follows:

content-type: text/html 

In the Perl program, we are careful to display a blank line after this output by using two newline characters, “\n\n”.Using the print function, we write HTML code to the standard output stream.Except for the content-type line, use of '\n' is optional but avoids having the HTML code output appear on one line.For the web page to have a paragraph or line break, we write the tag <p> or <br> to the output, as on the third-from-the bottom line in the following segment:

print "content-type: text/html \n\n";

print "<html><head>\n";

print "<title>Benzene Search Results</title>\n";

print "</head><body>\n";

print "<h1>Benzene</h1>\n";

print "<ul>\n";

print "<li>Formula: "             << formula     << endl;

print "<li>Molecular Weight: "    << MolecularWt << endl;

print "<li>CAS Registry Number: " << RegistryNum << endl;

print "<li>Chemical Structure: [" << ChemStruct<< "]" << endl;

print "</ul>\n";

print "<center>If you have comments or questions,<br>\n”;

print "please contact us.</center>\n";

print "</body></html>\n";

The segment generates the following output:

content-type: text/html

 

<html>

<head>

<title>Benzene Search Results</title>

</head>

 

<body>

<h1>Benzene</h1>

<ul>

<li>Formula: C6H6

<li>Molecular Weight: 78.11

<li>CAS Registry Number: 71-43-2

<li>Chemical Structure: [C6H6]

</ul>

<center>If you have comments or questions,<br>

please contact us.</center>

</body>

</html>

Click here to view the resulting web page. 

Quick Review Question #2

Have no blanks in your answer and use all lowercase.

A: Give the hyphenated word that begins the first line of output for a standard output file from a CGI program.

B: Give the punctuation symbol that immediately follows this word.

C: Give the remainder of the content-type descriptor line to indicate ASCII output.

D: Give the remainder of the content-type descriptor line to indicate web page output.

E: Give the minimum number of blank lines, if any, that must follow the content-type descriptor line.

F: For HTML code output from a CGI program, check all items that we can use to cause advancement to a new line in the output file.

G: For HTML code output from a CGI program, check all items that we can use to cause advancement to a new line or paragraph in the resulting web page.
\n,

(Back to Top)

CGI Input

We employ forms to enter data or query a database via the web.A CGI program must obtain this data before accessing the database.For example, using the "Atomic Weights and Isotopic Compositions of the Elements with Relative Atomic Masses," suppose we type "Li" for the atomic symbol of lithium, choose "HTML Table," and click "Get Data".(Page "Atomic Weights and Isotopic Compositions of the Elements with Relative Atomic Masses" is derived from one athttp://physics.nist.gov/PhysRefData/Compositions/index.html.)With the form method get, the URL for the result is as follows:

http://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl?ele=Li&ascii=html

After the question mark (?), the CGI query string contains a list of name-value pairs.For example, in the "Atomic Symbol or Number" text box we typed "Li", and HTML code reveals that the name of this text box is ele.Thus, the URL string contains "ele=Li".Similarly, the string also contains "ascii=html" because, we selected the button with value "html" from the ascii radio button group.An ampersand (&) separates these name-value pairs.

If the form method for submission were post instead of get, the question mark and CGI query string would not appear in the URL, thus simplifying the URL and avoiding URL length restrictions from the browser.With post, a second step occurs in which, invisible to the user, the server passes the query string to the CGI program in a standard input file.

A query string, such as "ele=Li&ascii=html", is encoded in standard URL format in which a blank is replaced by a plus (+) and a non-alphanumeric special character, such as the slash (/), is replaced by a percent sign (%) and its ASCII code in hexadecimal (base 16) representation, such as %2F.Thus, the string "val = x + 85.2" with blanks around the equals mark and plus sign is encoded as "val+%3D+x+%2B+85.2".In the encoded string, four pluses replace the blanks; 3D is the hexadecimal ASCII code for '='; and 2B is the code for '+'.The character-by-character encoding is as follows:

 

String

v

a

l

 

=

 

x

 

+

 

8

5

.

2

Encoded string

v

a

l

+

%3D

+

x

+

%2B

+

8

5

.

2

 

The CGI program must decode the encoded string before processing further.Fortunately, libraries exist for performing this task in several languages.However, for generality and to illustrate the process, we use Perl to decode the query string.

Quick Review Question #3

A: If three (3) ampersands are in a query string, what is the number of name-value pairs?

B: Give the number of characters in the decoded value of the query string "seq=%27%2F3%27".

C: Suppose a user types three characters¾#, a blank, and 5¾in a form input blank that has the name val.  Referring to table of ASCII Codes, give the encoded query substring containing the name-value pair.

(Back to Top)


Process a Query String

Suppose the variable $posted_information stores a query string.The following lines of Perl decode the query string from its hexadecimal representation back into ASCII:

$posted_information =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack(“C”, hex($1))/eg;

$posted_information =~ s/\+/ /g;

The first line of code looks for a string that contains a percent sign and two hexadecimal numbers.Any encoded characters that are found are decoded using the pack command.The “C” argument indicates that the value should be converted to an ASCII character.The second line replaces a + sign with a space.

Now that the query string is decoded, we use the split function to divide key-value pairs.The following line breaks up each &-separated key-value pair and puts each component into an array, called @fields:

@fields = split(/&/, $posted_information); 

The first argument, which is between forward slashes, is the character(s) to separate values in the variable.For instance, an argument of /:/ would separate the string argument by colons.The @fields variable can be indexed like an array in C++, where $fields[0] is the first value in the array, $fields[1], the second, and so forth.Note that when accessing the individual elements of an array, the variable name is preceded by a dollar sign ‘$’.

Suppose that the variable $posted_information contains an email address and a telephone number submitted from a form, so that $posted_information looks as follows after decoding from hexadecimal:

$posted_information = email=name@yahoo.com&number=123-4567

After calling the function split as above, $fields[0] contains the value “email=name@yahoo.com” and $fields[1] has “number=123-4567”.By calling split again, we obtain the desired information.

($label, $email_address) = split(/=/, $fields[0]);

($label, $phone_number) = split(/=/, $fields[1]);

For this example, we separate values by an equals sign.

The function split returns an array, or list.Sometimes we know the number of fields split returns.For example, with email=name@yahoo.com, the function returns two fields.In this case, instead of using an arbitrary array name, we can give our own list, and Perl will assign values to each individual variable based on what split returns.Consequently, after execution of the above code, $email_address contains the value “name@yahoo.com” and $phone_number contains the value “123-4567”.

(Back to Top)


Environment Variables

The server sets seventeen environment variables that a CGI program can access.When a CGI program is called, the environment variables are available to the program.For example, for the method get, the environment variable QUERY_STRING has as its value the query string, or the string after the question mark. The value of QUERY_STRING is "ele=Li&ascii=html" in the following example:

http://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl?ele=Li&ascii=html

With the associative array %ENV, we can determine the character string values of any of the environment variables.For example, to obtain the query string when the method is get, we use $ENV with the index "QUERY_STRING", such as follows in Perl:

$QueryString = $ENV{"QUERY_STRING"};

We can determine the query method using the index "REQUEST_METHOD", as follows:

$RequestMethod = $ENV{"REQUEST_METHOD"};

With a post request, in the CGI's program standard input, the server sends the form's data but not necessarily the end-of-file marker, a special symbol indicating the end of the file.Thus, to process the correct amount of input, we use the value of the environment variable CONTENT_LENGTH, which contains the length of the query string.In the following statement we obtain the value of this character string context variable:

$ContentLength = $ENV{"CONTENT_LENGTH"};

Knowing that the request method is post and the length of the query string, we can read the characters into a string variable $QueryString, as follows:

read(STDIN, $QueryString, $ContentLength);

(Back to Top)


Exercises

1.     a.      Write a Perl segment to generate a plain text output with a greeting to the user.

        b.      Repeat Part a, generating HTML code output.

(Back to Top)


Projects

1.    Create a web page with a text box for the user's name and two radio buttons indicating plain text or html output.Develop a Perl script to generate a plain text output file or a web page, depending on the selected radio button.Each output file should contain a greeting to the user

2.    Create a web page that enables the user to type a binary or hexadecimal number in a text box and to have a Perl script return the corresponding decimal number.Have a pair of radio buttons to indicate whether the original number is in base 2 or 16.

3.    Create a web page with Perl script to perform a temperature conversion between the Fahrenheit and Celsius systems. Enable the user to type a number and to indicate the kind of conversion to perform.Have the answer also appear on a web page.The following formulas convert a temperature in Fahrenheit (F) to its equivalent in Celsius (C) and vice versa:           

4.    The following is a MySQL statement to create the table ecs_spectra, which you can also download here:

create table ecs_spectra (

       spec            varchar(10),

       wavelength      varchar(10),

       Rel_int         varchar(10),

       Aki             float,

       Acc             char(2),

       Ei              varchar(10),

       Ej              varchar(10),

       Configurationsvarchar(15),

       Terms           char(6),

       Ji              char(3),

       Jk              char(3),

       Gi              integer,

       Gk              integer,

       Type            char(2),

       Tp_refs         varchar(10),

       Line_refs       varchar(10)

);

         Table 1 of "Accessing Atomic Spectra Database Assignment" from Project 1 of "Accessing Databases with SQL" explains the meanings of the fields.The data type varchar(n) is the type of a variable length string of at most n characters, while char(n) is the type of a string of exactly n characters.The structure of the table ecs_spectra was derived by Dr. Orlando Karam from the NIST Atomic Spectra Database.   

         Create a web page with an HTML form to allow the user to specify values of certain fields and to indicate the desired data.Develop a Perl script to access the web page and generate a text document thanking the user and displaying the information from the request.(After the module on "CGI Programs and Databases," we can return the desired data to the user.)

5.    The following is a MySQL statement to create the table ecs_sf_sites, which you can also download here:

create table ecs_sf_sites (

       id              char(12) primary key,

       site_name       varchar(255),

       street_addr     varchar(255),

       city            varchar(255),

       state           varchar(255),

       zip             varchar(255),

       county          varchar(255),

       site_smsa     varchar(255),

       fed_facil     char(1),

       npl_stat        char(1),

       corp_link     varchar(255),

       rod_link        varchar(255),

       latitude        float,

       longitude     float,

       ownership     varchar(255),

       site_incident   varchar(255)

);

         The data type varchar(n) is the type of a variable length string of at most n characters, while char(n) is the type of a string of exactly n characters.The structure of the table ecs_sf_sites was derived Database by Dr. Orlando Karam from the EPA's Superfund (CERCLIS).

         Create a web page with an HTML form to allow the user to specify values of certain fields and to indicate the desired data.Develop a Perl script to access the web page and generate a text document thanking the user and displaying the information from the request.(After the module on "CGI Programs and Databases," we can return the desired data to the user.)

6.    The following are MySQL statements to create the tables FixedType and ClassCodeRef, which you can also download here:

create table FixedType (

       constellation     char(4) not null,

       ObjectName        varchar(30) not null,

       ClassCode         enum('D', 'S', 'V'),

       SpectralClass      char(2),

       hours             int,

       minutes           int,

       TimeSec           int,

       degrees           int,

       AngleSec          int,

       magnitude         float,

       primary key(constellation, ObjectName)

);

create table ClassCodeRef (

       ClassCode     enum('D', 'S', 'V') not null primary key,

       ClassCodeName varchar(255)

);

         Table 1 of "Accessing Star Database Assignment" fromProject 3 of "Accessing Databases with SQL" explains the meanings of the fields.The data type varchar(n) is the type of a variable length string of at most n characters, while char(n) is the type of a string of exactly n characters.(The structure was derived by Dr. Orlando Karam from star.dat by Dr. Dan Welch (seeProject 5 of "Introduction to Databases").)

         Create a web page with an HTML form to allow the user to specify values of certain fields and to indicate the desired data.Develop a Perl script to access the web page and generate a text document thanking the user and displaying the information from the request.(After the module on"CGI Programs and Databases," we can return the desired data to the user.)

(Back to Top)