CGI Programs and Web Forms


Links to CGI References

The following are links to CGI references:


Introduction

A plain HTML document to produce a web page is static and does not change, say to respond to a database query; but a Common Gateway Interface (CGI) program is dynamic and executes in real-time. A CGI program can take a request for information from a database, which the user submits with an HTML form, query the database, perform any required processing, and return the results to a web page or a plain text document. The user visits our web site, and, as necessary, the CGI program runs on our machine. The CGI program can interface both with our web page and our database as well as perform additional processing. 

Because the program runs on our computer, we must be conscious of security. Usually, a CGI program must be in the /cgi-bin directory, and the webmaster controls this directory, allowing only authorized programs to reside in this area. 

For a CGI program, we can use any programming language that produces an executable file. C, C++, FORTRAN, Perl, Python, UNIX scripts, Visual BASIC, AppleScript, and TCL are common choices. We use C++ in this and the next module to avoid discussing another language and to have a language that has compiled programs, which usually execute faster than Perl scripts. Such compiled programs are particularly advantageous for numerous queries of large scientific databases. 

The current module discusses C++ CGI program interface to HTML forms, and next module, "CGI Programs and Databases", covers C++ CGI program access of MySQL databases.


Communication Methods

As Figure 1 in the module "Web Forms for Database Queries" shows, CGI is the interface for passing information between the web server and the CGI program, which is in C++ or another programming language. CGI accomplishes the communication through four methods:

The web server presents the user with a form that he or she completes and submits. Using standard input and environment variables, the server sends the information from the form to the CGI program. The program transmits SQL requests to the database; and the database returns the appropriate information, which the program processes and communicates to the web server using the standard output method.

Quick Review Question
Quick Review Question 1
Select the method(s) that CGI uses for communication.
                   
command line  directories    environment variables
schema  standard query language  standard input
standard output TCP/IP


CGI Output

For communication from a CGI program to the user, such as after accessing the database and performing any processing, the program writes results to a MIME (Multipurpose Internet Mail Extension) encoded standard output file, and the web server returns this file to the browser. On the web page "Atomic Weights and Isotopic Compositions of the Elements with Relative Atomic Masses," the user can specify output to be as a HTML Table, Pre-formatted ASCII, Table or Linearized ASCII Output. The latter two choices indicate an ASCII file, or text file, that is not a web page. In this case, the first line of output contains the content-type descriptor indicating plain text, as follows:

content-type: text/plain

Regardless of the content type, the second output line must be blank, so we have endl twice.  Suppose in a C++ CGI program the variables formula, MolecularWt, RegistryNum, and ChemStruct have appropriate values from the database. The following code segment produces a text file displaying the data:

cout << "content-type: text/plain" << endl << endl;
cout << "Benzene" << endl << endl;
cout << " * Formula: " << formula << endl;
cout << " * Molecular Weight: " << MolecularWt << endl;
cout << " * CAS Registry Number: " << RegistryNum << endl;
cout << " * Chemical Structure: [" << ChemStruct << "]" << endl;

The text file that the browser displays is similar to the followings:

Benzene

* Formula: C6H6
* Molecular Weight: 78.11
* CAS Registry Number: 71-43-2
* Chemical Structure: [C6H6]

For a web page, the first line of output contains a MIME content-type descriptor indicating an HTML document, as follows:

content-type: text/html

In the C++ program, we are careful to display a blank line after this output by having endl twice. Using the insertion operator <<, we write HTML code to the standard output stream cout. Except for the content-type line, use of '\n' or endl is optional but avoids having the HTML code output appear on one line. For the web page to have a paragraph or line break, we write the tag <p> or <br> to the output, as on the third-from-the bottom line in the following segment:

cout << "content-type: text/html" << endl << endl;
cout << "<html>" << endl << "<head>" << endl;
cout <<"<title<Benzene Search Results</title>" << endl;
cout << "</head>" << endl << endl << "<body>" << endl;;
cout <<"<h1>Benzene</h1>" << endl;
cout << "<u1>" << endl;
cout << "<li>Formula: "                         << formula           << endl;
cout << "<li>Molecular Weight: "       <<MolecularWt << endl;
cout << "<li>CAS Registry Number: " << RegistryNum << endl;
cout << "<li>Chemical Structure: ["     << ChemStruct   << "]"  << endl;
cout << "/ul>" << endl;
cout << "<center> If you have comments or questions, <br>" << endl;
cout << "please contact us. </center>" << endl;
cout << "</body>" << endl << "</html>" << endl;

The segment generates the following output:

content-type: text/html

<html>
<head>
<title>Benzene Search Results</title>
</head>

<body>
<h1>Benzene<h1>
<u1>
<li>Formula: 06H6
<li>Molecular Weight: 78.11
<li>CAS Registry Number: 71-43-2
<li>Chemical Structure: [06H6]</ul>
<center>If you have comments or questions, <br> please contact us. </center>
</body>
</html>

Click here to view the resulting web page.

Quick Review Question
Quick Review Question 2. Have no blanks in your answers and use all lowercase. 

a. Give the hyphenated word that begins the first line of output for a standard output file from a CGI progam.


b. Give the punctuation symbol that immediately follows this word.


c. Give the remainder of the content-type descriptor line to indicate ASCII output.


d. Give the remainder of the content-type descriptor line to indicate web page output.


e. Give the minimum number of blank lines, if any, that must follow the content-type descriptor line.


f. For ASCII code output from a CGI program, check all items that we can use to cause advancement to a new line in the output file.
<br>        endl           \n           <p>


g. For HTML code output from a CGI program, check all items that we can use to cause advancement to a new line or paragraph in the resulting web page.

    <br>        endl           \n            <p>


CGI Input

We employ forms to enter data or query a database via the web. A CGI program must obtain this data before accessing the database. For example, using the "Atomic Weights and Isotopic Compositions of the Elements with Relative Atomic Masses," suppose we type "Li" for the atomic symbol of lithium, choose "HTML Table," and click "Get Data".  (Page "Atomic Weights and Isotopic Compositions of the Elements with Relative Atomic Masses" is derived form one at http://physics.nist.gov/PhysRefData/Compositions/index.html.) With the form method get, the URL for the result is as follows:

http://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl?ele=Li&ascii=html

After the question mark (?), the CGI query string contains a list of name-value pairs. For example, in the "Atomic Symbol or Number" text box we typed "Li", and HTML code reveals that the name of this text box is ele. Thus, the URL string contains "ele=Li". Similarly, the string also contains "ascii=html" because, we selected the button with value "html" from the ascii radio button group. An ampersand (&) separates these name-value pairs.

If the form method for submission were post instead of get, the question mark and CGI query string would not appear in the URL, thus simplifying the URL and avoiding URL length restrictions from the browser. With post, a second step occurs in which, invisible to the user, the server passes the query string to the CGI program in a standard input file.

A query string, such as "ele=Li&ascii=html", is encoded in standard URL format in which a blank is replaced by a plus (+) and a non-alphanumeric special character, such as the slash (/), is replaced by a percent sign (%) and its ASCII code in hexadecimal (base 16) representation, such as %2F. Thus, the string "val = x + 85.2" with blanks around the equals mark and plus sign is encoded as "val+%3D+x+%2B+85.2". In the encoded string, four pluses replace the blanks; 3D is the hexadecimal ASCII code for '='; and 2B is the code for '+'. The character-by-character encoding is as follows:

String
v
a
l
=
x
+
8
5
.
2
Encoded String
v
a
l
+
%3D
+
x
+
%2B
+
8
5
.
2

The CGI program must decode the encoded string before processing further. Fortunately, libraries exist for performing this task in several languages. NCSA's "Decoding FORMs with CGI" and "Programs and Scripts: C and C++: Libraries and Classes" present lists of such libraries. However, for generality and to illustrate the process, we use C++ to decode the query string.

Quick Review Question
Quick Review Question 3.

a. If three (3) ampersands are in a query string, select the number of name-value pairs.

2 4 Impossible to tell

b. Give the number of characters in the decoded value of the query string "seq=%27%2F3%27".


c. Suppose a user types three characters:  #, a blank, and 5, in a form input blank that has the name val. Referring to the table of ASCII Codes, give the encoded query substring containing name-value pair.


Algorithms to Process a Query String

Figure 1 gives a structure chart for functions to process a query string. NamesValues takes the query string and communicates back two string arrays of corresponding names and values along with the numbers of such pairs. ConvertPair receives a string containing an unconverted name-value pair and returns a string name and corresponding decoded string value. PlusToBlank changes all pluses to blanks in a string. HexToChar converts each percent sign followed by 2 hexadecimal digits to the corresponding character. HexToDec returns a decimal number (0-15) corresponding to a hexadecimal character ('0'-'9', 'A'-'F').

Figure 1. Structure chart for obtaining arrays of names and values from query string

In the query string, the '&' character separates name-value pairs. Thus, as the following algorithm for NamesValues indicates, we repeatedly search for '&' to obtain a substring with the encoded pair. Sending the encoded pair string to ConvertPair, the function returns two strings, the name and decoded value. Decoding accomplishes changing each '+' to blank and each '%' followed by a two-digit hexadecimal number to the character that the number encodes.


NamesValues (QueryString, name[], value[], NumPairs)
Function to take a query string and send back the number of name-value pairs and two arrays with the corresponding names and values

Pre: QueryString is a well-formed query string.

Post: name is a string array with the names.
value is a string array with the values. NumPairs is the number of name-value pairs.
Algorithm:
NumPairs 0
while the length of QueryString is greater than 0
[NumPairs])
increment NumPairs by 1

The algorithm for ConvertPair, which follows, searches for the first '=', the character that separates the name substring from the value substring. The name substring, which occurs before '=', needs no further processing. However, we call PlusToBlank for the plus-to-blank conversion and HexToChar for the hexadecimal-code-to-character translation of the encoded value substring.

ConvertPair(pair, name, value)

Function to convert a string containing a name-value pair to a string name and a string value

Pre: pair is a string containing a name-value pair

Post: name is a string containing the name.
         value is a string containing the value. 

Algorithm:
name <- substring of pair up to '='
     value <- substring of pair from beyond '=' to end of string
     call PlusToBlank(value) to convert each '+' to blank
     call HexToChar(value) to convert each %-hexadecimal-number to a character


     if '&' is not found in QueryString // on last pair


          pair QueryString
          QueryString null string
     else
          pair
substring of QueryString from beginning up to '&'
          QueryString
QueryString from just beyond first '&' to end of string
call ConvertPair(pair, name[NumPairs], value

In a loop, PlusToBlank replaces each '+' with a blank, as the following algorithm indicates:

PlusToBlank(value)
Function to change each plus to a blank

Pre: value is a string.

Post: Every '+' in value has been changed to a blank.

Algorithm:
     while '+' is found
          change that '+' to blank

In the encoded value string, HexToChar replaces each sequence of '%' and two hexadecimal character digits with one character. "The ASCII Character Code" lists the two-digit hexadecimal numbers with their corresponding characters. As the section "Conversion from Hexadecimal to Decimal Numbers" of "Hexadecimal Representation" explains, to determine the decimal number corresponding to a two-digit hexadecimal number, we convert each digit to the corresponding decimal number, multiply the first number by 16, and add the results. For example,

3D16 = 3 * 16 + 13 = 6110  

Repeatedly, if HexToChar finds '%', the function calls HexToDec twice, once with each of the following two hexadecimal character digits, such as '3' and 'D', as a arguments to return the decimal equivalent, such as 3 and 13.  After completing the computation of the decimal number that corresponds to the sequence of two hexadecimal characters, HexToChar concatenates together the substring of value before the three characters beginning with '%', the character corresponding to the hexadecimal sequence, and the substring after that sequence.  The process continues until value does not contain any '%' character.  The algorithm for HexToChar follows:

HexToChar(value)

Function to change each percent sign followed by 2 hexadecimal   digits to the corresponding character.

Pre:  value is a string.

Post: Each hexadecimal-encoded character sequence has been changed to the corresponding character

Algorithm:
     while '%' is found
          dec HexToDec(character after '%' in value) * 16 + 
                       HexToDec(second character after '%' in value)
          value concatenate together the following:
                       substring of value up to '%'
                       character corresponding to dec 
                       substring of value after '%' and two hexadecimal characters


HexToDec determines if a hexadecimal character is in the range '0' - '9', 'a' - 'f', or 'A' - 'F'.  In each case, we subtract the lowest character in the sequence'0', 'a', or 'A' from the character to obtain the digit's relative position.  For example, '3' is '3' - '0' = 3 beyond '0'; and 'D' is 'D' - 'A' = 3 beyond 'A'.  For a letter, we also add 10 to promote its value into the teens, such as 3 + 10 = 13 for the decimal equivalent of 'D'.  The following algorithm presents the logic of HexToDec:

HexToDec(c) -> digit 
Function to take a hexadecimal character and to return the corresponding decimal number

Pre: c is a hexadecimal character.

Post: The corresponding decimal number has been returned.

Algorithm:
     if c is between '0' and '9'
                 digit <- c - '0'
�    else if c is between 'a' and 'f'
                 digit <- c - 'a' + 10
     else
                 digit <- c - 'A' + 10
     return digit

Alternatively, instead of checking if a non-digit character is a lowercase or uppercase character, we can perform the computation with the uppercase version of the letter, as follows:

digit <- toupper(c) - 'A' + 10


C++ Code for the CGI Input Algorithms

Four of the functions perform a string search for a character: NamesValues for '&', ConvertPair for '=', PlusToBlank for '+', and HexToChar for '%'.  In each case, we use the std::string method function find to locate the character.  For example, the following statement assigns to the integer variable loc the location (index) of the first occurrence of '&' in the string object QueryString.

 loc = QueryString.find('&');

If '&' does not appear in the string, find returns -1. 

The string member function substr returns a substring.  For example, the following statement assigns to string variable pair the substring of QueryString from the beginning for loc number of characters:

         pair = QueryString.substr(0, loc);

To obtain the number of characters in a string, we can call the string member function length, as in QueryString.length().  To concatenate two strings together, we employ the plus (+) operator.

If the string value represents an integer, such as "24", we can use atoi(value) to return the corresponding integer, such as 24.  Similarly, if value is a string representing a floating point number, such as "85.2", we can use atof(value) to obtain the corresponding number.

Quick Review Question
Quick Review Question 4. Do not type any blanks in your answers.

a. Complete the segment to convert each '+' to a blank in string variable value.

while((LocPlus = ) >= 0) value[LocPlus] = ' '; 


b. Suppose temp is a string variable.  Complete the boolean test in the while loop so that iterations continue as long as temp is not the null string.

while(> 0)


c. The following string variable star stores data for an object in the constellation Andromeda:

 string star = "AND,3,f|S|K0,23:04:11,50:03,4.64,2000";

Fill in the numbers to assign to string variable obj the substring up to but not including the first vertical line of star.

 obj = star.substr();


Environment Variables

The server sets seventeen environment variables that a CGI program can access.  When a CGI program is called, the environment variables are available to the program.  For example, for the method get, the environment variable QUERY_STRING has as its value the query string, or the string after the question mark. The value of QUERY_STRING is "ele=Li&ascii=html" in the following example:

http://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl?ele=Li&ascii=html

With the standard library function getenv, we can determine the character string values of any of the environment variables.  For example, to obtain the query string when the method is get, we use getenv with the parameter "QUERY_STRING", such as follows in C++:

string QueryString = getenv("QUERY_STRING");

We can determine the query method by invoking the getenv function and the parameter "REQUEST_METHOD", as follows in C++:

string RequestMethod = getenv("REQUEST_METHOD");

With a post request, in the CGI's program standard input, the server sends the form's data but not necessarily the end-of-file marker, a special symbol indicating the end of the file.  Thus, to process the correct amount of input, we use the value of the environment variable CONTEXT_LENGTH, which contains the length of the query string.  In the following statement, through getenv we obtain the value of this character string context variable and return the equivalent integer with atoi:

int ContextLength = atoi(getenv("CONTENT_LENGTH"));

Knowing that the request method is post and the length of the query string, we read the characters one at a time and concatenate them onto the end of a string variable, in this case QueryString, as follows:

char ch;

QueryString = "";

for (int i = 0; i < ContextLength; i++)

{

   cin << ch;

   QueryString += ch;

}


Exercises

1.      a.  Write a C++ segment to generate a plain text output with a greeting to the user.

         b.  Repeat Part a, generating HTML code output.

2.      Suppose s is a string variable of 20 characters.  Complete the statement to assign to string variable t a string consisting of s with its fifth through eighth characters removed so that the prefix and suffix of s are concatenated.

 Exercises 3-11 relate to the object-oriented development of the routines of the section "Algorithms to Process a Query String."

3.      Write a header file FromBrowser.h for a class FromBrowser to accomplish standard input from a browser to a C++ CGI program.  The following functions are public:  the default constructor, NamesValues, and GetNamesValues to return two string arrays and an integer.  The private data are string variable QueryString, string arrays name and value and integer variable NumPairs.  ConvertPair, PlusToBlank, HexToChar, and HexToDec are private member functions.

4.      Define the member function GetNamesValues.

5.      Define the member function PlusToBlank.

6.      Define the member function HexToDec.

7.      Define the member function HexToChar.

8.      Define the member function ConvertPair.

9.      Define the member function NamesValues.

10.    Define the default constructor to obtain a value for QueryString whether the method is get or post.

11.    Write a main function using the FromBrowser class to display the decoded name and value pairs.


Projects

 See Exercises 3-10 for an object-oriented development of the class FromBrowser with algorithms from the section "Algorithms to Process a Query String."

1.      Create a web page with a text box for the user's name and two radio buttons indicating plain text or html output.  Develop a CGI program, to generate a plain text output file or a web page, depending on the selected radio button.  Each output file should contain a greeting to the user

 2.     Create a web page that enables the user to type a binary or hexadecimal number in a text box and to have a CGI program return the corresponding decimal number.  Have a pair of radio buttons to indicate whether the original number is in base 2 or 16.

3.      Create a web page with CGI program to perform a temperature conversion between the Fahrenheit and Celsius systems. Enable the user to type a number and to indicate the kind of conversion to perform.  Have the answer also appear on a web page.  The following formulas convert a temperature in Fahrenheit (F) to its equivalent in Celsius (C) and vice versa:

                 

4.      The following is a MySQL statement to create the table ecs_spectra, which you can also download (spectrauct.txt) here:

create table ecs_spectra (

       spec                varchar(10),

       wavelength          varchar(10),

       Rel_int             varchar(10),

       Aki                 float,

       Ac                  char(2),

       Ei                  varchar(10),

       Ej                  varchar(10),

       Configurations      varchar(15),

       Terms               char(6),

       Ji                  char(3),

       Jk                  char(3),

       Gi                  integer,

       Gk                  integer,

       Type                char(2),

       Tp_refs             varchar(10),

       Line_refs           varchar(10)

);

Table 1 of "Accessing Atomic Spectra Database Assignment" from Project 1 of "Accessing Databases with SQL" explains the meanings of the fields.  The data type varchar(n) is the type of a variable length string of at most n characters, while char(n) is the type of a string of exactly n characters.  The structure of the table ecs_spectra was derived by Dr. Orlando Karam from the NIST Atomic Spectra Database .  

Create a web page with an HTML form to allow the user to specify values of certain fields and to indicate the desired data.  Develop a CGI program to access the web page and generate a text document thanking the user and displaying the information from the request.  (After the module on "CGI Programs and Databases," we can return the desired data to the user.)

5.      The following is a MySQL statement to create the table ecs_sf_sites, which you can also download (envirouct.txt) here:

 create table ecs_sf_sites (

       id                    char(12) primary key,

       site_name             varchar(255),

       street_addr           varchar(255),

       city                  varchar(255),

       state                 varchar(255),

       zip                   varchar(255),

       county                varchar(255),

       site_smsa             varchar(255),

       fed_facil             char(1),

       npl_stat              char(1),

       corp_link             varchar(255),

       rod_link              varchar(255),

       latitude              float,

       longitude             float,

       ownership             varchar(255),

       site_incident         varchar(255)

);

 

The data type varchar(n) is the type of a variable length string of at most n characters, while char(n) is the type of a string of exactly n characters.  The structure of the table ecs_spectra was derived by Dr. Orlando Karam from the EPA's Superfund (CERCLIS) Database.  

Create a web page with an HTML form to allow the user to specify values of certain fields and to indicate the desired data.  Develop a CGI program to access the web page and generate a text document thanking the user and displaying the information from the request.  (After the module on "CGI Programs and Databases," we can return the desired data to the user.)     

6.    The following are MySQL statements to create the tables FixedType and ClassCodeRef, which you can also download here:

create table FixedType (
          constellation char(4) not null,
          ObjectName varchar(30) not null,
          ClassCode enum('D', 'S', 'V'),
          SpectralClass char(2),
          hours int,
          minutes int,
          TimeSec int,
          degrees int,
          AngleSec int, 
          magnitude float,
          primary key(constellation, ObjectName)
);

create table ClassCodeRef (
          ClassCode enum('D', 'S', 'V') not null primary key,
          ClassCodeName varchar(255)
);

Table 1 of "Accessing Star Database Assignment" from Project 3 of "Accessing Databases with SQL" explains the meanings of the fields. The data type varchar(n) is the type of a variable length string of at most n characters, while char(n) is the type of a string of exactly n characters. (The structure was derived by Dr. Orlando Karam from star.dat by Dr. Dan Welch (see Project 5 of "Introduction to Databases").) 

Create a web page with an HTML form to allow the user to specify values of certain fields and to indicate the desired data. Develop a CGI program to access the web page and generate a text document thanking the user and displaying the information from the request. (After the module on "CGI Programs and Databases," we can return the desired data to the user.)


Copyright � 2002, Dr. Angela B. Shiflet
All rights reserved