In this module, we discuss the Structured Query Language (SQL), which is the English-like interface to relational databases. We study standard SQL in general and the SQL product MySQL in particular. A very good book on the subject is MySQL & mSQL by R. J. Yarger, G. Reese, and T. King, O'Reilly publ., Sebastopol, CA, 1999.

Database Creation

To start MySQL on a UNIX system, type mysql to launch the command version and xmysql for the GUI version. Chapter 4 of the book MySQL & mSQL has documentation on creating an empty database and administering a MySQL database with the recommended security and backup features. The administrative user for MySQL can create a database with a create database statement. As with all MySQL statements, a semicolon (;) ends the following example to create a database called BioStudy:

create database BioStudy;

To indicate which existing database we are we are using, we employ the use statement, such as follows:

use BioStudy;


Comments

As always, English comments to document code are important. In SQL, a comment line begins with a number sign (#) or semicolon (;) and continues to the end of the line, such as follows:
# SQL to create a database for Physics
# laboratory experiments

Identifiers

SQL keywords are not case-sensitive. Thus, select, Select, or SELECT accomplishes the same thing. However, the identifier names, such as names of relations and columns that we create, are case sensitive. Thus, table name solventtable is different than SolventTable. Standard SQL allows identifier names to contain letters, digits, and underscores (_) and requires that they begin with a letter. MySQL allows an identifier to have length up to 64 characters. The identifier should be descriptive of its role. For example, if a relation is to contain student data, we might use the identifier student. This name is self-documenting because it is descriptive. The identifier s, however, could stand for just about anything. A name should transmit as much information as possible to a reader of the commands, so that he or she can more easily correct and change the code.

Definition An identifier is self-documenting if the name is descriptive of the item's role.

In this module, we use lowercase for one-word identifier names, such as average. A compound identifier, such as TaxRate, has the first letter of each word capitalized. You might choose another convention. Each programmer develops his or her own style over time. The main criterion should be that the system used enhances code readability.

Quick Review Question
Quick Review Question 1  Which are the following are not "legal" SQL identifiers?
a. RunNumber b. Run# c. run number
d. 2run e. run_number f. run2


Table Creation and Data Types

Before we can add data to a table, the table must exist. Suppose we want to create the table CurrentBioStudy with primary key subject and other attributes of RhFactor and BloodType, as in the module on "Relational Databases." The schema is as follows:

BioStudy_schema = (subject:string, RhFactor:{+, -}, BloodType:{M, N, MN})

The following SQL generates table CurrentBioStudy with this schema:
	create table CurrentBioStudy (
	   subject   char(10) not null,
	   RhFactor  enum('+', '-'),
	   BloodType enum('M', 'N', 'MN'),
	   primary key(subject)
	);
Alternatively, we can place primary key on the same line as subject, as follows:
	create table CurrentBioStudy (
	   subject   char(10) not null primary key,
	   RhFactor  enum('+', '-'),
	   BloodType enum('M', 'N', 'MN')
	);
The name for the new table (CurrentBioStudy) appears after create table. A list of the column names and data types is in parentheses with commas separating the names. The modifier not null indicates that null cannot be a value in the column subject. After the list of column names and types or on the line with subject's declaration, we indicate that this attribute is the primary key. The data type char(10) allows for character strings of exactly 10 characters, padded for blanks as necessary. RhFactor and BloodType are both of enumeration types, and their possible values appear in apostrophes.

Frequently, attributes have numeric types. In a chemical database, attribute FreezingPoint has floating point type float or real, while EventNumber might have integer type int. The modifier unsigned, as in unsigned int, indicates that the value cannot be negative, such as for a counter. Scientific data can be very large or small. For such floating point numbers, we use the type double

The EventTime attribute for a psychology schema probably would be of type time and would have standard time values. DiscoveryDate would be an attribute of type date, the type for a standard date value.   

Attribute subject above is of type char(10), a fixed length string of 10 characters. If the value has less than 10 characters, such as "SubjectA", the computer pads the string with blanks on the end to be exactly 10 characters. Values of greater length are truncated or chopped off to exactly 10 characters.

The type of a variable length string of at most 100 characters is text(100) or varchar(100). Thus, the value "SubjectA" occupies space for only 8 characters. For text(n) the maximum string length n is at most 65,535, while for varchar, n is at most 255. When we know the maximum length of a string, as in the case of subject, we should use char(n) because database searching is faster on fixed length as opposed to variable length strings. Thus, char is used more often than text.

When sorting on a column of type char, text, or varchar, the database ignores case and sorts alphabetically as in a dictionary. Thus, string "anthracene," starting with a lowercase letter, sorts before "Benzene," starting with a uppercase letter, even though uppercase letters occur before lowercase ones in the ASCII encoding scheme. Sorting the strings in C++ using regular ASCII string comparisons, the order would be reversed.    

Table 1 summarizes some of the most frequently used data types.

Table 1. Some SQL data types

Data Type Meaning
char(n) Fixed-length string of n characters, padded with blanks as necessary; n ² 255
date Date; output format YYYY-MM-DD; input format YY-MM-DD, YYYY-MM-DD, YYMMDD, or YYYYMMDD
double Double floating point; approximate range ±10308; smallest positive value about 10-308
enum Enumeration; list of values in parentheses, each value in apostrophes
float or real Floating point; approximate range ±1038; smallest positive value about 10-38
int Integer; approximate range ±2 billion
money Money
text(n) Variable-length string of at most n characters; n ² 65,535
time Time; output format HH:MM:SS; input format HH:MM:SS, HHMMSS, HHMM, or HH
varchar(n) Variable-length string of at most n characters; n ² 255
year Year; output format YYYY or YY; input format YYYY or YY; YY indicates years between 1970 and 2069

To eliminate the necessity of specifying a frequently used field value, we can specify a default value. For example, because most individuals are Rh-positive, we can give the attribute RhFactor the default value of '+', as follows:

create table CurrentBioStudy (
   subject   char(10) not null primary key,
   RhFactor  enum('+', '-') default '+',
   BloodType enum('M', 'N', 'MN')
);

To save space, we should remove tables (drop table) that we no longer use. To drop a table CurrentBioStudy, we use the following statement

drop table CurrentBioStudy;
With this statement, all references to the table elsewhere, such as in constraints and foreign keys, are dropped.
Quick Review Question
Quick Review Question 2  Suppose we want to write a statement to create a table LabSection that has the following schema:

(CatalogNumber: string, SecYear: year, semester: {F, S, S1, S2}, section: char, DragExpWeek:integer)

Answer the following questions concerning this task:
a. What are the first three words of the statement?

b. Suppose CatalogNumber has a maximum length of 9. Using all lowercase, give the type of CatalogNumber.

c. Of SecYear

d. Of section

e. Of DragExpWeek

f. What is the first word used to indicate semester is an enumeration type.

g. Suppose the possible values of semester are F, S, S1, and S2. What encloses each value in the type declaration?
Apostrophes (')
Quotation marks (")
Square brackets ([ ])
Nothing


h. What encloses the list of these values?
Square brackets
Parentheses
Pointed brackets < >
Nothing


i. What separates these values?
Colons
Commas
Semicolons
Blanks


j. We indicate that CatalogNumber, SecYear, semester, and section must each have values by writing what two words (in lowercase) after each of their types?

k. What two words (in lowercase) do we use that represents the fact that these identifiers are underlined in the schema?

Write the entire statement, and click to view the answer.

Quick Review Question
Quick Review Question 3  Write a statement to remove table LabSection and all references to the table from the database. Use lowercase, except for the table name, and only one blank between words.


Show

It is easy to forget the name of a database or its structure. The show statement provides a way for us to refresh our memories. Once in MySQL, to obtain a listing of a databases, we write

show databases;

Within a database, we get a listing of the tables, as follows:

show tables;

To display the relations in database PsyDB that we are not using, we issue the statement

show tables from PsyDB;

The following statements display table LabSection's columns with their data types and the table's keys:

show columns from LabSection;
show keys from LabSection;

 
Quick Review Question
Quick Review Question 4  Type your answers on one line with one blank between items, using lowercase for keywords. For each part, write a statement to display
a. The relations in database BioStudy

b. The primary key in relation CurrentBioStudy

c. The data types for all attributes of CurrentBioStudy

d. The names of all databases


Alterations

To change the structure or name of a database or table, we employ the alter table statement. For example, to change the column RhFactor of table CurrentBioStudy to include the not null modifier, we have the following:

alter table CurrentBioStudy change RhFactor RhFactor enum('+', '-') not null;

 The last line includes a complete redeclaration of the column. RhFactor appears twice because we have the option of changing the column's name, say to factor, as follows:

alter table CurrentBioStudy change RhFactor factor enum('+', '-') not null;

To add the column temperature with type float to table PsyExp, we perform the add action, as follows:

alter table PsyExp add temperature float;

To give the default value of 1 to column EventCode of table PsyExp, we alter the column with a set default, as follows:

table PsyExp alter EventCode set default 1;

We can remove the column, temperature, with the drop action, as the following illustrates:

alter table PsyExp drop temperature;

Similarly, we can drop the default value for column EventCode of table PsyExp, as follows:

alter table PsyExp alter EventCode drop default;
To rename table PsyExp as PsychologyExp, we write:

alter table PsyExp rename as PsychologyExp;

 
Quick Review Question
Quick Review Question 5  In the following questions, use lowercase, except for names, and only one blank between words. Complete each of the following statements.
a. In table weather, add column wind with type int.

______(a)______ weather ____(b)____
    wind ___(c)___;

b. In table weather, change column wind to have the name WindVelocity and the type float.

______(a)_______ weather
___(b)____ wind
    ___________(c)_____________;

c. Remove column WindVelocity from table weather.

____(a)_____ weather _____(b)______ WindVelocity;


Insertions

Once created, we can put data into a table. For example, to insert the tuple (SubjectA, +, M) into table CurrentBioStudy, we write the following insert statement:
insert into CurrentBioStudy(subject, RhFactor, BloodType
values ('SubjectA','+', 'M');

Alternatively, using a set clause, we can place the values with the column names in any order, as follows:

insert into CurrentBioStudy
set RhFactor = '+',BloodType = 'M', subject = 'SubjectA';

In MySQL, if we have a value for every column, we can omit the column names:

insert into CurrentBioStudy
values ('SubjectB', '-', 'M');

For an apostrophe to be part of the constant, we employ \' to avoid terminating the string. For example, "Smith's lab" in the values clause is written as 'Smith\'s lab'. To store a backslash, we must write two backslashes, as in 'C:\\main' to represent "C:\main".

If we do not specify a value, the database stores null in that field unless the field has a default value. Also, we can use the constant null as a value, such as follows:

insert into CurrentBioStudy(subject, RhFactor, BloodType)
values ('SubjectC', null,'M');

MySQL allows us to insert several tuples at once, as the following illustrates:

insert into CurrentBioStudy
values ('SubjectD', null, 'M'),
       ('SubjectE', '+','MN'),
       ('SubjectF', '-','N');
Quick Review Question
Quick Review Question 6  Suppose into LabSection from Quick Review Question 2 we want to insert a row that has the following data:
Phys201, 2004, F, A, 3
a. What are the first three words of the statement?

b. What word appears before the listing of data?

c. What encloses each data value?
Apostrophes (‘)
Quotation marks (")
Square brackets ([ ])
Nothing


d. What encloses the list of data values?
brackets
Parentheses
Pointed brackets < >
Nothing


e. What separates these data values?
Colons
Commas
Semicolons
Blanks



Write the entire statement, and click to view the answer.

We can read data from a file and insert the data into a table with a load statement. We must create the table so that the columns agree with the data. If tabs separate the data items and a newline character terminates each tuple in the file, we can insert the data of file lab.dat, which is in the subdirectory /home/johnson/db/, into table LabSection, as follows:
load data infile '/home/johnson/db/lab.dat'
into table LabSection;
To load lab.dat from the current local directory, we can issue the following command using local:
load data infile local lab.dat
into table LabSection;

If we have a lot of data to add to a table, loading data from a file with the load statement can be much faster than using the insert statement.


Auto-Increment

We can establish an integer attribute as the primary key and have MySQL automatically increment the value with each table insertion. The auto_increment modifier can apply to at most one field. For example, the team entity in the physics laboratory database (Figure 16, "Entity-Relationship Model" module) has attributes TeamNumber and TeamGrade. TeamNumber is an integer primary key, and sequential integers form reasonable values. Thus, we create the team table as follows:
create table team (
	TeamNumber int not null primary key auto_increment,
	TeamGrade float
);
To use the auto_increment feature, we use null or 0 as the value for TeamNumber, as follows:
insert into team
values (null, 86.5);
If this insertion is the first one, the computer makes the TeamNumber value 1. On subsequent insertions, the value is one more than the largest TeamNumber.


Updating

Some databases change frequently. To modify the row of a table, we use the update statement. For example, to update the SubjectA tuple of the table CurrentBioStudy so that the Rh-factor is negative and the blood type is MN, we write the following:
update CurrentBioStudy
set RhFactor = '-', BloodType = 'MN'
where subject = 'SubjectA';
We use the where clause to indicate particular row(s). Although optional, the where clause occurs frequently in update statements.

Any relational operator (=, !=, <, <=, >, >=) is permissible for comparisons in a where clause. We can also use the logical connectors (not or !, or or ||, and or &&), bitwise operators (|, &, <<, >>), and arithmetic operators (+, -, *, /, %) of C++, as well as parentheses.

Example 1   Suppose table ChemExperiment of a chemical database has the following schema, ChemExperiments_schema, from Example 1 of the "Relational Database" module:

ChemExperiments_schema
ExpNumber SoluteMass SolventName SolventMass FP

    The following statement creates the table:
create table ChemExperiment (
	ExpNumber int not null primary key,
	SoluteMass float,
	SolventName char(30),
	SolventMass float,
	FP float
);
Suppose in Experiment 37, we dissolve a 13.5-gram sample of a nonelectrolyte in 100 grams of benzene, and the solution freezes at 2.95°C. To insert this information into the table, we type the following:
insert into ChemExperiment
values (37, 13.5, ‘benzene’, 100, 2.95);
If later we want to change every occurrence of "benzene" in the table to start with a capital letter, we issue an update command, as follows:
update ChemExperiment
set SolventName = 'Benzene'
where SolventName = 'benzene';
If the update is successful, MySQL issues a message that resembles the two lines below. One row was affected, matched, and changed.
Query OK, 1 row affected  (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
If the syntax of the update statement was correct, but the system found no rows that matched the where clause, a message like the following would appear:
Query OK, 0 row affected (0.01 sec)
Rows matched: 0 Changed: 0 Warnings: 0
If we know the row for which we are searching is in the database, we probably mistyped a data item, such as using Phys203 instead of Phys201.

If a statement has an SQL syntax error, MySQL issues an error message and gives the approximate location of the mistake.

To update the quantity of benzene to be 100 grams in every row in which the amount is greater than 100, we use the relational operator greater than (>) and the logical operator && (or and), as follows:
update ChemExperiment
set SolventMass = 100
where SolventName = 'benzene' &&
SolventMass > 100;

Quick Review Question
Quick Review Question 7 Suppose we want to update the LabSection table row above, (Phys201, 2004, F, A, 3), to have a DragExpWeek of 2. Type your answers on one line with one blank between items, using lowercase for keywords.
a. What are the first two words of the statement?

b. Give the clause to change the DragExpWeek column value to 2.

c. To update a particular row, we give values for the entire primary key. On one line, start the where clause by specifying a value for CatalogNumber. Do not type anything after the value.

d. Because the primary key is a collection of four fields, to designate a particular row, we must specify values for each field that is part of the key. In the where clause, what connector do we use at the end of Part c’s answer and between the other field assignments?
comma
&
and(or &&)
|
or (or ||)
blank



Write the entire statement, and click to view the answer.


Row Deletions
We also use the where clause in a delete statement to delete a row or rows from a table. To delete the row for Experiment 37, we give the table and the particular key, as follows:
delete from ChemExperiment
where ExpNumber = 37;

If we wish to delete every row that has a solute mass less than 1 gram, we write

delete from ChemExperiment
where SoluteMass < 1;

We must be very careful of such massive changes that could change a number of rows. It is advisable to make a backup of the database before issuing such commands. Later in this module, we discuss how to make backups. An even more dangerous delete statement does not have a where clause, resulting in removal of every row from the table.

# CAUTION: Removes every row from the table
delete from ChemExperiment;

Selections

The most common SQL statement is select, which allows us to query the database. To view the RhFactor of SubjectA in the CurrentBioStudy table, we have the following statement:
select RhFactor
from CurrentBioStudy
where subject = 'SubjectA';
The requested field value (+) appears with the name of the field, such as follows:
+----------+
| RhFactor |
+----------+
| +        |
+----------+
To see the entire record, we use the wild card * to designate all columns, as follows:
select *
from CurrentBioStudy
where subject = 'SubjectA';
The column names with values for SubjectA appear, as follows:
+----------+----------+-----------+
| subject  | RhFactor | BloodType |
+----------+----------+-----------+
| SubjectA | +        | M         |
+----------+----------+-----------+
If we wish to see all the data in the table, we use * and omit the where clause:
select * from CurrentBioStudy;
The output, which is unordered, might appear as follows:
+----------+----------+-----------+
| subject  | RhFactor | BloodType |
+----------+----------+-----------+
| SubjectA | +        | M         |
| SubjectB | +        | N         |
| SubjectD | -        | M         |
| SubjectE | +        |           |
| SubjectC | -        |           |
+----------+----------+-----------+
To alphabetize the results, we include the order by clause. For example, suppose we want to have a listing of the last names, first names, ids, and grades in the student table of Figure 4 of the "Database Development" module, such as follows:
+-----------+-----------+-----------+--------------+
| StudentId | LastName  | FirstName | StudentGrade |
+-----------+-----------+-----------+--------------+
|       107 | Davis     | George    | A            |
|       102 | Bolder    | Anne      | A            |
|       103 | Donnovan  | Patrick   | C            |
|       104 | Davis     | Kyle      | D            |
|       105 | d'Antonio | Mary      | B            |
|       106 | Iseman    | Bart      | F            |
|       101 | Davis     | George    | C            |
+-----------+-----------+-----------+--------------+

We want to alphabetize by the last name. However, if several students have the same last name, we want to alphabetize by their first names. If even they agree, we have the ids appear in sorted order. Although we are displaying all fields, because the output should appear in a different order, we specify the columns, as follows:

select LastName, FirstName, StudentId, StudentGrade
from student
order by LastName, FirstName, StudentId;
Execution of this select statement returns a table, such as the below. "George Davis" with student id 101 occurs before "George Davis" with id 107, and "Kyle Davis" appears after "George Davis".
+-----------+-----------+-----------+--------------+
| LastName  | FirstName | StudentId | StudentGrade |
+-----------+-----------+-----------+--------------+
| Bolder    | Anne      | 102       | A            |
| d'Antonio | Mary      | 105       | B            |
| Davis     | George    | 101       | A            |
| Davis     | George    | 107       | C            |
| Davis     | Kyle      | 104       | D            |
| Donnovan  | Patrick   | 103       | C            |
| Iseman    | Bart      | 106       | F            |
+-----------+-----------+-----------+--------------+
Example 2   In the "Relational Database" module, we use the notation (yyy link) sKnownCompound = Benzene(ChemKnowns) for selection of the tuple from the (yyy link) ChemKnowns relation that has a KnownCompound value of "Benzene". In SQL, we write
select *
from ChemKnowns
where KnownCompound = 'Benzene';
The select statement also handles the (yyy link) projection operation pKnownCompound(ChemKnowns) to obtain a list of the compounds in ChemKnowns, as follows:
select KnownCompound from ChemKnowns;
For a new table, SolventFP, of the solvent names and their freezing points, we use the notation
SolventFP ¨ pSolventName,FreezingPoint(ChemSolvents)
In SQL, we create SolventFP and insert the data into it by employing insert and select, as follows:
create table SolventFP (
     SolventName char(30) not null primary key,
     FreezingPoint float
);

insert into SolventFP (SolventName, FreezingPoint)
select (SolventName, FreezingPoint)
from ChemSolvents;
The number of columns and their types match between the SolventFP table and the projected ChemSolvents table.

Quick Review Question
Quick Review Question 8 Consider table LabSection from Quick Review Question 2. Type your answers on one line with one blank between items, using lowercase for keywords.
a. Write a statement to see the value of the DragExpWeek field for every tuple in the table.

b. Give the clause that would appear at the end of Part a's answer to obtain only results with SecYear after 1995. Do not end the clause with any punctuation.

c. Write a statement to view the entire table.

d. Give the clause to sort the result of Part c by SecYear. Do not end the clause with any punctuation.


Joining
We also use select to implement join operations. In Example 1 of the "Relational Databases" module, we developed the following three schemas:
ChemKnowns_schema
KnownCompound MolecularWeight
ChemSolvents_schema
SolventName FreezingPoint Kf
ChemExperiments_schema
ExpNumber SoluteMass SolventName SolventMass FP
Suppose relations ChemKnown, ChemSolvent, and ChemExperiment, respectively, have these schemas. To obtain a listing of solvents with freezing points from ChemSolvent and molecular weights from ChemKnown, we form the natural join, ChemSolvent *SolventName = KnownCompound ChemKnown. In SQL, we employ the select statement, listing both tables and the values that must agree, as follows:
select SolventName, FreezingPoint, MolecularWeight
from ChemSolvent, ChemKnown
where SolventName = KnownCompound;
For the data of Example 1 in the "Relational Database" module, only Benzene and p-Dichlorobenzene appear in both tables. Thus, the output is as follows:
+-------------------+---------------+-----------------+
| SolventName       | FreezingPoint | MolecularWeight |
+-------------------+---------------+-----------------+
| Benzene           |          5.50 |           78.11 |
| p-Dichlorobenzene |         53.00 |          147.00 |
+-------------------+---------------+-----------------+
To obtain the set of tuples where each experiment has data from ChemExperiment matched with the data about its solvent from ChemSolvent, we again equijoin on the solvent name. However, the column name is SolventName in both tables. To avoid ambiguity, we employ the dot operator as in C++, designating the tables and column name, ChemExperiment.SolventName and ChemSolvent.SolventName. Thus, to obtain the solute mass, solution freezing point (FP), solvent mass, and solvent constant Kf for Experiment 35, we perform the join as follows:
select SoluteMass, FP, SolventMass, Kf
from ChemExperiment, ChemSolvent
where ExpNumber = 35 and
ChemExperiment.SolventName = ChemSolvent.SolventName;
 
Quick Review Question
Quick Review Question 9 In Quick Review Question 13 of the "Relational Database" module, we considered relations session (ccc Charles, change time to EventTime in that module) and EventCode of the following schemas, respectively:

PsyExp_schema = (EventNumber:integer, EventTime:float, SubjectID:integer)
PsyEventCode_schema = (code:integer, event:string)

EventNumber and code have the same domain. Type your answers on one line with one blank between items, using lowercase for keywords.

a. To obtain a listing of the times and events, we project the natural join, pEventTime, event(session * EventNumber=code EventCode). Begin the statement to accomplish this task by choosing the columns to present.

b. Give the from clause.

c. Give the where clause. Do not type a semicolon.

d. Write the entire statement, and click to view the answer.

e. Write the where clause if column EventNumber were also called code. Do not use a semicolon.


Aggregate Functions

MySQL has a number of functions that we can use to obtain values or to specify in a where clause. Table 2 lists aggregate functions that apply to a set of values, such as all data in a column.

Table 2. MySQL aggregate functions on set of values, s

Function Meaning
avg(s) Average of values in s
count(s) Number of non-null values in s
max(s) Maximum value in s
min(s) Minimum value in s
std(s) Standard deviation of values in s
sum(s) Sum of values in s

To find the average week of the semester in which Physics 201 performed the Coefficient of Drag Experiment, we have the following statement:
select avg(DragExpWeek)
from LabSection
where CatalogNumber = 'Phys201';
The database does not include null values for DragExpWeek in computation of the average. The results might be reported as follows:
+------------------+
| avg(DragExpWeek) |
+------------------+
|           3.7500 |
+------------------+
The aggregate function count counts only on the non-null values. To obtain a count of the number of tuples in the relation CurrentBioStudy, even those with null for some but not all fields, we have * as the argument of count, as follows:
select count(*) from CurrentBioStudy;
For a count of the number of Rh+ subjects,
select count(*)
from CurrentBioStudy
where RhFactor = '+';
The modifier distinct appears before the column name to compute the function on only distinct values. Thus, to discover how many different solvents appear in the ChemExperiment relation, we write the following:
select distinct count(SolventName)
from ChemExperiment;

Grouping

Sometimes, we group tuples with a common attribute value to perform an aggregate function on the collection. For example, to display the number of subjects in each blood type, grouping by blood type, we display each blood type and a count of the elements in that group, as follows:
select BloodType, count(BloodType)
from CurrentBioStudy
group by BloodType;

Because of the amount of computation involved in executing order or group, the clauses should be used sparingly. Their use can significantly slow database access in a distributed environment. If many such summaries are needed, retrieval of the data from the database with summary processing by a computer program should be considered.

Quick Review Question
Quick Review Question 10 For the LabSection table, complete the statement to display each catalog number (CatalogNumber) and the average week (DragExpWeek) in which the course performs the Coefficient of Drag Experiment. Use lowercase and one blank between items.
select CatalogNumber, ___________(a)__________
from LabSection
_____________(b)_____________;


Non-Aggregate Functions

Besides aggregate functions, SQL has functions that apply to one value (such as a field value) at a time. Table 3 contains a partial list of such functions. These functions can appear in a select clause or a where clause.

Table 3. MySQL functions that apply to one value at a time. Material in square brackets, [ ], is optional.

Function Meaning
abs(n) Absolute value of n
acos(n) Arc cosine (inverse cosine of n)
asin(n) Arc sine (inverse sine of n)
atan(n) Arc tangent (inverse tangent of n)
concat(s1, s2, ..., sn) Concatenation of strings s1, s2, ..., sn
conv(n, b1, b2) Convert n from base b1 to base b2
ceiling(n) Ceiling of n, smallest integer ≥ n
cos(r) Cosine of r, which is in radians
cot(r) Cotangent of r, which is in radians
curdate() Current date; number YYYYMMDD or string 'YYYY-MM-DD'
curtime() Current time; number HHMMSS or string 'HH:MM:SS'
database() Name of current database
adddate(d, i) Date d plus interval i
subdate(d, i) Date d minus interval i
dayname(d) Day of week for date d
dayofmonth(d) Day of month for date d
degrees(r) Degrees corresponding to radians r
encrypt(s1, s2) Password encrypt string s1 using salt s2
exp(p) ep
floor(n) Floor of n, largest integer ≤ n
format(n, d) Format n with d decimal places
hour(t) Hour of time t
if(b, v1, v2) If b is true, v1, else v2
isnull(e) 1 (true) if e is null, 0 (false) otherwise
log(n) Natural logarithm of n
log10(n) Common logarithm of n
minute(t) Minute of time t
mod(n1, n2) n1 % n2
month(d) Number of month for date d
monthname(d) Name of month for date d
now() Current date and time
password(s) Password-encrypted string s
pi() Pi (π)
pow(n, p) np
radians(d) Radians corresponding to degrees d
rand([seed]) Random floating point number between 0 and 1; optional seed seed
round(n [, d]) n rounded to d decimal places or to an integer if no d
second(t) Seconds of time t
sin(r) Sine of r, which is in radians
sqrt(n) Square root of n
tan(r) Tangent of r, which is in radians
truncate(n, d) n truncated to d decimals
user() Current user
week(d) Week of year for date d
year(d) Year for date d

Suppose the table, mushroom, of a database contains the MushroomId and radius of the cap as well as other information for a number of mushrooms. To estimate the surface area of the top, we compute the area of a circle, πr2, as follows:
select MushroomId, Pi() * Pow(radius, 2)
from mushroom;

The area is computed individually for each value of radius. An alternative to Pow(radius, 2) is to multiply, radius * radius. As this example illustrates, we can use the arithmetic operators +, -, *, /, and % as well as the SQL functions.

Quick Review Question
Quick Review Question 11 Complete the select statement to display the estimated circumference (2πr) of each mushroom cap. Use one blank between each operator and operand.
select MushroomId, _______________(a)_________________
from mushroom;

Quick Review Question
Quick Review Question 12 The escape velocity of an object from a planet is , where G is the gravitational constant, M is the mass of the planet, and R is its radius. Suppose a table, planet, stores this information for a number of planets along with the planet names (PlanetName) and other information. Complete the select statement to return the name and escape velocity for each planet whose radius is larger than that of the earth (about 6,400,000 m).
a. Complete the select clause using one space between each operator and operand.
select PlanetName, _______________(a)________________

b. Complete the where clause with single blanks to separate words.
where R > (_______________(a)_______________);


Write the entire statement, and click to view the answer.



Indices

The primary key, such as subject in table CurrentBioStudy above, is an example of an attribute that has an associated index. Indexing was part of the early SQL standard but is not part later standards, such as SQL2. We include indexing because MySQL does and because it can speed database searching significantly. For example, if we plan to use BloodType from table CurrentBioStudy in queries, we can create an index BloodType_index for the attribute, as follows:
create index BloodType_index
on CurrentBioStudy (BloodType);
We can also create an index at the same time we form a table, such as follows:
create table CurrentBioStudy (
	subject char(10) not null primary key,
	RhFactor enum(‘+’, ‘-‘),
	BloodType enum(‘M’, ‘N’, ‘MN’) not null,
       	index BloodType_index (BloodType)
	);
As the example illustrates, any indexed column must have the not null modifier to indicate that a value in such a field cannot be null.

If we plan to perform queries involving RhFactor and BloodType together, both of which must be not null, we can create an index for the combination by listing the attributes in parentheses, as follows:
create index RhBloodType_index
on CurrentBioStudy (RhFactor, BloodType);
MySQL generates an index on RhFactor and BloodType together and on RhFactor by itself.
The naming of indices allows us to later drop them, such as
drop index RhBloodType_index;
Indices create a time-space trade-off. An index makes searches faster but consumes space. If we know that it is unlikely that we will perform further searching on an indexed attribute, we can drop the index to free the space and to eliminate the time involved in indexing new tuples.

Quick Review Question
Quick Review Question 13 The primary key of LabSection from Quick Review Question 2 is the collection of attributes CatalogNumber, SecYear, semester, section. Check all indices that are automatically created by declaring this collection a primary key:
CatalogNumber, SecYear, semester, section
CatalogNumber, SecYear, semester
SecYear, semester, section
CatalogNumber, SecYear
SecYear, semester
semester, section
CatalogNumber
SecYear
semester
section

Quick Review Question
Quick Review Question 14 Suppose we also frequently search on the DragExpWeek column of table LabSection. Create an index for that column.


Backups

Making backups is extremely important, particularly considering the amount of data that most databases store. The mysqldump utility, which we access from the operating system prompt, not in MySQL, stores ANSI SQL commands to recreate a database or a table. For example, we can use mysqldump to store SQL statements to create the tables and insert the data from the PsyDB database. If the database is password protected, we must also designate the user (here root) after –u and the password (here dbpass) immediately after -p. The following utility call at the operating system prompt creates a file, PsyDBBkup.sql, of SQL statements:
mysqldump -u root -p dbpass PsyDB > PsyDBBkup.sql

PsyDBBkup.sql, which begins as follows, contains SQL statements to recreate every table and its data:

	# MySQL dump 5.13
	#
  	# Host: localhost   Database: PsyDB
  	#--------------------------------------------------------
  	# Server version 3.22.22
  	#
 	# Table structure for table 'PsyEvent'
 	#
  	CREATE TABLE PsyEvent (
  		EventCode int(11) DEFAULT '0' NOT NULL,
  		event char(30),
  		PRIMARY KEY (EventCode)
  	);
  	#
  	# Dumping data for table 'PsyEvent'
  	#
  	INSERT INTO PsyEvent VALUES (1,'\'Left press\'');
  	INSERT INTO PsyEvent VALUES (2,'\'Right press\'');
  	INSERT INTO PsyEvent VALUES (3,'\'Turn on all panel lights\'');
  	...
To backup the table PsyExp from that database, we give the database and table names after mysqldump, as follows:
mysqldump -u root -p dbpass PsyDB PsyExp > PsyExp.sql
To recreate the database or table, we can execute the commands in the file at the operating system prompt. If PsyDB exists, but the PsyExp no longer does, we can recreate the table and its data using redirection in UNIX, as follows:
mysql -u root -p dbpass PsyDB < PsyExp.sql
Because the backup files are in ASCII format, we can also search for data with which to perform a partial backup.
Quick Review Question
Quick Review Question 15 From where do we use the mysqldump utility?

Operating system prompt
MySQL prompt
neither


Quick Review Question
Quick Review Question 16
a. Complete the command to store in file ChemBkup.sql the SQL to restore the ChemExpDB database.

___(a)____ -u root -pdbpass ___(b)___ _(c)_ ____(d)_____

b. Complete the command to restore ChemExpDB from file ChemBkup.sql.

___(a)___ -u root -p dbpass ___(b)___ _(c)_ ____(d)____

To store only the data and table creation statements in separate files in a subdirectory, PsyDir, we use the -T option, as follows:
mysqldump -u root -p dbpass -T PsyDir PsyDB
The directory PsyDir contains two files for each PsyDB table. One file having the name of the table with the extension .sql contains the SQL statement to create, but not to load, the table. Another file with the extension .txt contains the tab-deliminted data from the file. For example, the following file, PsyExp.sql, has the create statement for table PsyExp:
	# MySQL dump 5.13
  	#
  	# Host: localhost Database: PsyDB
  	#--------------------------------------------------------
  	# Server version 3.22.22
  	#
  	# Table structure for table 'PsyExp'
  	#
  	CREATE TABLE PsyExp (
  		EventTime float(10,2) DEFAULT '0.00' NOT NULL,
  		EventCode int(11) DEFAULT '0' NOT NULL,
  		ExpNumber int(11) DEFAULT '0' NOT NULL,
  		PRIMARY KEY (EventTime)
  	);
The file PsyExp.txt of tab-delimited data follows:
	1.22	 2	3
	2.71	 5	3
	3.02	 1	3
	5.66	 2	3
	5.91	 5	3
	7.55	 2	3
	8.00	10	3
This file can be input data for another table, a different database, or a spreadsheet. In MySQL, we can use the load statement to insert the data into a table, as follows:
load data local infile 'PsyExp.txt' into table PsyExp;
The database administrator can also have the system generate an update log file by launching the mysql database server with the --log-update option.

Exercises

1. Write statements to do each of the following tasks:
    a. Create a database called anthropology
    b. Use that database
    c. Create a discovery table with the following fields: ItemID (integer key), DiscoveryDate, longitude and latitude (floating point of location of discovery), condition (enumeration type with possible values of EXCELLENT, VERYGOOD, GOOD, POOR, and VERYPOOR) with default value GOOD, and carbon14 (floating point amount of Carbon-14). A Geiger counter measures the amount of radioactive carbon-14 in the sample. Living material has a disintegration rate of about 13.5 atoms per minute per gram of carbon. When the organism dies, the disintegration rate decreases, enabling us to date the material.
    d. Change the table to include column ItemDescription (a possibly long string).
    e. Add three rows of data all at one time to the table
    f. Add one row that does not specify condition so that the field uses the default value.
    g. Update the first tuple to have a condition of POOR
    h. Remove the last row added to the table
    i. Assume more data has been added to the table. View the ItemIDs and descriptions of all items in poor condition.
    j. Count the number of items
    k. Compute the approximate age of the item as

2. For the psychology database of Quick Review Question 8, write a statement to display each event code (EventNumber) and the number of times that event occurs.

3. a. Example 1 of the "Relational Database" module computed an estimate of the molecular weight of a solute, as follows, where the masses are in grams:


Kf is a column in table ChemSolvent, and the other columns are in ChemExperiment. Write a statement to display the experiment number (ExpNumber) and the computed molecular weight for each solvent.
Write the entire statement, and click to view the answer.

b. Write a statement to return the estimated molecular weight for the experiment number 37.

Write the entire statement, and click to view the answer.

c. Write a statement to query the ChemKnowns relation to return any KnownCompound whose MolecularWeight (g/mole) is within 5 g/mole of the value computed from Part b.

Write the entire statement, and click to view the answer.


Projects


Develop a database for each of Projects 1-4 from module "Entity-Relationship Model", which was normalized and converted to a logical relational database model in module "Database Development". Keeping an online log of your work, perform a number of queries on the database.

1. Project 1 2. Project 2
3. Project 3 4. Project 4

Develop a database for each of Projects 5 and 6 from module "Relational Databases". Keeping an online log of your work, perform a number of queries on the database.

5. Project 1
6. Project 2 (yyy link) Click here for a data file.

7. Develop a psychology database as described in Quick Review Question 2 of module "Relational Databases". Keeping an online log of your work, perform a number of queries on the database. (yyy link) Click here for data files.

8. Develop a psychology database xxx Alliston Reid's program. Keeping an online log of your work, perform a number of queries on the database. (yyy link) Click here for data files.

Copyright © 2002, Dr. Angela B. Shiflet
All rights reserved