Creating and Changing Databases with SQL

We considered relational database queries in the module on "Accessing Databases with SQL."  In the module on "Updating Data with SQL," we discussed how to backup and restore relational databases and how to add, change, and delete the tuples of data in such databases.  After learning in other modules how to carefully design normalized databases, we are now ready to consider creating relational databases  and their tables using the Structured Query Language.  This discussion must include material on identifiers and data types for attributes.  Because maintenance of databases sometimes involves changing schemas, we also consider altering the structure of tables.


Database Creations

Chapter 4 of the book MySQL & mSQL by R. J. Yarger, G. Reese, and T. King has documentation on creating an empty database and administering a MySQL database with the recommended security and backup features.  The administrative user for MySQL can create a database with a create database statement.  As with all MySQL statements, a semicolon (;) ends the following example to create a database called BioStudy:

create database BioStudy;

 


Identifiers

SQL keywords are not case-sensitive.  Thus, select, Select, or SELECT accomplishes the same thing.  However, the identifier names, such as names of relations and columns that we create, are case sensitive.  Thus, table name solventtable is different than SolventTable.

Standard SQL allows identifier names to contain letters, digits, and underscores (_) and requires that they begin with a letter.  MySQL allows an identifier  to have length up to 64 characters.  The identifier should be descriptive of its role.  For example, if a relation is to contain student data, we might use the identifier student.  This name is self-documenting because it is descriptive.  The identifier s, however, could stand for just about anything.  A name should transmit as much information as possible to a reader of the commands, so that he or she can more easily correct and change the code.

Definition.    An identifier is self-documenting if the name is descriptive of the item's role.

In this module, we use lowercase for one-word identifier names, such as average.  A compound identifier, such as TaxRate, has the first letter of each word capitalized.  You might choose another convention.  Each programmer develops his or her own style over time.  The main criterion should be that the system used enhances code readability.  

Quick Review Question
Quick Review Question 1  Which are the following are not "legal" SQL identifiers?
a. RunNumber b. Run# c. run number
d. 2run e. run_number f. run2

Table Creation and Data Types

Before we can add data to a table, the table must exist.  Suppose we want to create the table CurrentBioStudy with primary key subject and other attributes of RhFactor and BloodType, as in the module on �Relational Databases.�   The schema is as follows:

         BioStudy_schema = (subject:string, RhFactor:{+, -}, BloodType:{M, N, MN})

The following SQL generates table CurrentBioStudy with this schema:

 create table CurrentBioStudy (

   subject   char(10) not null,

   RhFactor  enum(�+�, �-�),

   BloodType enum(�M�, �N�, MN�),

   primary key(subject)

);

 

Alternatively, we can place primary key on the same line as subject, as follows:

create table CurrentBioStudy (

   subject   char(10) not null primary key,

   RhFactor  enum(�+�, �-�),

   BloodType enum(�M�, �N�, �MN�)

};

 

The name for the new table (CurrentBioStudy) appears after create table.  A list of the column names and data types is in parentheses with commas separating the names.  The modifier not null indicates that null cannot be a value in the column subject.  After the list of column names and types or on the line with subject�s declaration, we indicate that this attribute is the primary key.  The data type char(10) allows for character strings of exactly 10 characters, padded for blanks as necessary.  RhFactor and BloodType are both of enumeration types, and their possible values appear in apostrophes.  

Frequently, attributes have numeric types.  In a chemical database, attribute FreezingPoint has floating point type float or real, while EventNumber might have integer type int.  The modifier unsigned, as in unsigned int, indicates that the value cannot be negative, such as for a counter.  Scientific data can be very large or small.  For such floating point numbers, we use the type double.

The EventTime attribute for a psychology schema probably would be of type time and would have standard time values.  DiscoveryDate would be an attribute of type date, the type for a standard date value.

Attribute subject above is of type char(10), a fixed length string of 10 characters.  If the value has less than 10 characters, such as �SubjectA�, the computer pads the string with blanks on the end to be exactly 10 characters. Values of greater length are truncated or chopped off to exactly 10 characters. 

The type of a variable length string of at most 100 characters is text(100) or varchar(100).  Thus, the value �SubjectA� occupies space for only 8 characters.  For text(n) the maximum string length n is at most 65,535, while for varchar(n), n is at most 255.  When we know the maximum length of a string, as in the case of subject, we should use char(n) because database searching is faster on fixed length as opposed to variable length strings.  Thus, char is used more often than text.

When sorting on a column of type char, text, or varchar, the database ignores case and sorts alphabetically as in a dictionary.  Thus, string �anthracene,� starting with a lowercase letter, sorts before �Benzene,� starting with a uppercase letter, even though uppercase letters occur before lowercase ones in the ASCII encoding scheme. Sorting the strings in C++ using regular ASCII string comparisons, the order would be reversed. 

         Table 1 summarizes some of the most frequently used data types.

Table 1.         Some SQL data types

Data Type

Meaning

char(n)

Fixed-length string of n characters, padded with blanks as necessary; n ≤ 255

date

Date; output format YYYY-MM-DD; input format YY-MM-DD, YYYY-MM-DD, YYMMDD, or YYYYMMDD

double

Double floating point; approximate range �10308; smallest positive value about 10-308

enum

Enumeration; list of values in parentheses, each value in apostrophes

float or real

Floating point; approximate range �1038; smallest positive value about 10-38

int

Integer; approximate range �2 billion

money

Money

text(n)

Variable-length string of at most n characters; n ≤ 65,535

time

Time; output format HH:MM:SS; input format HH:MM:SS, HHMMSS, HHMM, or HH

varchar(n)

Variable-length string of at most n characters; n ≤ 255

year

Year; output format YYYY or YY; input format YYYY or YY; YY indicates years between 1970 and 2069

To eliminate the necessity of specifying a frequently used field value, we can specify a default value.  For example, because most individuals are Rh-positive, we can give the attribute RhFactor the default value of �+�, as follows:

create table CurrentBioStudy (

   subject   char(10) not null primary key,

   RhFactor  enum(�+�, �-�) default �+�,

   BloodType enum(�M�, �N�, �MN�)

);

 

 

Quick Review Question
Quick Review Question 2  Suppose we want to write a statement to create a table LabSection that has the following schema:

(CatalogNumber: string, SecYear: year, semester: {F, S, S1, S2}, section: char, DragExpWeek:integer)

Answer the following questions concerning this task:
a. What are the first three words of the statement?
b. Suppose CatalogNumber has a maximum length of 9. Using all lowercase, give the type of CatalogNumber.
c. Of SecYear
d. Of section
e. Of DragExpWeek
f. What is the first word used to indicate semester is an enumeration type.

g. Suppose the possible values of semester are F, S, S1, and S2. What encloses each value in the type declaration?
Apostrophes (')
Quotation marks (")
Square brackets ([ ])
Nothing


h. What encloses the list of these values?
Square brackets
Parentheses
Pointed brackets < >
Nothing


i. What separates these values?
Colons
Commas
Semicolons
Blanks

j. We indicate that CatalogNumber, SecYear, semester, and section must each have values by writing what two words (in lowercase) after each of their types?
k. What two words (in lowercase) do we use that represents the fact that these identifiers are underlined in the schema?
Write the entire statement, and click to view the answer.

 


Deleting Tables

To save space, we should remove tables (drop table) that we no longer use.  To drop a table CurrentBioStudy, we use the following statement

 drop table CurrentBioStudy;

With this statement, all references to the table elsewhere, such as in constraints and foreign keys, are dropped.  

Quick Review Question
Quick Review Question 3  Write a statement to remove table LabSection and all references to the table from the database. Use lowercase, except for the table name, and only one blank between words.

Alterations

To change the structure or name of a database or table, we employ the alter table statement.  For example, to change the column RhFactor of table CurrentBioStudy to include the not null modifier, we have the following:

alter table CurrentBioStudy

change RhFactor

    RhFactor enum('+', '-') not null;  

The last line includes a complete redeclaration of the column.  RhFactor appears twice because we have the option of changing the column�s name, say to factor, as follows:

alter table CurrentBioStudy

change RhFactor

      factor enum('+', '-') not null;  


To add the column temperature with type float to table PsyExp, we perform the add action, as follows:

 alter table PsyExp add

      temperature float;  

To give the default value of 1 to column EventCode of table PsyExp, we alter the column with a set default, as follows:

alter table PsyExp alter EventCode set default 1;

We can remove the column, temperature, with the drop action, as the following illustrates:

alter table PsyExp drop temperature;

Similarly, we can drop the default value for column EventCode of table PsyExp, as follows:

alter table PsyExp alter EventCode drop default;

To rename table PsyExp as PsychologyExp, we write:

alter table PsyExp rename as PsychologyExp;  

Quick Review Question
Quick Review Question 4  In the following questions, use lowercase, except for names, and only one blank between words. Complete each of the following statements.
a. In table weather, add column wind with type int.

______(a)______ weather ____(b)____
    wind ___(c)___;
b. In table weather, change column wind to have the name WindVelocity and the type float.

______(a)_______ weather
___(b)____ wind
    ___________(c)_____________;
c. Remove column WindVelocity from table weather.

____(a)_____ weather _____(b)______ WindVelocity;

Auto-Increment

We can establish an integer attribute as the primary key and have MySQL automatically increment the value with each table insertion.  The auto_increment modifier can apply to at most one field. For example, the team entity in the physics laboratory database (yyy link) (Figure 16, �Entity-Relationship  Model� module) has attributes TeamNumber and TeamGrade. TeamNumber is an integer primary key, and sequential integers form reasonable values.  Thus, we create the team table as follows:

create table team (

     TeamNumber int  not null  primary key  auto_increment,

      TeamGrade  float

);  

To use the auto_increment feature, we use null or 0 as the value for TeamNumber, as follows:

insert into team

values (null, 86.5);  

If this insertion is the first one, the computer makes the TeamNumber value 1.  On subsequent insertions, the value is one more than the largest TeamNumber.


Indices

The primary key, such as subject in table CurrentBioStudy above, is an example of an attribute that has an associated index.  Indexing was part of the early SQL standard but is not part later standards, such as SQL2.  We include indexing because MySQL does and because it can speed database searching significantly.  For example, if we plan to use BloodType from table CurrentBioStudy in queries, we can create an index BloodType_index for the attribute, as follows:

create index BloodType_index

on CurrentBioStudy (BloodType);  

We can also create an index at the same time we form a table, such as follows:

create table CurrentBioStudy (

    subject   char(10) not null primary key,

     RhFactor  enum(�+�, �-�),

    BloodType enum(�M�, �N�, �MN�) not null,

     index BloodType_index (BloodType)

);  

As the example illustrates, any indexed column must have the not null modifier to indicate that a value in such a field cannot be null.

If we plan to perform queries involving RhFactor and BloodType together, both of which must be not null, we can create an index for the combination by listing the attributes in parentheses, as follows:

create index RhBloodType_index

on CurrentBioStudy (RhFactor, BloodType);  

MySQL generates an index on RhFactor and BloodType together and on RhFactor by itself.

The naming of indices allows us to later drop them, such as

drop index RhBloodType_index;  

Indices create a time-space trade-off.  An index makes searches faster but consumes space.  If we know that it is unlikely that we will perform further searching on an indexed attribute, we can drop the index to free the space and to eliminate the time involved in indexing new tuples.

Quick Review Question
Quick Review Question 5 The primary key of LabSection from Quick Review Question 2 is the collection of attributes CatalogNumber, SecYear, semester, section. Check all indices that are automatically created by declaring this collection a primary key:
CatalogNumber, SecYear, semester, section
CatalogNumber, SecYear, semester
SecYear, semester, section
CatalogNumber, SecYear
SecYear, semester
semester, section
CatalogNumber
SecYear
semester
section
Quick Review Question
Quick Review Question 6  Suppose we also frequently search on the DragExpWeek column of table LabSection. Create an index for that column.

Exercises

1. Write statements to do each of the following tasks:
    a. Create a database called anthropology
    b. Create a discovery table with the following fields: ItemID (integer key), DiscoveryDate, longitude and latitude (floating point of location of discovery), condition (enumeration type with possible values of EXCELLENT, VERYGOOD, GOOD, POOR, and VERYPOOR) with default value GOOD, and carbon14 (floating point amount of Carbon-14). A Geiger counter measures the amount of radioactive carbon-14 in the sample. Living material has a disintegration rate of about 13.5 atoms per minute per gram of carbon. When the organism dies, the disintegration rate decreases, enabling us to date the material.
    c. Change the table to include column ItemDescription (a possibly long string).
    d. Add one row that does not specify condition so that the field uses the default value.
   
2. Write a statement to create the relation ChemExperiment, having 30 characters for SolventName and having the following structure:

ExpNumber

SoluteMass

SolventName

SolventMass

FP

          Click here to see the answer.


Projects

Develop a database for each of Projects 1-4 from module "Entity-Relationship Model", which was normalized and converted to a logical relational database model in module "Database Development". Keeping an online log of your work, perform a number of queries on the database.

1. Project 1 2. Project 2
3. Project 3 4. Project 4

Develop a database for each of Projects 5 and 6 from module "Relational Databases". Keeping an online log of your work, perform a number of queries on the database.

5. Project 1
6. Project 2  Click here for a data file.

7. Develop a psychology database as described in Quick Review Question 2 of module "Relational Databases". Keeping an online log of your work, perform a number of queries on the database. Click here for data files.

8. Develop a psychology database Alliston Reid's program. Keeping an online log of your work, perform a number of queries on the database. Click here for data files.

Copyright � 2002, Dr. Angela B. Shiflet
All rights reserved