Relational Databases

In the module on "Database Introduction," we gave an overview of relational databases. In the current module, we develop this data model more carefully and include its mathematical basis and notation. For clarity, we will repeat and expand some definitions.

The mathematical structure of relational databases results in great flexibility than some other database models by allowing queries that the database designers did not originally envision. At the heart and even the name of relational databases is the concept of relation, which depends on the concepts of ordered tuples and Cartesian product. Some mathematics course use the concept of ordered pairs of numbers: When asked to graph y = x² over the set of real numbers, Ā, we plot S = {(x, y) | y = x², x Ā}, the set of all ordered pairs of real numbers where the second coordinate is the square of the first. The ordered pair (3, 9) S, because 3² = 9; but (9, 3) Ļ S, because 9² 3. S is a subset of the Cartesian product Ā ´ Ā, which is the set of all ordered pairs of elements where the first and second elements are from Ā.

The first and second sets of the cross product do not have to be the same. Suppose two students are picked at random as subjects for a science class to examine various characteristics. The set P = {SubjectA, SubjectB} holds the designations for the subjects. One characteristic to examine is whether a person is Rh⁺ or Rh^-. We define the set Rh = {+, -}. An ordered pair consisting of the person's designation followed by + or - indicates the subject and his or her Rh type. A listing of all possible pairs is the Cartesian product of P and Rh:

P´Rh = {(SubjectA, +), (SubjectA, -), (SubjectB, +), (SubjectB, -)}.

We can take the Cartesian product of several sets. For example, another characteristic is the blood type of M, N, or MN. We let T = {M, N, MN}. The Cartesian product of P, Rh, and T, P´Rh´T, has (2)(2)(3) = 12 elements and is

P´Rh´T =

{(SubjectA, +, M), (SubjectA, +, N), (SubjectA, +, MN),
(SubjectA, -, M), (SubjectA, -, N), (SubjectA, -, MN),
(SubjectB, +, M), (SubjectB, +, N), (SubjectB, +, MN),
(SubjectB, -, M), (SubjectB, -, N), (SubjectB, -, MN)}.

Definition

Let A₁, A₂, A₃,..., A_n be sets. The Cartesian or cross product of these sets is

That is, the Cartesian product of sets is the set of all ordered n-tuples or tuples where each coordinate comes from the corresponding set.

The Cartesian product and the Cartesian coordinate system are named after René Descartes, who invented analytical geometry in the 17^th century. Descartes, of fragile health, was fond of spending most of the morning in bed, resting and thinking. This habit of a lifetime was disrupted when he was invited by the Queen of Sweden to Stockholm. Her demands to work with him in the early morning are said to have contributed to his premature death within a few months from pneumonia.

In mathematics or relational databases, a relation is just a subset of the Cartesian product of sets; it has some, but not necessarily all, the elements in the Cartesian product. Thus, {(x, y) | y = x², x

Ā} is a relation on Ā ´ Ā, and this set indicates how y is related to x. The set CurrentBioStudy = {(SubjectB, -, N), (SubjectA, +, M)}, giving specific relationship among the people, their Rh factor, and their blood types, is a subset of the Cartesian product of P, Rh, and T, P´Rh´T, and thus is a relation on P´Rh´T. This relation has degree 3 because the relation is a subset of the Cartesian product of 3 sets.

Definition

Let A₁, A₂, A₃,..., A_n be sets. A relation on

is a subset of

. The degree of this relation is n.

We can picture a relation as a two-dimensional table, such as follows:

*CurrentBioStudy*
*subject*	*RhFactor*	*BloodType*
SubjectB	+	N
SubjectA	+	M

A tuple, such as (SubjectB, -, N), corresponds to a row of the table. A column heading, such as RhFactor, is an attribute; and the data type or the set of all possible values of a possible values for an attribute, such as {+, -} for RhFactor, is the domain of the attribute. Moreover, the values in a domain must be atomic or indivisible with respect to the model. Thus, a cell in a table contains a single value, such as +, that we do not subdivide. In terms of an ER diagram, a tuple is an entity with three attributes, as the following illustrates:

Definition

An attribute corresponds to a column in a relation. A value is atomic if it is indivisible. The domain of an attribute is the set of all possible values of that attribute; each value must be atomic.

A relation, such as CurrentBioStudy, is a particular instance of a relation schema, which we might call BioStudy_schema. For a relation or table, the relation schema describes the column headings for a table. The schema (such as BioStudy_schema) is a list of attributes (such as subject, RhFactor, and BloodType) with their corresponding domains (such as string, {-, +}, and {M, N, MN}). (xxx should types be char and string instead of an enum type?) We can denote the relation schema BioStudy_schema as the name with the attributes and their domains in parentheses, as follows:

BioStudy_schema = (subject:string, RhFactor:{+, -}, BloodType:{M, N, MN})
For the tuple t = (SubjectB, +, N) in relation CurrentBioStudy, "SubjectB" is the value corresponding to the subject attribute; the value of the RhFactor attribute is +; and "N" is the instance of the BloodType attribute.

Definition

A relation schema S is a list of attributes, A₁, A₂, A₃,..., A_n, with their corresponding domains, D₁, D₂, D₃,..., D_n, written as

S = (A₁:D₁, A₂: D₂, A₃: D₃,É, A_n: D_n)

Quick Review Questions

A psychology relational database stores data from a number of experiments. One conditioning experiment with rats involves two-lever boxes. The following is a list of possible events and their event codes:

Code	Event
1	Left press
2	Right press
3	Turn on all panel lights
4	Turn off all panel lights
5	Pellet delivery
6	Retract right lever
7	Insert right lever
8	Pulse tone on
10	End of session

The computer monitors the these events along with the times of occurrence Data from a session might be as follows:

		2     0.10
 		2     1.48
		5     1.48
		2     3.57
		1     4.72
		5     4.72
		.
 		.
		.
		2  1334.25
		2  1335.40
		1  1336.72
		1  1337.10
		5  1337.10
		10  1337.71

Each rat has its own integer identification number, such as 23.

Quick Review Question 1 Complete the relation...

PsyExp_schema = (EventNumber: a , time: b , SubjectID: c )

A
integer	int	float
double	real	string
bool	boolean	struct

B
integer	int	float
double	real	string
bool	boolean	struct

C
integer	int	float
double	real	string
bool	boolean	struct

Quick Review Question 2 Give a schema, PsyEventCode_schema, for relations that store events (event) along with their codes (code). Have exactly one blank before and after = and after each comma.

Properties of Relations
The following is a list of properties of relations:

A value of an attribute is atomic.
Attribute names in a relation are distinct.
The order of tuples in a relation is irrelevant. Thus, because CurrentBioStudy is a set, we have the same relation if (SubjectB, -, N) is the first row of the table CurrentBioStudy and (SubjectA, +, M) is the second row, or vice versa.
Tuples are distinct. Thus, a table cannot have two identical rows.

In the relation schema BioStudy_schema, each human subject has a unique designation, or unique value of subject. We say that the attribute subject is a superkey. A superkey is an attribute or set of attributes that uniquely identifies a particular tuple in a relation.

We could use {subject, RhFactor}, {subject, BloodType}, or even {subject, RhFactor, BloodType} as a superkey because the values of the attributes uniquely identifies a tuple. When several attributes make up the superkey, we call the superkey a composite key. As another example, suppose instead of using generic titles, such as SubjectA, we use the subjectÕs first, middle, and last names. Assuming no two people have the same name, the set of attributes, {FirstName, MiddleName, LastName}, forms a composite key. In this case, this superkey is a candidate key because no proper subset is a superkey. In the earlier version, {subject, RhFactor} is not a candidate key because the proper subset {subject} is a key. A candidate key has the properties of uniqueness and irreducibility. The key uniquely identifies a tuple, and no proper subset uniquely identifies a tuple.

A relation can have more than one candidate key. For example, our relation could also have attributes for the subject as well as the name. The attribute subject and the set of attributes {FirstName, MiddleName, LastName} are both candidate keys. If we use subject to uniquely identify tuples, then subject becomes the primary key or key. In fact, we would be wise to pick SSN instead of {FirstName, MiddleName, LastName} because of the possibility that for some class two people could have the same name. In a relation schema, we underline the primary key, such as follows:

(subject, FirstName, MiddleName, LastName, RhFactor, BloodType)

Because we use the primary key to identify tuples, clearly the primary key cannot have a value of null. We call this requirement the entity integrity constraint.

Definition

A superkey is an attribute or set of attributes that uniquely identifies a particular tuple in a relation. A candidate key is a superkey in which no proper subset is a superkey. The primary key or key is the candidate key that we choose to uniquely identify tuples. The entity integrity constraint requires that no primary key value be null.

Quick Review Questions

Quick Review Quesion 3 To revise the schema PsyExp_schema = (EventNumber:integer, time:float, SubjectID:integer) of QRQ 1 to indicate that time is a primary key, we do which of the following?:

boldface time	boldface time:float	underline time
underline time:float	change time to TIME

Quick Review Quesion 4 The primary key(s) of schema PsyEventCode_schema = (code:integer, event:string) of QRQ 2 is (are):

code

event

Quick Review Quesion 5 Check all possible superkeys of PsyExp_schema = (EventNumber:integer, time:float, SubjectID:integer).

time	EventNumber	SubjectID
{time, SubjectId}	{time, EventNumber}	{EventNumber, SubjectId}
{EventNumber, time, SubjectId}

Example 1 (Idea for Example from Teacher's Manual to Accompany Laboratory Manual General Chemistry Principles and Structure, Jo A. Beran and James E. Brady, John Wiley & Sons, Inc., New York, 1978, p. 76)

The molecular weight of a non-volatile nonelectrolyte can be determined by dissolving this solute in an organic solvent, such as benzene, which has a freezing point of f.p. = 5.5^oC and a freezing point depression constant of Kf = 5.12^oC kg/mole. (The freezing point depression constant for a solvent is how much the freezing point of a solution is lowered when one kg of solvent is dissolved in one mole of the solute.) We record temperature versus time for the solution as it cools. The temperature when the graph levels off is the freezing point. For example, suppose we dissolve a 13.5-gram sample of nonelectrolyte in 100 grams of benzene, and the solution freezes at 2.95^oC. The molecular weight of the nonelectrolyte is 234 g/mole, as the following shows:

Suppose a database stores experimental data and known molecular weights for a number of compounds. One relation, ChemKnowns, stores the names of compounds (KnownCompound) and their floating point molecular weights (MolecularWeight). Consequently, the relation schema for ChemKnowns is

ChemKnowns_schema = (KnownCompound:string, MolecularWeight:float)
Because each compound has a unique name, the key for the ChemKnowns relation is the attribute KnownCompound. A partial listing of the relation is follows:

ChemKnowns

*KnownCompound*	*MolecularWeight*
Anthracene	178.23
Benzene	78.11
Benzil	210.23
Benzophenone	182.22
Biphenyl	154.21
p-Dichlorobenzene	147.00

Although the tuples appear with the key in alphabetical order, there is no requirement that they be in such an order.

Another relation, ChemSolvents, stores the names of possible organic solvents (SolventName) along with each freezing point (FreezingPoint) and freezing point depression constant (Kf). The relation schema follows:

ChemSolvents_schema = (SolventName: string, FreezingPoint: float, Kf:float)
Again, the name attribute, SolventName, is the key for the relation. A partial list of this relation follows:

ChemSolvents

*SolventName*	*FreezingPoint*	Kf
Benzene	5.5	5.12
Cyclohezane	6.5	20.2
p-Dichlorobenzene	53	7.1
Glacial acetic acid	17	3.9

Although the solvent name "Glacial acetic acid" has three words, the value is atomic. In the relation, we do not split the value into three parts.

The experiments have distinct identification numbers, such as 10. To determine the molecular weight of an unknown solute, we record the mass of the solute in grams, solvent name and its mass in kg, and the freezing point of the solution. We have a relation, ChemExperiments, for this data with schema

ChemExperiments_schema = (ExpNumber:integer, SoluteMass:float, SolventName:string, SolventMass: float, FP: float)
Notice that SolventName is the key in the relation ChemSolvents. In this case, we call SolventName a foreign key in the relation ChemExperiments. Even if the attribute names had not been identical, they would match. For example, we could refer to CompoundName in ChemSolvents and SolventName in ChemExperiments, but SolventName would still be a foreign key.

By the referential integrity constraint, any value (other than null) for a foreign key must appear as a candidate key value in the other relation. A tuple in one relation refers to a tuple, which must exist, in another relation. We indicate the referential integrity constraint on the database schema by drawing an arrow from the foreign key to the attribute in the relation that it references, such as follows:

ChemKnowns_schema

KnownCompound

MolecularWeight

ChemSolvents_schema

SolventName

FreezingPoint

ChemExperiments_schema

ExpNumber

SoluteMass

SolventName

SolventMass

Later, we will use such foreign keys to make connections among the data. We will return to this example as we develop additional database concepts.

Definition

A foreign key is an attribute or set of attributes in a relation that matches a candidate key in a relation. The referential integrity constraint states if relation B has a foreign key k from relation A, then each value of k in B (other than null) must appear as a value for the corresponding candidate key in some tuple of A.

Quick Review Question

Quick Review Quesion 6 Check any foreign key in PsyExp_schema = (EventNumber:integer, time:float, SubjectID:integer) of QRQ 3 and that matches a key in PsyEventCode_schema = (code:integer, event:string) of QRQ 4.

EventNumber	time	SubjectID
EventNumber and SubjectId,	None of the above

Selection Operation
One of the most common operations on a relation is to select a subset of the tuples or rows. This selection operation, symbolized with the lowercase Greek letter s (s), employs a selection condition or criterion for determining which tuples to choose. The general notation for the operation is s_{selection condition}(relation). For example, to retrieve the molecular weight of Benzene, we select the tuple from the ChemKnowns relation that has a KnownCompound value of "Benzene". We can write the operation mathematically as s_{KnownCompound
= Benzene}(ChemKnowns). The operation will return the tuple (Benzene, 78.11).

Definition

The selection operation s_{selection condition}(relation) returns a subset of the tuples from the relation that satisfy the selection condition.

Quick Review Question

Projection Operation
Instead of picking a subset of the rows, we often need to select a subset of columns or attribute values. To do so, we use the projection operation, which we symbolize using the lowercase Greek letter p. For example, to obtain a list of the compounds whose molecular weight is in the chemistry database, we have the notation p_{KnownCompound}(ChemKnowns). The operation returns a relation with only the KnownCompound attribute column, as follows:

p_{KnownCompound}(ChemKnowns)

KnownCompound

Anthracene

Benzene

Benzil

Benzophenone

Biphenyl

p-Dichlorobenzene

Suppose we want a table of the solvent names and their freezing points, such as follows:

*SolventName*	*FreezingPoint*
Benzene	5.5
Cyclohezane	6.5
p-Dichlorobenzene	53
Glacial acetic acid	17

In this case, we list both attributes in the subscript of p and have the relation name ChemSolvents in parentheses: p_{SolventName,FreezingPoint}(ChemSolvents). If we choose to name this new table SolventFP, we can employ the following left-arrow notation:

SolventFP ¬ p_{SolventName,FreezingPoint}(ChemSolvents)

Quick Review Questions

Union Operations
We can perform set operations, such as union, intersection, difference, and Cartesian product, on relations. For the first three operations, we need relations with tuples that are of the same type, in which case the relations are union compatible. To be union compatible, the relations must have the same number of attributes and corresponding domains must be the same. However, corresponding attribute names do not have to be identical.

Definition

Two relations R and S are union compatible if they have the same degree n and corresponding attributes have the same domain.

The relations ChemKnowns and ChemSolvents are not union compatible because the former has two attributes, while the later has three. However, the projections of these relations to their first attributes with domains of string are union compatible. We call these new relations, which follow, KnownNamesTable and SolventNamesTable.
KnownNamesTable ¬ p_{KnownCompound}(ChemKnowns)
SolventNamesTable ¬ p_SolventName(ChemSolvents

KnownNamesTable SolventNamesTable

KnownCompound SolventName

Anthracene Benzene

Benzene Cyclohezane

Benzil p-Dichlorobenzene

Benzophenone Glacial acetic acid

Biphenyl

p-Dichlorobenzene

The union, , of two union-compatible relations is a relation consisting of all the tuples from the relations with duplicates omitted. For example, to obtain the names of all compounds, knowns and solvents, in the (yyy link) chemical database, we perform the union of KnownNamesTable and SolventNamesTable, KnownNamesTable SolventNamesTable, which yields the following relation:

KnownNamesTable SolventNamesTable

Anthracene

Benzene

Benzil

Benzophenone

Biphenyl

p-Dichlorobenzene

Cyclohezane

Glacial acetic acid

Notice that Benzene and p-Dichlorobenzene each appear in both tables but are present only once in the union because duplicate tuples do not appear.

Definition The union of two union-compatible relations R and S, written R S, is a relation consisting of all the tuples from the relations with duplicates omitted.

Quick Review Questions

Quick Review Quesion 10  Perform projections on relations ChemKnowns and ChemExperiments using the maximum number of attributes so that the resulting relations are assignment compatible. Explain why it is not reasonable to take the union of these tables although the results are union compatible.

Quick Review Quesion 11   Suppose relations study5 and study8 have schemas BioStudy_schema = (subject:string, RhFactor:{+, -}, BloodType:{M, N, MN}). Suppose study5 has 15 tuples and study8 has 10.

a. How many attributes are in study5 study8?

b. Give the least number of tuples in this union.

c. Give the most number of tuples in this union.

Intersection Operation
Suppose we want a list of solvents for which we have the molecular weight information that appears in ChemKnowns. In this case, we perform the intersection of KnownNamesTable and SolventNamesTable, KnownNamesTable SolventNamesTable, which yields the following relation:

KnownNamesTable SolventNamesTable

Benzene

p-Dichlorobenzene

Definition The intersection of two union-compatible relations R and S, written R S, is a relation consisting of only the tuples that appear in both relations.

Quick Review Questions

Quick Review Question 12  For relations study5 and study8 of QRQ 11, give the least and most number of tuples in study5 study8.

Difference Operation
Perhaps, we need to know the solvents for which the database does not store their molecular weights. In this case, we start with the SolventNamesTable and take out the solvent names that also appear in the KnownNamesTable. This set difference operation, KnownNamesTable - SolventNamesTable, returns the following relation that does not contain Benzene or p-Dichlorobenzene:

KnownNamesTable - SolventNamesTable

Anthracene

Benzil

Benzophenone

Biphenyl

Definition The difference of two union-compatible relations R and S, written R - S, is a relation consisting of the tuples that appear in R but not in S.

Quick Review Question

Quick Review Quesion 13  For relations study5 and study8 of QRQ 11, give the least and most number of tuples in study5 - study8.

Cartesian Product Operation
The set operation of Cartesian product does not require that the operands be union compatible. For example, the relation ChemSolvents has three attributes (SolventName, FreezingPoint, and Kf), and the relation ChemKnowns has two (KnownCompound and MolecularWeight). The Cartesian product of ChemSolvents and ChemKnowns, ChemSolvents ChemKnowns, has all five attributes. Every one of the 4 tuples in ChemSolvents is combined with every one of the 6 tuples of ChemKnowns to form a relation with 24 tuples. This new relation, CrossTable, follows in part:

CrossTable <-- ChemSolvents ChemKnowns

SolventName FreezingPoint Kf KnownCompound MolecularWeight

Benzene 5.5 5.12 Anthracene 178.23

Benzene 5.5 5.12 Benzene 78.11

Benzene 5.5 5.12 Benzil 210.23

Benzene 5.5 5.12 Benzophenone 182.22

Benzene 5.5 5.12 Biphenyl 154.21

Benzene 5.5 5.12 p-Dichlorobenzene 147.00

Cyclohezane 6.5 20.2 Anthracene 178.23

Cyclohezane 6.5 20.2 Benzene 78.11

.
.
. .
.
. .
.
. .
.
. .
.
.

Definition The Cartesian product of two relations R and S, written R S, is a relation consisting of the tuples with attributes from R and S, where each tuple of R is paired with every tuple of S.

However, we probably are only interested in the subset or selection of the tuples in the CrossTable relation where the SolventName and KnownCompound agree. We store these in a relation KnownSolventTable, as follows:

SelectCrossTable <-- s_{SolventName
= KnownCompound}(CrossTable)

SolventName FreezingPoint Kf KnownCompound MolecularWeight

Benzene 5.5 5.12 Benzene 78.11

p-Dichlorobenzene 53 7.1 p-Dichlorobenzene 147.00

To eliminate the duplicate columns, we perform a projection with all of the attributes except KnownCompound, as follows:

KnownSolventTable <-- p_{SolventName,
FreezingPoint, Kf, MolecularWeight}(SelectCrossTable)

SolventName FreezingPoint Kf MolecularWeight

Benzene 5.5 5.12 78.11

p-Dichlorobenzene 53 7.1 147.00

This process of performing the Cartesian product, selecting tuples that match in a particular attribute, and projecting to eliminate duplicate columns is so common that another relational database operation, join, accomplishes the result in one step.

Quick Review Question

Quick Review Quesion 14

a. How many attributes are in the Cartesian product of ChemSolvents, which has 3 attributes, and ChemExperiments, which has 5 attributes?

b. Suppose ChemExperiments has 30 tuples, and ChemSolvents has 6. How many tuples are in the Cartesian product of ChemSolvents and ChemExperiments?

Join Operation
We just used the Cartesian product and selection to obtain a table SelectCrossTable of related tuples from two relations. Because of its utility, such a sequence of operations is combined into a single operation, join. The join of relations R and S, written R|X|_conditionS, is the set of tuples from the Cartesian product of R and S that satisfy condition. The condition compares an attribute of R with an attribute of S using one of the relational operators (=, , <, ², >, ³). Thus, to generate the relation SelectCrossTable, we perform the join operation

SelectCrossTable <-- ChemSolvents|X|_{SolventName
= KnownCompound}ChemKnowns
The join where the condition is a test of equality is used so often that we give it a special name, equijoin. As with the Cartesian product followed by selection, the equijoin generates a table with two identical columns. We eliminated this duplication above by projecting the table on all attributes except the repeated one(s). Equijoin with elimination of duplication is so common that we give the operation a special name, natural join, and notation, *. Thus, to obtain KnownSolventTable from ChemSolvents and ChemKnowns, we can perform the natural join, as follows:

KnownSolventTable ¬ChemSolvents *_{SolventName
= KnownCompound} ChemKnowns

Definition The join of relations R and S, written R |X|_condition S, is the set of tuples from the Cartesian product of R and S that satisfy condition. The condition compares an attribute of R with an attribute of S using one of the relational operators (=, , <, ², >, ³). An equijoin of relations is a join where condition is a test of equality. A natural join of R and S, written R *_{RA = SA} S, is the projection of R |X|_{RA = SA} S to eliminate duplicate attributes.

Quick Review Question

Quick Review Quesion 15   From the psychology database, relation EventCode is the table of Quick Review Question 1 and relation.

session

EventNumber time SubjectID

2 1.22 3

5 2.71 3

1 3.02 3

2 5.66 3

5 5.91 3

2 7.55 3

10 8.00 3

a. Give the relation session |X|_{EventNumber=code} EventCode.

b. Give the relation session *_{EventNumber=code} EventCode.

Exercises
1. Consider the psychology relational database of QRQ 1 and QRQ 15. Give the relational operation and the resulting relation for each of the following queries.

a. Retrieve the meaning of code 4.
b. For relation session, obtain a list of all event codes.
c. For session, list all events that occurred.
d. Give a table of all the tuples in session where the event occurred before time 4.00.
2. Consider the biology relational database with relation schema BioStudy_schema and relation CurrentBioStudy from the beginning of this module. Suppose schema details_schema is as follows:

details_schema = (DetailSubject:string, sex:{F, M})
Relation CurrentDetails follows:

DetailSubject Sex

SubjectA M

SubjectB F

SubjectC F

SubjectD M

Give the relational operation and the resulting relation for each of the following queries.

a. Generate a relation with the information from CurrentBioStudy along with the sex of the subject.
b. Obtain a list of Rh^- female subjects in the current study.

Suppose PreviousBioStudy has schema BioStudy_schema and is the following:

Subject RhFactor BloodType

SubjectD + MN

SubjectC - M

SubjectB + N

SubjectE + N

c. Take the union of PreviousBioStudy and CurrentBioStudy.
d. Take the intersection of PreviousBioStudy and CurrentBioStudy.
e. Take the difference PreviousBioStudy - CurrentBioStudy.
f. Take the difference CurrentBioStudy - PreviousBioStudy.

3. Consider the chemical relational database of the Example in this module. Give the relational operation and the resulting relation for each of the following queries.

a. Obtain the molecular weight of Anthracene.
b. Return a relation with the information of ChemExperiments along with the freezing point and K_f of the solvent.

4. Suppose relations exp25 and exp3 have schemas PsyExp_schema in the psychology database of QRQ 1. An abbreviated listing of the relations follows:

exp25 exp3

EventNumber time SubjectID EventNumber time SubjectID

2 .10 25 2 1.22 3

2 1.48 25 5 2.71 3

5 1.48 25 1 3.02 3

2 3.57 25 2 5.66 3

1 4.72 25 5 5.91 3

5 4.72 25 2 7.55 3

2 6.81 25 10 8.00 3

1 7.63 25

10 8.00 25

a. How many attributes are in exp25 exp3?
b. How many tuples are in this union?
c. Give the intersection.
d. Give exp25 - exp3.
e. Give exp3 - exp25.
f. How many attributes are in the Cartesian product?
g. How many tuples are in the Cartesian product?

Projects

1. Draw a relational database schema for the physics database in the Entity Relation Module.

2. Below is a description of a file that holds star data. (The description and data are from Dr. Daniel Welch, Wofford College.) Give a relational database schema for the data.

The first few data items are as follows:

AND,3,f|S|K0,23:04:11,50:03,4.64,2000
AND,51,f|S|K3,01:37:59,48:37,3.57,2000
AND,58,f|S|A5,02:08:29,37:51,4.8,2000
AND,60,f|S|K4,02:13:13,44:14,4.83,2000
AND,65,f|S|K4,02:25:37,50:17,4.72,2000
AND,7,f|S|F0,23:12:33,49:25,4.52,2000
AND,8,f|D|M2,23:17:44,49:01,4.83,2000
AND,Alpheratz,f|D|B8,00:08:23,29:05,2.07,2000
AND,Chi,f|S|G8,01:39:21,44:24,4.98,2000
AND,Delta,f|D|K3,00:39:20,30:52,3.25,2000

A description of the data follows:

*
* Each major category begins with a line starting with # for easy searching.
*
* Each object occupies one line. Fields are separated with commas. Some fields
* are further subdivided into subfields with vertical bars (|). Lines
* beginning with anything other than a-z, A-Z or 0-9 are ignored and may be
* used for comments. Objects may be in any order.
*
* all date fields may be in either of two forms:
* 1) month/day/year, where day may contain a trailing decimal portion.
* examples: 1/1/1993 and 1/1.234/1993
* 2) a real-number, such as 1993.123.
*
* The first three fields are always Constellation's Name, Object's Name, and Type.
* Remaining fields depend on the first entry within the Type field.
*
* The first field abbreviates the constellation name, such as AND for Andromeda.
* The second field is the star name within the constellation.
*
* The Type field always starts with a single letter designating the form of
* the object's motion:
* f: fixed (no proper motion)
* e: heliocentric elliptical orbit
* h: heliocentric hyperbolic orbit
* p: heliocentric parabolic orbit
*
* if "Type" is fixed, an object class code may follow in the next subfield:
* C: Cluster, globular
* U: Cluster, with nebulosity
* O: Cluster, open
* G: Galaxy, spiral
* H: Galaxy, spherical
* A: Cluster of galaxies
* N: Nebula, bright
* F: Nebula, diffuse
* K: Nebula, dark
* P: Nebula, planetary
* Q: Quasar
* T: stellar object
* B: Star, binary
* D: Star, double
* M: Star, multiple
* S: Star
* V: Star, variable
*
* if "Class" is one of T, B, D, S or V, the spectral class and possibly the
* numerical subclass designation may follow in the next subfield:
* O, B, A, F, G, K, M, N, C, S
*
* Following Type, the remaining fields are defined as follows:
*
* elliptical format (e < 1):
* i = inclination, degrees
* O = longitude of ascending node, degrees
* o = argument of perihelion, degrees
* a = mean distance (aka semi-major axis), AU
* n = mean daily motion, degrees per day (computed from a**3/2 if omitted)
* e = eccentricity,
* M = mean anomaly (ie, degrees from perihelion),
* E = epoch date (ie, time of M),
* D = the equinox year (ie, time of i/O/o).
* g/k or H/G = magnitude model; select which by preceding the first field
* with either a "g" or an "H"; H/G is the default if neither is given.
* s = angular size at 1 AU, arc seconds, optional
*
* hyperbolic format (e > 1):
* T = epoch of perihelion
* i = inclination, degrees
* O = longitude of ascending node, degrees
* o = argument of perihelion, degrees
* e = eccentricity,
* q = perihelion distance, AU
* D = the equinox year (ie, time of i/O/o).
* g/k = magnitude model
* s = angular size at 1 AU, arc seconds, optional
*
* parabolic format (e == 1):
* T = epoch of perihelion
* i = inclination, degrees
* o = argument of perihelion, degrees
* q = perihelion distance, AU
* O = longitude of ascending node, degrees
* D = the equinox year (ie, time of i/O/o).
* g/k = magnitude model
* s = angular size at 1 AU, arc seconds, optional
*
* fixed format:
* ra, hours
* dec, degrees
* magnitude
* reference epoch
* s = angular size, arc minutes, optional

Copyright © 2002, Dr. Angela B. Shiflet
All rights reserved

*KnownNamesTable*	*SolventNamesTable*
*KnownCompound*	*SolventName*
Anthracene	Benzene
Benzene	Cyclohezane
Benzil	p-Dichlorobenzene
Benzophenone	Glacial acetic acid
Biphenyl
p-Dichlorobenzene

SolventName	FreezingPoint	Kf	KnownCompound	MolecularWeight
Benzene	5.5	5.12	Anthracene	178.23
Benzene	5.5	5.12	Benzene	78.11
Benzene	5.5	5.12	Benzil	210.23
Benzene	5.5	5.12	Benzophenone	182.22
Benzene	5.5	5.12	Biphenyl	154.21
Benzene	5.5	5.12	p-Dichlorobenzene	147.00
Cyclohezane	6.5	20.2	Anthracene	178.23
Cyclohezane	6.5	20.2	Benzene	78.11
. . .	. . .	. . .	. . .	. . .

Subject	RhFactor	BloodType
SubjectD	+	MN
SubjectC	-	M
SubjectB	+	N
SubjectE	+	N

	exp25			exp3
EventNumber	time	SubjectID	EventNumber	time	SubjectID
2	.10	25	2	1.22	3
2	1.48	25	5	2.71	3
5	1.48	25	1	3.02	3
2	3.57	25	2	5.66	3
1	4.72	25	5	5.91	3
5	4.72	25	2	7.55	3
2	6.81	25	10	8.00	3
1	7.63	25
10	8.00	25