Definition | A database is an integrated collection of files. |
Definition | There is data inconsistency between the files if the same data is stored in different formats in two files or if data must be matched between files. |
Definition | Redundant data is data that is duplicated between files. |
Definition | Data integrity is the assurance that data is correct and consistent--that the data correctly reflects the "real" world. |
Definition | Data security refers to data being protected so that only authorized personnel can access them. |
Definition | There is data dependence if the format of the data storage is dependent upon the application program. |
![]() |
In the 1940s, punch cards and paper tape stored files with data encoded using holes in the medium. Magnetic tape was introduced in the early 1950s as an external storage medium for the UNIVAC, the first commercially produced computer. Some people would not believe that information was recorded on the tape, because they could not see holes as they did on punched cards. One plant worker, who did not know about computer tape, received a delivery of tape he thought was no good because it had no adhesive back! |
Definition | A database management system (DBMS) is database software that contains two major functional parts, a data manipulation language (DML) or query language to access and update data in the database and a data definition language (DDL) to design and maintain the database. A schema is a description of the overall database structure, while a subschema holds the definition of part of the data described in the schema. |
The relational model is based on tables of data.
Each table has a name and a finite number of columns. Each column has a
name and a domain, or the possible values it can assume. Each row contains
values with the value in each column being in that column's domain.
The designer of a relational database establishes relations
or two-dimensional tables of data. Various attributes, which correspond
to the columns in a table, are like elements or fields in a C++ structure. The domain
of an attribute, as with the domain of a function in mathematics, is the set of
all possible values of that attribute. Thus, for a Social Security number (SSN)
attribute, the domain might be the set of all 9-digit character strings. A tuple
is similar to a row in a table or a particular value of a structure variable.
For example, suppose the relation PatientAddr has the structure
In the tuple, the address2 attribute has the special null to represent that no value exists. The value null is in every domain and indicates a missing or, as in this case, a nonexistant data value.
Definition | The relational database model is based on mathematical principles. A relation is a set of tuples, or a two-dimensional table of data. An attribute corresponds to a column in a relation. The domain of an attribute is the set of all possible values of that attribute. A tuple corresponds to a row in a relation. An attribute or set of attributes that uniquely identifies a particular entity is a key. The special value null is in every domain and indicates a missing or nonexistant data value. |
SSN | LastName | FirstName | address1 | address2 | city | state | zip | phone |
123456789 | Smith | Mary | 123 Baker Rd. | null | Atlanta | GA | 30342 | (404)555-2345 |
946573218 | Hodges | James | 847 El Dorado | null | Roswell | GA | 30327 | (510)555-9876 |
565324884 | Hall | William | Apt. 12 | 4 Elm St. | Spartanburg | SC | 29303 | (864)555-3838 |
657982253 | May | Ann | 654 Main St. | null | Richland | WA | 94550 | (912)555-8329 |
429348203 | Hodges | Jane | 847 El Dorado | null | Roswell | GA | 30327 | (510)555-9876 |
267392093 | Smith | Hondre | Apt. 3A | 342 Church St. | Atlanta | GA | 30320 | (404)555-4239 |
834295377 | Martinez | Louis | 672 High St. | null | Roswell | GA | 30327 | (510)555-7722 |
Suppose also for each drug study there is a relation
like StudyA05 for those in Study A05 indicating their SSN and
whether or not they are taking a placebo instead of the tested drug. The
structure of this relation might be
(SSN, placebo)
and the relation might be as in Figure 2.
Figure 2 The relation StudyA05
SSN | placebo |
946573218 | 'y' |
123456789 | 'n' |
657982253 | 'y' |
834295377 | 'n' |
insert into RoswellPatientAddr
select *
from PatientAddr
where city = 'Roswell'
The resulting relation is like that of Figure 3.
Figure 3 Relation RoswellPatientAddr,
created by selecting tuples from PatientAddr where city =
'Roswell'
SSN | LastName | FirstName | address1 | address2 | city | state | zip | phone |
946573218 | Hodges | James | 847 El Dorado | null | Roswell | GA | 30327 | (510)555-9876 |
429348203 | Hodges | Jane | 847 El Dorado | null | Roswell | GA | 30327 | (510)555-9876 |
834295377 | Martinez | Louis | 672 High St. | null | Roswell | GA | 30327 | (510)555-7722 |
insert into NamePhone
select FirstName, LastName, phone
from PatientAddr
Figure 4 displays the NamePhone relation. If we only want the
names and phone numbers of patients from Roswell, we would perform the
projection on RoswellPatientAddr instead of PatientAddr.
Figure 4 Relation NamePhone, created
with projecting FirstName, LastName,
and phone
from PatientAddr
LastName | FirstName | phone |
Smith | Mary | (404)555-2345 |
Hodges | James | (510)555-9876 |
Hall | William | (864)555-3838 |
May | Ann | (912)555-8389 |
Hodges | Jane | (510)555-9876 |
Smith | Hondre | (404)555-4239 |
Martinez | Louis | (510)555-7722 |
insert into AddrStudyA05
select *
from PatientAddr, StudyA05
where PatientAddr.SSN = StudyA05.SSN
Notice we specify to which the relation SSN belongs (PatientAddr
or StudyA05) by using a syntax similar to that of structures or objects,
which is the relation name followed by a dot and the attribute name (SSN).
Figure 5 shows the result of such an operation.
Figure 5 Relation AddrStudyA05
formed by joining PatientAddr and StudyA05 where PatientAddr.SSN
= StudyA05.SSN
SSN | LastName | FirstName | address1 | address2 | city | state | zip | phone | placebo |
123456789 | Smith | Mary | 123 Baker Rd. | null | Atlanta | GA | 30342 | (404)555-2345 | 'n' |
946573218 | Hodges | James | 847 El Dorado | null | Roswell | GA | 94583 | (510)555-9876 | 'y' |
657982253 | May | Ann | 654 Main St. | null | Richland | WA | 94550 | (912)555-8329 | 'y' |
834295377 | Martinez | Louis | 672 High St. | null | Roswell | GA | 30327 | (510)555-7722 | 'n' |
With its powerful ability to restructure the data in any number of ways and its relative ease of use, relational databases have become very popular. Some examples of such databases are DB2 for IBM mainframe computers, Oracle and Sybase for UNIX systems, Access for Windows, and FileMaker for Macintosh and Windows.
1. Create a row in the PatientAddr
table for yourself and a row for a friend of yours.
2.
Suppose for this semester there is a relation, Course, storing the
courses each patient takes with an attribute structure of (SSN, CourseNumber).
Give part of this relation, containing information for you and a friend.
Note that your SSN will appear several times, but the course numbers
will be different.
3. For the relation PatientAddr,
sketch the resulting table that is formed as a result of each of the following
commands:
a. insert into A
select *
from Study A05
where placebo = 'n'
b. insert into B
select SSN, LastName, FirstName
from PatientAddr
c. insert into C
select *
from A, B
where A.SSN = B.SSN
4. How is data integrity preserved in a relational database?
5. How is data consistency preserved in a relational database?
6. How is data independence preserved in a relational database?
In the following exercises, write relational database commands to create the relations.
7. A relation with the FirstName, LastName and state
columns from
PatientAddr.
8. A relation with the records from StudyA05
of those taking a placebo.
9. A relation with the SSN and LastName
attributes from PatientAddr and the placebo
attribute from StudyA05.
1. How would you represent a table its data values in a program?
2. Design an algorithm to select a subset of the set of rows in a
table of data, which is in a file. The algorithm should involve
traditional file processing to emulate the select operation of
relational databases. We employed the select operation to create RoswellPatientAddr
of Figure 3.
3. Design an algorithm to project by selecting a subset of the set of columns in a table of data, which is in a file. The algorithm should involve traditional file processing to emulate the project operation of relational databases. We employed the project operation to create NamePhone of Figure 4.
4. Design an algorithm to join the data in two tables of data, which are in separate files. The algorithm should involve traditional file processing to emulate the join operation of relational databases. We employed the join operation to create AddrStudyA05 of Figure 5.
5. a. The file star.dat contains data on stars. Click here to obtain a description of the file format. Develop a program to read the data from star.dat into an array. Repeatedly present the user with a menu to perform the select or project operations on the data or to quit. If the user chooses to select or project, display a menu of subfields. For the select operation, have the user pick a subfield and then type a value for that subfield. Display all records that have the indicated subfield value. If the user wishes to project the data, display a user-selected subfield for all records.
b. Revise the program from Part a to allow the user to perform the select or project operation on more than one subfield. For the select operation, repeatedly allow the user to indicate a subfield and value for that subfield. Display the records that have all the indicated subfield values. Also, for the project operation, display one or more chosen subfields for all records.
Copyright © 2002, Dr. Angela B. Shiflet
All rights reserved