Genomic Data


References

The following references, which are the sources for most of the current module's figures, provide a tutorial on protein structures and an excellent hyperlinked glossary of genetic terms.

· "Protein Structure Basics" by Bernhard Rupp, UCRL-MI-125269, Lawrence Livermore National Laboratory, 2000, http://ruppweb.dyndns.org/

· "Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.nhgri.nih.gov/DIR/VIP/Glossary


Introduction

In the module "Computational Science and Web-Accessed Databases," we introduced a "Genomic Example" involving web-accessed genomic databases. The current module provides the biological background for understanding such databases and a discussion of their characteristics. The next module on "Genomic Sequence Comparison" presents the dynamic programming technique for measuring the similarity of two DNA sequences, and the module "Searching Genomic Databases" considers algorithms for discovering regions of sequence alignment. Thus, with our knowledge HTML, databases, and CGI programming, we will be able to develop form-based web pages and CGI programs for interfacing between web pages and genomic databases, for processing DNA sequences, and for returning results to the user. Such applications fall in the domain of bioinformatics.

A newly developing area of computational science, called bioinformatics, deals with the organization of biological data, such as in databases, and the analysis of such data. Recently, enormous strides have been made in genetics, due in part to the power of bioinformatics, as the following examples illustrate:

Geneticists in Sweden and with Merck Research Laboratories in West Point, Pennsylvania, studied families with a high occurrence of macular degeneration, which causes loss of vision in old age. "Once the combined team got to within 800,000 bases of it [a section of the chromosome], the researchers searched computer databases for potential genes in that target region. Both labs then looked for mutations in those genes in family members with the disease but not in people with normal vision." One of the groups "found that the mutations consistently appeared" in a particular gene. The researchers are now investigating how mutations in this gene initiate the pathology. ["New Gene Found for Inherited macular Degeneration", Human Genetics News Focus column by Elizabeth Pennisi, v. 281, 3 July 1998, Science, p. 31]

A microbial ribosomal DNA (rDNA) sequence database is particularly valuable because biologists have not been able to culture many bacteria and viruses outside the body. By extracting the rDNA and comparing its genetic makeup with data from the database, scientists have been able to determine organisms that cause Crohn's disease, which is an inflammatory bowel disorder, and several other diseases ["The Search for Unrecognized Pathogens" by David A. Relman, v. 284, 21 May 1999, Science, pp. 1308-1310].

National Center for Biotechnical Infomation (NCBI) web site's Educational page describes "What is bioinformatics?" and "Why use bioinformatics?".
Proteins

Proteins are the basic building blocks of life, performing many critical functions. Some proteins are the fundamental, structural components of tissue; while others, (enzymes), are catalysts for chemical reactions. A simple protein is a linear polymer or chain of amino acids. Table 1 lists the twenty amino acids common to proteins along with their one-letter and three-letter codes. Each amino acid contains an amino group (-NH3+) at one end and a carboxyl group (-COO-) at the other, connected by a carbon (a-carbon). A variable side-chain (R-group) and a hydrogen are attached to the a-carbon (see Figure 1). The R-group is responsible for the chemical nature of each amino acid. Chains of amino acids are linked by peptide bonds, which form through the interaction of an amino group of one amino acid with the carboxyl group of another (see Figure 2). This interaction results in condensation, or release of water. Therefore, each amino acid component is referred to as a residue. Because one end of a protein has a free amino group (N-terminal) and the other has a free carboxyl group (C-terminal), we can assign a direction to the chain and list the amino acids from the “beginning” (N-terminal) of the chain to the “end” (C-terminal).

Table 1. The twenty commonly occurring amino acids along with their one-letter and three-letter codes. (Note: B is used when one cannot distinguish between D and N because of amino acid analytical processing. Similarly, Z is used when it is ambiguous whether the amino acid is E or Q. X represents an unknown or nonstandard amino acid.)

One-Letter Code
Three-Letter Code
Name
     
A
Ala
Alanine
C
Cys
Cysteine
D
Asp
Aspartic Acid
E
Glu
Glutamic Acid
F
Phe
Phenylalanine
G
Gly
Glycine
H
His
Histidine
I
Ile
Isoleucine
K
Lys
Lysine
L
Leu
Leucine
M
Met
Methionine
N
Asn
Asparagine
P
Pro
Proline
Q
Gln
Glutamine
R
Arg
Arginine
S
Ser
Serine
T
Thr
Threonine
V
Val
Valine
W
Trp
Tryptophan
Y
Tyr
Tyrosine
 

Figure 1 Structure of an amino acid ["Protein Structure Basics" by Bernhard Rupp, UCRL-MI-125269, Lawrence Livermore National Laboratory, 2000, http://ruppweb.dyndns.org/Xray/tutorial/protein_structure.htm]


a-amino-acid.gif (999 bytes)

Figure 2 Chain of two amino acids ["Protein Structure Basics" by Bernhard Rupp, UCRL-MI-125269, Lawrence Livermore National Laboratory, 2000, http://ruppweb.dyndns.org/Xray/tutorial/protein_structure.htm]

2-amino-acids.gif (2351 bytes)

The linear sequence of residues is called the primary structure of a protein (see Figure 3). However, interactions between hydrogens and oxygens of the amino and carboxyl groups of different amino acids may result in regular arrangements, called the secondary structure of the protein. For example, a helix is one type of secondary structure (see Figure 5). Chemical interactions that take place between R-groups of nonadjacent amino acids maintain this structure. The primary structure governs these higher order structures of proteins, which are essential in determining the three-dimensional conformation of the protein. A single polymer of amino acids might be more properly called a polypeptide. Some functional proteins are made up of only one polypeptide, whereas many proteins are made up of more than one polypeptide. For instance, hemoglobin consists of two pairs. Proteins that consist of two or more interacting polypeptides are said to exhibit quaternary structure (see Figure 5).  These higher-order structures are very important because they determine the overall shape of the protein, possible bindings of the protein, and, thus, its function.

Figure 3 Primary protein structure ["Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.genome.gov/Pages/Hyperion//DIR/VIP/Glossary/Illustration/amino_acid.shtml]

Figure 4 Helix ["Protein Structure Basics" by Bernhard Rupp, UCRL-MI-125269, Lawrence Livermore National Laboratory, 2000, http://ruppweb.dyndns.org/Xray/tutorial/protein_structure.htm]

helices.gif (46879 bytes)

Figure 5 Protein structures ["Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.genome.gov/Pages/Hyperion//DIR/VIP/Glossary/Illustration/protein.shtml]

Quick Review Question
Quick Review Question 1  
a. Give the name of the area of computational science that deals with the organization and the analysis of biological data.


b. Give the number of amino acids common to proteins.


Match each of the following phrases with the best term.

c. Basic building blocks of life
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


d. Proteins that are catalysts for chemical reactions.
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


e. A simple protein is a linear chain of these
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


f. Link chains of amino acids
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


g. Formed through the interaction of an amino group of one amino acid with the carboxyl group of another
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


h. Amino acid component of a protein
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


i. Free amino group that is the beginning of the chain of amino acids
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


j. Free carboxyl group that is the end of the chain of amino acids
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


k. Linear sequence of residues of a protein
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


l. A single polymer of amino acids
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


m. Determine the overall shape of the protein and its function
a-carbon amino acids amino group C-terminal carboxyl group enzymes higher-order structures
N-terminal peptide bonds polypeptide primary structure proteins R-group residue


Nucleic Acids

In the cell, the nucleic acid DNA (deoxyribonucleic acid) contains the encoded information for the manufacture of all the proteins a cell needs. However, DNA does not oversee protein synthesis directly but acts through an intermediary nucleic acid, RNA (ribonucleic acid). The RNA sequences subsequently specify the amino acid sequences of proteins. Both DNA and RNA are polymers of molecules called nucleotides. A nucleotide is a compound molecule made up of a sugar (either deoxyribose or ribose), a phosphate, and a nitrogen base (adenine (A), guanine (G), cytosine (C), and thymine (T) in DNA or uracil (U) in RNA (see Figure 6). DNA is a double strand of nucleotides, whereas RNA is a single strand. Thus, we can say a particular DNA molecule has 300 bases or 300 nucleotides. As with proteins, because the backbone of a strand always has specific chemical structures at opposite ends, we can canonically give direction to the sequence of nucleotides (or bases) in a strand.

Figure 6 RNA and DNA ["Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.genome.gov/Pages/Hyperion//DIR/VIP/Glossary/Illustration/rna2.shtml]


Bases in one strand may bond with bases in another. Because of their structure, A and T always bond together, and C and G always bond together.  Each pair is said to be made up of complementary bases and is referred to as a base pair (bp).  The number of such base pairs is used to describe the length of a DNA molecule.  Because of pairing consistency, by knowing the sequence of bases in one strand, we can deduce the sequence of bases in the other strand through reverse complementation. For example, suppose one sequence is s = ATGAC. Because of the required pairing, A - T and C - G, we know the base pairs must appear as follows:

s:
A
T
G
A
C
 
|
|
|
|
|
 
T
A
C
T
G

However, to list the bases in the canonical order, we first reverse s (s’) and then complement () to obtain the sequence GTCAT in the canonical order, as follows:
s:
A T G A C
s’
C A G T A
G T C A T
 
Quick Review Question
Quick Review Question 2  Give the reverse complement of the sequence GTACCT.

s: G T A C C T
s’ T C C A T G
A G G T A C

In contrast to DNA, RNA (ribonucleic acid) is a single strand of nucleotides made up of ribose sugars and bases A, C, G, and U instead of the nitrogen base thymine (T) (see Table 2). Several types of RNA with different functions exist in the cell.

Table 2. Bases in DNA and RNA

Base
Abbreviation
Complement
In DNA
In RNA
adenine
A
T in DNA, U in RNA
yes
yes
guanine
G
C
yes
yes
cytosine
C
G
yes
yes
thymine
T
A
yes
no
uraciler
U
A
no
yes
Quick Review Question
Quick Review Question 3 Match all the terms that apply for each of the following parts.
a. Contains the encoded information that is stored to direct the manufacture of all the proteins a cell needs
A C DNA G nucleotide
protein RNA T U


b. An intermediary nucleic acid in protein synthesis

A C DNA G nucleotide
protein RNA T U


c. Compound molecule made of a sugar, a phosphate, and a nitrogen base

A C DNA G nucleotide
protein RNA T U


d. Type of molecule in DNA and RNA sequences
A C DNA G nucleotide
protein RNA T U


e. Bases in DNA
A C DNA G nucleotide
protein RNA T U


f. Bases in RNA
A C DNA G nucleotide
protein RNA T U


g. Always bonds with base A in DNA
A C DNA G nucleotide
protein RNA T U


h. Always bonds with base A in RNA
A C DNA G nucleotide
protein RNA T U


i. Always bonds with base C in DNA or RNA
A C DNA G nucleotide
protein RNA T U


j. Always bonds with base T in DNA
A C DNA G nucleotide
protein RNA T U


k. Always bonds with base U in RNA
A C DNA G nucleotide
protein RNA T U


l. Always bonds with base G in DNA or RNA
A C DNA G nucleotide
protein RNA T U


m. Single strand of nucleotides
A C DNA G nucleotide
protein RNA T U


n. Double strand of nucleotides
A C DNA G nucleotide
protein RNA T U


 


From Genes to Proteins

Each cell contains chromosomes, which are very long DNA molecules. A gene is a contiguous section of a chromosome that encodes information to build a protein or an RNA molecule see (Figure 7). In humans, a gene is composed of about 10,000 bp. A chromosome contains genes and contiguous sections that are not part of any gene. Some scientists believe that genes compose only about 10% of a human chromosome. A complete set of chromosomes in a cell is called the genome. For example, a human genome has 46 chromosomes in 23 pairs.

Figure 7 Gene as contiguous section of chromosome [Figure slightly modified from graphics at "Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.genome.gov/Pages/Hyperion//DIR/VIP/Glossary/Illustration/gene.shtml]


    

For simplicity, we assume that a particular protein in an organism corresponds to exactly one gene. In a gene, a sequence of three nucleotides (triplet) specifies an amino acid. For example, the sequence  ACG or the codon ACA encodes the information for the amino acid Threonine (Thr). The genetic code represents such a correspondence between these triplets and the amino acids they specify. With four base choices, a pair of bases could only encode information for (4)(4) = 16 amino acids. With three bases, (4)(4)(4) = 64 possible triplets exist. Several, such as ACG and ACA, encode the same amino acid; and three sequences do not encode for any amino acid.

Quick Review Question
Quick Review Question 4 Give the number of each of the following:  
a. Give the number of bases in a triplet


b. Give the number of different bases in DNA


c.  Give the number of commonly occurring amino acids


d.  Give the number of possible triplets

 

Protein synthesis is the process of using genetic code to direct the building of proteins. Synthesis begins in the nucleus, where enzymes catalyze the production of a molecule of RNA, termed messenger RNA or mRNA. As Figure 8 illustrates, each DNA triplet specifies a complementary sequence of three nucleotides, which we call a codon, in the RNA. The synthesis of RNA is called transcription. During transcription, base pairing ensures formation of a strand of RNA that is complementary to the gene sequence with U replacing T.

Figure 8 Codons in RNA ["Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.genome.gov/Pages/Hyperion//DIR/VIP/Glossary/Illustration/codon.shtml]

As Figure 9 shows, transcription represents only the first step in protein synthesis. The initial transcript must be processed through a complex process of chemical changes that includes removal of portions of the RNA and splicing back together of loose ends. However, this modified mRNA molecule is in the nucleus with a double thickness of membrane separating the nucleus from the cytoplasm, or area surrounding the cell nucleus; while protein synthesis must conclude in the cytoplasm on small structures called ribosomes. The mRNA molecule must be transported from the nucleus into the cytoplasm before any protein can be synthesized.

Figure 9 Protein synthesis ["Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.genome.gov/Pages/Hyperion//DIR/VIP/Glossary/Illustration/mrna.shtml]

After movement to the cytoplasm, the mRNA attaches to a ribosome, which essentially reads the code (sequence of codons) specified by the gene. The ribosome brings together the mRNA strand with various molecules of another type of very small RNA, called transfer RNA (tRNA). There are many different tRNA molecules, each of which attaches very specifically to only one type of amino acid. The ribosomal enzymes must ensure that each codon of the mRNA combines with a tRNA that carries the correct amino acid in the nascent protein sequence. This process is possible because each tRNA molecule contains a triplet code, called an anticodon. The anticodon then base pairs with the complementary codon of the mRNA, ensuring addition of the correct amino acid. The ribosome moves down the mRNA molecule one codon at a time, allowing the correct tRNA with the specified amino acid to base pair. Enzymes of the ribosome also help by catalyzing the formation of peptide bonds between the amino acid and the growing peptide chain (see Figure 10). Eventually, the ribosome reaches a codon that does not code for any amino acid, which signals the ribosome to stop. Voila! With the help of some RNA and ribosomes, we have a protein with an amino acid sequence that the DNA sequence of the gene specified.

Figure 10 Growing peptide chain ["Talking Glossary of Genetic Terms," National Human Genome Research Institute, http://www.genome.gov/Pages/Hyperion//DIR/VIP/Glossary/Illustration/peptide.shtml]

 

Quick Review Question
Quick Review Question 5 Select the best match for each of the following:
a. Very long DNA molecule
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

b. Contiguous section of a chromosome that encodes information to build a protein or an RNA molecule

anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

c. A complete set of chromosomes in a cell
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

d. Sequence of three nucleotides
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

e. The process of using genetic code to direct the building of proteins
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

f. The place in the cell where protein synthesis begins
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

g. The place in the cell where enzymes catalyze the production of a molecule of RNA
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

h. A molecule of RNA produced in the nucleus
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

i. Sequence of three nucleotides in RNA complementary to a DNA triplet
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

j. The synthesis of RNA
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

k. Area surrounding the cell nucleus
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

l. Small structure on which protein synthesis concludes
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

m. Location in the cell of ribosomes
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

n. A type of RNA that attaches very specifically to only one type of amino acid
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

o. Triplet code that a tRNA molecule contains
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

p. Bond between an amino acid and a growing peptide chain
anticodon chromosome codon cytoplasm DNA
gene genome mRNA nucleus peptide bond
protein synthesis ribosome transcription triplet tRNA

Studying the Genome

Sequencing is the process of finding the base-pair sequence of a section of DNA. Although a human chromosome contains about 100-million (108) base pairs, scientists have only been able to sequence pieces of DNA with several thousand bp. One way to sequence a much larger segment is to split it into smaller pieces; determine the sequences on these fragments; and using computational techniques, reconstruct the sequence for the larger segment. Computational techniques must handle the errors that are almost always in the laboratory data.
Human Genome Project

The Human Genome Project (HGP), which started in 1988, is an international effort to determine the whole human DNA sequence.  A working draft of the human genome sequence by a private company, Celera Genomics, and government laboratories was published in February, 2001.  Knowing the sequence will help scientists in their work to determine the location of genes and the functions of proteins. Databases of DNA, RNA, and protein sequences and computational techniques to search and analyze the data will help in these efforts. Already, scientists have completely sequenced the DNA for 26 organisms, such as baker’s yeast, several microbes that cause disease, and a common worm (Caenorhabditis elegans), which is the first animal whose entire genome is known.
    

The National Center for Biotechnology Information (NCBI), USA, maintains GenBank, a database of more than 920,000 plant and animal DNA sequences from over 16,000 species. The database has been doubling about every 18 months. Figure 11 is an example of a GenBank record. The accession number, here M82814, is a key and field identifiers describe the contents of the fields, which are strings. One can search the database by keyword or sequence. Other databases include DNA database EMBL, protein sequence database PIR, PDB database for three-dimensional structures of proteins, and EcoCyc Encyclopedia of Escherichia coli Genes and Metabolism.

Figure 11. A GenBank record (Click here)


Genomic Database Characteristics

The data and access characteristics of genomic databases present challenges to computational scientists. First, genomic data is very complex. For example, the yeast genome has a 10-million bp sequence. A genomic database stores subsequences and a wealth of related data, such as locations, references, comments, and features.

This complexity results in another characteristic of genomic databases, which is the necessity for some form of redundancy. For example, there can be several alternative genes (alleles), say for determining different eye colors, at the same location (locus) on a DNA sequence. Databases handle redundancy in different ways. GenBank makes little attempt to reduce redundancy, while the nr (non-redundant) nucleotide database by NCBI has merged entries that have identical sequences .

A high degree of variability is another aspect of the complexity of biological data. Often there are exceptions, or new discoveries cause revisions or additions to possible values. For example, besides the 20 common amino acids, we use three other symbols for ambiguous or unknown situations. Thus, data types must be flexible with few constraints.

Because of this variability, schemas change rapidly. Database designers must manage extensions to the schemas as scientific knowledge expands. Many genomic databases are re-released annually or semiannually.

To add to the complexity involving data types and schemas, two biologists or two databases may represent the same data in different ways. However, scientists need to be able to compare their findings.

One of the most significant access characteristics of genomic databases is that scientists read, or query, such databases extensively but rarely write to one. For example, each month in 2000, the MITOMAP database had over 10,000 people that query the database but fewer than five who made additions to it.

As another characteristic, these querying users have a limited knowledge of the database schema design. Hence, the web access to the database must be flexible and intuitive to handle a variety of queries. Often, these queries are complex and involve multiple data sets.

Moreover, biologists are interested in the data in context. For example, a biologist might want to know similar sequences to a query sequence, where they are alike, where they are different, how the sequences compare, functions of similar subsequences, etc. By making such comparisons, he or she hopes to deduce functions of the query sequence.

Biologists must query the most recent version of a database but also must be able to access earlier versions to examine data and repeat procedures. Thus, a genomic database must provide access to historic data.

The following is a summary of the main characteristics of genomic databases:

Quick Review Question
Quick Review Question 6 Select the data and access characteristics of genomic databases
All versions of data needed Complex Context of data important
Databases may represent the same date in different ways Data only needed in isolation Detailed knowledge of schema by users
Fixed schemas Intricate queries important Limited knowledge of schema by users
Mostly read and write access Mostly read only access Mostly write only access
No data redundancy One form of data representation Only latest version of data needed
Only simple queries Schemas change often Simple
Some data has exceptions Some data redundancy needed User-friendly web access needed


Exercises

1. Consider TTGCGGAATC, which is part of a hypothetical DNA sequence.
a. Find its complementary strand.
b. Find the corresponding sequence of bases in mRNA.
c.
Give 6 possible sequences of codons for the protein it produces.
2. Repeat Exercise 1 for the sequence CTGGATAGGCCAGT.

3. In GenBank, search for each of the following:
a. The information on Accession Number M82606.  Hint:  You can use the online search at the GenBank site.
b. The disease-causing agent Vibrio cholerae, which causes cholera. Search the vital sequence section of the database.
4. For the GenBank entity record in Figure 11, determine possible attributes and their types.

Projects

1. Develop a program to read DNA sequences from a file and to write the sequences along with their complementary strands to another file.

2. Develop a program to read DNA sequences from a file and to write to another file each sequence with the six possible sequences of codons for the protein it produces.

Copyright © 2002, Dr. Angela B. Shiflet
All rights reserved