Previous Page Next Page

C.2. A Little Background on DNA

DNA is the stuff of which every living thing is made. It contains the genetic instructions to make me and you who we are. It is also reponsible for creating proteins. Proteins are of great interest because they control cell activity and how the cell functions; i.e., how you digest your food, fight off infection, get oxygen to your muscles, organs, etc.

In order to use Perl to work with DNA, here are some of the terms you will encounter:

  1. Nucleotides, or bases

  2. Base pairing

  3. Reverse complement

  4. Sequencing

  5. Proteins and amino acids

  6. RNA molecues and transcription

  7. BLAST and FASTA

DNA is found in the nucleus of our cells, called a double helix because it is shaped like a spiral ladder, and consists of thousands of genes. The rungs of the ladder are made up of four chemicals, called nucleotides, or bases, that carry the information used to make a body and to keep it running. Each of the four bases is named with a letter, G (guanine), A (adenine), T (thymine), or C (cytosene). All of the letters in one cell make up the human genome, joined end to end, to hold a complete set of instructions for coding all life on Earth.

In a complete DNA helix, hydrogen bonds always link the As with the Ts and the Gs with the Cs. When new cells are made, different letters of the DNA alphabet are combined, but even with just four letters, the DNA alphabet spells out all the information you need to create new cells. (If you have one strand of DNA, the opposite strand is the reverse complement of the other strand.) The order of the DNA bases is called the sequence. The sequence of the four bases in DNA can spell all the instructions to create cells for your whole body and determine individual hereditary characteristics. "If you wrote down all of the bases in one cell, you would fill a stack of 1,000 phone books with As, Ts, Gs and Cs. Scientists trying to locate small sections of DNA out of the whole genome have to flip through billions of bases to find what they want! Sometimes this takes years." (From http://www.thetech.or/exhitits/online/genome.) Genbank, the Genetic Sequence Data Bank (http://www.ncbi.nlm.nih.gov/Genbank), holds most of the known sequence data.

Figure C.1. The double helix and nucleotides, or bases.


What is RNA? RNA is like DNA but is only one strand rather than two and isn't confined to the nucleus. RNA has the same pairing scheme as DNA, except the T (thiamine) is replaced with a U (uracil). There are different types of RNA; one type is the messenger that transports DNA coded instructions from outside the nucleus of the cell to message centers, areas of the cell where the information is needed to create proteins. Proteins are long chains of differently shaped molecules called amino acids. When you eat food, the body digests the food and breaks it down into amino acids, which can then be reused by the cells. The transfer RNA helps translate the coded instructions into amino acids. The term "transcription" is used to describe the process where DNA makes RNA (ribonucleic acid), and "translation" is when the RNA translates the code into proteins.

BLAST (Basic Local Alignment Search Tool) is one of the most popular biological search tools today. It is used to test a sequence by issuing a query and testing it against a library of known sequences. It provides a file with scores showing how the sequence measures up statistically. FASTA files are simple text files with one header line starting with a ">" character and the name of the DNA or gene from which it is produced, followed by lines of nucleotide or amino-acid sequence data.

Understanding some of these terms will be helpful when looking at examples in this appendix and on the Web.

Previous Page Next Page