app03lev1sec1.html

Appendix C. Perl and Biology

C.1. What Is Bioinformatics?

"The avalanche of genome data grows daily. The new challenge will be to use this vast reservoir of data to explore how DNA and proteins work with each other and the environment to create complex, dynamic, living systems. Systematic studies of function on a grand scale—functional genomics—will be the focus of biological explorations in this century and beyond..." http://www.ornl.gov/sci/techresources/Human_Genome/project/. Bioinformatics refers to the processing of these vast amounts of biological data on a computer.

In the past decade, computers have become more and more important in biological research, especially since 2003 when the Human Genome Project completed its goal to identify all the genes in human DNA. (See the goals of the project at http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml.) Programs can now analyze large amounts of data quickly, and Perl has become one of the most, if not the most, popular language used in the field of bioinformatics for tasks such as accessing sequence data from local and remote databases, transforming sequence files, manipulating sequence data, obtaining statistics, modeling biological systems, etc.

Many biologists may prefer to write their own functions to handle the massive amounts of data and may find that downloading and using the BioPerl modules will provide tools to speed the development of their programs. Not all Perl programmers have a strong background in biology any more than all biolgists understand Perl. Therefore, the following overview briefly describes some of the biology and some of the ways Perl is used to manipulate symbols that represent DNA sequences and proteins with and without the BioPerl modules.