BioPerl is an open source (essentially free) toolkit of over 500 Perl modules that enable scripts to analyze large quatities of data for Web-based systems. "Bioperl provides reusable Perl modules that facilitate writing Perl scripts for sequence manipulation, accessing of databases using a range of data formats, and execution and parsing of the results of various molecular biology programs including Blast, clustalw, TCoffee, genscan, ESTscan, and HMMER." BioPerl also supports accessing remote databases as well as creating indices for accessing local databases. BioPerl lets you build programs that can analyze huge quantities of sequence data. It does require familiarity with object-oriented Perl and does not provide complete ready-made programs; it provides a set of tools to simplify common bioinformatic tasks. BioPerl is always being developed and maintained by a large number of programmers.
For complete installation instructions, see the BioPerl home page at www.bioperl.org. The easiest way to install BioPerl is to use the PPM manager provided by ActiveState for all major operating systems or by upacking the tar file and entering the following commands:
>gunzip bioperl-1.2.tar.gz >tar xvf bioperl-1.2.tar >cd bioperl-1.2
Now issue the make commands:
>perl Makefile.PL >make >make test
For downloads and information, go to the BioPerl Web site shown in Figure C.2
Directory of C:\ActivePerl\site\lib\Bio
07/03/2007 07:14 PM <DIR> . 07/03/2007 07:14 PM <DIR> .. 07/03/2007 07:14 PM <DIR> Align 07/03/2007 07:14 PM <DIR> AlignIO 07/03/2007 07:14 PM 14,301 AlignIO.pm 07/03/2007 07:14 PM 23,350 AnalysisI.pm 07/03/2007 07:14 PM 6,042 AnalysisParserI.pm 07/03/2007 07:14 PM 6,422 AnalysisResultI.pm 07/03/2007 07:14 PM 2,798 AnnotatableI.pm 07/03/2007 07:14 PM <DIR> Annotation 07/03/2007 07:14 PM 5,004 AnnotationCollectionI.pm 07/03/2007 07:14 PM 4,654 AnnotationI.pm 07/03/2007 07:14 PM <DIR> Assembly 07/03/2007 07:14 PM <DIR> Biblio 07/03/2007 07:14 PM 11,153 Biblio.pm 07/03/2007 07:14 PM <DIR> Cluster 07/03/2007 07:14 PM 4,100 ClusterI.pm 07/03/2007 07:14 PM <DIR> ClusterIO 07/03/2007 07:14 PM 7,792 ClusterIO.pm 07/03/2007 07:14 PM <DIR> Coordinate 07/03/2007 07:14 PM <DIR> Das 07/03/2007 07:14 PM 13,084 DasI.pm 07/03/2007 07:14 PM <DIR> DB 07/03/2007 07:14 PM 2,738 DBLinkContainerI.pm 07/03/2007 07:14 PM 2,882 DescribableI.pm 07/03/2007 07:14 PM <DIR> Event 07/03/2007 07:14 PM <DIR> Expression 07/03/2007 07:14 PM <DIR> Factory 07/03/2007 07:14 PM 5,327 FeatureHolderI.pm 07/03/2007 07:14 PM <DIR> Graphics 07/03/2007 07:14 PM 2,793 Graphics.pm 07/03/2007 07:14 PM 2,530 IdCollectionI.pm 07/03/2007 07:14 PM 5,146 IdentifiableI.pm 07/03/2007 07:14 PM <DIR> Index 07/03/2007 07:14 PM <DIR> LiveSeq 07/03/2007 07:14 PM 10,986 LocatableSeq.pm 07/03/2007 07:14 PM <DIR> Location 07/03/2007 07:14 PM 12,304 LocationI.pm 07/03/2007 07:14 PM <DIR> Map 07/03/2007 07:14 PM <DIR> MapIO 07/03/2007 07:14 PM 5,329 MapIO.pm 07/03/2007 07:14 PM <DIR> Matrix 07/03/2007 07:14 PM <DIR> Ontology 07/03/2007 07:14 PM <DIR> OntologyIO 07/03/2007 07:14 PM 8,235 OntologyIO.pm 07/03/2007 07:14 PM 17,252 Perl.pm 07/03/2007 07:14 PM <DIR> Phenotype 07/03/2007 07:14 PM 24,568 PrimarySeq.pm 07/03/2007 07:14 PM 20,679 PrimarySeqI.pm 07/03/2007 07:14 PM 7,472 Range.pm 07/03/2007 07:14 PM 11,972 RangeI.pm 07/03/2007 07:14 PM <DIR> Root 07/03/2007 07:14 PM <DIR> Search 07/03/2007 07:14 PM 5,435 SearchDist.pm 07/03/2007 07:14 PM <DIR> SearchIO 07/03/2007 07:14 PM 15,069 SearchIO.pm 07/03/2007 07:14 PM <DIR> Seq 07/03/2007 07:14 PM 37,739 Seq.pm 07/03/2007 07:14 PM 3,299 SeqAnalysisParserI.pm 07/03/2007 07:14 PM <DIR> SeqFeature 07/03/2007 07:14 PM 17,041 SeqFeatureI.pm 07/03/2007 07:14 PM 6,484 SeqI.pm 07/03/2007 07:14 PM <DIR> SeqIO 07/03/2007 07:14 PM 21,805 SeqIO.pm 07/03/2007 07:14 PM 9,799 SeqUtils.pm 07/03/2007 07:14 PM 50,828 SimpleAlign.pm 07/03/2007 07:14 PM 8,380 Species.pm 07/03/2007 07:14 PM <DIR> Structure 07/03/2007 07:14 PM <DIR> Symbol 07/03/2007 07:14 PM <DIR> Taxonomy 07/03/2007 07:14 PM 6,372 Taxonomy.pm 07/03/2007 07:14 PM <DIR> Tools 07/03/2007 07:14 PM <DIR> Tree 07/03/2007 07:14 PM <DIR> TreeIO 07/03/2007 07:14 PM 6,223 TreeIO.pm 07/03/2007 07:14 PM 3,251 UpdateableSeqI.pm 07/03/2007 07:14 PM <DIR> Variation 39 File(s) 430,638 bytes 38 Dir(s) 62,847,074,304 bytes free
(Perl Script using bioperl modules from the bioperl Web site) use Bio::Seq; use Bio::SeqIO; # create a sequence object of some DNA my $seq = Bio::Seq->new(-id => 'testseq', -seq => 'CATGTAGATAG' ); # print out some details about it print "seq is ", $seq->length, " bases long\n"; print "reversed complement seq is ", $seq->revcom->seq, "\n"; # write it to a file in Fasta format my $out = Bio::SeqIO->new(-file => '>testseq.fsa', -format => ' Fasta'); $out->write_seq($seq); (Output) seq is 11 bases long reversed complement seq is CTATCTACATG $ more testseq.fsa <-- The FASTA file >testseq CATGTAGATAG |
For more examples using BioPerl go to:
http://www.bioperl.org/wiki/Bptutorial#Quick_getting_started_scripts
BioPerl comes with a set of production-quality scripts that are kept in the scripts/ directory. You can install these scripts by answering the questions on make install. The default location directory is /usr/bin. Installation will copy the scripts to the specified directory, change the PLS suffix to pl, and prepend bp_ to all the script names if they aren't so named already.