app03lev1sec4.html

C.4. What Is BioPerl?

BioPerl is an open source (essentially free) toolkit of over 500 Perl modules that enable scripts to analyze large quatities of data for Web-based systems. "Bioperl provides reusable Perl modules that facilitate writing Perl scripts for sequence manipulation, accessing of databases using a range of data formats, and execution and parsing of the results of various molecular biology programs including Blast, clustalw, TCoffee, genscan, ESTscan, and HMMER." BioPerl also supports accessing remote databases as well as creating indices for accessing local databases. BioPerl lets you build programs that can analyze huge quantities of sequence data. It does require familiarity with object-oriented Perl and does not provide complete ready-made programs; it provides a set of tools to simplify common bioinformatic tasks. BioPerl is always being developed and maintained by a large number of programmers.

For complete installation instructions, see the BioPerl home page at www.bioperl.org. The easiest way to install BioPerl is to use the PPM manager provided by ActiveState for all major operating systems or by upacking the tar file and entering the following commands:

   >gunzip bioperl-1.2.tar.gz
>tar xvf bioperl-1.2.tar
>cd bioperl-1.2

Now issue the make commands:

>perl Makefile.PL
>make
>make test

For downloads and information, go to the BioPerl Web site shown in Figure C.2

Figure C.2. Installing bioperl with PPM (ActiveState).

[View full size image]

Figure C.3. `bioperl` is a large set of modules as shown here from a Windows32 system.

[View full size image]

Directory of C:\ActivePerl\site\lib\Bio

Code View:

07/03/2007  07:14 PM    <DIR>          .
07/03/2007  07:14 PM    <DIR>          ..
07/03/2007  07:14 PM    <DIR>          Align
07/03/2007  07:14 PM    <DIR>          AlignIO
07/03/2007  07:14 PM            14,301 AlignIO.pm
07/03/2007  07:14 PM            23,350 AnalysisI.pm
07/03/2007  07:14 PM             6,042 AnalysisParserI.pm
07/03/2007  07:14 PM             6,422 AnalysisResultI.pm
07/03/2007  07:14 PM             2,798 AnnotatableI.pm
07/03/2007  07:14 PM    <DIR>          Annotation
07/03/2007  07:14 PM             5,004 AnnotationCollectionI.pm
07/03/2007  07:14 PM             4,654 AnnotationI.pm
07/03/2007  07:14 PM    <DIR>          Assembly
07/03/2007  07:14 PM    <DIR>          Biblio
07/03/2007  07:14 PM            11,153 Biblio.pm
07/03/2007  07:14 PM    <DIR>          Cluster
07/03/2007  07:14 PM             4,100 ClusterI.pm
07/03/2007  07:14 PM    <DIR>          ClusterIO
07/03/2007  07:14 PM             7,792 ClusterIO.pm
07/03/2007  07:14 PM    <DIR>          Coordinate
07/03/2007  07:14 PM    <DIR>          Das
07/03/2007  07:14 PM            13,084 DasI.pm
07/03/2007  07:14 PM    <DIR>          DB
07/03/2007  07:14 PM             2,738 DBLinkContainerI.pm
07/03/2007  07:14 PM             2,882 DescribableI.pm
07/03/2007  07:14 PM    <DIR>          Event
07/03/2007  07:14 PM    <DIR>          Expression
07/03/2007  07:14 PM    <DIR>          Factory
07/03/2007  07:14 PM             5,327 FeatureHolderI.pm
07/03/2007  07:14 PM    <DIR>          Graphics
07/03/2007  07:14 PM             2,793 Graphics.pm
07/03/2007  07:14 PM             2,530 IdCollectionI.pm
07/03/2007  07:14 PM             5,146 IdentifiableI.pm
07/03/2007  07:14 PM    <DIR>          Index
07/03/2007  07:14 PM    <DIR>          LiveSeq
07/03/2007  07:14 PM            10,986 LocatableSeq.pm
07/03/2007  07:14 PM    <DIR>          Location
07/03/2007  07:14 PM            12,304 LocationI.pm
07/03/2007  07:14 PM    <DIR>          Map
07/03/2007  07:14 PM    <DIR>          MapIO
07/03/2007  07:14 PM             5,329 MapIO.pm
07/03/2007  07:14 PM    <DIR>          Matrix
07/03/2007  07:14 PM    <DIR>          Ontology
07/03/2007  07:14 PM    <DIR>          OntologyIO
07/03/2007  07:14 PM             8,235 OntologyIO.pm
07/03/2007  07:14 PM            17,252 Perl.pm
07/03/2007  07:14 PM    <DIR>          Phenotype
07/03/2007  07:14 PM            24,568 PrimarySeq.pm
07/03/2007  07:14 PM            20,679 PrimarySeqI.pm
07/03/2007  07:14 PM             7,472 Range.pm
07/03/2007  07:14 PM            11,972 RangeI.pm
07/03/2007  07:14 PM    <DIR>          Root
07/03/2007  07:14 PM    <DIR>          Search
07/03/2007  07:14 PM             5,435 SearchDist.pm
07/03/2007  07:14 PM    <DIR>          SearchIO
07/03/2007  07:14 PM            15,069 SearchIO.pm
07/03/2007  07:14 PM    <DIR>          Seq
07/03/2007  07:14 PM            37,739 Seq.pm
07/03/2007  07:14 PM             3,299 SeqAnalysisParserI.pm
07/03/2007  07:14 PM    <DIR>          SeqFeature
07/03/2007  07:14 PM            17,041 SeqFeatureI.pm
07/03/2007  07:14 PM             6,484 SeqI.pm
07/03/2007  07:14 PM    <DIR>          SeqIO
07/03/2007  07:14 PM            21,805 SeqIO.pm
07/03/2007  07:14 PM             9,799 SeqUtils.pm
07/03/2007  07:14 PM            50,828 SimpleAlign.pm
07/03/2007  07:14 PM             8,380 Species.pm
07/03/2007  07:14 PM    <DIR>          Structure
07/03/2007  07:14 PM    <DIR>          Symbol
07/03/2007  07:14 PM    <DIR>          Taxonomy
07/03/2007  07:14 PM             6,372 Taxonomy.pm
07/03/2007  07:14 PM    <DIR>          Tools
07/03/2007  07:14 PM    <DIR>          Tree
07/03/2007  07:14 PM    <DIR>          TreeIO
07/03/2007  07:14 PM             6,223 TreeIO.pm
07/03/2007  07:14 PM             3,251 UpdateableSeqI.pm
07/03/2007  07:14 PM    <DIR>          Variation
             39 File(s)         430,638 bytes
             38 Dir(s)   62,847,074,304 bytes free

Example C.4.

(Perl Script using bioperl modules from the bioperl Web site) use Bio::Seq; use Bio::SeqIO; # create a sequence object of some DNA my $seq = Bio::Seq->new(-id => 'testseq', -seq => 'CATGTAGATAG' ); # print out some details about it print "seq is ", $seq->length, " bases long\n"; print "reversed complement seq is ", $seq->revcom->seq, "\n"; # write it to a file in Fasta format my $out = Bio::SeqIO->new(-file => '>testseq.fsa', -format => ' Fasta'); $out->write_seq($seq); (Output) seq is 11 bases long reversed complement seq is CTATCTACATG $ more testseq.fsa <-- The FASTA file >testseq CATGTAGATAG

For more examples using BioPerl go to:

http://www.bioperl.org/wiki/Bptutorial#Quick_getting_started_scripts

BioPerl comes with a set of production-quality scripts that are kept in the scripts/ directory. You can install these scripts by answering the questions on make install. The default location directory is /usr/bin. Installation will copy the scripts to the specified directory, change the PLS suffix to pl, and prepend bp_ to all the script names if they aren't so named already.

C.4. What Is BioPerl?

Figure C.2. Installing bioperl with PPM (ActiveState).

Figure C.3. bioperl is a large set of modules as shown here from a Windows32 system.

Example C.4.

Figure C.3. `bioperl` is a large set of modules as shown here from a Windows32 system.