You are on page 1of 11

Introduction to Perl: part 1

EPFL Bioinformatics II 13 Mar 2006


What is Perl ?
How is Perl is used ?
Why is Perl useful in Biology ?
Basics of Perl
Simple text file parsing
Arrays

By the way: Perl stays for Practical Extraction and Report Language
What is Perl.
EPFL Bioinformatics II 13 Mar 2006
A Unix shell: standard Unix commands are accessible within Perl code
An interpreted programming language: each command is compiled on the
fly.
A complicated and rich programming language: contains many commands,
each command may do many different things at the same time.
A powerful but dangerous language: You are allowed to do almost
everything you want on your own risk. Typos often lead to unexpected
actions rather than compilation errors.

Pros and Cons of Perl
EPFL Bioinformatics II 13 Mar 2006
Advantages:
Powerful: You can program a complicated task with very few lines of
code. It often takes less than an hour to write a Perl script
Advanced support for regular expressions (string matching operations).
This is especially useful for text file parsing.
Easy access to system commands from within Perl
Support for web-based applications through CGI interface.

Disadvantages:
Code is very concise and usually difficult to read
Slow for computationally intensive tasks.
Little control over systems resources.
Not very transparent: Many trivial tasks like initialization of variables
are done automatically, but you dont know exactly how

For what purposes is Perl used in biology
EPFL Bioinformatics II 13 Mar 2006
To parse and reformat structured text files, e.g. nucleotide sequence entry
files.
Web-based program interfaces
For implementing computationally inexpensive algorithms
For testing computationally expensive algorithms on small data sets
For piloting complex data processing pipelines invoking several compiled
programs in succession
For automating any repetitive simple task
For educational purposes: You will not escape writing the Smith-
Waterman algorithm in Perl.
How is Perl used ?
EPFL Bioinformatics II 13 Mar 2006
On a Unix system, write a text file named e.g. myrog.pl.
#!/usr/local/bin/perl
#
print "Hello!\n";
Then make this file executable for you
% chmod +x myprog.pl
and call it like any Unix command:
myprog.pl
Note: the character sequence \n represents a new line (line feed) character.
Alternatively, Perl commands can also be submitted directly from the UNIX
commanc line:
% perl -e 'print "hello!\n"'
Indicates location of Perl
Interpreter on local machine
Basics of Perl
EPFL Bioinformatics II 13 Mar 2006
Lines starting with # are ignored (comment lines
Individual commands are separated by ;
Variable names start with a $
Blocks of commands are encompassed by {}
Example of a Perl script which computes and prints the square-roots of
integers 1 to 20.
#!/usr/local/bin/perl
#
for($i=1; $i <= 10; $i = $i+1) {
$sqrt = $i**0.5;
print "square-root of $i is $sqrt\n"
}
Result
EPFL Bioinformatics II 13 Mar 2006
#!/usr/local/bin/perl
#
for($i=1; $i <= 10; $i = $i+1) {
$sqrt = $i**0.5;
print "square-root of $i is $sqrt\n"
}
square-root of 1 is 1
square-root of 2 is 1.4142135623731
square-root of 3 is 1.73205080756888
square-root of 4 is 2
square-root of 5 is 2.23606797749979
square-root of 6 is 2.44948974278318
square-root of 7 is 2.64575131106459
square-root of 8 is 2.82842712474619
square-root of 9 is 3
square-root of 10 is 3.16227766016838
Example of a simple text parsing file:
EPFL Bioinformatics II 13 Mar 2006
#!/usr/local/bin/perl
#
$prt = 0;
while(<STDIN>) {
if(/^ID/) {$text = "$_"; $prt = 0}
if(/^OS.*Homo sapiens/) {$prt = 1}
if(/^DE/) {$text = $text . "$_"}
if(/^\/\// and $prt) {print "$text"}
}
This script scans a Swiss-Prot sequence library file and prints for each human entry
The ID and DE lines. The Swiss-Prot library file is read for the standard intput. The
script may be called as follows:

% text_parsing.pl < swissprot.dat
Example of a simple text parsing file:
EPFL Bioinformatics II 13 Mar 2006
#!/usr/local/bin/perl
#
$prt = 0;
while(<STDIN>) {
if(/^ID/) {$text = "$_"; $prt = 0}
if(/^OS.*Homo sapiens/) {$prt = 1}
if(/^DE/) {$text = $text . "$_"}
if(/^\/\// and $prt) {print "$text"}
}
Explanations:
while(expr){block} repeats block as long as expr returns true (1).
<STDIN> reads on line from standard intput and stores content
in predefined variable $_.
/^string/ returns true (1) if string is found at the beginning (^)
of $_.
$text . "$_" concatenates two character strings
\/\/ backslashes are needed to force slashes to interpreted
as ordinary characters without specific meaning
Arrays:
EPFL Bioinformatics II 13 Mar 2006
#!/usr/local/bin/perl
#
@numbers = ("one", "two", "three", "four");
print scalar(@numbers) . "\n";
print "$numbers[0]\n";
Example:
Output:
4
one
Notes:
Array names start with @
References to array elements start with $
The numbering of array elements starts with 0, as in C.
The function scalar returns the size of the array
Two-dimensional array:
EPFL Bioinformatics II 13 Mar 2006
#!/usr/local/bin/perl
#
@table = ( ["aa", "ab", "ac"], ["ba" , "bb" , "bc"] );
print scalar(@table) . "\n";
print "$#{@table}\n";
print "$#{$table[0]}\n";
print "$table[1][2]\n";
Example:
Output:
2
1
2
bc
Note:
$#{@array} returns the subscript of the last element of @array.

You might also like