This document provides an introduction to the Perl programming language, including what Perl is, how it is used, why it is useful for biology, its basic features like variables and arrays, and examples of simple Perl scripts for tasks like text parsing and calculating square roots. Key points covered include that Perl is an interpreted language used for tasks like parsing text files and web interfaces, it has advantages like powerful regular expressions and system command access, and disadvantages like terse hard-to-read code. Examples demonstrate basic Perl syntax and how to write simple programs.
This document provides an introduction to the Perl programming language, including what Perl is, how it is used, why it is useful for biology, its basic features like variables and arrays, and examples of simple Perl scripts for tasks like text parsing and calculating square roots. Key points covered include that Perl is an interpreted language used for tasks like parsing text files and web interfaces, it has advantages like powerful regular expressions and system command access, and disadvantages like terse hard-to-read code. Examples demonstrate basic Perl syntax and how to write simple programs.
This document provides an introduction to the Perl programming language, including what Perl is, how it is used, why it is useful for biology, its basic features like variables and arrays, and examples of simple Perl scripts for tasks like text parsing and calculating square roots. Key points covered include that Perl is an interpreted language used for tasks like parsing text files and web interfaces, it has advantages like powerful regular expressions and system command access, and disadvantages like terse hard-to-read code. Examples demonstrate basic Perl syntax and how to write simple programs.
What is Perl ? How is Perl is used ? Why is Perl useful in Biology ? Basics of Perl Simple text file parsing Arrays
By the way: Perl stays for Practical Extraction and Report Language What is Perl. EPFL Bioinformatics II 13 Mar 2006 A Unix shell: standard Unix commands are accessible within Perl code An interpreted programming language: each command is compiled on the fly. A complicated and rich programming language: contains many commands, each command may do many different things at the same time. A powerful but dangerous language: You are allowed to do almost everything you want on your own risk. Typos often lead to unexpected actions rather than compilation errors.
Pros and Cons of Perl EPFL Bioinformatics II 13 Mar 2006 Advantages: Powerful: You can program a complicated task with very few lines of code. It often takes less than an hour to write a Perl script Advanced support for regular expressions (string matching operations). This is especially useful for text file parsing. Easy access to system commands from within Perl Support for web-based applications through CGI interface.
Disadvantages: Code is very concise and usually difficult to read Slow for computationally intensive tasks. Little control over systems resources. Not very transparent: Many trivial tasks like initialization of variables are done automatically, but you dont know exactly how
For what purposes is Perl used in biology EPFL Bioinformatics II 13 Mar 2006 To parse and reformat structured text files, e.g. nucleotide sequence entry files. Web-based program interfaces For implementing computationally inexpensive algorithms For testing computationally expensive algorithms on small data sets For piloting complex data processing pipelines invoking several compiled programs in succession For automating any repetitive simple task For educational purposes: You will not escape writing the Smith- Waterman algorithm in Perl. How is Perl used ? EPFL Bioinformatics II 13 Mar 2006 On a Unix system, write a text file named e.g. myrog.pl. #!/usr/local/bin/perl # print "Hello!\n"; Then make this file executable for you % chmod +x myprog.pl and call it like any Unix command: myprog.pl Note: the character sequence \n represents a new line (line feed) character. Alternatively, Perl commands can also be submitted directly from the UNIX commanc line: % perl -e 'print "hello!\n"' Indicates location of Perl Interpreter on local machine Basics of Perl EPFL Bioinformatics II 13 Mar 2006 Lines starting with # are ignored (comment lines Individual commands are separated by ; Variable names start with a $ Blocks of commands are encompassed by {} Example of a Perl script which computes and prints the square-roots of integers 1 to 20. #!/usr/local/bin/perl # for($i=1; $i <= 10; $i = $i+1) { $sqrt = $i**0.5; print "square-root of $i is $sqrt\n" } Result EPFL Bioinformatics II 13 Mar 2006 #!/usr/local/bin/perl # for($i=1; $i <= 10; $i = $i+1) { $sqrt = $i**0.5; print "square-root of $i is $sqrt\n" } square-root of 1 is 1 square-root of 2 is 1.4142135623731 square-root of 3 is 1.73205080756888 square-root of 4 is 2 square-root of 5 is 2.23606797749979 square-root of 6 is 2.44948974278318 square-root of 7 is 2.64575131106459 square-root of 8 is 2.82842712474619 square-root of 9 is 3 square-root of 10 is 3.16227766016838 Example of a simple text parsing file: EPFL Bioinformatics II 13 Mar 2006 #!/usr/local/bin/perl # $prt = 0; while(<STDIN>) { if(/^ID/) {$text = "$_"; $prt = 0} if(/^OS.*Homo sapiens/) {$prt = 1} if(/^DE/) {$text = $text . "$_"} if(/^\/\// and $prt) {print "$text"} } This script scans a Swiss-Prot sequence library file and prints for each human entry The ID and DE lines. The Swiss-Prot library file is read for the standard intput. The script may be called as follows:
% text_parsing.pl < swissprot.dat Example of a simple text parsing file: EPFL Bioinformatics II 13 Mar 2006 #!/usr/local/bin/perl # $prt = 0; while(<STDIN>) { if(/^ID/) {$text = "$_"; $prt = 0} if(/^OS.*Homo sapiens/) {$prt = 1} if(/^DE/) {$text = $text . "$_"} if(/^\/\// and $prt) {print "$text"} } Explanations: while(expr){block} repeats block as long as expr returns true (1). <STDIN> reads on line from standard intput and stores content in predefined variable $_. /^string/ returns true (1) if string is found at the beginning (^) of $_. $text . "$_" concatenates two character strings \/\/ backslashes are needed to force slashes to interpreted as ordinary characters without specific meaning Arrays: EPFL Bioinformatics II 13 Mar 2006 #!/usr/local/bin/perl # @numbers = ("one", "two", "three", "four"); print scalar(@numbers) . "\n"; print "$numbers[0]\n"; Example: Output: 4 one Notes: Array names start with @ References to array elements start with $ The numbering of array elements starts with 0, as in C. The function scalar returns the size of the array Two-dimensional array: EPFL Bioinformatics II 13 Mar 2006 #!/usr/local/bin/perl # @table = ( ["aa", "ab", "ac"], ["ba" , "bb" , "bc"] ); print scalar(@table) . "\n"; print "$#{@table}\n"; print "$#{$table[0]}\n"; print "$table[1][2]\n"; Example: Output: 2 1 2 bc Note: $#{@array} returns the subscript of the last element of @array.