Professional Documents
Culture Documents
A Perl Tutorial
NLP Course - 2006
What is Perl?
Practical Extraction and Report Language Interpreted Language
Optimized for String Manipulation and File I/O Full support for Regular Expressions
UNIX Cygwin
Put the following in the first line of your script #!/usr/bin/perl Run the script % perl script_name
Basic Syntax
Statements end with semicolon ; Comments start with #
Only single line comments
Variables
You dont have to declare a variable before you access it You don't have to declare a variable's type
Scalar
A single value (string or numerical) Accessed by prefixing an identifier with '$' Assignment with '=' $scalar = expression
Strings
Quoting Strings
With ' (apostrophe) Everything is interpreted literally With " (double quotes) Variables get expanded With ` (backtick) The text is executed as a separate process, and the output of the command is returned as the value of the string
Check 01_printDate.pl
Comparison Operators
String lt gt eq le ge ne cmp Operation less than greater than equal to less than or equal to greater than or equal to not equal to compare, return 1, 0, -1 Arithmetic < > == <= >= != <=>
Logical Operators
Operator ||, or Operation logical or
String Operators
Operator . x .=
$string1 = "potato"; $string2 = "head"; $newstring = $string1 . $string2; #"potatohead" $newerstring = $string1 x 2; #"potatopotato" $string1 .= $string2; #"potatohead" Check concat_input.pl
Perl Functions
Perl functions are identified by their unique names (print, chop, close, etc) Function arguments are supplied as a comma separated list in parenthesis.
The commas are necessary The parentheses are often not Be careful! You can write some nasty and unreadable code this way!
Check 02_unreadable.pl
Lists
Ordered collection of scalars
Zero indexed (first item in position '0') Elements addressed by their positions
List Operators
(): list constructor , : element separator []: take slices (single or multiple element chunks)
List Operations
sort(LIST)
Check 03_listOps.pl
Arrays
A named list
Dynamically allocated, can be saved Zero-indexed Shares list operations, and adds to them
Array Operators
@: reference to the array (or a portion of it, with []) $: reference to an element (used with [])
Array Operations
push(@ARRAY, LIST)
Hash Operators
% : refers to the hash {}: denotes the key $ : the value of the element indexed by the key (used with {})
Hash Operations
keys(%ARRAY)
Arrays Example
#!/usr/bin/perl # Simple List operations # Address an element in the list @stringInstruments = ("violin","viola","cello","bass"); @brass = ("trumpet","horn","trombone","euphonium", "tuba"); $biggestInstrument = $stringInstruments[3]; print("The biggest instrument: ", $biggestInstrument); # Join elements at positions 0, 1, 2 and 4 into a white-space delimited string print("orchestral brass: ", join(" ",@brass[0,1,2,4]), "\n"); @unsorted_num = ('3','5','2','1','4'); @sorted_num = sort( @unsorted_num ); # Sort the list print("Numbers (Sorted, 1-5): ", @sorted_num, "\n"); #Add a few more numbers @numbers_10 = @sorted_num; push(@numbers_10, ('6','7','8','9','10')); print("Numbers (1-10): ", @numbers_10, "\n"); # Remove the last print("Numbers (1-9): ", pop(@numbers_10), "\n"); # Remove the first print("Numbers (2-9): ", shift(@numbers_10), "\n"); # Combine two ops print("Count elements (2-9): ", $#@numbers_10; # scalar( @numbers_10 ), "\n"); print("What's left (numbers 2-9): ", @numbers_10, "\n");
Hashes Example
#!/usr/bin/perl # Simple List operations $player{"clarinet"} = "Susan Bartlett"; $player{"basson"} = "Andrew Vandesteeg"; $player{"flute"} = "Heidi Lawson"; $player{"oboe"} = "Jeanine Hassel"; @woodwinds = keys(%player); @woodwindPlayers = values(%player); # Who plays the oboe? print("Oboe: ", $player{'oboe'}, "\n"); $playerCount = scalar(@woodwindPlayers); while (($instrument, $name) = each(%player)) { print( "$name plays the $instrument\n" ); }
Pattern Matching
A pattern is a sequence of characters to be searched for in a character string
/pattern/
Match operators
=~: tests whether a pattern is matched !~: tests whether patterns is not matched
Patterns
Pattern
/def/
Matches
"define"
Pattern
/d.f/ dif
Matches
/\bdef\b/
/^def/ /^def$/ /de?f/ /d[eE]f/ /d[^eE]f/
a def word
/d.+f/
dabcf
df, daffff deef, deeef deeef deeeeef up to deeef
def in start of /d.*f/ line /de{1,3}f/ def line df, def def, dEf daf, dzf /de{3}f/ /de{3,}f/ /de{0,3}f/
Character Ranges
Escape Pattern Sequence \d [0-9] \D \w \W \s \S [^0-9] [_0-9A-Za-z] [^_0-9A-Za-z] [ \r\t\n\f] [^\r\t\n\f] Description Any digit Anything but a digit Any word character Anything but a word char White-space Anything but white-space
Backreferences
Memorize the matched portion of input Use of parentheses.
/[a-z]+(.)[a-z]+\1[a-z]+/ asd-eeed-sdsa, sd-sss-ws NOT as_eee-dfg
Substitutions
Substitution operator
s/pattern/substitution/options
If $string = "abc123def";
$string =~ s/123/456/
$`
$'
EXAMPLE
$_ = "this is a sample string"; /sa.*le/; # matches "sample" within the string # $` is now "this is a " # $& is now "sample" # $' is now " string" Because these variables are set on each successful match, you should save the values elsewhere if you need them later in the program.
# what about capital P in "please"? if ($question =~ /please/) { print ("Thank you for being polite!\n"); } else { print ("That was not very polite!\n"); }