You are on page 1of 164

i

r
r
(

e
s
r
l
n
n

Welcome
EDITORIAL
Editor Catherine Emma Ellis
Editor-in-chief Graham Barlow
Art editor Efrain Hernandez-Mendoza

MARKETING

Welcome!

Group marketing manager NOT YET APPOINTED


Marketing manager Richard Stephens

PRODUCTION & DISTRIBUTION


Production controller Marie Quilter
Production manager Mark Constance
Printed in the UK by William Gibbons & Sons Ltd
on behalf of Future
Distributed by Seymour Distribution Ltd,
2 East Poultry Avenue, London EC1A 9PT,
Tel: 0207 429 4000
Overseas distribution by Seymour International

CIRCULATION
Trade marketing manager Juliette Winyard
(07551 150 984)

LICENSING
International director Regina Erak
regina.erak@futurenet.com
+44 (0)1225 442244 Fax +44 (0)1225 732275

MANAGEMENT
Content & marketing director Nial Ferguson
Head of content & marketing, technology Nick Merritt
Group editor-in-chief Paul Newman
Group art director Steve Gotobed
Future is an award-winning international media
group and leading digital business. We reach more
than 49 million international consumers a month
and create world-class content and advertising
solutions for passionate consumers online, on tablet
& smartphone and in print.
Future plc is a public
company quoted
on the London
Stock Exchange
(symbol: FUTR).
www.futureplc.com

Chief executive Zillah Byng-Maddick


Non-executive chairman Peter Allen
&KLHIQDQFLDORIFHURichard Haley
Tel +44 (0)207 042 4000 (London)
Tel +44 (0)1225 442 244 (Bath)

All contents copyright 2014 Future Publishing Limited


or published under licence. All rights reserved. No part of
this magazine may be reproduced, stored, transmitted
or used in any way without the prior written permission
of the publisher.
Future Publishing Limited (company number 2008885)
is registered in England and Wales. Registered office:
Registered office: Quay House, The Ambury, Bath, BA1
1UA. All information contained in this publication is for
information only and is, as far as we are aware, correct at
the time of going to press. Future cannot accept any
responsibility for errors or inaccuracies in such
information. You are advised to contact manufacturers
and retailers directly with regard to the price and other
details of products or services referred to in this
publication. Apps and websites mentioned in this
publication are not under our control. We are not
responsible for their contents or any changes or updates
to them.
If you submit unsolicited material to us, you
automatically grant Future a licence to publish your
submission in whole or in part in all editions of the
magazine, including licensed editions worldwide and in
any physical or digital format throughout the world. Any
material you submit is sent at your risk and, although
every care is taken, neither Future nor its employees,
agents or subcontractors shall be liable for loss or
damage.

rogramming is no longer the domain of


the computer scientist. It has become an
essential everyday skill for anyone involved
with computers. Whether thats tinkering with
websites, writing a macro for your home accounts,
or playing around with Minecraft on a Raspberry Pi,
being able to put one line of code after another is
your path to computing liberation.
Coding Academy 2015 offers plenty of help
for beginners, with a whole section dedicated to
essential programming concepts and principles,
plus a selection of projects that turn those
concepts into real tools. You can then use the
same skills to create an interactive online calendar,
or go into Web 3.0 company startup mode with
Ruby on Rails. And if youre looking to expand your
repertoire of programming languages even further,
well give you a tour of everything from C and SDL
to Python and Perl. So what are you waiting for?
Get stuck in, and happy coding!

LINUX is a trademark of Linus Torvalds, GNU/Linux is


abbreviated to Linux throughout for brevity. All other
trademarks are the property of their respective owners.
Where applicable code printed in this magazine is
licensed under the GNU GPL v2 or later. See www.gnu.
org/copyleft/gpl.html.
Disclaimer: All tips in this magazine are used at your own
risk. We accept no liability for any loss of data or damage
to your computer, peripherals or software through the
use of any tips or advice.

Contents

Contents

Code concepts
Types of data ................................................................................................................ 8
More data types..................................................................................................... 10
Abstraction .................................................................................................................. 12
Files and modules................................................................................................ 14
Use an IDE..................................................................................................................... 18
Write a program .................................................................................................... 20
Add features .............................................................................................................. 22
Put it all together .................................................................................................. 24
Data modules........................................................................................................... 26
Data storage.............................................................................................................. 28
Data organisation................................................................................................30
Data encryption..................................................................................................... 32
Spot mistakes..........................................................................................................34

Ruby
Ruby: Master the basics.............................................................................. 38
Ruby: Add a little more polish ............................................................... 42
Ruby: Modules, blocks and gems....................................................49
Ruby on Rails: Web development .................................................... 54
Ruby on Rails: Code testing .................................................................... 58
Ruby on Rails: Site optimisation......................................................... 62

More languages
C and beyond: Code a starfield .......................................................... 68
Scheme: Learn the basics .........................................................................72
Scheme: Recursion............................................................................................76
Scheme: High order procedures ......................................................80
4

Contents

PHP
PHP: Write your first script....................................................................... 86
PHP: Build an online calendar ..............................................................90
PHP: Extend your calendar .....................................................................94
PHP: Get started with MySQL ............................................................. 98
PHP: Do more with MySQL .................................................................. 102

Modern Perl
Modern Perl: Track your reading....................................................108
Modern Perl: Build a web app............................................................. 112
Modern Perl: Adding to our app....................................................... 116

Python
Python: Different types of data ....................................................... 122
Python: Code a system monitor .................................................... 124
Python: Clutter animations .................................................................. 128
Python: Stream video ................................................................................. 132
Python: Code a Gimp plugin ............................................................... 136
Python: Gimp snowflakes ..................................................................... 140
Python: Make a Twitter client............................................................ 144
Minecraft: Start hacking ..........................................................................148
Minecraft: Image wall importing ....................................................150
Minecraft: Make a trebuchet ..............................................................154
Minecraft: Build a cannon ...................................................................... 158

.
.

.
:

n
;

Code concepts

Code concepts
E

very programmer, whether theyre coding for


bare-metal embedded programs or stringing
together website functions, needs to know the
basics. Whether youre a relative beginner or an old
hand, this look at the fundamentals of coding will
strengthen your command of the basic techniques
so you can spend more time being creative.

Types of data ................................................................................................................ 8


More data types..................................................................................................... 10
Abstraction .................................................................................................................. 12
Files and modules................................................................................................ 14
Use an IDE..................................................................................................................... 18
Write a program .................................................................................................... 20
Add features .............................................................................................................. 22
Put it all together .................................................................................................. 24
Data modules........................................................................................................... 26
Data storage.............................................................................................................. 28
Data organisation................................................................................................30
Data encryption..................................................................................................... 32
Spot mistakes..........................................................................................................34

Code concepts

Code concepts:
Types of data
Functions tell programs how to work, but its data that they operate
on. Jonathan Roberts explains the basics of data in Python.

n this article, well be covering the basic data types in


Python and the concepts that accompany them. In later
articles, well look at a few more advanced topics that
build on what we do here: data abstraction (p12), fancy
structures such as trees, and more.

What is data?

While were
looking only at
basic data types,
in real programs
getting the
wrong type can
cause problems,
in which case
youll see
a TypeError.

In the world, and in the programs that well write, theres an


amazing variety of different types of data. In a mortgage
calculator, for example, the value of the mortgage, the interest
rate and the term of the loan are all types of data; in a
shopping list program, there are all the different types of food
and the list that stores them each of which was its own kind
of data.
The computers world is a lot more limited. It doesnt know
the difference between all these data types, but that doesnt
stop it from working with them. The computer has a few basic
ones it can work with, and that you have to use creatively to
represent all the variety in the world.
Well begin by highlighting three data types. First, we have
numbers 10, 3 and 2580 are all examples of these.
In particular, these are ints, or integers. Python knows about
other types of numbers, too, including longs (long integers),
floats (such as 10.35 or 0.8413) and complex (complex
numbers). There are also strings, such as Hello World,
Banana and Pizza. These are identified as a sequence of
characters enclosed within quotation marks. You can use
either double or single quotes. Finally, there are lists, such as
[Bananas, Oranges, Fish]. In some ways, these are like a

string, in that they are a sequence. What makes them


different is that the elements that make up a list can be of any
type. In this example, the elements are all strings, but you
could create another list that mixes different types, such as
[Bananas, 10, a]. Lists are identified by the square brackets
that enclose them, and each item or element within them is
separated by a comma.

Working with data


There are lots of things you can do with the different types of
data in Python. For instance, you can add, subtract, divide and
multiply two numbers and Python will return the result:
>>> 23 + 42
65
>>> 22 / 11
2
If you combine different types of numbers, such as an int
and a float, the value returned by Python will be of whatever
type retains the most detail: that is to say, if you add an int
and a float, the returned value will be a float.
You can test this by using the type() function. It returns
the type of whatever argument you pass to it.
>>> type(8)
<type int>
>>> type(23.01)
<type float>
>>> type(8 + 23.01)
<type float>
You can also use the same operations on strings and lists,
but they have different effects. The + operator concatenates
(combines together) two strings or two lists, while the *
operator repeats the contents of the string or list.
>>> Hello + World
Hello World
>>> [Apples] * 2
[Apples, Apples]
Strings and lists also have their own special set of
operations, including slices. These let you select a particular
part of the sequence by its numerical index, which begins
from 0.
>>> word = Hello
>>> word[0]
H
>>> word[3]
l
>>> list = [banana, cake, tiffin]
>>> list[2]
tiffin
Indexes work in reverse, too. If you want to reference the last

Code concepts
element of a list or the last character in a string, you can use
the same notation with a -1 as the index. -2 will reference the
second-to-last character, -3 the third, etc. Note that when
working backwards, the indexes dont start at 0.

Methods
Lists and strings also have a range of other special
operations, each unique to that particular type. These are
known as methods. Theyre similar to functions such as
type() in that they perform a procedure. What makes them
different is that theyre associated with a particular piece of
data, and hence have a different syntax for execution.
For example, among the list types methods are append
and insert.
>>> list.append(chicken)
>>> list
[banana, cake, tiffin, chicken]
>>> list.insert(1, pasta)
>>> list
[banana, pasta, cake, tiffin, chicken]
As you can see, a method is invoked by placing a period
between the piece of data that youre applying the method to
and the name of the method. Then you pass any arguments
between round brackets, just as you would with a normal
function. It works the same with strings and any other data
object, too:
>>> word = HELLO
>>> word.lower()
hello
There are lots of different methods that can be applied to
lists and strings, and to tuples and dictionaries (which were
about to look at). To see the order of the arguments and the
full range of methods available, youll need to consult the
Python documentation.

Variables
In the previous examples, we used the idea of variables to
make it easier to work with our data. Variables are a way to
name different values different pieces of data. They make it
easy to manage all the bits of data youre working with, and
greatly reduce the complexity of development (when you use
sensible names).
As we saw above, in Python you create a new variable with
an assignment statement. First comes the name of the
variable, then a single equals sign, followed by the piece of
data that you want to assign to that variable.
From that point on, whenever you use the name assigned
to the variable, youre referring to the data that you assigned
to it. In the examples, we saw this in action when we
referenced the second character in a string or the third
element in a list by appending index notation to the variable
name. You can also see this in action if you apply the type()
function to a variable name:
>>> type(word)
<type str>
>>> type(list)
<type list>

Other data types


There are two other common types of data that are used by
Python: tuples and dictionaries.
Tuples are very similar to lists: theyre a sequence data
type, and they can contain elements of mixed types. The big
difference is that tuples are immutable that is to say, once
you create a tuple you cannot change it, and that tuples are

identified by round brackets, as opposed to square brackets:


(bananas, tiffin, cereal). Dictionaries are similar to a list or
a tuple in that they contain a collection of related items. They
differ in that the elements arent indexed by numbers, but by
keys and are created with curly brackets: {}. Its quite like an
English language dictionary. The key is the word that youre
looking up, and the value is the definition of the word.
With Python dictionaries, however, you can use any
immutable data type as the key (strings are immutable, too),
so long as its unique within that dictionary. If you try to use
an already existing key, its previous association is forgotten
completely and that data lost forever.
>>> english = {free: as in beer, linux: operating system}
>>> english[free]
as in beer
>>> english[free] = as in liberty
>>> english[free]
as in liberty

The Python
interpreter is a
great place to
experiment with
Python code and
see how different
data types work
together.

Looping sequences
One common operation that you may want to perform on any
of the sequence types is looping over their contents to apply
an operation to every element contained within. Consider this
small Python program:
list = [banana, tiffin, burrito]
for item in list:
print item
First, we created the list as we would normally, then we
used the for in construct to perform the print function
on each item in the list. The second word in that construct
doesnt have to be item, thats just a variable name that
gets assigned temporarily to each element contained within
the sequence specified at the end. We could just as well
have written for letter in word and it would have worked
just as well.
Thats all we have time to cover in this article, but with the
basic data types covered, well be ready to look at how you
can put this knowledge to use when modelling real-world
problems in later articles.
In the meantime, read the Python documentation to
become familiar with some of the other methods that it
provides for the data types weve looked at before. Youll find
lots of useful tools, such as sort and reverse! Q
9

Code concepts

Code concepts:
More data types
Learn how different types of data come together to solve a real
problem as Jonathan Roberts counts some words.

n the first Code Concepts tutorial, we introduced Pythons


most common data types: numbers (ints and floats),
strings, lists, tuples and dictionaries. We demonstrated
how they work with different operators and a few of their
most useful methods. We didnt, however, give much insight
into how they might be used in real situations. In this article,
were going to fix that.
Were going to write a short program that counts the
number of times each unique word occurs in a text file.
Punctuation marks will be excluded, and if the same word
occurs but in different cases (eg, the and The), they will be
taken to represent a single word. Finally, the program will print
the results to the screen. It should look like this:
the: 123
you: 10
a: 600
...
As an example, well be using The Time Machine, by HG
Wells, which you can download from Project Gutenberg,
saving it in the same folder as your Python file under the
name timemachine.txt.
As the program description suggests, the first thing well
need to do is make the text accessible from inside our Python
program. This is done with the open() function:

Hardly
surprisingly,
our counting
program, after
being sorted,
finds the to
be the most
common word
in The Time
Machine, by
HG Wells.

10

tm = open(timemachine.txt, r)
In this example, open() is passed two variables. The first is
the name of the file to open; if it were in a different directory
from the Python script, the entire path would have to be
given. The second argument specifies which mode the file
should be opened in: r stands for read, but you can also use
w for write or rw for read-write.
Notice weve also assigned the file to a variable, tm, so we
can refer to it later in the program.
With a reference to the file created, we also need a way to
access its contents. There are several ways to do this, but
today well be using a for in loop. To see how this works,
try opening timemachine.txt in the interactive interpreter
and then typing:
>>> for line in tm:
print line
...
The result should be every line of the file printed to the
screen. By putting this code in to a .py file, say cw.py, weve
got the start of our Python program.

Cleaning up
The program description also specified that we should
exclude punctuation marks, consider the same word but in
different cases as one, and that were counting individual
words, not lines! As it stands, weve been able to read only
entire lines as strings, however, with punctuation, strange
whitespace characters (such as \r\n) and different
cases intact.
Looking at the Python string documentation (http://
docs.python.org/library), we can see that there are four
methods that can help us convert line strings into a format
closer to that specified by the description: strip(),
translate(), lower() and split().
Each of these are methods, and as such theyre functions
that are applied to particular strings using the dot notation.
For example, strip(), which removes specified characters
from the beginning and end of a string, is used like this:
>>> line.strip()
When passed with no arguments, it removes all
whitespace characters, which is one of the jobs we needed to
get done.
The function translate() is a method that can be used for
removing a set of characters, such as all punctuation marks,
from a string. To use it in this capacity, it needs to be passed
two arguments, the first being None and the second being
the list of characters to be deleted.
>>> line.translate(None, !#$%&\()*+,-./:;<=>?@[\\]^_`{|}~)
lower() speaks for itself, really: it converts every character in

Code concepts
a string to lower-case. split() splits distinct elements inside
a string into separate strings, returning them as a list.
By passing an argument to split(), its possible to specify
which character identifies the end of one element and the
start of another.
>>> line.split( )
In this example, weve passed a single space as the
character to split the string around. With all punctuation
removed, this will create a list, with each word in the string
stored as a separate element.
Put all of this in the Python file we started working on
earlier, inside the for loop, and weve made considerable
progress. It should now look like this:
tm = open(timemachine.txt, r)
for line in tm:
line = line.strip()
line = line.translate(None, !#$%&\()*+,-./:;<=>?@
[\\]^_`{|}~)
line = line.lower()
list = line.split( )
Because all of the string methods return a new, modified
string, rather than operating on the existing string, weve
re-assigned the line variable in each line to store the work of
the previous step.

Uniqueness
Phew, look at all that work weve just done with data! By using
the string methods, weve been able to remove all the bits of
data that we werent interested in. Weve also split one large
string, representing a line, into smaller chunks by converting
it to a list, and in the process gotten at the exact, abstract
concept were most interested in: words.
Our stunning progress aside, theres still work to be done.
We now need a way to identify which words are unique and
not just in this line, but in every line contained within the
entire file!
The first thing that should pop in to your head when
thinking about uniqueness is of a dictionary, the key-value
store we saw in the first article (p8). It doesnt allow duplicate
keys, so by entering each word as a key within a dictionary,
were guaranteed there wont be any duplicates.
Whats more, we can use the value to store the number of
times each word has occurred, incrementing it as the
program comes across new instances of each key.
Start by creating the dictionary, and ensuring that it
persists for the entire file not just a single line by placing
this line before the start of the for loop:
dict = {}
This creates an empty dictionary, ready to receive
our words.
Next, we need to think about a way to get each word in to
the dictionary. As we saw last time, ordinarily a simple
assignment statement would be enough to add a new word
to the dictionary. We could then iterate over the list we
created above (using another for loop), adding each entry to
the dictionary with a value of 1 (to represent that it has
occurred once in the file).
for word in list:
dict[word] = 1
But remember, if the key already exists the old value is
overwritten and the count will be reset. To get around this, we
can place an if-else clause inside the loop:
if word in dict:
count = dict[word]
count += 1

dict[word] = count
else:
dict[word] = 1
This is a bit confusing because dict[word] is being used in
two different ways. In the second line, it returns the value and
assigns it to the variable count, while in the fourth and
seventh lines, count and 1 are assigned to that keys
value, respectively.
Notice, too, that if a word is already in the dictionary, we
increment the count by 1, representing another occurrence.

Pythons
Standard Library
reference, http://
docs.python.
org/library, is an
invaluable source
for discovering
what methods
are available and
how to use them.

Putting it together
Another data type wrestled with, another step closer to our
goal. At this point, all thats left to do is insert some code to
print the dictionary and put it all together and run the
program. The print section should look like this and be at the
very end of the file, outside of the line-looping code.
for word,count in dict.iteritems():
print word + : + str(count)
This for loop looks different to what youve seen before.
By using the iteritems method of the dictionary, we can
access both the key (word) and value (count) in a single loop.
Whats more, weve had to use the str() function to convert
count, an integer, into a string, as the + operator cant
concatenate an integer and a string.
Try running it, and you should see your terminal screen
filled with lines like:
...
other: 20
sick: 2
ventilating: 2
...

Data everywhere!
Thats all we planned to achieve in this particular tutorial and
its actually turned out to be quite a lot. As well as having a
chance to see how several different types of data and their
methods can be applied to solve a real problem, we hope
youve noticed how important it is to select the appropriate
type for representing different abstract concepts.
For example, we started off with a single string
representing an entire line, and we eventually split this into
a list of individual strings representing single words. This
made sense until we wanted to consider unique instances, at
which point we put everything into a dictionary.
As a further programming exercise, why not look into
sorting the resulting dictionary in order to see which words
occur most often? You might also want to consider writing
the result to a file, one entry on a line, to save the fruits of
your labour. Q
11

Code concepts

Code concepts:
Abstraction
Jonathan Roberts shows you how creating abstractions can
make code more reliable and easier to maintain.

n the first couple of Code Concepts tutorials, weve been


looking at data. First, we introduced some of Pythons
core data types, and then we demonstrated how they can
be put to use when solving a real problem. The next datarelated topic we want to consider is abstraction, but before
we get on to that, were first going to look at abstraction in
general and as it applies to procedures. So, this time well take
a brief hiatus from data, before returning to it in a later article.

Square roots
To get our heads around the concept of abstraction, lets start
by thinking about square roots and different techniques for
finding them. One of these was discovered by Newton, and is
thus known as Newtons method.
It says that when trying to find the square root of number
(x), we should start with a guess (y) of its square root. We can
then improve that by averaging our guess (y) with the result
of dividing the number (x) by our guess (y). As we repeat this
procedure, we get closer and closer to the square root. In
most attempts, well never reach a definite result, well only
make our guess more and more accurate. Eventually, well
reach a level of accuracy that is good enough for our needs
and give up. Just to be clear about whats involved, take a look
at the table below for how you would apply this method to
find the square root of 2 (eg, x).
Its a lot of work just to
find the square root of
a number! Imagine if when
you were in school, every
time you had to find a square
root you had to do all these
steps manually. For instance,
solving problems involving Pythagoras theorem would be
much more unwieldly.
Luckily, assuming you were allowed calculators at school,
theres another, much simpler method to find square roots.
Calculators come with a button marked with the square root
symbol, and all you have to do is press this button once

what could be easier? This second approach is whats known


as an abstraction. When working on problems, such as those
involving Pythagoras theorem, we dont care how to calculate
the square root, only that we can do it and get the correct
result. We can treat the square root button on our calculator
as a black box we never look inside it and we dont know
how it does what it does, all that matters is we know how to
use it and that it gives the correct result.
Abstraction a very powerful technique that makes
programming a lot easier, as it helps us to manage
complexity. To demonstrate how abstraction can help,
consider this Python code for finding the longest side of a
right-angled triangle:
import math
def pythag(a, b):
a2b2 = (a * a) + (b * b)
guess = 1.0
while (math.fabs((guess * guess) - a2b2) > 0.01):
guess = (((a2b2 / guess) + guess) / 2)
return guess
The first thing to note is that its not in the least bit
readable. Sure, with a piece of code this short, you can read
through it reasonably quickly and figure out whats going on,
but at a glance its not obvious, and if it were longer and
written like this, youd have a terrible time figuring out what
on Earth it was doing. Whats
more, it would be very
difficult to test the different
parts of this code as you go
along (aka incremental
development, vital for
building robust software).
For instance, how would you break out the code for testing
whether or not a guess is close enough to the actual result
(if you can even identify it), or the code for improving
a guess, to check that it works? What if this function didnt
return the expected results, how would you start testing all
the different parts to find where the error was?
Finally, theres code in here that could be reused in other
functions, such as that for squaring a number, for taking an
average of two numbers, and even for finding the square root
of a number, but none of it is reusable because of the way its
written. You could type it all out again, or copy and paste it,
but the more typing you have to do, the more obscure code
you have to copy and paste, and the more likely mistakes are
to make it in to your programming.
Lets try writing that code again, this time coming up
with some abstractions to fix the problems listed above.
We havent listed the contents of each new function weve

This is a very powerful


technique that makes
programming easier.

Finding a square root


Guess (y) Division (x/y)

12

Average (((x/y) + y)/2)

2/1 = 2

(2 + 1)/2 = 1.5

1.5

2/1.5 = 1.33

(1.33 + 1.5)/2 = 1.4167

1.4167

2/1.4167 = 1.4118

(1.4118 + 1.4167)/2 = 1.4142

Code concepts
created, leaving them for you to fill in.
import math:
def square(x):
...
def closeEnough(x, guess):
...
def improveGuess(x, guess):
...
def sqrt(x, guess):
...
def pythag(a, b):
a2b2 = square(a) + square(b)
return sqrt(a2b2)
Here, weve split the code in to several smaller functions,
each of which fulfils a particular role. This has many benefits.
For starters, how much easier is the pythag() function to
read? In the first line, you can see clearly that a2b2 is the
result of squaring two numbers, and everything below that
has been consolidated in to a single function call, the purpose
of which is also obvious.
Whats more, because each part of the code has been split
into a different function, we can easily test it. For example,
testing whether improveGuess() was doing the right thing
would be very easy: come up with a few values for x and
guess, do the improvement by hand, and then compare your
results with those returned by the function.
If pythag() itself was found not to return the correct
result, we could quickly test all these auxiliary functions to
narrow down where the bug was.
And, of course, we can easily reuse any of these new
functions. If you were finding the square root of a number in
a different function, for instance, you could just call the sqrt()
function: six characters instead of four lines means theres far
less opportunity to make mistakes.
One final point: because our sqrt code is now abstracted,
we could change the implementation completely, but so long

This code can be


improved by taking
advantage of scope.
as we kept the function call and arguments the same, all code
that relies on it would continue to work properly.
This means that if you come across a much more efficient
way of calculating square roots, youre not stuck with working
through thousands of lines of code, manually changing every
section that finds a square root: you do it once, and its
done everywhere. This code can be improved still further by
taking advantage of scope see http://bit.ly/1CQS7xl for

Abstraction

Java
C
Assembler
Object code
There are layers of abstraction underneath everything
you do on a PC you just dont often think of them.

more details. closeEnough() and improveGuess() are


particular to the sqrt() function that is to say, other
functions are unlikely to rely on their services. To help keep
our code clean, and make the relationship between these
functions and sqrt() clear, we can place their definitions
inside the definition of sqrt():
def sqrt(x, guess):
def closeEnough(x, guess):
...
def improveGuess(x, guess):
...
...
These functions are now visible only to code within the
sqrt() definition we say theyre in the scope of sqrt().
Anything outside of it has no idea that they even exist. This
way, if we later need to define similar functions for improving
a guess in a different context, we wont face the issue of
colliding names or the headache of figuring out what
improveGuess1() and improveGuess2() do.

Our final code


for finding the
longest side of
a triangle is
longer than
what we had to
start, but its
more readable,
more robust, and
generally better!

Layers of abstraction
Hopefully, this example has demonstrated how powerful
a technique abstraction is. Bear in mind that there are many
layers of abstraction present in everything you do on
a computer that you never think of.
For instance, when youre programming do you know how
Python represents integers in the computers memory? Or
how the CPU performs arithmetic operations such as
addition and subtraction?
The answer is probably no. You just accept the fact that
typing 2 + 3 in to the Python interpreter returns the correct
result, and you never have to worry about how it does this.
You treat it as a black box.
Think how much longer it would take you to program if
you had to manually take care of what data went in which
memory location, to work with binary numbers, and translate
alphabetic characters in to their numeric representations
thank goodness for abstraction! Q
13

THE BEST
Z97 MOBO?

AMAZING DUAL
GPU LAPTOP

ASUS MAXIMUS
VII FORMULA RATED!

17-INCH AORUS X7
V2 ON TEST

4K SCREENS
ON A BUDGET

DOUBLE DISPLAY

REMOTE LOGIN
ACCESS A WINDOWS PC
ANYWHERE ON EARTH

IIYAMA B2888
UHSU REVIEWED

HOW TO MANAGE YOUR


MULTI-MONITOR SETUP

8-CORE SENSATION!

FASTEST
EVER CPU

MAKE THE START MENU


EVEN MORE POWERFUL

s for
Includes guide
well!
Windows 8 as

100

ISSUE 297/NOVEMBER 2014

SUPER START

WINDOWS

SECRETS

New Intel Haswell-E chip


redefines performance

UNLOCKED

EXCLUSIVE Inside Intel's


game-changing 5960X
TESTED The new CPU,
X99 mobos and RAM
PLUS How to overclock
Haswell-E to 4.4GHz

Discover new ways to speed up your PC,


[SUREOHPVVDYHWLPHDQGPXFKPRUH
6(8,<3(/

LOCK DOWN YOUR


DOCUMENTS
7KHVLPSOHJXLGHWRNHHSLQJ\RXU
folders safe and password protected

BUILD IT!

A FULL HD
GAMING PC
FOR 468

FREE DISC!

2015 GAMES

MED
DIA

767'

PACKED WITH
40 ESSENTIAL APPS

PREVIEW

BEST FOR
ALL NEW PCS

DO ALL THIS AND MORE


%RRVW\RXUKDUGGULYH
5HPRYHPDOZDUH
*HWEHWWHUSULQWV
Turn to p98 now!

16 games worth getting


excited about right now
WITCHER 3

STREA
AM
YOU
UR
6HQGPXVLLFYLG
GHR
DQGSK
KR
KRWRV
DURXQG
G\RXU
U
KRP
PH
PH

ARKHAM KNIGHTS

PLUS THE ULTIMATE


PC BUYER'S GUIDE

Windows tutorials

We help you to buy


the right laptop or
WDEOHWZLWKFRQGHQFH

New things to do

Buying advice

Help & support

100% jargon free

SAVE UP TO 45%

SAVE UP TO 45%

SAVE UP TO 40%

SAVE UP TO 70%

SAVE UP TO 50%

FROM 25.49

FROM 23.49

FROM 25.49

FROM 15.99

FROM 23.49

INCLUDES DVD VIDEO | SAMPLES | TUTORIALS


Issue 283

Technique and technology for making music

FROM

STUDIO
TO

STAGE

The essential guide to setting


up an electronic live show

TECHNIQUE

FUTURE
BASS

Create powerful,
complex synth
bass in your DAW
DVD missing?
Ask your vendor

TESTED
Roland System-1

ON STAGE WITH

FACTORY
FLOOR
See the set-up behind the
UKs nest electronic live act

SAVE UP TO 40%

SAVE UP TO 55%

SAVE UP TO 40%

SAVE UP TO 35%

SAVE UP TO 50%

SAVE UP TO 50%

SAVE UP TO 40%

FROM 26.49

FROM 12.99

FROM 17.99

FROM 22.49

FROM 20.99

FROM 21.49

FROM 25.49

2 easy ways to order


/Z501

Or call us on 0844 848 2852


quote Z501

Lines open Mon to Fri 8am 9.30pm


and Sat 8am 4pm

Savings compared to buying 2 years worth of full priced issues from UK newsstand. This offer is for new print subscribers only. You will receive 13 issues in a year. Full details of the Direct Debit guarDQWHHDUHDYDLODEOHXSRQUHTXHVW,I\RXDUHGLVVDWLVHGLQDQ\ZD\\RXFDQZULWHWRXVRUFDOOXVWRFDQFHO\RXUVXEVFULSWLRQDWDQ\WLPHDQGZHZLOOUHIXQG\RXIRUDOOXQPDLOHGLVVXHV3ULFHVFRUUHFWDW
point of print and subject to change. For full terms and conditions please visit: myfavm.ag/magterms Offer ends: 31st January 2015

0U[YVK\JPUN H NSVIHS [LJO IYHUK


[OH[ WYVTPZLZ [V JOHUNL [OL ^H`
`V\ JVUZ\TL [LJOUVSVN`

*V]LYPUN
UN [OL
SH[LZ[ UL^
[LJOUVSVN`
VSVN`


  

PU KLW[O
KL
PU KL[HPS
KL
M\SS` [LZ[LK
NLHY

Code concepts

Code concepts:
Files and modules
Graham Morrison expands your library of functions and grabs
external data with just two lines of Python.

or the majority of programming projects, you dont get


far before facing the age-old problem of how to get
data into and out of your application. Whether its using
punched cards to get patterns into a 19th century Jacquard
textile loom, or Googles robots skimming websites for data
to feed its search engine, dealing with external input is as
fundamental as programming itself.
And its a problem and a concept you may be more
familiar with on the command line. When you type ls to list
the contents of the current
directory, for example, the
command is reading in the
contents of a file, the current
directory, and outputting
the contents to another,
the terminal.
Of course, the inputs and outputs arent files in the sense
most people would recognise, but thats the way the Linux
filesystem has been designed nearly everything is a file.
This helps when you want to save the output of a command,
or use that output as the input to another.
You may already know that typing ls >list.txt will redirect
the output from the command to a file called list.txt, but you
can take this much further because the output can be treated
exactly like a file. ls | sort -r will pipe (thats the vertical bar
character) the output of ls into the input of sort to create

a reversed alphabetical list of a folders contents. The


complexity of how data input and output can be
accomplished is entirely down to your programming
environment. Every language will include functions to load
and save data, for instance, but this can either be difficult or
easy depending on how many assumptions the language is
willing to make on your behalf. However, theres always a
logical sequence of events that need to occur.
You will first need to open a file, creating one if it doesnt
exist, and then either read
data from this file, or write
data to it, before explicitly
closing the file again so that
other processes can use it.
Most languages require
you to specify a read-mode
when you open a file, as this tells the filesystem whether to
expect file modifications or not. This is important because
many different processes may also want to access the file,
and if the filesystem knows the file is being changed, it wont
usually allow access. However, many processes can access
a read-only file without worrying about the integrity of the
data it holds, because nothing is able to change it. If you are
familiar with databases, its the same kind of problem you
face with multiple users accessing the same table.
In Python, as with most other languages, opening a file to
write or as read-only can be done with a single line:
>>> f = open(list.txt, r)
If the file doesnt exist, Python will generate a No such file
or directory error. To avoid this, weve used the output from
our command line example to create a text file called list.txt.
This is within the folder from where we launched the
Python interpreter.

If the filesystem knows


the file is being changed,
it wont allow access.

When you
read a file, most
languages will
step through its
data from the
beginning to the
end in chunks
you specify. In
this example,
were reading a
line at a time.

Environment variables
Dealing with paths, folders and file locations can quickly
become complicated, and its one of the more tedious issues
youll face with your own projects. Youll find that different
environments have different solutions for finding files,
with some creating keywords for common locations and
others leaving it to the programmer. This isnt so bad when
you only deal with files created by your projects, but it
becomes difficult when you need to know where to store a
configuration file or load a default icon. These locations may
be different depending on your Linux distribution or desktop,
but with a cross-platform language such as Python, theyll
also be different for each operating system. For that reason,
you might want to consider using environment variables.
These are similar to variables with a global scope in many

16

Code concepts
programming languages, but they apply to any one users
Linux session rather than within your own code. If you type
env on the command line, for instance, youll see a list of the
environmental variables currently set for your terminal
session. Look closely, and youll see a few that apply to default
locations and, most importantly, one called HOME. The value
assigned to this environmental variable will be the location of
your home folder on your Linux system, and if we want to use
this within our Python script, we first need to add a line to
import the operating system-specific module. The line to do
this is:
import os
This command is also opening a file, but not in the same
way we opened list.txt. This file is known as a module in
Python terms, and modules like this import functionality,
including statements and definitions, so that a programmer
doesnt have to keep re-inventing the wheel.
Modules extend the simple constructs of a language to
add portable shortcuts and solutions, which is why other
languages might call them libraries. Libraries and modules
are a little like copying and pasting someones own research
and insight into your own project. Only its better than that,
because modules such as os are used by everyone, turning
the way they do things into a standard.

Setting the standard


There are even libraries called std, and these embed standard
ways of doing many things a language doesnt provide by
default, such as common mathematical functions, data types
and string services, as well as file input/output and support
for specific file types. You will find the documentation for
what a library does within an API. This will list each function,
what it does, and what it requires as an input and an output.
You should also be able to find the source files used by the
import (and by #include in other languages). On most Linux
systems, for example, /lib/python2.x will include all the
modules. If you load os.py into a text editor, youll see the
code youve just added to your project, as well as which
functions are now accessible to you.
There are many, many different modules for Python its
one of the best reasons to choose it over any other language,
and more can usually be installed with just a couple of clicks
from your package manager.
But this is where the ugly
spectre of dependencies can
start to have an effect on your
project, because if you want
to give your code to someone
else, you need to make sure
that person has also got the
same modules installed.
If you were programming in C or C++, where your code
is compiled and linked against binary libraries, those binary
libraries will also need to be present on any other system that
runs your code. They will become dependencies for your
project, which is what package managers do when you install
a complex package.

To see what we mean, add the following piece of code to


your project:
f = open(os.environ[HOME]+/list.txt,r)
This line will open the file list.txt in your home folder.
Python knows which home folder is yours, because the
os.environ function from the os module returns a string
from an environmental variable, and the one weve asked it
to return is HOME. But all weve done is open the file, weve
not yet read any of its contents. This might seem counterintuitive, but its an historical throwback to the way that files
used to be stored, which is why this is also the way nearly all
languages work. Its only after a file has been opened that you
can start to read its contents:
f.readline()
The above instruction will read a single line of the text
file and output this to the interpreter as a string. Repeating
the command will read the next line, because Python is
remembering how far through the file it has read. Internally,
this is being done using something called a pointer, and this
too is common to the vast majority of languages.
Alternatively, if you wanted to read the entire file, you
could use f.read(). As our
file contains only text,
copying the contents to
a Python string is an easy
conversion. The same isnt
true of a binary file. Rather
than being treated as text,
the organisation of the bits and bytes that make up a binary
file are organised according to the file type used by the file
or no file type at all if its raw data. As a result, Python (or any
other language) would be unable to extract any context from
a binary file, causing an error if you try to read it into a string.
The solution, at least for the initial input, is to add a b flag
when you first open the file, as this warns Python to expect
raw binary. When you then try to read the input, youll see the
hexadecimal values of the file output to the display.
To make this data useful, youll need to do some extra
work, which well look at next time; but first, make sure you
close the open file, as this should ensure the integrity of your
filesystem. As you might guess, the command looks like this:
f.close
and its as easy as that! Q

Binary files
have no context
without an
associated file
type and a way
of handling them,
which is why you
get the raw data
output when you
read one.

If you load os.py into a


text editor youll see the
code youve just added.

The os module
Getting back to our project, the os module is designed to
provide a portable way of accessing operating systemdependent functionality so that you can write multi-platform
applications without worrying about where files should be
placed. This includes knowing where your home directory
might be.

17

Code concepts

Code concepts:
Use an IDE
Lazy Graham Morrison explains why its never too early to start
using a development environment.

ts a good idea to use an editor with syntax highlighting


when writing your code. Using an editor such as Kate or
Gedit, with your language selected for highlighting, marks
all the different elements within your code a different colour.
Theres a good reason why this makes your life easier
when you are new to a language, syntax highlighting will
help you to see easily when an element isnt recognised, or
when parenthesis is broken, or when youve made a simple
formatting error. But this is a solution that also scales many
experts will also use highlighting because it means they have
one less thing to worry about, especially if youre hammering
out code faster than a famished Richard Stallman. Which is
maybe why Stallmans Emacs editor has got some excellent
syntax highlighting of its own.

Why use an IDE?


Syntax highlighting is just as useful for beginners as it is for
experts because it gives you less to think about and more
time to code, which is why even when youre a beginner
its worth finding an editing
environment you can grow
into, and one that will
accommodate your projects
as they get larger while you
learn. Text editors are great

for editing single-file projects using an interpreted language,


but can become cumbersome when projects get bigger. This
is where Integrated Development Environments can help take
the strain. Not only will they manage the various files within
a project, theyll also manage how those files are built into
a single executable, as well as how functions and objects from
one file can be used within another.
This might not be particularly applicable to Python, but
most IDEs work in the same way, so you can take your skills
with you when you move to a different environment. And to
get a better understanding of how capable IDEs can be, even
for the beginner, were going to cover a few of their essential
functions and how you can start using them for your projects.

Down in Komodo
In almost all the examples weve used in this series of code
concept guides, weve used Python to illustrate the ideas and
concepts covered in the text. Weve even mentioned some
of the Python IDEs available, such as Eric, but weve never
covered any alternative, or how
you use any of them with the
language, and how similar
functions are available for other
IDEs for other languages. There
are many Python IDEs, but
perhaps because of its crossplatform credentials, some of the more popular ones are
commercial. This isnt ideal when youre starting out, so were
going to forgo the commercial options and look at a free
alternative. The one weve chosen is called Komodo Edit.
Its open source and fairly comprehensive, but its also the
little brother of a closed source commercial version called
Komodo IDE, so your skills will be transferable if you need
a more comprehensive solution.
Installation from the download is as easy as untarring the
file and running ./install.sh in the new directory. Most
distributions will now show a shortcut to the IDE on your
desktop, or you can run the bin/komodo executable from
your home directory. Your first view of this application might
be a little overwhelming, as the default configuration offers
a large news pane at the top, mostly containing an
advertisement for the commercial version. But spend a few
moments familiarising yourself, and it will soon feel like home.
The two panels on the left, for instance, contain a simple file
manager at the top and a project view at the bottom.
A project is what an IDE calls the glut of code, configuration
and IDE files that come together to create a single application
or project. You can create a new project by clicking on the
small symbol to the right, and when youve created a new

Syntax highlighting is
as useful for beginners
as it is for experts.

Syntax highlighting and code completion are two of the best reasons for using
an integrated development environment such as Komodo.

18

Code concepts
project, you can drag source files onto the project name, or
right-click on it, to add new files. Wed recommend starting
with a new Python 3-derived template. The New File dialog
allows you to choose between many different languages
supported by Komodo, but all this really does is pre-define an
environmental variable at the top of the file and make sure
the file extension is correct.

Code constructions
With a fresh file ready for your Python code, well now give
some examples of how an IDE will help with code
constructions. Taking a cue from previous tutorials in the
Code Concepts series, type import. As weve covered
previously, this is the command to add extra functionality to
Python by importing code from other modules or libraries.
When using a simple text editor, you had to know the exact
name of the module you wanted to import. With Komodo,
youll be presented with a list of modules that are already
installed, and you just have to choose the one youre after.
Choose math to add the mathematical functions. Now add
the following code to the file:
def square(x):
return x * x
As you might remember, this is a super-simple function
that returns the square of x, and you should have found
typing those two lines easier than with a text editor. The tab
would have been added automatically, for example, and now
when you type print square (10) on a separate line, Komodo
already knows about your new square function and prompts
you to include a value within its brackets. Unfortunately,
Komodo Edit doesnt integrate the running functionality
within the application, which means you need to run your
scripts semi-manually. Pressing Ctrl+R or selecting Tools >
Run Command from the menu opens a small dialog, and into
this you need to type
%(python) %F. All this is
doing is replacing %(python)
with the name of the default
Python executable, as defined
within the Preferences panel,
and %F with the full path to
the script youre currently editing. Running this command will
attempt to execute the script, printing any output into a new
panel that appears below the editor. If there are any errors,
theyll also appear in the Command Output panel and you
can click on the errors to force the editor to jump to their
position within the file youre editing.

Komodo includes many templates for starting a project, but if you find
yourself with the same setup each time, you can also create your own.

stepped through explains how the majority of IDEs work.


They act mostly as an advanced editor sitting on top of the
tools that run the code or build the binaries.
Now weve executed the run command, we can add it
to Komodos toolbox, and from there create a keybinding to
run the same command when we need to run our scripts.
From the Run dialog, enable the Add to Toolbox checkbox.
A panel on the right will appear, complete with the Python
command weve been using to execute our code. Right-click
this and select Properties. From here, you can rename the
command to something slightly friendlier and use the Key
Binding page to assign a keyboard shortcut to the function.
Back at the code face, and to
explore Komodo Edit further,
start another function called
def square_root(x):, and for
the code within the function,
type math. What youll then
see is the list of functions
provided by the math module we imported with the previous
command. Select math.sqrt and as soon as you add the
first (, youll see a small pop-up box that informs you of
exactly what the function is going to do, complete with
its expectations for passed variables. This is what makes
Komodo Edit so powerful for learning how to use a language.
It helps beginners to work on their projects without having to
refer constantly to the documentation.

You should also take


a look at the Syntax
Checking Status page.

Syntax checking
You should also take a look at the Syntax Checking Status
page. This updates in real-time to show you any errors that
creep into your code as youre typing, such as an incorrect
indentation when you create a new function in Python. This is
also why you need to make sure the correct language is preconfigured from the drop-down menu on the bottom-right of
the screen, as this is where Komodo loads all its languagedependent intelligence from. This is set automatically when
you create a Python project. You should also see that when
you do create a function in the editor, small square brackets
in the left encase those sections of code that are logically
disconnected from the main flow of execution. You can click
on the small plus icons to fold these sections away to make
your code easier to read. Many IDEs, and even editors, offer
the same facility. In fact, the simple process weve just

Part-time programmers
Some developers argue that code completion promotes
lazy programming because, they say, you never really learn
a language while you let an application complete function
names for you and highlight any mistakes. We think this is
partly true, but it doesnt take into account hobbyist or parttime programmers. Professionals spend every day of their
working lives surrounded by code, so their best option is
always going to be to master the language they rely on for
a living. It will happen without them trying. But for those of us
who code only occasionally, when our schedules allow, tools
such as code completion and syntax highlighting can make
us more productive. And this is where IDEs can make
a massive difference. Q
19

Code concepts

Code concepts:
Write a program
Jonathan Roberts shows you how to re-implement classic Unix
tools to bolster your Python knowledge and build real programs.

n the next few pages, were aiming to get you writing real
programs. Over the next few tutorials, were going to
create a Python implementation of the popular Unix tool
cat. Like all Unix tools, cat is a great target because its small
and focused on a single task, all the while using several
different operating system features, including accessing files,
pipes, and so on.
This means it wont take too
long to complete, but at the
same time will expose you to
a selection of Pythons core
features in the Standard Library,
and once youve mastered the
basics, its learning the ins-and-outs of your chosen languages
libraries that will let you get on with real work.

Our goal for the project overall is to:


Create a Python program, cat.py, that when called with no
arguments accepts user input on the standard input pipe
until an end of line character is reached, at which point
it sends the output to standard out.
When called with file names as arguments, cat.py should
send each line of the files to
standard output, displaying the
whole of the first file and then
the whole of the second file.
It should accept two
arguments: -E, which will make
it put $ signs at the end of
each line and -n, which will make it put the current line
number at the beginning of each line.
This time, were going to create a cat clone that can work
with any number of files passed to it as arguments on the
command line. Were going to be using Python 3, so if you
want to follow along, make sure youre using the same
version, as some features are not backwards-compatible with
Python 2.x.

You now know more


than enough to start
writing real programs.

Python files

The final program well be implementing. Its not long, but it makes use of a lot
of core language features youll be able to re-use time and again.

20

Lets start at the easiest part of the problem: displaying the


contents of a file, line by line, to standard out. In Python, you
access a file with the open function, which returns a fileobject that you can later read from, or otherwise manipulate.
To capture this file-object for use later in your program, you
need to assign the result of running the open function to
a variable, like so:
file = open(hello.txt, r)
This creates a variable, file, that will later allow us to read
the contents of the file hello.txt. It will only allow us to read
from this file, not write to it, because we passed a second
argument to the open function, r, which specified that the file
should be opened in read-only mode.
With access to the file now provided through the newlycreated file object, the next task is to display its contents, line
by line, on standard output. This is very easy to achieve, since
in Python files are iterable objects.
Iterable objects, like lists, strings, tuples and dictionaries,
allow you to access their individual member elements one at
a time through a for loop. With a file, this means you can
access each line contained within simply by putting it in a for
loop, as follows:
for line in file:
print(line)
The print function then causes whatever argument you
pass to it to be displayed on standard output.

Code concepts
If you put all this in a file, make it executable and create
a hello.txt file in the same directory, youll see that it works
rather well. There is one oddity, however theres an empty
line between each line of output.
The reason this happens is that print automatically adds
a newline character to the end of each line. Since theres
already a newline character at the end of each line in hello.txt
(there is, even if you cant see it, otherwise everything would
be on one line!), the second newline character leads to an
empty line.
You can fix this by calling print with a second, named
argument such as: print(line, end=). This tells print to put
an empty string, or no character, at the end of each line
instead of a newline character.

Passing arguments
This is perfectly fine, but compared to the real cat command,
theres a glaring omission here: we would have to edit the
program code itself to change which file is being displayed
to standard out. What we need is some way to pass
arguments on the command line, so that we could call our
new program by typing cat.py hello.txt on the command
line. Since Python has batteries included, this is a fairly
straightforward task as well.
The Python interpreter automatically captures all
arguments passed on the command line, and a module
called sys, part of the Standard Library, makes this available
to your code.
Even though sys is part of the standard library, its not
available to your code by default. Instead, you first have to
import it to your program and then access its contents with
dot notation dont worry, well explain this in a moment.
First, to import it to your program, add:
import sys
to the top of your cat.py file.
The part of the sys module that were interested in is the
argv object. This object stores all of the arguments passed on
the command line in a Python list, which means you can
access and manipulate it using various techniques weve
seen in past Code Concepts and will show in future ones.
There are only two things
you really need to know
about this. They are:
The first element of
the list is the name of
the program itself all
arguments follow this.
To access the list, you need to use dot notation that
is to say, argv is stored within sys, so to access it, you type
sys.argv, or sys.argv[1] to get the first argument to
your program.
Knowing this, you should now be able to adjust the code
we created previously by replacing hello.txt with sys.argv[1].
When you call cat.py from the command line, you can then
pass the name of any text file, and it will work just the same.

The output of the real Unix command, cat, and our Python re-implementation,
are exactly the same in this simple example.

since this is the name of the program itself. If you think


back to our previous article on data types and common list
operations, youll realise that this is easily done with a slice.
This is just one line:
for file in sys.argv[1:]:
Because operating on all
the files passed as arguments
to a program is such a
common operation, Python
provides a short cut for doing
this in the Standard Library, called fileinput.
To use this shortcut, you must first import it by putting
import fileinput at the top of your code. You will then be
able to use it to recreate the rest of our cat program so far,
as follows:
for line in fileinput.input():
print(line, end=)
This shortcut function takes care of opening each file
in turn and then making all their lines accessible through a
single iterator.
Thats about all that we have space for in this tutorial.
Although theres not been much code in this example, we
hope youve started to get a sense for how much is available
in Pythons Standard Library (and therefore how much work
is available for you to recycle), and how a good knowledge of
its contents can save you a lot of work when implementing
new programs. Q

The part of the sys


module were interested
in is the argv object.

Many files
Of course, our program is meant to accept more than one file
and output all their contents to standard output, one after
another, but as things stand, our program can only accept
one file as an argument!
To fix this particular problem, you need to loop over all the
files in the argv list. The only thing you need to be careful of
when you do this is that you exclude the very first element,

21

Code concepts

Code concepts:
Add features
Jonathan Roberts tour of the Python programming language
continues, as we write a clone of the Unix cat command.

ast time, we showed you how to build a simple cat


clone in Python. In this tutorial, were going to add
some more features to our program, including the
ability to read from the standard input pipe, just like the real
cat, and the ability to pass options to your cat clone. So,
without further delay, lets dive in.
Fortunately, you already know everything you need to
interact with the standard input pipe. In Linux, all pipes are
treated just like files: you can
pass a file as an argument to
a command, or you can pass
a pipe as an argument it
doesnt matter which you do,
because theyre basically the
same thing.
In Python, the same is true. All you need to get to work
with the standard input pipe is access to the sys library, which
if you followed along last time you already have. Lets write
a little sample program first to demonstrate:
import sys
for line in sys.stdin:
print(line, end=)
The first line imports the sys module. The lines that follow
are almost identical to those we had last time. Rather than
specifying the name of a file, however, we specified the name
of the file-object, stdin, which is found inside the sys module.
Just like a real file, in Python the standard input pipe is an
iterable object, so we use a for loop to walk through each line.
You might be wondering how this works, however, since
standard input starts off empty. If you run the program,
youll see what happens. Rather than printing out everything
thats present straight away, it will simply wait. Every time a
new line character is passed to standard input (by pressing

return), it will then print everything that came before it to


standard output.
Right, now we have two modes that our program can
operate in, but we need to put them together into a single
program. If we call our program with arguments, we want
it to work like last time that is, by concatenating the files
contents together; if its called without any arguments, we
want our program to work by repeating each line entered into
standard input. We could
easily do this with what we
know so far: simply check
to see what the length of
the sys.argv array is. If
its greater than 1, do last
lessons version, otherwise
do this version:
if len(sys.argv) > 1:
[last month...]
else:
[this month...]
Pretty straightforward. The only point of interest here is
the use of the len() function, seeing as were on a journey to
discover different Python functions. This function is built in to
Python, and can be applied to any type of sequence object
(a string, tuple or list) or a map (like a dictionary), and it
always tells you how many elements are in that object.
There are more useful functions like this, which you can
find at http://docs.python.org/3/library/functions.html.

Python provides us with


a much more powerful
alternative to sys.argv.

The Python language comes with all the bells and whistles you need to write
useful programs. In this example, you can see the replace method applied to
a string in order to remove all white space in the tutorial, we used the rstrip
method for a similar purpose.

22

Parsing arguments and options


This is quite a simplistic approach, however, and Python
provides us with a much more powerful alternative to
sys.argv. To demonstrate, were going to add two options
to our program that will modify the output generated by
our program.
You may not have realised it, but cat does in fact have
a range of options. Were going to implement the -E, which
shows dollar symbols at the end of lines, and -n, which
displays line numbers at the beginning of lines.
To do this, well start by setting up an OptionParser.
This is a special object, provided as part of the optparse
module, that will do most of the hard work for you. As well as
automatically detecting options and arguments, saving you
a lot of hard work, OptionParser will automatically generate
help text for your users in the event that they use your
program incorrectly or pass --help to it, like this:
[jon@LT04394 ~]$ ./cat.py --help
Usage: cat.py [OPTION]... [FILE]...
Options:

Code concepts
The Python 3
website provides
excellent
documentation
for a wealth of
built-in functions
and methods.
If you ever
wonder how to
do something
in Python, docs.
python.org/3/
should be your
first port of call.

-h, --help show this help message and exit


-E
Show $ at line endings
-n
Show line numbers
just like a real program!
To get started with OptionParser, first import the
necessary components:
from optparse import OptionParser
You may notice that this looks a bit different to what we
saw before. Instead of importing the entire module, were only
importing the OptionParser object. Next, you need to create
a new instance of the object, add some new options for it to
detect with the add_option method, and pass it a usage
string to display:
usage = usage: %prog [option]... [file]...
parser = OptionParser(usage=usage)
parser.add_option(-E, dest=showend, action=store_
true, help=Show $ at line
endings)
parser.add_option(-n, dest=shownum, action=store_
true, help=Show line numbers)
The %prog part of the usage string will be replaced with
the name of your program. The dest argument specifies what
name youll be able to use to
access the value of an
argument once the parsing has
been done, while the action
specifies what that value should
be. In this case, the action
store_true says to set the dest
variable to True if the argument is present, and False if not.
You can read about other actions at http://docs.python.
org/3/library/optparse.html.
Finally, with everything set, you just need to parse the
arguments that were passed to your program and assign the
results to two array variables:
(options, args) = parser.parser_args()
The options variable will contain all user-defined options,
such as -E or -n, while args will contain all positional
arguments left over after parsing out the options. You can call
these variables whatever you like, but the two will always be
set in the same order, so dont confuse yourself by putting the
variables the other way around!

With the argument-parsing code written, youll next want


to start implementing the code that will run when a particular
option is set. In both cases, well be modifying the string of
text thats output by the program, which means youll need to
know a little bit about Pythons built-in string editing
functions. Lets think about the -E, or showend, option first.
All we want this to do is replace the invisible line break thats
at the end of every file (or every line of the standard input
pipe, as implied by pressing return), and replace it with a
dollar symbol followed by a line break.
The first part, removing the existing new line, can be
achieved by the string.rstrip() method. This removes all
white space characters by default, at the right-hand edge of
a string. If you pass a string to it as an argument, it will strip
those characters from the right-hand edge instead of white
space. In our case, just white space will do.

Completing the job


The second part of the job is as simple as setting the end
variable in the print statement to the string $\n and the job
is almost complete. We say almost complete because we still
need to write some more logic to further control the flow of
the program based on what
options were set, as well as
whether or not any arguments
are passed. The thing is, this
logic needs to be a bit more
complicated than it ordinarily
would be because we need to
maintain a cumulative count of lines that have been printed
as the program runs to implement the second -n, or
shownum, option.
While there are several ways you could achieve this, in the
next tutorial were going to introduce you to a bit of object
orientated programming in Python and implement this
functionality in a class. Well also introduce you to a very
important Python convention the main() function and the
name variable.
In the meantime, you can keep yourself busy by
investigating the string.format() method and see if you can
figure out how you can append a number to the beginning of
each line. Q

Dont confuse yourself


by putting the variables
the other way around!

23

Code concepts

Code concepts:
Put it all together
Jonathan Roberts guide to the Python programming language
continues. In this tutorial, were going to finish our clone of cat.

eve come quite a long way over the last two


tutorials, having implemented the ability to echo
the contents of multiple files to the screen, the
ability to echo standard input to the screen and the ability
to detect and act upon options passed by the user of our
program. All that remains is
for us to implement the line
number option and to gather
together everything else
weve written into a single,
working program.

nested for loops, although theyre not nearly as readable as


object-oriented code!
When building complicated programs, figuring out how to
organise them so they remain easy to read; its easy to track
which variables are being used by which functions; and easy
to update, extend, or add new
features, can be challenging.
To make this easier, there are
various paradigms that
provide techniques for
managing complexity.
One of these paradigms
is object-oriented programming. In object-oriented
programming, the elements of the program are broken down
into objects which contain state that is variables that
describe the current condition of the object, and methods,
that allow us to perform actions on those variables or with
that object.
Its a very natural way of thinking, because it mirrors the
real world so closely. I can describe a set of properties about
my hand, such as having five fingers that are in certain
locations, and I can describe certain methods or things I can
do with my hand, such as moving one finger to press a key, or
holding a cup. My hand is an object, complete with state and
methods that let me work with it.
Were going to turn our cat program into an object,
where its state records how many lines have been displayed,
and its methods perform the action of the cat program
redisplaying file contents to the screen.

Its a very natural way


of thinking because it
mirrors the real world.

Objects
Last time, we ended by saying that there are many ways we
could implement the line counting option in our program.
Were going to show you how to do it in an object-oriented
style, as it gives us an excuse to introduce you to this aspect
of Python programming. You could, however, with a bit of
careful thought, implement the same function with some

Python objects

Just to prove that it works, heres our cat implementation, with all of the
options being put to use.

24

Python implements objects through a class system. A class


is a template, and an object is a particular instance of that
class, modelled on the template. We define a new class with
a keyword, much like we define a new function:
class catCommand:
Inside the class, we specify the methods (functions) and
state that we want to associate with every instance of the
object. There are some special methods, however, that are
often used. One of these is the init method. This is run when
the class is first instantiated in to a particular object, and
allows you to set specific variables that you want to belong to
that object.
def __init__(self):
self.count = 1
In this case, weve assigned 1 to the count variable, and
well be using this to record how many lines have been
displayed. You probably noticed the self variable, passed as
the first argument to the method, and wondered what on

Code concepts
Earth that was about. Well, it is the main distinguishing
feature between methods and ordinary functions.
Methods, even those with no other arguments, must have
the self variable. It is an automatically populated variable,
that will always point to the particular instance of the object
that youre working with. So self.count is a count variable
thats exclusive to individual instances of the
catCommand object.

The run method


We next need to write a method that will execute the
appropriate logic depending on whether certain options are
set. Weve called this the run method:
def run(self, i, options):
#set default options
e =
for line in i:
#modify printed line according to options
if options.showend:
[...last month]
if options.shownum:
line = {0} {1}.format(self.count, line)
self.count += 1
print(line, end=e)
Notice that weve passed the self variable to this method,
too. The two other arguments passed to this function are
arguments that well pass when we call the method later
on, just like with a normal function. The first, i, is going to
be a reference to whichever file is being displayed at this
moment, while the options variable is a reference to the
options decoded by the OptParse module.
The logic after that is fairly clear: for each line in the
current file, modify the line depending on what options
have been set. Either we do as last tutorial, and modify
the end character to be $\n or we modify the line, using
the .format method that we suggested you research last
time, to append the count variable, defined in the init
method, to the rest of the line. We then increment the count
and print the line.
The most important part is the use of self. It lets us refer
to variables stored within the current instance of the object.
Because its stored as part of the object, it will persist after
the current execution of the run method ends. As long as we
use the run method attached to the same object each time
we cat a new file in the argument list, the count will
remember how many lines were
displayed in the last file, and
continue to count correctly.
It might seem more natural,
given the description of
methods as individual actions
that can be taken by our
objects, to split each argument into a different method, and
this is a fine way to approach the problem.
The reason weve done it this way is we found it meant we
could re-use more code, making it more readable and less
error-prone.
Now all thats left to do is to tie everything together.
Were going to do this by writing a main function. This isnt
required in Python, but many programs follow this idiom, so
we will too:
def main():
[option parsing code ...]
c = catCommand()

The completed program isnt very long, but it has given us a chance to
introduce you to many different aspects of the Python language.

if len(args) > 1:
for a in args:
f = open(a, r)
c.run(f, options)
else:
c.run(sys.stdin, options)
Weve not filled in the object parsing code from last
time, because that hasnt changed. Whats new is the c =
catCommand() line. This is how we create an instance of a
class, how we create a new object. The c object now
has a variable, count, that is accessible by all its methods
as the self.count variable. This is what will allow us to track
line numbers.
We then check to see whether any arguments have been
passed. If they have, we call the run method of the object
c for each file that was passed as an argument, passing in
any options extracted
by OptParse along the
way. If there werent any
arguments, wed simply
call the run method with
sys.stdin instead of
a file object.
The last thing we need to do is actually call the main
function when the program is run:
if __name__ == __
main__:
main()
These last two lines are the oddest of all, but quite useful
in a lot of circumstances. The name variable is special when
the program is run on the command line, or otherwise as
a standalone application, it is set to main; when its imported
as an external module to other Python programs, its not.
This way, we can automatically execute main when run
as a standalone program, but not when importing it as
a module. Q

The last thing to do is


call the main function
when the program runs.

25

Code concepts

Code concepts:
Data modules
Jonathan Roberts introduces a way of untangling the mess of
your code and adding structure to your programs.

n the last few code concept tutorials, weve mentioned


how programming is all about managing complexity, and
weve introduced you to quite a few of the tools and
techniques that help programmers do this. From variables
to function definitions or object orientation they all help.
One tool weve yet to cover, in part because you dont start
to come across it until youre writing larger programs, is the
idea of modules and name spaces.
Yet, if youve written a program of any length, even just
a few hundred lines, this is a tool that youre no doubt
desperate for. In a long file of code, youll have noticed how
quickly it becomes more difficult to read it. Functions seem to
blur in to one another, and when youre trying to hunt down
the cause of the latest error, you find it difficult to remember
exactly where you defined that all important variable.
These problems are caused by a lack of structure. With
all your code in a single file, its harder to determine the
dependencies between elements of your program that is,
which parts rely on each other to get work done and its
harder to
visualise the flow
of data through
your program.
As your
programs grow
in length, other
problems also occur. For instance, you may find yourself with
naming conflicts, as two different parts of your program
require functions called add (adding integers or adding
fractions in a mathematics program, for example), or you
may have written a useful function that you want to share
with other programs that youre writing, and the only tool you
have to do that is boring and error-prone copy and pasting.

As your programs grow


in length, other problems
also occur.

Untangling the mess


Modules are a great way to solve all of these problems, letting
you put structure back in to your code, enabling you to avoid
naming conflicts, and making it easier for you to share useful
chunks of code between programs.
Youve no doubt been using them all the time in your code
as youve relied on Python built-in or third-party modules to
provide lots of extra functionality. As an example, remember
the optparse module we used on page 22.
We included it in our program with the import statement,
like so:
import optparse
After putting this line at the top of our Python program, we
magically got access to a whole load of other functions that
automatically parsed command line options. We could access

26

them by typing the modules name, followed by the name of


the function we wanted to execute:
optparse.OptionParser()
This was great from a readability perspective. In our cat
clone, we didnt have to wade through lots of code about
parsing command line arguments; instead we could focus on
the code that dealt with the logic of echoing file contents to
the screen and to the standard output pipe. Whats more, we
didnt have to worry about using names in our own program
that might collide with those in the optparse module,
because they were all hidden inside the optparse
namespace, and reusing this code was as easy as typing
import optparse no messy copy and pasting here.

How modules work


Modules sound fancy and you might think theyre
complicated, but in Python at least theyre really just plain
old files. You can try it out for yourself. Create a new directory
and inside it create a fact.py file. Inside it, define a function to
return the factorial of a given number:
def factorial(n):
result = 1
while n > 0:
if n == 1:
result *= 1
else:
result *= n
n -= 1
return n
Then, create a second Python file called doMath.py.
Inside this, first import the module you just created and
then execute the factorial function, printing the result to
the screen:
import fact
print fact.factorial(5)
Now, when you run the doMath.py file, you should see
120 printed on the screen. You should notice that the name of
the module is just the name of the file, in the same directory,
with the extension removed. We can then call any function
defined in that module by typing the modules name, followed
by a dot, followed by the function name.

The Python path


The big question thats left is, how does Python know where
to look to find your modules?
The answer is that Python has a pre-defined set of
locations that it looks in to find files that match the name
specified in your import statements. It first looks inside all of
the built-in modules, the location of which are defined when

Code concepts
By splitting
your code up
in to smaller
chunks, each
placed in its
own file and
directory, you
can bring order
to your projects
and make future
maintenance
easier.

you install Python; it then searches through a list of


directories known as the path.
This path is much like the Bash shells $PATH
environment variable: it uses the same syntax, and serves
exactly the same function. It varies, however, in how the
contents of the Python path are generated. Initially,
the locations stored in the path consist of the following
two locations:
The directory containing the script doing the importing.
The PYTHONPATH, which is a set of directories predefined in your default installation.
You can inspect the path in your Python environment by
importing the sys module, and then inspecting the path
attribute (typing sys.path will do the trick).
Once a program has started, it can even modify the path
itself and add other locations to it.

Variable scope in modules


Before you head off and start merrily writing your own
modules, theres one more thing you need to know about:
variable scope.
Weve no doubt talked about scope as a concept before,
but as a quick refresher, scope refers to the part of a program
from which particular variables can be accessed. For instance,
a single Python module might contain the following code:
food = [apples, oranges, pears]

print food
def show_choc():
food = [snickers, kitkat, dairy milk]
print food
show_choc()
print food
If you run that, youll see that outside the function the
variable food refers to a list of fruit, while inside the function,
it refers to a list of chocolate bars. This small program
demonstrates two different scopes: the global scope of the
current module, in which food refers to a list of fruit, and the
local scope of the function, in which food refers to a list of
chocolate.
When looking up a variable, Python starts with the
innermost variable and works its way out, starting with the
immediately enclosing function, and then any functions
enclosing that, and then the modules global scope, and then
finally it will look at all the built-in names.
In simple, single-file programs, its a bad idea to put
variables in the global scope. It can cause confusion and
subtle errors elsewhere in your program. Modules help with
this problem, because each module has its own global scope.
As we saw above, when we import a module, its contents are
all stored as attributes of the modules name, accessed via
dot notation. This makes global variables less troublesome,
although you should still be careful when using them. Q

Python style
While many people think of Python as
a modern language, its actually been
around since the early 1990s. As with
any programming language thats been
around for any length of time, people
who use it often have learned a lot about
the best ways to do things in the
language in terms of the easiest way

to solve common problems, and the best


ways to format your code to make sure
its readable for co-workers and anyone
else working on the code with you
(including your future self!).
If youre interested in finding out more
about these best practices in Python,
there are two very useful resources from

which you can start learning:


http://python.net/~goodger/
projects/pycon/2007/idiomatic/
handout.html
www.python.org/dev/peps/pep0008
Read these and youre sure to gain
some deeper insight into the language.

27

Code concepts

Code concepts:
Data storage
Jonathan Roberts explains how to deal with persistent data and
store your files in Python.

torage is cheap: you can buy a 500GB external


hard drive for less than 40 these days, and even
smartphones come with at least 8GB of storage, and
many are easily expandable up to 64GB for only the price of
a pair of jeans.
Its no surprise, then, that almost every modern
application stores data in one way or another, whether thats
configuration data, cached data to speed up future use, saved
games, to-do lists or photos. The list goes on and on.
With this in mind, this Code Concepts is going to
demonstrate how to deal with persistent data in our language
of choice Python.
The most obvious form of persistent storage that you can
take advantage of in Python is file storage. Support for it is
included in the standard library, and you dont even have to
import any modules to take advantage of it!
To open a file in the current working directory (that is,
wherever you ran the Python script from, or wherever you
were when you launched the interactive shell), use the open()
function:
file = open(lxftest.txt, w)
The first
argument to
open is the
filename, while
the second specifies which mode the file should be opened in
in this case, write, but other valid options include read-only
(r) and append (a).
In previous issues, weve shown you that this file object is
in fact an iterator, which means you can use the in keyword to
loop through each line in the file and deal with its contents,
one line at a time. Before reviewing that information, however,
lets look at how to write data to a file.

Almost every modern


application stores data
in one way or another.

Writing to files
Suppose youre writing your own RSS application to replace
Google Reader. Youve already got some way to ask users to
enter in a list of feeds (perhaps using raw_input(), or perhaps
using a web form and CGI), but now you want to store that list
of feeds on disk so you can re-use it later when youre
checking for new updates. At the moment, the feeds are just
stored in a Python list:
feeds = [http://newsrss.bbc.co.uk/rss/newsonline_uk_
edition/front_page/rss.xml, http://www.tuxradar.com/rss]
To get the feeds in to the file is a simple process. Just use
the write method:
for feed in feeds:
file.write({0}\n.format(feed))

28

Easy! Notice how we used the format string function to


add a new line to the end of each string, otherwise wed end
up with everything on one line which would have made it
harder to use later.
Re-using the contents of this file would be just as simple.
Using the file as an iterator, load each line in turn in to a list,
stripping off the trailing new line character. Well leave you to
figure this one out.
When using files in your Python code, there are two things
that you need to keep in mind. The first is that you need to
convert whatever you want to write to the file to a string first.
This is easy, though, since you can just use the built-in str()
function, eg, str(42) => 42.
The second is that you have to close the file after youve
finished using it if you dont do this, you risk losing data that
you thought had been committed to disk, but that had not yet
been flushed. You can do this manually with the close method
of file objects. In our example, this would translate to adding
file.close() to our program. Its better, however, to use the
with keyword:
with open(lxf-test.txt, a) as file:
feeds = [line.rstrip(\n) for line in f]
This simple piece of Python handles opening the file
object and, when the block inside the with statement is
finished, automatically closes it for us, too! If youre unsure
what the second line does, look up Python list
comprehensions; theyre a great way to write efficient,
concise code and to bring a little bit of functional style in to
your work.

Serialising
Working with files would be much easier if you didnt have to
worry about converting your list (or dictionary, for that
matter) in to a string first of all for dictionaries in particular,
this could get messy. Fortunately, Python provides two tools
to make this easier.
The first of these tools is the pickle module. Pickle
accepts many different kinds of Python objects, and can then
convert them to a character string and back again. You still
have to do the file opening and closing, but you no longer
have to worry about figuring out an appropriate string
representation for your data:
import pickle
...
with open(lxf-test.txt, a) as file:
pickle.dump(feeds, file)

with open(lxf-test.txt, r) as file:


feeds = pickle.load(file)

Code concepts

If youre interested in persistent data in Python, a good next stopping point is the ZODB object database. Its much
easier and more natural in Python than a relational database engine (www.zodb.org).

This is much easier, and it has other applications outside


of persisting data in files, too. For example, if you wanted to
transfer your feed list across the network, you would first
have to make it in to a character string, too, which you could
do with pickle.
The problem with this is that it will only work in Python
that is to say, other programming languages dont support
the pickle data format. If you like the concept of pickling
(more generically, serialising), theres another option that
does have support in other languages, too: JSON.
You may have heard of JSON it stands for JavaScript
Object Notation, and is a way of converting objects into
human-readable string representations, which look almost
identical to objects found in the JavaScript programming
language. Its great, because its human readable, and also
because its widely supported in many different languages,
largely because its become so popular with fancy web
2.0 applications.
In Python, you use it in exactly the same way as pickle in
the above example, just replace pickle with json throughout,
and youll be writing interoperable, serialised code!

Shelves
Of course, some code bases have many different objects that
you want to store persistently between runs, and keeping
track of many different pickled files can get tricky. Theres
another Python standard module, however, that uses Pickle
underneath, but makes access to the stored objects more
intuitive and convenient: the Shelve module.
Essentially, a shelf is a persistent dictionary that is to
say, a persistent way to store key-value pairs. The great thing
about shelves, however, is that the value can be any Python
object that Pickle can serialise. Lets take a look at how you
can use it. Thinking back to our RSS reader application,
imagine that as well as the list of feeds to check, you wanted

to keep track of how many unread items each feed had, and
which item was the last to be read. You might do this with a
dictionary, eg,
tracker = { bbc.co.uk:
{ last-read: foo,
num-unread: 10, },
tuxradar.co.uk: { last-read: bar,
num-unread: 5, }}
You could then store the list of feeds and the tracking
details for each in a single file by using the shelve module,
like so:
import shelve
shelf = shelve.open(lxf-test)
shelf[feeds] = feeds
shelf[tracker] = tracker
shelf.close()
There are a few important things that you should be aware
of about the shelve module:
The shelve module has its own operations for opening and
closing files, so you cant just use the standard open function.
To save some data to the shelf, you must first use a
standard Python assignment operation to set the value of
a particular key to the object you want to save.
As with files, you must close the shelf object once finished
with, otherwise your changes may not be stored.
Accessing data inside the shelf is just as easy. Rather than
assigning a key in the shelf dictionary to a value, you assign
a value to that stored in the dictionary at a particular key:
feeds = shelf[feeds]. If you want to modify the data that
was stored in the shelf, modify it in the temporary value you
assigned it to, then re-assign that temporary value back to
the shelf before closing it again.
Thats about all we have space for this tutorial, but keep
reading, as well discuss one final option for persistent data:
relational databases (eg, MySQL). Q
29

Code concepts

Code concepts:
Data organisation
Jonathan Roberts uses SQL and a relational database to add
some structure to his extensive 70s rock collection.

n the last tutorial, we looked at how to make data


persistent in your Python programs. The techniques we
looked at were flat-file based, and as useful as they are,
theyre not exactly industrial scale. As your applications grow
more ambitious, as performance becomes more important,
or as you try to express more complicated ideas and
relationships, youll need to look towards other technologies,
such as an object database or, even, a relational database.
As relational databases are by far the most common
tool for asking complex questions about data today, in this
tutorial were going to introduce you to the basics of relational
databases and the language used to work with them (which
is called SQL, or Structured Query Language). With the
basics mastered, youll be able to start integrating relational
databases into
your code.
To follow
along, make sure
youve got
MySQL or one
of its drop-in
replacements installed and can get access to the MySQL
console:
mysql -uroot
If youve set a password, use the -p switch to give that as well
as the username. Throughout, well be working on a small
database to track our music collection.

Relational databases
are used to ask complex
questions about data.

Relationships
Lets start by thinking about the information we want to store
in our music collection. A logical place to start might be
thinking about it in terms of the CDs that we own. Each CD
is a single album, and each album can be described by lots
of other information, or attributes, including the artist who
created the album and the tracks that are on it.
We could represent all of this data in one large,
homogeneous table like the one below which is all well

Duplicated data

30

Album

Free At Last

Free At Last

Artist

Free

Free

Track

Little Bit of Love

Travellin Man

Track Time

2:34

3:23

Album Time

65:58

65:58

Year

1972

1972

Band Split

1973

1973

Relational database
Album Name

Free At Last

Running Time

65:58

Year

1972

Artist_id

and good, but very wasteful. For every track on the same
album, we have to duplicate all the information, such as the
album name, its running time, the year it was published, and
all the information about the artist, too, such as their name
and the year they split. As well as being wasteful with storage
space, this also makes the data slower to search, harder to
interpret and more dangerous to modify later.
Relational databases resolve these problems by letting us
split the data and store it in a more efficient, useful form. They
enable us to identify separate entities within the database
that would benefit from being stored in independent tables.
In our example, we might split information about the
album, artist and track into separate tables. We would then
only need to have a single entry for the artist Free (storing
the name and the year they split), a single entry for the
album Free At Last (storing its name, the year published
and the running time), and a single entry for each track in
the database (storing everything else) in each of their
respective tables.
All that duplication is gone, but now all the data has been
separated, what happens when you want to report all the
information about a single track, including the artist who
produced it and the album it appeared on? Thats where the
relational part of relational database comes in.
Every row within a database table must in some way
be unique, either based on a single unique column (eg
unique name for an artist, or unique title for an album), or
a combination of columns (eg album title, year published).
These unique columns form what is known as a primary
key. Where a natural primary key (a natural set of unique
columns) doesnt exist within a table, you can easily
add an artificial one in the form of an automatically
incrementing integer ID.
We can then add an extra column to each of our tables
that references the primary key in another table. For example,
consider the table above. Here, rather than giving all the
information about the artist in the same table, weve simply
specified the unique ID for a row in another table, probably
called Artist. When we want to present this album to a user, in
conjunction with information about the artist who published

Code concepts
it, we can get the information first from this Album table, and
then retrieve the information about the artist, whose ID is 1,
from the Artist table, combining it together for presentation.

SQL
That, in a nutshell, is what relational databases are all about.
Splitting information into manageable, reusable chunks of
data, and describing the relationships between those chunks.
To create these tables within the database, to manage the
relationships, to insert and query data, most relational
databases make use of SQL, and now that you know what a
table and a relationship is, we can show you how to use SQL
to create and use your own.
After logging into the MySQL console, the first thing we
need to do is create a database. The database is the top-level
storage container for bits of related information, so we need
to create it before we can start storing or querying anything
else. To do this, you use the create database statement:
create database lxfmusic;
Notice the semi-colon at the end of the command all
SQL statements must end with a semi-colon. Also notice that
weve used lower-case letters: SQL is not case sensitive, and
you can issue your commands in whatever case you like.
With the database created, you now need to switch to it.
Much as you work within a current working directory on the
Linux console, in MySQL, many commands you issue are
relative to the currently selected database. You can switch
databases with the use command:
use lxfmusic;
Now to create some tables:
create table Album (
Album_id int auto_increment primary key,
name varchar(100)
);
create table Track (
Track_id int auto_increment primary key,
title varchar(100),
running_time int,
Album_id int
);
The most obvious things to note here are that weve
issued two commands, separated by semi-colons, and that
weve split each command over multiple lines. SQL doesnt
care about white space, so you can split your code up
however you like, as long as you put the right punctuation in
the correct places.
As for the command itself, notice how similar it is to the
create database statement. We specify the action we want
to take, the type of object were operating on and then the
properties of that object. With the create database
statement, the only property was the name of the database;
with the create table statement, weve also got a whole load
of extra properties that come inside the parentheses and are
separated by commas.
These are known as column definitions, and each commaseparated entry describes one column in the database. First,
we give the column a name, then we describe the type of data
that is stored in it (this is necessary in most databases), and
then after that we specify any additional properties of that
column, such as whether or not it is part of the primary key.
The auto_increment keyword means that you dont have
to worry about specifying the value of Track_id when inserting
data, as MySQL will ensure that this is an integer that gets
incremented for every row in the database, thus forming a

MariaDB is a drop-in replacement for the MySQL database, and is quickly


finding favour among distros including Mageia, OpenSUSE and even Slackware.

primary key. You can find out more about the create table
statement in the MySQL documentation at http://dev.
mysql.com/doc/refman/5.5/en/create-table.html.

Inserts and queries


Inserting data into the newly created tables isnt any trickier:
insert into Album (name) values (Free at Last);
Once again, we specify the action and the object on which
were acting, we then specify the columns which were
inserting into, and finally the values of the data to be put in.
Before we can insert an entry into the Track table,
however, we must discover what the ID of the album Free At
Last is, otherwise we wont be able to link the tables together
very easily. To do this, we use the select statement:
select * from Album where name = Free At Last;
This command says we want to select all columns from
the Album table whose name field is equal to Free At Last.
Pretty self-explanatory really! If wed only wanted to get the ID
field, we could have replaced the asterisk with Album_id and
it would have taken just that column.
Since that returned a 1 for me (it being the first entry in
the database), we can insert into the Track table as follows:
insert into Track (title, running_time, Album_id) values
(Little Bit of Love, 154, 1);
The big thing to note is that we specified the running time
in seconds and stored it as an integer. With most databases,
you must always specify a data type for your columns, and
sometimes this means you need to represent your data in a
different manner than in your application, and youll need to
write some code to convert it for display. That said, MySQL
does have a wide range of data types, so many eventualities
are covered.
Thats all we have space for this month, but dont let your
MySQL education stop there. Now youve seen the basics,
youll want to investigate foreign keys and joins, two more
advanced techniques that will let you be far more expressive
with your SQL. Youll also want to investigate the different
types of relationship, such as one-to-one, one-to-many,
many-to-one and many-to-many.
Finally, if you want to integrate MySQL with your
programming language of choice, look out for an appropriate
module, such as the python-mysql module for Python. Q
31

Code concepts

Code concepts:
Data encryption
Learn the principles behind encryption Ben Everard unpacks
how it prevents snoopers from reading your data.

hen writing a program, we usually assume that


the other applications are friendly. We dont
normally try to hide our data from them. In fact,
we normally try to write our files so that other programs can
read them. We use text encodings, XML and other standards
to make sure that our files play nicely with other software.
However, there are times when we want to keep our
information to ourselves. Perhaps we need to transmit it
across an insecure network, or put it on a USB key that
could be lost. Whatever the case, we need to make sure that
prying eyes
cant see what
it is, and for
this we use
encryption.
Before we
get started,
we should say that the first rule of data encryption is never
try to create your own. Unless, that is, you have a PhD in the
appropriate area of mathematics and plenty of experience.
There are a number of standards that are generally
considered unbreakable, and most languages have a good set
of encryption libraries that support these. These libraries will
be far more secure than any you can create yourself; just
make sure you read the documentation properly to avoid
using insecure options. Here were going to break this rule,
but only to show how encryption works.
All encryption starts with data you want to hide. It then
applies some method of rendering that data unreadable.
Ideally, it should have some method of recovering the data
that only the original user can perform. Usually this is done
with a password (when the password is a string of binary
information rather than an alpha-numeric word, its referred

The first rule of data


encryption is this: never
to try to create your own.

to as a key, but the basic principal is the same). There are


two different types of encryption: symmetric key and
asymmetric key. The former uses the same key to decrypt
information as it used to encrypt it, while the latter uses two
different keys.

Symmetric encryption
Here were going to take a look at a simple (and fairly
insecure) method of symmetric encryption (also known as
private key encryption) for text that uses the XOR (exclusive
OR) function. XOR takes two binary digits as an input (which
can each be either 0 or 1). It outputs a 1 if one of the two
inputs is a 1, and a 0 if either none or both is (See the XOR
Truth Table, right).
We can XOR strings of data by XORing each item in turn.
That simple XOR provides all we need for our encryption
method which well get onto in a minute. First well look at the
information were going to encrypt.
Text, like all computer data, is stored in binary as 1s and
0s. ASCII text encoding stores each character as a string of
eight bits of binary information. For example, B is
01000010, e is 01100101 and n is 01101110. ASCII text, then
is just a chain of these characters, each one eight bits long.
Ben, therefore is 010000100110010101101110.
Now, back to our encryption method. Were going to use a
password thats just a single character (we told you it wasnt
going to be secure!). It can be any single character. And our
encryption method is to XOR each letter of our text with our
key (See XORing characters box, above-right).
That ciphertext is our unreadable text. Without knowing
the key, theres no way a program can read it or is there?
We know the key (its A), but how can we use this to get
the original text back? Actually, its really simple. Back in

Asymmetric encryption
Symmetric encryption is great for securing
files, but it has some problems when securing
communication. For starters, you need a secure
way of sharing the keys with everyone. If you wanted
an encrypted communication with, for example,
Google, youd somehow need to obtain a key, and
this key would need to be different for every person
Google communicated with otherwise theyd be
able to eavesdrop.
To get around this, we have asymmetric
encryption (which is also known as public key
encryption). In this method there are two keys,

32

one public and the other private. Anything that is


encrypted with the public key can only be decrypted
with the private key and visa versa.
Now, for example, if you need an encrypted
communication with Google, you only need to
know Googles public key. Using this public key
encryption, you can exchange a single-use key for
a session of symmetric encryption.
For further info and an example of an asymmetric
implementation, see Neil Bothwicks tutorial at
techradar.com/news/internet/data-privacy-howsafe-is-your-data-in-the-cloud--1170332/1.

Code concepts
XORing characters
b
Text

Key A

XOR
Cypher text

XOR
0

XOR
1

We can encrypt a stream by XORing each character in turn with our key.

school you may have learned that (a + b) + c = a + (b + c).


That is, with addition at least, it doesnt matter which order
you do the addition; you always get the same result. Well, it
turns out that the same thing is true of XOR. But before we
can use that, there are two more things we need to know:
anything that is XORd with itself is 0, and anything XORd with
0 is unchanged.
So, key XOR key is 0 and text XOR 0 is text. That means
that text XOR (key XOR key) is text, and weve just learned
that this is the same as (text XOR key) XOR key. Since our
cyphertext is just text XOR key, we now know that cyphertext
XOR key is our original text. Basically, thats just a really long
way of saying that we can decrypt something exactly the
same way we encrypted it.

Statistical attacks
We now have our method of encryption and decryption, but
its not very secure. First of all, there are only 256 possible
keys (if we include all eight bit strings, and not just the ones
that have ASCII characters), so its perfectly feasible for an
attacker to check every one in turn. However, it turns out they
dont have to.
In English text, some characters are very common, and
others are quite rare. For example, the space character
usually makes up 1520% of all the characters in a piece of
text and the lower-case E about 10% while lower-case Z can
be as little as 0.02% and capital Z almost never.
Since a given character in our text will always evaluate to
the same character in our cyphertext, these proportions will
come across. For example, given the cyphertext:
00001001 00011110 00000011 01110001 00110010 00111000
00100001 00111001 00110100 00100011 00100010 01111101
01110001 01100001 00100010 01110001 00101000 00111110
00100100 01110001 00100010 00110100 00110100 01111101
01110001 00110000 00100011 00110100 01110001 00111110
00111110 00100101 01110001 00100010 00110100 00110010
00100100 00100011 00110100
We can see that the most common 8-bit string is: 01110001.

XOR truth table


Input 1

Input 2

XOR

We can take a guess that this is the encrypted form of the


space character.
We know that text XOR text is 0 and key XOR 0 is key.
Therefore we know that (text XOR key) XOR text is key. So, if
we XOR the most common character with space
(00100000), we get the key. In this case, its 01010001 which
corresponds to the ASCII character Q
Using this, we can decrypt the whole text to:
XOR ciphers, as you see, are not secure
This is known as a statistical attack and is one of the most
common ways of attacking encryption. The simplest way to
prevent your data falling victim to it is to use an encryption
method thats well known and been tested by the best minds
in the business. AES, Two-fish, Three-fish and Serpent are
all good choices. As we mentioned at the start, you should
find a library that implements one or more of them in your
language of choice.
You may have noticed a slight flaw in this plan. How do you
find out a servers public key in the first place? After all, if an
attacker could trick you into using their public key, they could
read all the supposedly secure communications.
We get around this by using certificates. When you install a
web browser, it will come with public keys for a number of
trusted certificate authorities. A website can then get one of
these authorities to sign their public key. When you go to an
encrypted web page, the server sends you a certificate (which
contains both their public key, and the signature from the
authority) and the encrypted method. Since you trust the
authority, you can now trust this certificate, and read the page
safe in the knowledge that its been securely transmitted and
not tampered with.
This method isnt perfect. First, it requires you to trust a
range of authorities (you can see how many authorities you
trust in your web browser. For example, in Firefox, go to Edit >
Preferences > Advanced > View Certificates > Authorities).
Its quite a few, and most are probably companies youve
never heard of, much less trust. If a hacker or disgruntled
employee gets into the computer systems at any one of
them, they could intercept almost any encrypted web traffic
they wanted. But thats not the only way these certificates
can be subverted. If an attacker can find a way of forging the
digital signature, they can trick a computer into thinking a
communication comes from a particular source when it
doesnt. In fact, it was this method that allowed the Flame
malware to get into Iranian computers. Microsoft had used
the insecure MD5 (rather than the more secure SHA) hash,
and attackers were able to use this to make the computer
think something was signed when it wasnt. Be careful! Q
33

Code concepts

Code concepts:
Spot mistakes
Bug reports are useful, but you dont really want to cause too many.
Alex Cox explains what to avoid and how to avoid it.

t doesnt matter how much care you put into writing your
code. Even if youve had four cups of coffee and triplecheck every line you write, sooner or later you are going to
make a mistake. It might be as simple as a typo a missing
bracket or the wrong number, or it could be as complex as
broken logic, memory problems or just inefficient code. Either
way, the results will always be the same at some point, your
program wont do what you wanted it to. This might mean it
crashes and dumps the user back to the command line. But it
could also mean a subtle rounding error in your tax returns
that prompts the Inland Revenue to send you a tax bill for
millions of pounds, forcing you to sell your home and declare
yourself bankrupt.

Finding the problems

The IDLE
Python IDE has a
debug mode that
can show how
your variables
change over time.

34

How quickly your mistakes are detected and rectified is


dependent on how complex the problem is, and your skills in
the delicate art of troubleshooting. For instance, even though
our examples of code from previous tutorials stretch to no
more than 10 lines, youve probably needed to debug them as
youve transferred them from these pages to the Python
interpreter. When your applications grow more complex than
just a few lines or functions, you can spend more time hunting
down problems than you do coding. Which is why before you
worry about debugging, you should follow a few simple rules
while writing your code.
The first is that, while you cant always plan what youre
going to write or how youre going to solve a specific problem,
you should always go back and clean up whatever code you
end up with. This is because its likely youll have used nowredundant variables and bolted on functionality into illogical
places. Going back and cleaning up these areas makes the
code easier to maintain and easier to understand. And
making your project as easy to understand as possible

becomes important as it starts to grow, and you seldom


revisit these old bits of code.
Whenever you write a decent chunk of functionality, the
second thing you should do is add a few comments to
describe what it does and how it does it. Comments are
simple text descriptions about what your code is doing,
usually including any inputs and expected output. Theyre not
interpreted by the language or the compiler they dont
affect how your code works, they are there purely to help
other developers and users understand what a piece of code
does. But, more importantly, they are there to remind you of
what your own code does.
This might sound strange, but no matter how clear your
insight might have been when you wrote it, give it a few days,
weeks or months, and it may as well have been written by
someone else for all the sense it now makes. And as a
programmer, one of the most frustrating things you have to
do is solve a difficult problem twice once when you create
the code, and again when you want to modify it but dont
understand how it works. A line or two of simple description
can save you days of trying to work out what a function
calculates and how it works, or may even obviate the need for
you to understand anything about what a piece of code does,
as you need to know only the inputs and outputs.

The importance of documentation


This is exactly how external libraries and APIs work. When you
install Qt, for instance, youre not expected to understand
how a specific function works. You need only to study the
documentation of the interface and how to use it within the
context of your own code. Everything a programmer needs to
know should be included in the documentation. If you want to
use Qts excellent sorting algorithms, for example, you dont
have to know how it manages to be so efficient, you need to
know only what to send to the function and how to get the
results back.
You should model your own comments on the same idea,
both because it makes documentation easier, and because
self-contained code functionality is easier to test and forget
about. But we dont mean you need to write a book. Keep
your words as brief as they need to be sometimes that
might mean a single line. How you add comments to code is
dependent on the language youre using. In Python, for
example, comments are usually demarcated by the # symbol
in the first column of a line. Everything that comes after this
symbol will be ignored by the interpreter, and if youre using
an editor with syntax highlighting, the comment will also be
coloured differently to make it more obvious. The more detail

Code concepts
you put into a comment the better, but dont write a book.
Adding comments to code can be tedious when you just want
to get on with programming, so make them as brief as you
can without stopping your flow. If necessary, you can go back
and flesh out your thoughts when you dont feel like writing
code (usually the day before a public release). When you start
to code, youll introduce many errors without realising it. To
begin with, for example, you wont know what is and isnt a
keyword a word used by your chosen language to do
something important. Each language is different, but Pythons
list of keywords is quite manageable, and includes common
language words such as and, if, else, import, class and
break, as well as less obvious words such as yield, lambda,
raise and assert. This is why its often a good idea to create
your own variable names out of composite parts, rather than
go with real words. If youre using an IDE, theres a good
chance that its syntax highlighting will stop you from using a
protected keyword.

Undeclared values
A related problem that doesnt affect Python is using
undeclared values. This happens in C or C++, for instance, if
you use a variable without first saying what type its going to
be, such as int x to declare x an integer. Its only after doing
this you can use the variable in your own code. This is the big
difference between compiled languages and interpreted ones.
However, in both languages, you cant assume a default value
for an uninitialised variable. Typing print (x) in Python, for
instance, will result in an error, but not if you precede the line
with x = 1. This is because the interpreter knows the type of a
variable only after youve assigned it a value. C/C+ can be
even more random, not necessarily generating an error, but
the value held in an uninitialised variable is unpredictable until
youve assigned it a value.
Typos are also common, especially in conditional
statements, where they can go undetected because they are
syntactically correct. Watch out for using a single equals sign
to check for equality, for example although Python is pretty

good at catching these problems. Another type of problem


Python is good at avoiding is inaccurate indenting. This is
where conditions and functions use code hierarchy to split
the code into parts. Python enforces this by breaking
execution if you get it wrong, but other languages try to make
sense of code hierarchy, and sometimes a misplaced bracket
is all thats needed to create unpredictable results. However,
this can make Python trickier to learn. Initially, if you dont
know about its strict tabbed requirements, or that it needs a
colon at the end of compound statement headers, the errors
created dont make sense. You also need to be careful about
case sensitivity, especially with keywords and your own
variable names.
When youve got something that works, you need to test
it not just with the kind of values your application might
expect, but with anything that can be input. Your code should
fail gracefully, rather than randomly. And when youve got
something ready to release, give it to other people to test.
Theyll have a different approach, and will be happier to break
your code in ways you couldnt imagine. Only then will your
code be ready for the wild frontier of the internet, and youd
better wear your flameproof jacket for that release. Q

You have to be
careful in Python
that the colons
and indentation
are in the correct
place, or your
script wont run.
But this does
stop a lot of
runtime errors.

Comment syntax
Different languages mark comments differently, and there seems to be
little consensus on what a comment should look like. However, there are
a couple of rules. Most languages offer both inline and block comments,
for example. Inline are usually for a single line, or a comment after a piece

Bash
BASIC
C
C++
HTML

of code on the same line, and theyre initiated by using a couple of


characters before the comment. Block comments are used to wrap pieces
of text (or code you dont want interpreted/compiled), and usually have
different start and end characters.

# A hash is used for comments in many scripting languages. When # is followed by a ! it becomes a shebang # and is used to
tell the system which interpreter to use, for example: #!/usr/bin/bash
REM For many of us, this is the first comment syntax we learn
/* This kind of comment in C can be used to make a block of text span many lines */
// Whereas this kind of comment is used after the // code or for just a single line
<!-- Though not a programming language, weve included this because youre likely to have already seen the syntax, and
therefore comments, in action -->

Java

/** Similar to C, because it can span lines, but with an extra * at the beginning */

Perl

= heading Overview
As well as the hash, in Perl you can also use something called Plain Old Documentation. It has a specific format, but it does
force you to explain your code more thoroughly
=cut

Python

As well as the hash, Python users can denote blocks of comments using a source code literal called a docstring, which is
a convoluted way of saying enclose your text in blocks of triple quotes, like this

35

Ruby

Ruby
I

f you want to be a hip, cool web developer,


you had better learn Ruby on Rails. This web
development framework takes the grunt work
out of building scalable web apps, leaving you to do
the problem-solving without having to reinvent the
wheel every time you start a new project.

Ruby: Master the basics.............................................................................. 38


Ruby: Add a little more polish ............................................................... 42
Ruby: Modules, blocks and gems....................................................49
Ruby on Rails: Web development .................................................... 54
Ruby on Rails: Code testing .................................................................... 58
Ruby on Rails: Site optimisation......................................................... 62

37

Ruby

Ruby: Master
the basics
Juliet Kemp introduces the ins and outs of the Ruby
programming language enough to write your first program.

Quick
tip
Indentation doesnt
matter from a code
point of view,
but the Ruby
community prefers
two-character
indentation.

uby on Rails is the current web stack framework,


popping up on open source projects all over the
web. The Rails part is the web stack; the language
underlying it is Ruby. Its flexible, highly object-oriented, and
quick to develop in. Ruby itself is growing rapidly in popularity
alongside Rails, and its very easy to jump in and get started.
To install Ruby, see the boxout. Once youve installed it, to get
an idea of how it works just type irb at the command line.
This fires up the Interactive Ruby Shell, which allows you to
type in code and get its value back immediately. Try it out:
:001 > 3*5
=> 15
:003 > print Hello!
Hello! => nil
:004 > puts Is there anyone there?
Is there anyone there? => nil
Here, both print and puts (put string print a string to
standard output) are methods. You could also put the
parameter in brackets if you prefer, eg, puts(Is there
anyone there?). In Ruby, brackets are often optional, so its
up to you (or the project youre working on) to decide what
your preferred coding style is (the Ruby community norms
tend to be minimal and leave brackets out unless needed for
clarity). In both cases, the return value of the function is nil,
whereas the return value of 3*5 is 15.
You can even write a method in IRB. Try this out:
> def hithere
?> return Hello!
?> end
> => nil
> hithere
> => Hello!
Here we define a method that returns a string, then call it, and

the return value is, as expected, our string. Note that IRB is
smart enough to recognise that the method isnt finished
until the end line and doesnt return anything until then.
However, its not necessary to use return to get a return value
from a Ruby method. Try this:
> def hithere2
?> Hello; no return
?> end
> => nil
> hithere2
> => Hello; no return
Ruby methods will automatically return the evaluation of
the last line of the method. So you need return only if you
have multiple possible return values, or to improve code

Note the different return values with 7 (treated as


integer) and 7.0 (treated as floating point).

Install Ruby
RVM (the Ruby Version Manager) is
the easiest way to install Ruby.
Among other things, it allows you to
install and use multiple versions of
Ruby on one machine, which may
come in handy later on in your
Ruby experience. If you have Git
installed, all you need is:
\curl -L https://get.rvm.io | bash -s
stable --ruby
(yes, the backslash is correct). This

38

will download and install RVM,


Ruby, and any other basic
necessities, and youll be prompted
to do anything else you need to (as
a rule, this should just be to source
the RVM script). For more
information, see the RVM website:
https://rvm.io/rvm/install, or for
other ways of installing Ruby try
www.ruby-lang.org/en/
downloads.

Halfway through installing RVM.

Ruby
clarity with more complex code. To run code as a file rather
than in IRB, just put the commands in a file with the extension
.rb, and run it with ruby myfile.rb. Alternatively, you can add
a shebang line at the top, make the file executable, and call it
anything you like:

#!/usr/bin/ruby -w
puts I can run Ruby!
Note the -w flag, which turns on warnings this is good
practice. You can run this file (once executable) with
./myfile.rb.

Write your first Ruby program


Were going to write a little Ruby program that acts as a basic
notebook. By the end of the tutorial, notebook.rb will show
your current notes, and allow you to add another one on the
end. See www.linuxformat.com/files/ca2015.zip for full
code details; the initial version is called notebook_v1.rb.
First, lets look at writing to a file. Input and output in Ruby
are handled by the IO class, and the File class is a subclass of
that. Create a notebook.rb file that looks like this:
#!/usr/bin/ruby -w
nbk = File.open(notebook.txt, w)
nbk.puts My first note
nbk.close
As youve almost certainly heard, in Ruby everything is an
object. This includes things like numbers, which other OO
languages (eg Java) often treat as primitive types. In Ruby,
absolutely everything can have a method or an instance
variable associated with it.
Among other things, this means that the standard way of
making something happen looks like thing.method. Here, we
use the File class methods to open a new file to write. Note
that unlike in (for example) Java, you dont have to explicitly
use new() to create a new object of a particular type. nbk is
automatically set up as a File object to test this, you can run
that line in irb then type nbk.class.name to return File.
We can then call .puts string on it, and close it.
Set the files execute bit, run it with ./notebook.rb, then
take a look at notebook.txt.
There are a couple of alternative methods you could use
to write to the file; both nbk.write My String\n, and nbk
<< __My String\n will work. However, with both of these
you need to explicitly add the newline, which puts will
automatically add.
As it stands, this method will clobber any data that already
exists in the file every time you run it. To add more data on
the end of the file, you need to use append mode instead of
write mode, using File.open(notebook.txt, a). Make that
change now, so your notes file will get steadily bigger.
What about reading the data back? Add these lines at the
end of notebook.rb:
nbk_read = File.open(notebook.txt)
while line = nbk_read.gets do
puts line
end
nbk_read.close
We open the file again (you might want to reorder the lines
to avoid opening it twice), then start a while block. In Ruby,
the syntax for this is while CONDITION do on a single line,
followed by the block to perform on each repetition, then end
to mark the end of the block. Here, the while block repeats
for as long as gets returns a line from the file (myfile.gets
takes a single line at a time from myfile). All we do in the
block is to output that line with puts, to standard out
(the console).
To create a new note, we want to be able to get user input.
The most basic way to do this is with a single string.
Comment out the write to file section of your notebook.rb

file, and edit it to look like this:


nbk = File.open(notebook.txt, a+)
while line = nbk.gets do
puts line
end
puts Enter a new note
note = gets
nbk.puts note
nbk.close
We open the file with the parameter a+ this means to
open for reading and appending. So well read and output the
existing notes, then ask the user for a new one, using puts to
output the query string to the console, and gets to get a
string from the console (standard in). We then use nbk.puts
to write the string (our new note) to the file.
Run this, and youll immediately notice that while you do
get asked for a new note, you dont see the old ones. But if
you add the note then look at notebook.txt, all your old notes
and your new one are still there. Whats happened? The
answer is that a+ automatically positions the IO stream at the
end of the file, ready to append. To read back all the old notes,
youll need to add this line before the while loop:
nbk.rewind
This repositions the IO stream at the start of the file. Note
that if you do a writing operation after this, the stream will
automatically move to the end of the file, so you wont clobber
anything. Using gets and puts like this automatically does
the Right Thing with newlines; each added note is put on its
own line. In other circumstances, you might want to lose the
extraneous newline from your new note, which you can do by
using note = gets.chomp. You can also make the code even
neater by reducing that last section to a mere three lines:
puts Enter a new note
nbk.puts gets
nbk.close

Quick
tip
Ruby treats both
semicolons and
newline as the end
of a statement. An
operator (+, -, \,
etc) at the end of
a line indicates a
continuation. Other
whitespace is
usually ignored
use the -w switch
to flag up the rare
occasions where it
is used to interpret
ambiguous
statements.

Running version one of the code a couple of times; including fixing an error
where I left an old line hanging around at the bottom of the file.

39

Ruby
Create a class

Quick
tip
When looking for
a method, Ruby
will try the named
class first, then its
parent, up the
inheritance chain.
Here, theres only
one ancestor: the
basic Object class
(check this with
Note.superclass).
You can explicitly
call the ancestor of
a method youre
overriding with
super.

Running the
code on the
notes generated
by the first
version has
errors, as there
are no titles;
once thats fixed,
it runs fine. The
second set of
edits allow me to
input a new note.

40

So far, this has been structured much as a functional


program do one thing, then do the next thing. If you want
to write a more complicated program, youll want to create
your own classes. Well start again with a blank file the code
for this version is in notebook_v2.rb on the ZIP file at www.
linuxformat.com/files/ca2015.zip. This time, well start by
defining a Note class. Each Note object has a title and a body:
class Note
def initialize(title, body)
@title = title
@body = body
end end
initialize is automatically called when you create a new
object by calling Class.new, so you can set up your objects
initial state. Here, Note.initialize takes two parameters,
title and body. By Ruby convention, local variables (and
parameters, like these, that act like them) start with a lowercase letter, while classes start with a capital letter. Each Note
object will have its own title and body, so each object will have
instance variables for title and body. Instance variables in
Ruby always begin with @. Here, we have @title and @body.
To test this, add these lines under the class definition:
myNote = Note.new(Note 1, this is a note)
puts myNote.inspect
This creates a new Note object with the given title and
body, then uses the inspect method to take a look at the
object. Run this, and you should get this output:
#<Note:0x10e092978 @body=this is a note, @title=Note
1>
It looks like its done the right thing, but the formatting
isnt great. Objects in Ruby have a standard method, to_s,
which will output the object as a string. However, if we try
puts myNote.to_s, well just get the output
#<Note:0x10e092978 the object ID, which isnt that
useful to us. To solve this problem, we can override the to_s
method for our Note class. Add this method definition inside
the Note class definition, after the initialize method:
class Note
def to_s
Note: #{title}, #{body}
end end
myNote = Note.new(Note 1, this is a note)
puts myNote.to_s

For the first time, were using rather than this is


because allows variable interpolation, while doesnt, and
here we want to use interpolation. We could also write #{@
title}, or even #title. The preference in the Ruby community
seems to be for #{title}. Now if we run the program, well get
the more useful output Note: Note 1, this is a note.
Next, what if we want to access just the title of the note,
or just the body? The instance variables title and body are
private to their specific object; no other object can access
them. This is useful in that it avoids objects changing other
objects accidentally. But it does mean you need to do
something explicit if you want to be able to access them. We
could write a couple of methods to do that, by adding this in
the class definition:
class Note
def title
@title
end
def body
@body
end end
myNote = Note.new(Note 1, this is a note)
puts myNote.title
puts myNote.body
This will output the title and the body. However, because
this is such a common thing to want to do, Ruby provides you
with a shortcut method, attr_reader. Replace the title and
body methods we just added with this:
class Note
attr_reader :title, :body
end
myNote = Note.new(Note 1, this is a note)
puts myNote.title
puts myNote.body
You could make only the title, or only the body, accessible
via attr_reader. The :foo notation creates a Symbol object
that corresponds to the @foo instance variable, allowing you
to manipulate it via meta methods like this. You might also
want to be able to set the instance variables, and sure enough
theres a convenient shortcut method for that, too:
class Note
attr_writer :title, :body
end
myNote = Note.new(Note 1, this is a note)
myNote.title = Note 1 edited
puts New title is: + myNote.title
To create getter and setter methods both at once, use the
shortcut attr_accessor :title, :body. Next, add a few lines to
the program to request a second note:
puts Enter new note title
myTitle = gets.chomp
puts Enter new note body
myBody = gets.chomp
myNote2 = Note.new(myTitle, myBody)
puts myNote2.to_s
This prompts the user for a new note (title and body),
creates a new Note object, then outputs it. One question you
might be interested in is how many notes there are in total?
To keep track of this, you need to use a class variable; a
variable that exists only once, for the Note class as a whole,
and is incremented every time you create a new Note:

Ruby
Documentation
Its always a good idea to document your code
clearly for others (or for yourself at a later
time!). One popular option for Ruby is TomDoc
(http://tomdoc.org). Heres how that looks
with the final version of our Note class:
# Public: class to define a note. class Note
@@notes = 0
@@notebook_file = notebook.txt
# Public: Initialize a Note.
#
# title - The String title of the Note.
# body - The String body of the Note.
def initialize(title, body)

# ... code here ...


end
# Public: Write Note to file.
#
# Returns nothing.
def write_to_file
# ... code here ...
end
# Public: Class method to return name of
notebook file.
#
# Returns the String name of the notebook file.

class Note
@@notes = 0
def initialize(title, body)
@title = title
@body = body
@@notes += 1
end
def Note.total_notes
Total notes: #{@@notes}
end end
Class variables are written as @@foo. We set it at the top
of the class, then increment it in the constructor every time
a new Note is created. To find out the value of a class variable,
we can create a class method, using Class.classmethod, as
here. Add this line to the end of the file, after youve added
your two notes, to call the method:
puts Note.total_notes
You can also refer to a class variable within a regular
instance method, so you could do the same thing with an
instance method:
class Note
def total_notes
Total notes: #{@@notes}
end end
puts myNote.total_notes
However, that means having to call it via a particular note.
Conceptually, it makes more sense to use a class method.
Another way to refer to a class method is to use self.total_
notes. Its just a matter of preference. You may have noticed
that in this version of the program, your notes dont last from
one instance of the program running to the next one. Lets roll
in the File interaction we used before to write out to a file. To
see the version of the code for this last part of the tutorial,
download the archive from www.linuxformat.com/files/
ca2015.zip. Add this method to the Note class:
class Note
@@notebook_file = notebook.txt
def write_to_file
nbk = File.open(@@notebook_file, a)
nbk.puts(@title + , + @body)
nbk.close
end end
myNote = Note.new(Note 1, this is a note)
myNote.write_to_file
# comment out the rest of the file for now, for ease of testing
Our write_to_file method does what it says on the tin:
writes a given note to the end of the general notebook file
(defined as a class variable). Run this, then have a look at

def self.notebook_file
# ... code here ...
end
# Public: Gets/Sets the String title and body of
the Note.
attr_accessor :title, :body
You should state what the method does,
describe any arguments, and give a return value.
Constructor (initialize) and attribute (attr
accessor, etc) methods can be shorthanded as
shown here. TomDoc is designed to be both
human-readable and machine-parsable; check
the webpage out for more information.

notebook.txt and you should see the note added.


What about reading your notes back? This shouldnt be
an instance method, as we want to be able to read back the
existing notes independently of any specific note. We could
do it either as as part of the command flow of the program
(outside the class altogether), or as a class method.
Here it is outside the Note class:
class Note
def self.return_file
@@notebook_file
end end
nbk = File.open(self.return_file, r)
while line = nbk.gets do
note = line.split(,)
thisNote = Note.new(note.first, note.last)
puts thisNote.to_s
end
The variable note is used as an array; but Ruby doesnt
insist that you declare variables or their types in advance,
so we just go ahead and use it. line.split(,) splits line on
comma. If you miss out the argument (line.split) you would
split it on whitespace, which is no good for us, as we can have
whitespace within either a note title or note body. We then
use the first and last variables of the array to create a new
Note, and print it to screen. As were not doing anything with
thisNote other than printing it, we could reduce this further:
while line = nbk.gets do
puts line.split(,)
end
This will output each title and body on a separate line.
If you use the print command, you wont get any spaces.
Creating thisNote and using to_s gives you more control
over the output format. Q

Quick
tip
You might want to
error-check here
that you only have
two values in the
array. Try:

if note.length
!= 2
puts There
is a problem:
note has too
many fields!
next
end

Experimentation
Irb, Interactive Ruby, is a great tool for
experimenting and trying out code
snippets. Making good use of irb can
really speed up code production. For
example, if you enter a string, then a
dot, then hit Tab, irb will give you a list
of the methods you can use on a String
object. Since in Ruby everything is an
object, this works for anything you
input. If completion isnt working, try
irb --readline -r irb/completion.
ri provides online Ruby documentation.

To see all the classes ri knows about,


try ri -c; then try ri FileUtils to see
documentation for the FileUtils class.
To get documentation on a specific
method, try ri String#split.
You can also install the Ruby
Documentation Bundle for easy access
to a bunch of resources, including the
free version of Programming Ruby: The
Pragmatic Programmers Guide (aka
the Pickaxe), an FAQ, and a couple of
tutorials. Its also available online.

41

Ruby

Ruby: Add a
little more polish
Build on your Ruby reorganise your code, learn about modules
and blocks, and introduce a few tests with Juliet Kemp.

n the first part of this series of introductory Ruby tutorials,


we got started with Ruby and wrote a basic single-file
notebook program. In this next section, well improve the
overall structure, interface and usability of the code, looking at
how best to organise it and split different parts out. Well also
introduce command-line option parsing, learn about

modules, and do a bit of testing. Last time, I suggested using


RVM to install Ruby and a couple of other bits and pieces.
Its a good idea to update RVM fairly regularly, as a new stable
release comes out every month or two. Its easy to do: just
type rvm get stable, and check for any notes that appear in
the output.

Organise your code


In the previous tutorial, all of our code was in a single file. This
is fine for getting started, but as soon as your project gets to
be any reasonable size, it is likely to become confusing.
Single-file code is also less likely to be reusable by other
projects, and it is harder to write automated tests, because
you cant test parts of the code without having to run the
whole program.
If you look at the code from the previous tutorial, we have
the definition of a Note mixed up with the logic that creates
test Notes. It would
be much easier to
read if we break
that out into
separate files.
Unlike some
other languages
(such as Java), Ruby doesnt enforce any particular
organisational standards.
But there is a set of conventions emerging from within the
community (which are also used by the RubyGems system,
which well look at in the next tutorial), so well take that
approach here.
Even with a very small program, its worth getting the hang
of structuring things like this, so its a habit when you start on
larger things.

As your project gets to


any reasonable size, itll
become confusing.

Lets take a look at our notebook program the original is


in the ZIP file (www.linuxformat.com/files/ca2015.zip) as
notebook_old.rb. Currently, it has several sections:
The Note class, which defines a Note and a couple
of methods.
Writing a test note to file.
Opening the file and reading the notes back.
Getting another note from the user.
The second one of these (writing a test note), is a bit of a
red herring; its really more of a test, or a proof-of-concept, so
well ignore it. We can divide the code up then into three
operations: a class with the Note definition, a section which
reads Notes back from a file, and an input interface.
Effectively, the first two of these are library files, and the final
one is a command-line interface.
The Ruby convention, then, is a directory set up like this:
notebook/
# top level
bin/
# command line interface
lib/
# library files
test/
# test files
Were also going to set this all up as a module; if other
people want to use parts of this code in future, its not good
practice to have it all sloshing around in the top-level
namespace (see the boxout for more on modules, and a brief
introduction to mixins, which we wont be using here). So well

Modules and mixins


Modules are a way of grouping classes together
to use the same namespace. This eliminates
problems of a method that is named the same
as another unrelated method from another
class. So if light is a method concerning the
electromagnetic spectrum, but light also
concerns the set of things which are not heavy,
you can identify them with EMSpectrum.light()
and Weight.light() (or Mass.light(), but thats

42

another discussion). Modules remove the need


for multiple inheritance, by providing mixins.
If you include a module within a class definition:
require bar
class Foo
include Bar
# .... end
then the class gains access to all of that
modules methods and variables. Note that you

do need to require the file that the module


definition lives in; include doesnt do that for
you. Its also important to remember that
include doesnt copy the methods; it creates
a reference to them. If you change any Bar
methods in Foo, theyll also be changed for any
other modules that include Bar as well.
Mixins can get more complicated and useful
than this, but well tackle that in another tutorial.

Ruby
call the module Notebook, and the Note class will be
Notebook::Note.
This also means that its good practice to create a
lib/notebook/ subdirectory to keep the library files in, for
ease of navigation at a later date. A Notebook::Foo module
will then be in lib/notebook/foo.rb, which makes it easy
to find.
So, we can shift the Note class wholesale into lib/
notebook/note.rb, and just add a Module line at the top:
module Notebook
class Note
# see the DVD for the code
end end
Next, well shift the reading-in-from-file code into lib/
notebook/reader.rb. Currently, this code looks like this:
nbk = File.open(Note.notebook_file, r)
while line = nbk.gets do
note = line.split(,)
if note.length != 2
puts There is a problem!
next
end
thisNote = Note.new(note.first, note.last)
puts thisNote.to_s
end
That doesnt look much like a class just yet. What we want
is a Reader constructor that takes a file as its parameter,
then has a read method that reads and outputs the notes
(of course, this isnt the only way to organise this code; you
can probably think of several different ways off the top of your
head. Feel free to play around with them).
Lets rewrite Reader:
module Notebook
# Public: class to read back notes from a file
class Reader
# Public: initialize the reader
#
# file - The String name of the file to read in from
def initialize(file)
@file = file
end
# Public: read back from the notebook file
#
# Returns nothing
def read
File.open(@file, r) do |f|
while line = f.gets do
note = line.split(,)
if note.length != 2
puts There is a problem!
next
end
puts Note.new(note.first, note.last).to_s
end
end
end
end
end
This demonstrates an important feature of Ruby: the
block. The while...do...end is one style of block, which is
probably familiar to you from other languages, and does what
youd expect: execute the code between do and end while
f.gets continues to return lines.
The other block is here:
File.open(@file, r) do |f|

Ruby in blocks
Blocks in Ruby are used to interact with
methods. The basic format is:
variable.method do |n|
# put some code here that operates on
the variable n
end
The variable n is identified by method,
then the code block is passed into
method, and is applied to each n in
turn. So when opening a file, File.
open(@file, r) do |f| opens a file and
creates a filehandle, here f. The code
block after this will then be applied to f.
We wont delve into the code that
makes this possible just yet, but this

highly extensible way of hooking an


arbitrary piece of code into an existing
method is one of the things that makes
Ruby so powerful, and fun, to use.
Blocks can be either enclosed in
braces, or with do... end. The Ruby
standard is to use braces for single-line
blocks, and do...end for multi-line
blocks.
In the next tutorial, well delve into
blocks a bit more, and look at how to
write methods that use blocks. For now,
just get accustomed to the syntax and
the way blocks are used in existing
Ruby methods.

# execute code here


end
This block of code is executed on the filehandle returned
by File.open(@file, r), as represented by f. One of the
advantages of using this syntax is that the file will
automatically be closed once the block is finished.
Blocks are a powerful and important part of Ruby, but they
can also take a little getting used to. See (Ruby in Blocks box,
above) for a bit more detail.
Now we can write the bin/notebook command (no .rb
extension commands usually dont have an extension).
We want this to have as little in it as possible:
require_relative ../lib/notebook/runner
runner = Notebook::Runner.new()
runner.run
Spot the deliberate mistake here: we dont yet have
a Runner module.
Next, then, dump all the rest of the code from the original
version into notebook/lib/notebook/runner.rb, with a little
bit of a rewrite to make it into a module:
require_relative note
require_relative reader
module Notebook
class Runner
def run
reader = Reader.new(Note.notebook_file)
reader.read
puts Enter new note title
myTitle = gets.chomp
puts Enter new note body
myBody = gets.chomp
myNote2 = Note.new(myTitle, myBody)
myNote2.write_to_file
end
end
end
Where require looks in the standard library files, require_
relative looks at a path relative to the current file. So its
a good choice for loading files within your project.
This class doesnt need an initialize() method, as it has no
instance variables or anything else that needs initialising; its
just created as a blank object to hold the run() method.
Youll need to touch notebook.txt (in the main
notebook/ directory) to avoid getting a missing file error;
then run ruby -I lib bin/notebook, and you should see any
existing notes printed to screen, before being asked to input
a new note.

Quick
tip
To make the
program create the
file if it doesnt
exist, go back to
Reader.read and
replace notebook
= File.open(@file,
r) with notebook
= File.open(@file,
a+). This is readappend mode and
creates a file if that
file doesnt exist.

43

Ruby
Testing, one, two, three

Quick
tip
Running a test
suite is easy. Just
create test/test_
suite.rb, with the
following lines:

require_relative
test_note
require_relative
test_reader
and run it with
ruby notebook/
test/test_suite.rb.

Testing is an important part of software development, and the


way that our code is now set up is intended to make it easier
to test. Theres a standard Ruby test framework, Test::Unit,
and a useful gem, shoulda, to help you out a bit more. Install
this with gem install shoulda (youll likely get quite a lot of
other stuff along with it).
Shoulda is a test library that helps you to write clearer
tests. With shoulda, you can provide context for tests, so you
can group them by feature or scenario. Specifically, it allows
you to write context, setup and should blocks, which combine
to create specific tests.
Lets take a look at this in action. Well create some tests
for note.rb in test/test_note.rb. Here well look at one of
them in detail:
require test/unit
require shoulda
require_relative ../lib/notebook/note
class TestNote < Test::Unit::TestCase
context With no notes do
should have total notes return 0 do
assert_equal Total notes: 0, Notebook::Note.total_notes
end
end
context With notes do
setup do
@title = test_title
@body = test_body
@test_note = Notebook::Note.new(@title, @body)
end
should output title and body string for to_s do
assert_equal Note: #{@title}, #{@body}, @test_note.to_s
end
should have last line of file equal to test note values after
write_to_file do
@test_note.write_to_file
@last_line = `tail -n 1 notebook.txt`
assert_equal #{@title}, #{@body}, @last_line.chomp

Running tests, and looking at the structure of the module.

44

end
should have total_notes return 2 do
assert_equal Total notes: 2, Notebook::Note.total_notes
end
end
end
This generates these tests:
test: With no notes have total_notes return 0
test: With notes should output title and body string for to_s
test: With notes should have last line of file equal to test
note values after write_to_file
test: With notes should have total_notes return 2
The string labels for the context and should blocks are
just labels; they can be whatever you like. Within the context
block, the setup block will be run once for each test. You can
also nest context blocks if you want (see the boxout earlier for
more on blocks).

Testing
assert_equal does what youd expect from the name.
The first argument should be what you expect, and the
second argument what you actually get. Theres a full set
of assertions available via Test::Unit, and documented at
RubyDoc; they include assert_not_equal, assert_raise,
assert_throws, and so on. Shoulda also adds assert_
contains and assert_same_elements, for working
with arrays. Run this with ruby test/test_note.rb. It turns
out that we get a failure:
1) Failure: test: With notes should have last line of file equal
to test note values after write_to_file. (TestNote) [notebook/
test/test_note.rb:32]:
<test_title, test_body> expected but was
<test_title,test_body>.
Theres a misplaced space in there. We need to decide
whether we want the space (in which case we edit the code),
or dont (in which case we edit the test).
Note that the tests run in alphabetical order, first of the
context blocks, and then of the should blocks, within each
context block. This means that our zero notes test, for
example, needs to be alphabetically first, or the class variable
total_notes will already be incremented.
If you have a situation like this, you could consider either
running the tests separately, or numbering the tests to be
clear about the requirement. The same issue happens with
have total_notes return 2; the number returned will depend
on how many other tests are run before this one. In fact, this
also draws attention to a problem with total_notes more
generally: as things stand, it only keeps track of notes created
during this session, rather than tracking how many notes are
in the notebook file. This is a code problem rather than a test
problem, though!
We could also run some more tests for example, we
could change the code to require that either or both of the
note title and body are non-nil, and test accordingly.
Another issue with the tests as they currently stand is that
our test are messing up our actual notebook file! It would
certainly be worth changing the code to use a test notebook
but in fact, we want to be able to specify a notebook file
anyway, so well leave that til the next section.
We can also write some tests for the Reader class:
class TestReader < Test::Unit::TestCase

Ruby
context read do
should return nothing when reading test file do
test_file = testfile
note = Test Note,test body
File.open(test_file, w) {|f| f.write(@note) }
reader = Notebook::Reader.new(test_file)
assert_equal nil, reader.read
File.delete(test_file)
end
end
end
You may have noticed the problem here: the Reader class
currently just outputs to the console, and doesnt return
anything.
There are ways to test console output, but we wont go
into them here because were going to do a bit of rewriting of
this class anyway. Youll see another way of writing a File
block, though:
File.open(test_file, w) {|f| f.write(@note) }
As with the Reader class, this opens the file, runs the block
(in {}) on it, then closes it when the block is finished.
Remember to delete the test file afterwards!

Another failed test; this time Id removed a piece of code


but not the corresponding test.

Options and minor improvements


Another improvement would be to add in some commandline option parsing. The Options class provides an API for that
install it, if you dont already have it, with gem install
OptionsParser. The first option well add is one to take
a reference to a notebook file, or to use the default notebook.
require optparse
module Notebook
class Options
DEFAULT_NOTEBOOK = notebook.txt
attr_reader :notebook
def initialize(argv)
@notebook = DEFAULT_NOTEBOOK
parse(argv)
end
private
def parse(argv)
OptionParser.new do |opts|
opts.banner = Usage: notebook [ options ]
opts.on(-n, --notebook PATH,
String, Path to notebook file) do |notebook|
@notebook = notebook
end
opts.on(-h, --help, Show this message) do
puts opts

exit
end
opts.parse!(argv)
end
end
end
end
Were going to set up an accessor method for
@notebook, so it can be used elsewhere in the code. Then
initialize() just takes the arguments passed in when creating
the class, and runs the private parse() method. This is where
the work is done. Check out the box for the details on the
various OptionParser methods.
Now we need to fix up Runner to use the options:
require_relative options # as well as other files
class Runner
attr_reader :options
def initialize(argv)
@options = Options.new(argv)
end
def run
reader = Reader.new(@options.notebook)
# ... rest of code as before
end

Parsing options
When creating a new OptionParser, you use
a do block to set up how it behaves in various
situations. (new() yields itself when called with
a block see the other boxout for more
on blocks).
opts.banner() creates a heading banner for
any output produced.
opts.on() adds an option switch and handler
for that switch. The first argument is a short
switch (you could miss this out if you prefer),

and the second one is a long switch with


a mandatory argument. To specify, instead, an
optional argument, you would use
--notebook [PATH]. For a switch with no
argument, use --notebook. Note that if you
dont specify an argument here, an argument
passed in on the command line will be silently
ignored. We also tell OptionParser to cast the
argument to a String, and provide a description
string. The do block then acts on the command-

line argument, referred to by |notebook|.


The help option demonstrates an option
switch without an argument. Once all the option
switch handlers are set up, opts.parse!() parses
whatever has actually been passed in on the
command line, removing each one as it is dealt
with (parse would leave them in place).
Theres scope to get much more complicated
and detailed with OptionParser, but its
straightforward once you have the basic idea.

45

Ruby
end
And edit notebook/bin/notebook to pass in commandline arguments:
require_relative ../lib/notebook/runner
runner = Notebook::Runner.new(ARGV)
runner.run
If you try running it without an argument, it should now
work; but if you try bin/notebook -n myfile, nothing will be
written to myfile. This is because we still have the notebook
file hard-coded in lib/notebook/note.rb. So lets take out

any references to @@notebook_file in the Note class


(including self.notebook_file), and rewrite write_to_file to
take an argument:
def write_to_file(file)
File.open(file, a+) { |f| f.puts(@title + , + @body) }
end
Youll notice that were using that same handy block
syntax again. Back to the Runner class, and edit the write_
to_file line:
myNote.write_to_file(@options.notebook)

This or that?

Trying out
various option
switches from
the command
line.

46

The Runner here is still pretty basic it outputs what you


have, and adds something else. It would be good to have the
option to do one or the other. Back to Options for a bit more
parsing. Add this into the class:
class Options
attr_reader :notebook, :add, :read
def parse(argv)
@add = false
@read = false
OptionParser.new do |opts|
# .... as before
opts.on(-a, --add, Add a note) { @add = true }
opts.on(-r, --read, Read back notes) { @read = true }
# .... as before
end
end
end
Note that you can do these easy options as a single-line
piece of code, using { } rather than do end to set off the block.
This can improve readability when used judiciously;
but always bear in mind that its better to use a couple more
lines and have more readable code, than to crush it all onto
one line.
While were here, well also add something to deal with an
invalid argument. Replace the line opts.parse!(argv)
with this:
begin
opts.parse!(argv)

rescue OptionParser::InvalidOption => e


puts e
puts opts
exit(1)
end
begin/rescue/end is the Ruby way of handling
exceptions. The begin block contains code that might throw
an exception, and the rescue block, or blocks, handle specific
exceptions. You can also add an else block, which runs if there
are no exceptions, and an ensure block, which runs whatever
happens, before closing it out with end.
Now edit Runner to deal with these new options:
class Runner
def run
reader = Reader.new(@options.notebook)
if @options.read
reader.read
end
if @options.add
puts Enter new note title
title = gets.chomp
puts Enter new note body
body = gets.chomp
note = Note.new(title, body)
note.write_to_file(@options.notebook)
end
end
end
Try out your new arguments, and you should be able to
either read back your old notes (ruby bin/notebook -r),
or add a new one (ruby bin/notebook -a). If you try an
invalid switch, eg, -f, youll get the help output.
By default (ie, without any switch), this now does nothing;
to make it either read or output by default, change the default
value at the top of Options.parse. Note that you may also
want to change the behaviour of the switch further down
eg, if you set @read = true at the top, then unless you set
@read = false when you parse the -a switch, youll both read
back old notes and add a new one.
You may have realised at some point during this tutorial
that total_notes is no longer doing the Right Thing, as it only
tracks notes within a particular session, rather than reading
the total from the file.
For now, just delete it and any references to it, as were
going to be arranging the notes a bit differently as of the next
round of edits in the next tutorial.
In the next and final tutorial in this series, well look at
blocks a bit further; find out more about data storage so we
can edit and delete notes; look at packaging, gems, and rake;
find out more about mixins; and discover a few more bits and
pieces of Ruby syntax and usage along the way. Q

THE BEST LINUX


TUTORIALS FOR 2015!

OUT
NOW!
DELIVERED DIRECT TO YOUR DOOR
2UGHURQOLQHDWwww.myfavouritemagazines.co.uk
RUQGXVLQ\RXUQHDUHVWVXSHUPDUNHWQHZVDJHQWRUERRNVWRUH

SERIOUS ABOUT
HARDWARE?

NOW
ON APPLE
NEWSSTAND
Download the
day they go
on sale in the
UK!

PERFORMANCE ADVICE
FOR SERIOUS GAMERS
ON SALE EVERY MONTH

MASTER STEAM HOME STREAMING TODAY

SUPERTEST Z97 MOTHERBOARDS INTHE LABS

MAKE YOUR
OWN GAMES
Build a platform
game in Minecraft

SSD GROUP TEST

SUPER SIZED
+SUPER FAST
THROW OUT YOUR HARD DRIVE

High-capacity SSDs From just 36p per GB


All the latest controllers rated
NO.1 FOR
REVIEWS

GET READY FOR


ELITE: DANGEROUS

Gain an unfair advantage


with the best controllers around

4K ON A
BUDGET
499 AOC 4K screen
Tweaking for hi-res
High-end gaming rigs

NEXT-GEN CPU

DEVIL'S
CANYON

PLUS
Screenshots that
look awesome
Clean up your audio
Build your own
music server
Stream to Twitch
easily with Raptr

SAMSUNG 850
PRO 512GB

The world's rst 3D V-NAND


solid-state drive in the labs

BUILD A
BUDGET
GAMING PC

Complete systems from just 337


AMD Kabini vs Intel Pentium
More powerful than next-gen consoles

THE BEST
GAME ENGINES

Intel tweaks Haswell


for top performance

CREATE CHARACTERS
How games developers turn
concepts into powerful heroes

AORUS M7 THUNDER AMD A10-7800


ASUS ROG G550 PLUS LOADS MORE!

OPTIMISE BOOT TIMES

MAKE YOUR OWN MUSIC

How to tweak the Windows boot


process to load your OS quicker

How to create tunes from scratch


with the best free tracker tools

NVIDIA
SHIELD

THE IDEAL TABLET


FOR GAMERS!
FULL REVIEW
INSIDE

What next for Nvidia, Intel and AMD?


Next-gen CPUs, GPUs and more...
Ultra-fast high capacity SSDs

NO.1 FOR
REVIEWS

GIGABYTE Z97
GAMING 5
INTEL CORE
i5-4690K
APPLE iMAC
21-INCH
PLEXTOR M6e
M.2

PLUS
Speed up Windows
The best gaming
headsets revealed
Master PC audio
Make awesome
pixel art in GIMP

VS

SCREEN WARS

ASUS RoG
SWIFT & LG
ULTRAWIDE

G-Sync smoothness
takes on cinematic
gaming immersion

Making your own games


has never been easier

AMD'S FUTURE VISION


The technologies that will make
AMD a force to be reckoned with

NO.1 FOR REVIEWS

FUTURE
PC TECH!

NEXT-GEN SSD

NO.1 FOR
REVIEWS

CRUCIAL MX100
CORSAIR RAPTORSSD
K40
GAMDIAS EROS
GIGABYTE P34G

JUST
IN

INTEL'S NEW Z97


MOTHERBOARDS

Should you upgrade to Intels


latest motherboard chipset?

SUPERTEST THE BEST 4K DISPLAYS AVAILABLE

TROUBLESHOOTING TIPS

HOT NEW GAMING TECH

Machine wont boot? Well get


it up and running again!

Discover whats getting


game developers excited

Ruby

Ruby: Modules,
blocks and gems
Learn more about modules and mixins, blocks and yields, and
how to get your code out there, with Juliet Kemp.

n the previous two tutorials we got started with Ruby,


learnt some more syntax and structures, and began to
organise our code in the way thats expected in the Ruby
community. In this tutorial, well find out more about mixins
(the other big use of modules in Ruby), blocks and yields (one

of Rubys neatest features), and finally, how to package your


code as a gem for ease of install and sharing with others.
Well be building again on the code used in the last tutorial,
the Notebook command-line tool to collect short notes (see
www.linuxformat.com/files/ca2015.zip for the code).

Data structures and storage


At the end of the last tutorial, we had code that could read our
notes back from a file, print them out to screen, and add one
to the end. What this doesnt allow for is deleting or editing
any notes, since the notes arent saved at any point.
To delete or edit notes, well need to read them into a data
structure so we can refer to them elsewhere. Ruby has the
standard data structures, including hashes and arrays, so
lets stick with the straightforward option and put our notes
into an array once read in (wed need to add a key to use
a hash, as neither title nor body is guaranteed to be unique).
class Reader
attr_reader :notebook
def read
notebook = Array.new
File.open(@file, a+) do |f|
while line = f.gets do
@@total_notes += 1
note = line.split(,)
if note.length != 2
puts There is a problem!
next
end
notebook << Note.new(note.first, note.last)
end
end
notebook.each { |x| puts x.to_s }
end
# rest of class

end
This creates an array, and adds each Note to it with <<,
the shovel operator, which adds an item to the end of the
array. It then outputs the whole array. Run this, and you
should see an output a bit like this:
Note: argh, bin
Note: a, b
Note: any, ping
Note: my, dog
Note: test_title, test_body
The main issue is that there is no easy way for the user to
reference each note. Replace that notebook.each line with:
notebook.each_with_index { |val, index| puts #{index}:
#{val} }
We dont need to explicitly call the to_s method; since
were referring to our Note in a String context, Ruby will
automatically use the appropriate to_s method (while were
at it, though, edit the to_s method to remove that extraneous
Note: string). Run this, and your notes will have an index
associated with them:
0: argh, bin
1: a, b
2: any, ping
3: my, dog
4: test_title, test_body
But how are we going to interact with these? More
perturbingly, if you start experimenting and try to refer to this
array from another file, youll find its empty. Whats going on?

Flexible initialization
When you use Foo.new in Ruby, it calls Foo.
initialize. In our code so far, we have some
initialize() methods without any arguments,
and one with an argument (Reader.
initialize(file)). If you call Reader.new with no
argument, Ruby will throw an error. But what if
we wanted to set a default value? (in our code

the default value is set in Options, but we could


move it) We could then either specify a
notebook file, or call initialize without an
argument and use the default. Ruby provides
a way to do this without having to write multiple
constructors:
def initialize(file = notebook.txt)

@file = file
end
This will use a variable passed into the
constructor if there is one (Reader.new(foo.
txt)) and notebook.txt if not. You could also set
a constant earlier in the file and use that (def
initialize(file = DEFAULT_NOTEBOOK)).

49

Ruby
Singletons
The problem with the code on page 49 is that every time you
create a new Reader, youll also create a new notebook array,
which will make it impossible to be sure that youre always
referring to the same array (or that the array has any notes
in it).
What we want instead is a Singleton class, which can be
instantiated only once. Happily, Ruby provides a module to do
this. Well create a singleton NoteStore class to go alongside
Reader, and move some of our functionality into that:
require singleton
module Notebook
class NoteStore
include Singleton
attr_accessor :notebook_array
def initialize
@notebook_array = Array.new
end
def add(note)
@notebook_array << note
end
def edit(number)
new_note = @notebook_array[number].edit
@notebook_array[number] = new_note
end
def delete(number)
@notebook_array.delete_at(number)
end
def output
@notebook_array.each_with_index { |val, index| puts
#{index}: #{val} }
end
end
end
You may at this point notice that this all looks quite clear
and straightforward, which is often a good sign that your code
is doing the right thing. The reading in will still be done in
Reader (see below), but this new class is used to store and
access the array data.
Were doing much the same as we were with the array in
Reader. The magic happens with that include Singleton line.
This makes NoteStore use the Singleton module, which is
an example of how Ruby uses modules as mixins, to provide
an inheritance mechanism. See the boxout (Modules, classes
and mixins, oh my, p51) for more on this.

Singularity
What the Singleton module does, among other things, is to
disable the new() method and add an instance() method.
The first time the class is called, a new object will be created;
but the new() method is private, so it cant be called by any
other classes or modules. NoteStore.new would return
an error.
This ensures that there is one, and only one, instance of
the class. It also creates a method called instance(), which
allows you to access this single instance of the class.
To make use of this, Reader now looks like this:
class Reader
# no @notebook or @total_notes variable needed
def read
File.open(@file, a+) do |f|
# while loop as before, but take out total_notes

50

NoteStore.instance.add(Note.new(note.first, note.last))
end
NoteStore.instance.output
end
def write_all
File.open(@file, w) do |f|
NoteStore.instance.notebook_array.each { |x| f.puts(x) }
end
end
def total_notes_string
total_notes = NoteStore.instance.notebook_array.length
Total notes: #{total_notes}
end
end
Reading in from the file adds each element to the array in
our NoteStore instance, then we output it to the user. Writing
the array out again (once weve edited it) overwrites the
existing file content, again accessing the contents of that
NoteStore array. And we can use the array length to get our
total notes info. NoteStore then calls a Note.edit method, so
lets next write that:
class Note
def edit
puts Current title is: #{title}; enter new title or enter to
keep
new_title = gets.chomp
if (new_title != )
@title = new_title
end
puts Current body is: #{body}; enter new body or enter to
keep
new_body = gets.chomp
if (new_body != )
@body = new_body
end
return self
end
end
Again, this is straightforward. If theres a new title or body,
we change the values in the Note, and return the Note itself.
In NoteStore.edit, that new note is used to replace the old
one in the notebook array.
Finally, then, Runner and Options need to be set up to fire
all of this off. First, we add an edit option to Options:
class Options
attr_reader :notebook, :add, :read, :edit, :edit_number
def parse(argv)
# rest of code here as before
@edit = false
OptionParser.new do |opts|
opts.on(-e, --edit NUMBER, Integer, Edit a specific
note) do |number|
@edit = true
@edit_number = number
end
end
end
Now we add code to Runner to handle it:
require_relative notebook
module Notebook
class Runner

Ruby
def run
reader = Reader.new(@options.notebook)
# read and add options as before
if @options.edit
reader.read
if @options.edit_number >= NoteStore.instance.
notebook_array.length
puts No note of that number; cant edit!
return
end
NoteStore.instance.edit(@options.edit_number)
reader.write_all
end

end
end
end
The only thing to really draw your attention to here is the
error-handling; we need to check that the number to edit
actually exists within the array.
Try now with ruby bin/notebook -r to see what notes you
have, then ruby bin/notebook -e 2 to edit the note with
index 2, then ruby bin/notebook -r again to see what you
now have. It should all work as expected.
In fact, as you might already have thought, Reader and
NoteStore could just as well be the same (singleton) class; try
making that change to the code yourself.

Quick
tip

Blocks and yields: the lowdown


Weve used blocks in several places in the code, but without
really looking at what theyre doing. Lets try using a block
with a yield to understand what is actually happening under
the surface.
Blocks and yields are one of the most powerful features of
Ruby, so theyre well worth getting to grips with.
Before doing something with our real code, take a look at
a very simple code block.
def my_first_block
puts Starting block...
yield
puts ...and ending block
end
my_first_block { puts Hello }
Run this and it should output
Starting block... Hello
...ending block
The yield statement just spits out what was passed in
between the brackets. This can be multi-line if you want.
So far, so good; but the blocks weve already seen in code use
a parameter (look at the File.open blocks, for example). How
do we set up a block with a parameter?
def things_with_five
yield 5

end
things_with_five { |x| puts 3 * x }
This outputs 15. If you swap in 15 / x for 3 * x in the block,
youll get 3. Now try this:
def things_with_five_and_ten
yield 5
yield 10
end
things_with_five_and_ten { |x| puts 3 * x }
Output is 15, then 30. Effectively, what happens is this
pseudocode:
things_with_five_and_ten:
puts 3 * 5
puts 3 * 10
Each time yield takes the block, sticks first 5, then 10, into
the x variable, and runs the block.
The yield statement runs the code you write in the block,
but applies it in the context of its own method.
Now lets write a NoteStore.do_to_all method, which will
apply a change to every single note, a Note.edit_title method
as a test, and an option in Runner to call it:
class NoteStore
def do_to_all
i = 0;

If you get an error


LoadError: cannot
load such file
-- myfile, check
that all the files
are included in the
s.files line of the
gemspec. If using
git ls-files, make
sure everything is
checked in to git!

Modules, classes and mixins, oh my


In our code until now, weve really only been
using modules for their namespace purposes.
The use of the Singleton module demonstrates
its other purpose: mixins.
One way of thinking of modules is to see
them as providing characteristics, whereas
classes provide things. Since things can have
characteristics, classes can include modules,
and thereby access their characteristics
(methods and variables). This is demonstrated
by NoteStore. The include Singleton line
means that NoteStore includes the instance()
method, the newly private new() method, and
the other rewritten or added methods that
make the Singleton pattern work.
Another example might be if we wanted to
have two different sorts of notes: ones which
were editable and ones which were not.
We could set this up as follows:

An Editable module, describing various


methods which could apply to an Editable thing.
A Note class, which has methods and
variables which apply to (all, including noneditable) Notes.
An EditableNote class, which is a subclass of
Note and includes Editable. Subclasses, in Ruby
as with other languages, can be thought of as a
specialisation of their parent class. It would
look like this:
EditableNote < Note
includes Editable
# rest of class goes here
EditableNote would inherit methods from
both Editable and Note, and could also override
those to make its own versions.
However! Notice that classes cannot inherit
variables. Instance variables in Ruby are created
when a value is first assigned to them. If an

instance variable uses an inherited method that


assigns a value to a variable (for example,
EditableNote might inherit a set_title method
which sets the @title variable), it will then
acquire its very own @title variable. But that
variable wont shadow the instance variable in
the parent class.
Another thing to keep in mind is an important
feature of modules: they cant be instantiated.
Only a class can be instantiated. This means
that you could instead of using Singleton, write
a module as a type of singleton class
(containing variables and methods); however,
the Singleton module is probably a better way of
doing this, as it has done the hard work for you.
Keep an eye out when looking at Ruby code
for ways in which modules are used as mixins,
and how that can make your Ruby code more
flexible and user-friendly.

51

Ruby

Blocks in
action!

while @notebook_array[i]
new_note = yield @notebook_array[i]
@notebook_array[i] = new_note
i += 1
end
end end
class Note
def edit_title(title)
@title = title
return self
end end
class Runner
def run
# ... code as before
if @options.all
reader.read
NoteStore.instance.do_to_all do |n|
n.edit_title(@options.all_title)
end
reader.write_all
end
end
end

(Youll also need to add an all option to the Options class,


which takes a parameter with which well replace all the titles.
This is exactly the same syntax as the other options; see the
link on the Contents page for the code).
So, Runner.run calls the do_to_all method on our
singleton NoteStore, with an edit_title block applied to its
variable n. do_to_all uses a while loop to put each element of
@notebook_array in turn in as n, via the yield statement.
One way to see this is that do_to_all throws one note at
a time back to Runner.run, so it can be substituted in for that
n variable, like this:
new_note = @notebook_array[i].edit_title(@options.all_title)
The do_to_all method then runs its next couple of lines of
code before looping round again. Its acting as an iterator.
In fact, because @notebook_array is an Array, and Arrays
already have an iterator method called each, you can simplify
this further:
def do_to_all
@notebook_array.each do |i|
new_note = yield i
i = new_note
end
end
This time, were nesting our yield/block structures: do_to_
all itself uses a block to access the members of @notebook_
array one by one, and then passes them back up to the block
in Runner.run. Which is pretty neat, if a little hard to get your
head around initially.
OK! With all that in place, try running ruby bin/notebook
-x NOTHING to see all your titles replaced by the string
NOTHING (to use this in anger, youll probably want to think
of some other methods to apply to your Notes; you might
want to append a string to the title or body instead, for
example, or even create an interactive method which allows
you to make a change to each title one by one).
Blocks and yields are one of the most powerful aspects of
Ruby, so keep playing around with them, and look out for
them in other methods and classes that you use. Youll soon
notice that they show up nearly everywhere, and that too will
help you get used to their structure and uses.

Building, rake and gems


Once youve produced a decent piece of code, the next thing
you might want to do with it is to share it with the Ruby
community; and the standard way of doing that is with
a gem. You may have already used gems effectively, theyre
packages for Ruby, and RubyGems provides a straightforward
package management system. As of Ruby 1.9, its part of the
standard Ruby install, so you shouldnt need to do anything
more to use gems. The standard commands to manage
already-existing gems are:
gem install mygem
gem uninstall mygem
gem list --local #lists installed gems
gem list --remote #lists available gems
What about creating your own gem? Weve already set our
Notebook code up in a gem-like way, with bin/, lib/, and
test/ directories. There are a couple of things were missing,
though, before we can package up our gem.
First, its customary to have a notebook.rb, which just
sets up our other library files:
require notebook/note

52

require notebook/notestore
require notebook/options
require notebook/reader
require notebook/runner
This helps to ensure that namespaces work properly and
no one steps on anyone elses toes.
Well also rewrite notebook/bin/notebook just a little to
fit the new gem structure:
#!/usr/bin/env ruby
begin
require notebook
rescue LoadError
require rubygems
require notebook
end
runner = Notebook::Runner.new(ARGV)
runner.run
The begin/rescue/end control structure here avoids
requiring RubyGems. If someone is not using RubyGems to
manage their path, you dont want to force them to do so. But

Ruby
its reasonable to use RubyGems in the rescue__ block, as
a second chance to load your gem if it hasnt been found by
whatever the local path management system is.
Your Gem will need a version number, and best practice is
to store that in lib/notebook/version.rb:
module Notebook
VERSION = 0.0.1
end
Next, we need to write a basic gemspec, notebook.
gemspec, which lives in the top directory of your module.
A gemspec is a specification for your gem, with a list of
attributes, most of which are optional.
The required ones are date, name, summary, and
version. platform and require_paths are technically also
required, but both have defaults that should work fine, so you
neednt specify them yourself. Heres a short gemspec for
our gem:
# -*- encoding: utf-8 -*# lib = File.expand_path(../lib, __FILE__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require notebook/version
Gem::Specification.new do |s|
s.name
= notebook
s.version = Notebook::VERSION
s.date
= 2013-02-10
s.summary = Notebook
s.description = A notebook gem to hold single-line notes
s.authors = [Juliet Kemp]
s.email
= juliet@example.com
s.files
= `git ls-files`.split(\n)
s.test_files = `git ls-files test/*`.split(\n)
s.executables = notebook
s.homepage = end
The lines at the top allow your version file to be located.
Check the Rubygem documentation for more gemspec
options you might want. One common one is to specify
runtime dependencies, but as our only external library,
OptionParser, is part of the main Ruby install, theres no need
to include it here.
Git is the strongly recommended way to keep track of the
files in a Gemspec without having to write them all out (some
gem/bundle tools wont work at all if theres no git repo), but

a flat list of files here would of course also do the job. Its
important to list all of the files in your gem; any files not listed
here wont be available to the gem, which will cause it
to break.
Its also good practice to write a README.md file to
demonstrate usage; heres a very brief one:
# Notebook
## Installation
gem install notebook
## Usage
require notebook
notebook -h for help information on the command line
Normally, youd include API information here, but our gem
is currently designed to be used from the command-line
rather than within another piece of Ruby code.
Finally, build and install the gem:
gem build notebook.gemspec
gem install ./notebook-0.0.1.gem
You should now be able to type just notebook -r and have
your notebook read back (one thing to bear in mind is the
current location of your notebook file; you may need to
specify it on the command line if youve moved directories).
Youre now ready to contribute your (ruby) gems of code
wisdom to the community as a whole, as you continue along
your new Ruby programming path. Q

Here were
building and
installing our
gem package.

Using Rake and Bundler


Rake is the Ruby equivalent of the Unix
tool make, and operates similarly.
Its useful to automate your gem build
process. The Bundler gem (available
using gem install bundler) will also help
you to build a well-structured gem.
If youre starting from scratch, you can
use Bundler to create a directory for you
to code in, including a nice new git repo
for it.
If you have already worked through
your Gemspec construction by hand, as
above, you can still use the Rake tools to
automate your install, by using this
Rakefile:
require bundler/gem_tasks
Now rake install will build and install
your Gem.

The other tasks Bundler installs


automatically are build (which builds
your gem) and release (which tags it,
pushes the source to GitHub, and pushes
the gem to rubygems.org
Be sure it really is finished and ready for
public release before you do this!
You can also set Rake up to manage
your tests. One way to do this is to add
these lines to your Rakefile:
require rake/testtask
Rake::TestTask.new do |t|
t.libs << test
t.test_files = FileList[test/test*.rb]
t.verbose = true
end
Type rake test to run everything in
test/test*.

Running tests with Rake; time to rewrite the test suite

53

Ruby

Ruby on Rails:
Web development
Gavin Montague introduces us to Ruby on Rails, a powerful
framework that puts programmer happiness first.

he banner on the Ruby on Rails homepage isnt what


you might expect. It doesnt proclaim Rails to be the
fastest, most powerful, or most all-encompassing
framework; instead, it gives Rails goal as optimising
programmer happiness and sustained productivity. If youve
had previous dalliances with other frameworks, you might
appreciate this goal. Too many development tools sacrifice
the productivity of the programmer in exchange for more
power in the code. Thankfully, Rails is not only a powerful
tool its nice to use too!
The structure of Rails is such that it encourages good
design, manageable code and best practice at each stage of
development. This means youll spend less time being tied in
knots by cryptic code, unexpected bugs and boiler-plating,
giving you more time to build awesome things.
Over the course of this three-part series, well see how
Rails can help us to not only make great web apps, but build
them in a way that keeps us happy and productive. In this
part, well take a quick tour of the framework, bootstrap our
sample application and add some functionality.

Get started
Although your distro almost certainly has packages available
for Ruby, were going to use rbenv to get us up and running.
Rbenv allows us to create and manage entirely isolated
versions of Ruby inside a users home environment. Ruby
developers use this to hop between different versions of Ruby
as projects require, but its main use for us is to avoid touching
the root-owned system and ensure that we all have a
consistent starting point.
Start by installing rbenv and ruby-build via the Git source
code management tool; if you dont have Git, your package
manager will be able to provide it. Open a terminal and type:
$ git clone git://github.com/sstephenson/rbenv.git ~/.rbenv
Make your shell aware of rbenv by adding it to your
start-up files. If you dont have .bash_profile in your home
directory, change the end of the command to ~/.profile, or

The console
Rails is a web framework, but its not
just accessible in your browser.
bundle exec rails console
This starts the interactive console.
You can execute code directly from
within your Rails environment but
without the need for a browser. For

54

example, you can add a new task:


t = Task.create(:title=>Use the
console, :due_at=>Time.zone.now+10.
minutes)
The console is a great way of quickly
experimenting with objects and
methods that youre unfamiliar with.

see the rbenv home page for help.


$ echo export PATH=$HOME/.rbenv/bin:$PATH >> ~/.
bash_profile
Reload your shell, or open a new window, and type rbenv
you should see rbenv print itself out. If so, we can now
install Ruby the language underlying Rails.
$ git clone git://github.com/sstephenson/ruby-build.git ~/.
rbenv/plugins/ruby-build
$ rbenv install 1.9.3-p392
$ rbenv rehash
After this, running `ruby --version` should print out the
path to Ruby in your ~/.rbenv folder. Well also be using
SQLite3 to store data later. Your package manager will have a
suitable version, eg:
$ sudo apt-get install sqlite3

Introducing Ruby
Ruby is intended to be a language that makes programmers
productive and happy. The syntax lends itself to very clear
expression of ideas, and a working grasp of its simple but
powerful features can be picked up in an hour or two.
In technical terms, Ruby is an Object Oriented language:
everything you interact with is treated as a self-contained
box of data that you perform operations on via methods.
For example:
a = a string
a.reverse # gnirts a
array = [3, 2, 1]
array.sort # [1, 2, 3]
5.next # 6
In Ruby, youll spend most of your time writing classes,
best thought of as a blueprint for creating objects. Here, we
create a new class of Person, who knows that they have
a name and how to return a greeting.
class Person
attr_accessor :name
def initialize(name)
self.name = name
end
def greet
Hi, Im #{name}
end
end
bob = Person.new Bob
bob.greet => Hi, Im Bob
An interesting feature of Ruby is that parentheses are
largely optional, and certain non-alphabetic characters can be

Ruby

used in function names, so its perfectly valid to write:


name = Bob
name.is_a? String
array = [apple, orange, nil]
array.empty?
array.compact!
This makes Ruby ideal for writing very expressive, easilyread code. The following is executable Rails code, but its
meaning is quite apparent, even to non-programmers.
objects.each do |o|
o.update_timestamp and o.save! unless o.new_record?
end
Well be focusing on Rails from here on in, but Id
recommend having a look at the previous Ruby tutorials
starting on page 38.

Set up Rails
Rails is distributed as a gem: the Ruby communitys standard
package format, so lets install it and generate the skeleton
for our application:
$ gem install bundler
$ gem install rails --version=3.2.12
$ rbenv rehash
$ rails new todolist --skip-test-unit
Rails is an opinionated framework. Theres a correct place
for everything in its world, and this is enforced by all projects,
starting with the pre-defined file structure. If you use Git for
source control management, youll also appreciate that Rails
has dropped off appropriate gitkeep and gitignore files.
Dependencies on other Ruby libraries are managed via the
gemfile at the root of your new project. Open it up and add
the following around line three.
gem therubyracer, ~> 0.11.4
group :development, :test do
gem rspec-rails, ~> 2.13.0
end
Then, back in Terminal, well install these gems with
bundler and run some post-install magic. Well use these
libraries next month, when well learn about behaviour-driven
development, but for now they can be ignored.
$ bundler install
$ rbenv rehash
$ rails generate rspec:install

Explore the MVC


Rails adheres to the MVC (Model/View/Controller) design
pattern. This is an abstract way of thinking about where
responsibility for different aspects of your application lie.
Rails maps these layers to three eponymous folders in /app.
Your models are meaningful collections of data that
will generally be very specific to your application. Users,
Messages, Calendars, Tasks and Projects are suitable model
objects. Its up to the Model to handle its own storage and
relationships with other Models, and ensure that its internal
state is both accurate and valid. In our app, Tasks and
Projects will be models.
The views present our data to the end-user. In a Rails app,
this normally means HTML, JSON, XML or PDF. Views should
worry about how stuff looks, but not much else. Rails provides
a number of helper libraries for turning Models into HTML,
dealing with internationalisation and so forth. The middle men
of the stack are the controllers. They deal with interpreting

Rails scaffold tool might not produce the prettiest pages in the world, but its
a great way of getting up and running quickly with well-designed code.

the incoming request and lining up the correct Models to


pass to the View for display. In our app, the controller decides
which Task is to be operated on, what the operation is and
what templates to render.
The usefulness of the MVC approach is that it creates
clear areas of responsibility. For example, a Task will have
certain requirements before it can be considered valid (eg, it
must have a title and a due date).
The controller need never know what the requirements
are, but it will know how to ask a task, are you valid? and
where to direct the browser as a result. Similarly, Tasks know
nothing about HTML, but the view knows how to take data
and present it as a form, table and so forth.

Create your tasks


Lets see how this works out in practice by building our
first feature:
$ bundle exec rails generate scaffold task title:string
notes:text due_at:datetime done:boolean
This generates a scaffold: a boilerplate version of our
MVC stack for the Task class. This scaffold contains just
enough code to enable us to manipulate a database table
called tasks via a Model, a Controller and some Views.
On seeing the scaffold, most people react in one of two
ways: OMG! Rails writes code for me! or Oh, Rails is just a
boilerplate generator. In fact, Rails is neither of these things.
The scaffold is intended as a training wheel and a tool for
getting running quickly.
As its name suggests, scaffolding gives support during
construction: its not meant to be permanent. What the
scaffold does give us is an idiomatic example of how to write
code in Rails.
Rails has created an SQLite database and built a table to
store our objects. It does this through migrations:
timestamped Ruby files, which can be found in /db/migrate.
Notice how weve not written anything that ties us to a kind of
database. If I were running MySQL, PostgreSQL or SQLite this
migration would generate the correct SQL to build the table.

55

Ruby
By describing our tables in Ruby rather than SQL, we remain
adaptable. Lets now build the database and start Rails builtin server:
$ bundle exec rake db:migrate
$ bundle exec rails server
Open http://localhost:3000/tasks in your browser to
start interacting with the scaffold by adding, editing and
deleting items.
On the new/edit actions pages, youll see how Rails has
read the format of our database columns and generated
appropriate field types corresponding to their format:
checkbook for boolean, selects for dates, textareas for big
strings. If you need to quit the Rails server, just press
Control+C.
Notice how Rails has also decided your URL structure for
you. By default, Rails adheres to a REST architecture (http://
en.wikipedia.org/wiki/Representational_state_transfer).
In practical terms, this means that all Rails projects
automatically share the same relationships between object,
controllers and URLs. This makes building open-source
extensions easier because all apps will share the same
assumptions about where functionality should live. The file
config/routes.rb is responsible for mapping URLs to
controllers, actions and parameters, as well see later. You can
do an awful lot with Rails routing language, but the default will
serve our purposes today.

The model
Open your text editor inside the ./app directory, and well
start examining models/task.rb.
Not a lot there, is there? Rails ActiveRecord classes take
care of all the interactions with your database and leave you
to think about the behaviour of the object. All our model
initially contains is a list of attributes that can be set by bulkassignment. Rails will infer what attributes a task object
should have, based on the columns of its database table.
We just have to make sure we protect any sensitive fields that
users shouldnt be able to set as they edit records.
Additionally, Rails provides a huge number of commonly
needed functions for database-backed applications, such
as the management of inter-model relationships and
management of validity. A task isnt much use if you dont
know what it is, so lets make sure we always have a title
before allowing tasks to be saved.
Add the following inside the Task class anywhere
between the first line and the final end:
validates :title, :presence=>true
Go back to your browser and try creating a new task
without a title youll be presented with an error state. The

Rails 4
The most recent version of Rails at
the time of writing is 4.1.6, which
includes some exciting features to
make developers and users even
happier. These are:
Support for streaming data to the
browser.
Turbolinks AJAX goodness to speed
up every request.
A common API for background
queuing.
And a lot more!

56

Weve used Rails 3 here to ensure


that you can complete this series of
tutorials without having to rely on prerelease software. However, upgrading
from 3 to 4 should be relatively easy,
and the basic structure and features of
any Rails 3 app wont be too heavily
affected. If youre interested in
progressing further with Rails, keep an
eye on www.rubyonrails.org, where
youll be able to find out when new
versions are released.

Rails comes with a built-in web server thats suitable


for development work. Fire it up, visit localhost:3000 and
follow whats happening in the terminal.

validate method in Rails allows us to enforce a large number


of common requirements on our models without having to
hand-roll code each time we want to make sure a field is a
certain length, a certain state, or present/absent.
Lets add a slightly more clever validation. We shouldnt be
able to create Tasks with due dates that have passed, so edit
task.rb, then try to create a new task thats in the past.
validate :due_at_is_in_the_past
def due_at_is_in_the_past
errors.add(:due_at, is in the past!) if due_at < Time.zone.
now
end
That worked, but wait. What happens if we try to mark
a previously created task thats overdue as complete? Go to
one of your pre-existing tasks that has passed and try to save
it. Hmm, looks like well have to be a bit smarter and specify
that this condition applies only to new tasks. Alter your code:
validate :due_at_is_in_the_past, :on=>:create
Rails now knows to only apply this validation when records
are first created.

The controller
When you submitted the task form, how did Rails translate it
into data and how did it then know to display the form again
or redirect us off to the show action? To answer that, we look
at controller/tasks_controller.rb
Each method in our tasks_controller becomes the entrypoint for a users browser request. Depending on the URL
and the type of request (POST, GET, etc), Rails routing
system will execute one action on one controller inside your
application and deliver the output back to the browser. Take
a look at the update method
def update
@task = Task.find(params[:id])
respond_to do |format|
if @task.update_attributes(params[:task])
format.html { redirect_to @task, notice: ... }
format.json { head :no_content }
else
format.html { render action: edit }
format.json { render json: @task.errors, status:
:unprocessable_entity }
end
end
end

Ruby
In not a lot of code we accomplish quite a lot. We start by
finding one of our tasks via the Task class find method and
the id parameter that Rails has extracted from the URL.
We then go into a responds_to block, which is Rails way of
dealing with different output types. Were not dealing with
json here, so focus on the HTML formatters.
First, we try to update_attributes on our task object with
the data passed in from the form. This will either return true
or false, depending on what state our task has been put into.
If we fail to save the task, usually because of a validation error,
the controller renders our edit action. If we are successful, the
browser is redirected to the @task itself. Rails assumes that if
we ask to redirect to an object rather than a URL then there
must be a show action of a controller named after the
objects class.
Notice here what we didnt have to do. The incoming
request was accessed directly: Rails automatically parsed it
into Ruby objects for us to manipulate. We didnt have to
know about the internals of our task object. Our controller
simply relied on @task to manage its own state, and instead
concerned itself with the result. Finally, by following Rails
naming conventions we were able to redirect/render as
required without directly calling URLs or template files.
This might not seem like a lot, but consider that pretty
much every page of every web application on the planet is
a collection of showing, editing, updating, creating or
destroying database records. Rails provides just the right level
of abstraction to make the whole process almost seamless.
You can always go deeper if need be, but an awful lot can be
achieved without the need to do this.

The view
Finally, lets look at our templating language. Open the views
directory, and youll see we have five files: in Rails terms, four
are actions and one is a partial. All are written in Rails
default templating language, ERB.
Essentially, ERB is just Ruby injected into text files.
Although this means you can technically call any Ruby
method in your templates, it would be exceptionally bad form
to do so. Templates should only concern themselves with the
presentation of data. Generally speaking, if you find more
than two consecutive lines of Ruby in a template then
somethings wrong.
Inside our edit template, youll see that it just renders
a different template: the form partial. This makes a lot of

The best place to get more information on Rails is www.


rubyonrails.org, the projects home. Here, youll find
numerous screencasts, links to other tutorials and more
than enough information to take Rails to the next level.

sense when you consider that updating an existing task and


creating a new one are essentially the same process. Why
bother having two copies of the same form when we can just
reuse one?
This exposes another principle of Rails: Do not Repeat
Yourself, or DRY. Code should exist in one definitive, canonical
place. If we had to maintain two copies of the same form then
one would almost certainly get out of step with the other. DRY
keeps our code clean and our minds uncluttered.
Look at the _form.html.erb partial. Youll see that the
majority of the template takes place inside form_for(@task).
This wraps our @task object inside a form_helper, which
maps our objects data to HTML tags, deals with displaying
error messages and decides which URL our form should
point to.

Add a new feature


Now that we know what lives where, lets add a new feature to
our app: tasks should have a priority, and higher-priority tasks
should be displayed first on our index.
First off, well need a new field on our model to store the
priority in. We do this by creating a new migration.
By following the convention of ending the name with that of
the model, Rails will deal with making sure the correct table
is altered.
$ bundle exec rails generate migration add_priority_to_tasks
priority:integer
$ bundle exec rake db:migrate
Next, well update our form to allow users to assign
a priority. Open up the form partial and add a new field to
expose our priority in HTML:
<div class=field>
<%= f.label :priority %><br />
<%= f.text_field :priority %>
</div>
Before we can make use of this new field, tell our model
that users are allowed to make changes to the field directly:
class Task < ActiveRecord::Base
attr_accessible :due_at, :notes, :summary, :priority
Actually, while were here lets also ensure that people
dont try anything silly, such as assigning the priority very to
a task...
validates :priority, :numericality => true
We can now build tasks and ensure that they have
a priority. All thats left is to make them display in the correct
order. Tell our controllers index action to specifically request
that the collection is sorted by priority by changing the first
line where we build @tasks:
@tasks = Task.order(:priority)
Rails provides a very nice syntax for building up quite
complex queries on our database, without having to drop
down to SQL. For example, we could exclude any completed
task from our index by changing to:
@tasks = Task.where(:done=>false).order(:priority)
To make sure your tasks are displayed in the correct order,
try to get your index template to explicitly display the priority
for each task. At this point, weve got a functional Rails
application and have oriented ourselves to the correct way of
doing things.
In the next tutorial, well focus on perhaps the most major
part of Rails march to happiness: test driven development.
Well see how to drive development of features by creating an
automated test suite that runs alongside our application and
constantly checks for bugs, design problems and
unnecessary code. Q
57

Ruby

Ruby on Rails:
Code testing
Gavin Montague shows how Test Driven Development
can improve your code and catch bugs.

n the previous tutorial we looked at how Rails seeks to


optimise developer productivity and happiness. We took
a tour of the Rails framework and built a basic to-do list
application. You can find the completed app, along with
instructions to get it up and running, in the ZIP file at www.
linuxformat.com/files/ca2015.zip.
This time well look at how Test Driven Development (TDD)
can be used to build our application, catch bugs and improve
code, all things that I think youll agree will contribute to
programmer happiness.
The TDD methodology is a deceptively simple rethink of
how most developers find themselves working. To see the
changes that TDD brings, lets first look at the Test part.
If you were to watch most web developers at work theyd
constantly skip back and
forth between an editor
and a browser. After
changing code theyll swap
to the browser, reload the
page, look for errors and
then hop back to the editor.
This isnt very productive behaviour.
Firstly, its massively inefficient to be shifting back and
forth between two programs. Not only does it force you to
push the code to the back of your mind as you remember
what steps to take in the browser, but its also physically
slow. Imagine trying to manually test a sign-in system where
each change requires the developer to sign out, clear cookies
and start over.
Secondly, consider all the moving parts between the
edited code and the end result in the browser. If the page
doesnt load, what does that mean? How can we isolate what
part of our application has failed? Even worse, what if its

failed in some silent way thats not immediately apparent


from the front-end?
In TDD we use an automatic test suite to address these
issues. Over time a series of standalone tests are built that
can automatically run without any human input and make a
series of assertions to check our code. The tests are
reproducible so we never have to worry about someone
forgetting to manually check functionality each time they
edit the code. In terms of speeding the process up, a
well-written suite can typically run several hundred tests in
less than 10 seconds.
Additionally, the tests we write will be isolated, which
means that they will run without reference to the rest of
our application as a whole. This means that if we break
something it will be
immediately obvious
where the fault is and we
wont have to spend time
picking our way back
through the code looking
for whats gone wrong.
At a higher level, its even possible to write tests that actually
take control of your browser to exercise the full-stack of your
application, with tools like Capybara.

Its massively inefficient to


be shifting back and forth
between two programs.

Run tests with guard


Youll quickly get bored of manually
triggering your test suites, so why not
do it automatically? You can use the
guard gem (https://github.com/
guard/guard) to watch your project
and automatically run the correct
subset of tests whenever a change is
saved. Add guards dependencies to
your gemfile:
gem guard
gem guard-RSpec

58

gem libnotify
Then install them and start guard:
$ bundle install && rbenv rehash
$ bundle exec guard init
$ bundle exec guard
Now, as you save files in your project,
guard will intelligently run the matching
parts of your suite. It can also hook into
several desktop notification systems to
provide more visual feedback on how
youre progressing.

Your code as documentation


Strangely, very little of a programmers day is spent writing
new stuff. We spend most of our time reading and rewriting
code provided by others or our past selves. Its therefore
important to optimise for readability and understandability, or
to quote Damian Conway: Always write your code as though
it will have to be maintained by an angry axe murderer who
knows where you live.
A test suite can help developers get oriented with new
code by giving them a living, breathing set of specifications to
run. This is often easier to understand than parsing cryptic
inline comments that were written weeks ago and never
updated. Youll also find that because we always want to be
able to run our tests in isolation, our methods will be smaller,
more descriptive and have less dependency on other parts,
all of which will help with readability.

Red, green, refactor


All the benefits mentioned so far are what you get for writing
tests, but what about the Driven part of TDD? Well, we call it
Test Driven because the first step of development is to write
a test that fails. Hang on we start by writing a test? How can
we write a test before we have any code? This is the key

Ruby
feature of Test Driven Development. Before we write any
production code we first define what the code should do by
way of a test. In this context the test serves two purposes:
it acts as a target for us to work towards and as a line over
which we dont step. We write just enough code to pass our
current test and then re-evaluate where we are before either
starting on a new test or altering our existing code. As you
build up code through multiple cycles, two trends should
naturally emerge.
Your code will become simpler as a result of focusing on
writing just enough to hop to the next green stage. Too many
developers will go off on flights of fancy writing large, tightly
coupled, overly complex methods and classes that become
impossible to debug. A Test Driven system is more likely to
be composed of many tiny interconnected parts that can
all be operated independently and are easy to understand
in isolation.
Additionally, youll spend less time chasing developmental
dead-ends. And because you start by writing a test that
actually exercises the code as its finally intended to be used,
youll become much more aware of dependencies and flaws
in the interface design.
The TDD development cycle can be summed up as red,
green, refactor. We start with a failing test: red. We then write
just enough code to make the test pass: green. Finally, we
refactor our new code for maintainability and performance,
safe in the knowledge that if we break anything our suite will
catch it.
A full discussion of the merits and drawbacks of Test
Driven Development could fill up an entire bookshelf, but if
youre interested in the theory and evidence behind the
technique, I recommend you take a look at this paper
from dare I say it Microsoft, which provides a
comprehensive overview (http://research.microsoft.com/
en-us/groups/ese/nagappan_tdd.pdf). But thats enough
theory, lets look at TDD in practice.

A simple RSpec example


Rails ships with the Test::Unit library, which is a perfectly
decent test tool, but well be using RSpec today because I find
its syntax and setup much easier to get my head around.
Technically, RSpec is a Behaviour Driven Development (BDD)
tool, but for our purposes we can ignore the slight differences
between this and classic TDD. Open a new file inside last
months project at lib/hello_bot.rb:
describe HelloBot do
describe #greet do
it says hi to friends do
HelloBot.greet(Bob).should == Hi, Bob
end
end
end
This shows us the format of an RSpec test. We use
Rubys block syntax and the describe method to lay out a
series of tests which are marked by the it method. Inside
each of our tests we will make an assertion. In this case we
assert that our output should be Hi, Bob. Now run your
spec with the command:
+$ bundle exec rspec ./lib/hello_bot.rb
Notice that were running our test even though we havent
actually defined the HelloBot module yet. This is how we
drive our development: by starting off writing a test that
exercises the code that we havent yet written, it becomes
easier to imagine how we want to use that code later on.
RSpec will complain that the module doesnt exist, so lets

It might not look like much, but if you can see this, congratulations! Youve
test-driven your first feature.

jump forward a few steps by adding a definition to the top of


the file and rerunning the spec:
module HelloBot
def self.greet(name)
Hi, #{name}
end
end
When you run the tests a single dot will appear in your
terminal indicating the completion of a test. Try altering the
output of the greet method and rerunning to see how that
alters the output. RSpec has many configuration options that
can be used to print more or less information, or even to
format your results as HTML. These options will become
more useful as your suite gets bigger:
+$ bundle exec rspec ./lib/hello_bot.rb -f d --color
+$ bundle exec rspec ./lib/hello_bot.rb -f h > ~/Desktop/
RSpec.html
+$ bundle exec rspec --help
If you find a combination you like, add it to the .rspec file
in the project root to apply them automatically.
Im going to add another test to our suite for you to
complete: our method should also know how to deal with
formal situations. Add this test to your spec, then try to get
back to green. Have a look at the code in the source archive
(www.linuxformat.com/files/ca2015.zip) if you get stuck:
it is more formal with people whose names are unknown
do
HelloBot.greet(Miss Smith).should == How do you do,
Miss Smith?
end

Rails and RSpec


In your projects spec directory youll find an approximate
mirror of the app directory. We subdivide our tests to match
the various layers of our application to make management
easier. Running the tests in Rails requires a bit more
orchestration than in our simple example, but this is mostly
taken care of automatically.
To conform to our idea of isolation, RSpec will do its best
to ensure that nothing bleeds between tests. All instance
variables, class definitions and so forth are reset between
each test, but because almost all Rails applications will be
backed by a database, we have to get round the fact that each
of our tests might potentially alter our persistent data.
Luckily, Rails handles almost all of this as part of its various
environments, which you can see in the config/

59

Ruby
environments and config/database.yml files. Running our
tests via rake will ensure that our test database is created
and correctly reset for each test.
Our test suite can be run via rake, either in full or by
providing one of the test subsets, such as models:
$ bundle exec rake spec
$ bundle exec rake spec:models

Fixing a bug
If you run the full suite from the last tutorials app youll see
that were starting with half a dozen errors. The output is too
big to print here, but if you read through it youll see that the
error relates to our requirement that a new task cant have
a due_at time in the past. If you try to save a task that has
no due_at date set, an exception is thrown. Lets fix that
by opening spec/models/task_spec.rb and adding a
failing test:
describe #due_at_is_in_the_past do
it doesnt throw an exception if due_at is nil do
lambda {
Task.new(:due_at=>nil).due_at_is_in_the_past
}.should_not raise_error
end
end
Here we use a different expectation to trap any raised
exception and report it back as a test failure. In this test were
doing more than just capturing the bug in code: were also
suggesting the correct outcome.
If a task isnt supplied with a due_at value then it should
simply save nil. Run this test with rake spec:models and
watch it explode. Our tests are failing because our record tries
to compare a Time object with nil: a no-no in Ruby. We now
adjust our Task class:
def due_at_is_in_the_past
errors.add(:due_at, is in the past!) if (due_at && (due_at <
Time.zone.now ))
end
Our test now passes! Theres not much to refactor so well
skip that step and run our full suite. Its good practice to do
this after each successful cycle to make sure our changes
havent broken any other part of the app.
In order to better prioritise our time, items that are due
soon should appear in red on our index. To implement this
well first add the concept of due_soon? to our model. As
before, start by writing a test in task_spec.rb that expresses
the code we want to be able to call:
describe #due_soon? do
it is true if due in less than 24 hours do
task = Task.new(:due_at => Time.zone.now + 23.hour)
task.should be_due_soon
end

Uncle Bobs three rules of TDD


Bob Martin, known by the community
as Uncle Bob, is one of the clearest
voices on what good TDD looks like.
There are many pearls of wisdom on
his site (www.cleancoder.com), but
my personal favourites are his
Three Rules for Test Driven
Development:
1 You are not allowed to write any
production code unless it is to make a

60

failing unit test pass.


2 You are not allowed to write any more
of a unit test than is sufficient to fail;
and compilation failures are failures.
3 You are not allowed to write any more
production code than is sufficient to
pass the one failing unit test.
If you can stay within these
boundaries when practising TDD, you
wont go far wrong.

(Fig 1) RSpec can format the results of your tests in a


variety of readable formats, including HTML.

it is false if due in more than 24 hours do


task = Task.new(:due_at => Time.zone.now + 25.hour)
task.should_not be_due_soon
end
end
Theres a little bit of RSpec magic being used here, to
make your tests a little more readable, it understands that
be_due_soon means that the method due_soon? should
return true or false. Next, update the Task class in app/
models/task.rb:
def due_soon?
(due_at < Time.zone.now + 24.hours)
end
This passes, but recall that our due_at attribute is
allowed to be nil. We should add a test to describe what
should happen here:
it is false if no due date is set do
task = Task.new(:due_at => nil)
task.should_not be_due_soon
end
As it did earlier, the test fails because we cant compare nil
with a Time object. Wed best amend our method:
def due_soon?
return false if !due_at
(due_at < Time.zone.now + 24.hours)
end
This will give us green tests in our model layer and we can
rerun the full suite. Try extending your tests to cope with the
specification, A completed task can never be due soon. A
solution is available in the sample app.

Helper tests
Its bad practice to put anything other than the most
minimal flow control in templates, so well put our
formatting logic in a helper. Add a failing test to spec/
helpers/tasks_helper.rb:
describe task_title_formatter(task) do
before do
@task = Task.new(:title=>task)
end
it adds a due css class to tasks which are due_soon do
@task.due_at = Time.zone.now
task_title_formatter(@task).should == <span
class=due>task</span>
end

Ruby
it adds no extra classes to tasks which arent due_soon
do
@task.due_at = nil
task_title_formatter(@task).should == <span>task</
span>
end
end
Again, Id recommend you add the tests one at a time and
try to develop incrementally towards a solution, which should
look something like this:
module TasksHelper
def task_title_formatter(task)
if task.due_soon?
<span class=due>#{task.title}</span>.html_safe
else
<span>#{task.title}</span>.html_safe
end
end
end
Finally, you will need to update your index.html.erb view
to call task_title_formatter(title) and add .due to your
application.css.scss file (well look at why this isnt just a
plain CSS file in the next and final tutorial). If you start up the
Rails server your index page should now look something like
the grab on p59.

Controller test
Notice that our new feature didnt actually need to be tested
in either the controller or the view. Rails tends towards a
style of design known as Fat Model, Skinny Controller.
Where we have custom functionality its often pushed down
to the model or, for presentation data, out to a helper. A
well-designed Rails controller will usually contain very little
code because all the web-specific stuff, like session-handling,
URL parsing and header generation, is handled automatically
by Rails. That said, its the controllers responsibility to
manage the users login status, permissions etc and these
should be tested thoroughly.
Once a task has been completed it should no longer
be editable through the web interface. Lets drive out this
feature in two parts: well remove the edit links from our
tasks show page and stop our controller from allowing
updates. Well start with the controllers spec.
Go to the description of our update method around line
105 of tasks_controller_spec.rb and start a new nested
describe block:
describe PUT update do
describe where the task has already been completed do

To find out more about RSpec, your best resource is the


project documentation at www.relishapp.com/RSpec.

before do
@task = Task.create! valid_attributes
@task.update_attribute :done, true
end
it does not update the task do
put :update, {:id => @task.to_param, :task => { title =>
t }}, valid_session
Task.any_instance.should_not_receive(:update_
attributes)
end
it redirects the user back to the index do
put :update, {:id => @task.to_param, :task => { title =>
t }}, valid_session
response.should redirect_to(:action=>:index)
end
Our tests here are slightly different from before.
Remember I said that a controllers main job is to
orchestrate other objects. This means that were not so
much interested in the outcome of some actions, but
whether the actions trigger other events. In our first test we
attach an expectation directly onto our Task class using a
mocking library. A mock object can be used in place of a real
one, but may have additional behaviour. If you were testing a
library to transfer money between bank accounts it could get
pretty expensive to run your tests against real APIs. Instead
you would mock out the various responses the bank could
give (ok.xml, fail.xml etc) and run tests against them. Here
we use a mock to make sure that no update_attributes call
is made to any task.
You can alter the update method in tasks_controller.rb
to pass the tests:
respond_to do |format|
if @task.done?
format.html { redirect_to tasks_url, notice: Completed
tasks cant be changed. }
elsif @task.update_attributes(params[:task])

View testing
Finally, we can test our view. Open up show.html.erb_spec.
rb and add a test against a completed task:
it doesnt link to edit on complete tasks do
@task.done = true
render
rendered.should_not match(/Edit/)
end
That test will fail, so make it pass by updating
show.html.erb:
<%= link_to Edit, edit_task_path(@task) unless @task.
done? %>
Congratulations, youve now test-driven your second
feature. Try extending the behaviour to not show edit links on
the index page. Remember to write your tests first.
Weve only grazed the surface of TDD here, but I hope its
given you some idea of how useful it can be in helping your
development workflow. The key to getting the greatest value
from it is to apply it from the very start of your project and to
avoid the temptation to skip steps of the Red-Green-Refactor
loop. Over time, youll get faster at writing tests and faster at
deciding what tests should be written, and thats the path to
TDD happiness.
In the next tutorial well look at how Rails simplifies the
client-side aspects of web development with the assetspipeline and the JavaScript compiler CoffeeScript. Q
61

Ruby

Ruby on Rails:
Site optimisation
Gavin Montague shows how Rails, with help from Ajax,
CoffeeScript and SASS, can help front-end development.

ver the past two tutorials weve looked at how Ruby


on Rails optimises developer happiness by providing
sensible defaults and enforcing best practice in its
users. In part one we took a tour of a simple Rails application,
explored the MVC and built our basic to-do list manager in a
matter of minutes with the Scaffold tool.
In the previous guide, we looked at how Rails uses Test
Driven Development to produce higher quality code in less
time. This time well turn our attention to the client side of
web development. Although Rails is a server-side tool it is
remarkably opinionated about how one should build a frontend. Well see how Rails removes a lot of the friction of
working with JavaScript and CSS, and how best practice for
handling static assets is incorporated into the framework.

JavaScript
has a mixed
reputation
because of
its occasional
eccentricities.
CoffeeScript
tries to fix them
and you can try
it out for yourself
in the browser.

62

Unobtrusive JavaScript
Rails was one of the first web frameworks to integrate Ajax
into its core. This made for faster, more responsive sites,
but the markup it generated wasnt notably good quality.
That all changed in Rails 3, which generates clean, semantic
markup and integrates with a range of JavaScript frameworks
in an unobtrusive manner. Lets look at how easy it is to add
Ajax interactivity to our to-do application. Before we get
started, youll need a good set of front-end developer tools.

Debugging JavaScript can be a frustrating task, but its


much more bearable if you have the right tools for the job.
Install an up-to-date version of Firefox and the Firebug toolbar
(www.getfirebug.com). If you havent used Firebug before,
refer to its dedicated box (see Debugging JavaScript, p64)
for usage instructions.
At present, deleting a task in our application triggers a full
page reload. You can see this happening with the Firebug Net
Viewer. Our application must rebuild the whole page, send it
back to the client and then have the client render it. How
wasteful! It would be much nicer to deal with the deletion as
Ajax and remove the offending task from the current page.
In many frameworks wed have to build this from the
ground up. Rails has support for this kind of action baked in,
which saves each developer from reinventing the wheel.

Ajax on Rails
When you delete a task from the list, youll be met with a
JavaScript confirmation before you can proceed. But where
does this come from? Inspect the page source and youll see
that each delete link contains:
<a [...] data-confirm=Are you sure?>delete</a>
Rails uses HTML5 data attributes to describe behaviours
to a JavaScript framework in this case jQuery in an
unobtrusive way. The markup of the page isnt littered with
script tags or inline oclock attributes. This behaviour is
injected in afterwards, as is best practice.
In the <head> of your page youll see our scaffold has
already included jQuery and jquery_ujs libraries, which deal
with setting up and monitoring the behaviour indicated by
these extra attributes. If you havent used jQuery before, Id
recommend taking a look at the official tutorial (http://bit.
ly/13t9K7S). If you have a preferred JavaScript framework,
theres likely to be an analogous adaptor library available for
use with Rails.
To let jQuery know that our delete links should use Ajax we
only need to make two small changes to our code. Update the
delete link to include:
link_to Destroy, task, remote:true, method: :delete, data: {
confirm:
Are you sure? }
In your tasks controller, flip the responds_to block inside
the delete method:
respond_to do |format|
format.json { head :no_content }
format.html { redirect_to tasks_url }
end

Ruby
CoffeeScript & SASS everywhere!
If youre keen on trying out CoffeeScript or
SASS, but not fortunate enough to be into Ruby
on Rails, no problem! Both are standalone tools
that can be used in any project, albeit without
the built-in integration that Rails provides. SASS
is distributed as a Ruby Gem and can usually be
installed system-wide with one line:
sudo gem install sass
Once its installed you simply have to tell
SASS which scss files you want it to watch and

what to call the generated output. This, for


example, is how you tell it to watch mobile.scss:
$ sass --watch mobile.scss:mobile.css
SASS can be left running in the background
while you work on the scss file: it will output a
new version of the stylesheet immediately after
each save.
The installation of CoffeeScript is a little more
involved. Although the actual compiler can be
run in any JavaScript environment, youll need

Reload the page and the rendered links will now contain a
data-remote attribute. This tells jQuery to intercept any clicks
on the link and submit the request via an XMLHttpRequest
instead. Open the network console in Firebug then delete a
task. Youll see the request being fired off in the background.
Refresh the page and the task will be gone, but thats not
the most user-friendly behaviour. It would be better if our
user got an instant visual feedback. Create a new file at
app/assets/javascripts/ajax_tasks.js and add:
$(document).on(ajax:success, .index-table a[datamethod=delete],
function() { return $(this).closest(tr).fadeOut(); });
If youve used jQuery before, this should look relatively
familiar. We tell our document to watch for any successful
Ajax pseudo-event that originates from links with the delete
data-method. When we receive this event we fade out the
table row around the
originating link. Save your
changes, refresh the page
and try deleting a task.

Rails requests

node.js and its package manager, npm, to run


the command-line tool. Installing node.js itself
will vary between platforms and your best guide
is the projects site (http://nodejs.org/
download). With that in place:
$ npm install -g coffee-script
$ coffee --watch --compile application.coffee
As with SASS, the Coffee command-line tool
can be left to monitor all the required files for
changes in the background.

that is not a happy thought. JavaScript is a very good


language and when it comes to client-side interactivity its the
only game in town. However, it does have some eccentricities,
particularly in the eyes of Ruby and Python developers:
Trailing semicolons are sort of required, but not really.
Variables are global by default unless specified as var.
Return values are explicit (in Ruby, all methods return
values implicitly).
C-style control structures for loops and decisions can feel
quite messy.

Using CoffeeScript
None of these things make JavaScript a bad interpreted
programming language per se, but its a language that a lot of
developers dont look forward to working with. Thats why we
have CoffeeScript, the precompiler for JavaScript.
To quote its website:
Underneath that
awkward Java-esque
patina, JavaScript has
always had a gorgeous
heart. CoffeeScript is an
attempt to expose the
good parts of JavaScript in a simple way. The CoffeeScript
project isnt part of Rails, but out of the box, our to-do app
has been set up to seamlessly work with the compiler.
Lets look at a small piece of CoffeeScript to demonstrate
its key features. Create a new file at app/assets/
congratulations.js.coffee:
@congratulation_bot =
messages: [Well done, Great job, , Top stuff]
name: name
congratulate: (message) ->
alert message+ +@name unless message.length is 0
overcongratulate: ->
for message in @messages
@congratulate message
By amending our Ajax deletion method from earlier, we
can receive the praise of our congratulation bot each time we
delete a task.
window.congratulation_bot.name = Gavin;
$(document).on(ajax:success, .index-table a[datamethod=delete], function() {
window.congratulation_bot.overcongratulate();
return $(this).closest(tr).fadeOut();
});
On the next refresh, Rails will automatically compile the
CoffeeScript and serve the JavaScript output in its place.
This happens seamlessly and youll never have to worry
about making sure an up-to-date version is being served.

CoffeeScript is an attempt
to expose the good parts of
JavaScript in a simple way.

At this point its worth


looking at how Rails deals with different kinds of requests
when rendering output. Open your tasks_controller and focus
on the destroy method we altered.
In the tutorial on p54, I said that the job of the controller is
twofold: it lines up the correct operations on the models and
then decides what output should be rendered. Generally
these two halves are independent of one another and Rails
respond_to syntax gives us a neat way to organise our code.
The destroy method starts by dealing with the modelwrangling: we find the correct task and we destroy it. In the
respond_to block we then tell Rails how different kinds of
client should be handled. In our full-page refresh the browser
request for HTML is handled as a redirect back to the index
page. Our Ajax request is handled by sending back a
successful, but empty, HTTP response.
Developing APIs in Rails for other consumers is often
as simple as tacking on an extra responds_to formatter.
For example, if the output is to be consumed by a desktop
program then HTML wouldnt be suitable, but json would
be ideal. Try visiting /tasks.json and /tasks/<id>.json to
see how the respond_to block in each of these methods
automatically generates JSON formatted output. Looking at
the format of how json is generated, try getting your ToDo
controller to respond with XML output.
The Ajax support in Rails is useful for simple functionality.
But at some point in a complex application youll have to write
a chunk of complex JavaScript and, if youre anything like me,

63

Ruby
To see how CoffeeScripts syntax breaks down, go to
http://coffeescript.org and paste our code into the online
compiler. Youll see the output appear on the right-hand side.
Its slightly easier to experiment online with the code than
reloading pages in your app. Lets look at some of the
differences CoffeeScript brings.

CoffeeScript features

If youre
debugging
JavaScript or
examining the
dialogue between
the browser
and the server,
Firefoxs Firebug
extension is a
must-have tool.

First, like Python, CoffeeScript is white space sensitive. This is


becoming a feature of more modern languages. Given that all
developers indent their code for readability, why not make a
feature of it and lose the curly brackets?
The ungainly for-loop is represented in CoffeeScript as
the much more readable for message in messages, which
should be familiar to Ruby developers. We also get access to
Ruby-style suffix flow control and the unless keyword, which
serves as a negative if.
Also, as a nod to Ruby, we can use @ as a shortcut for the
keyword this when referencing a scoped variable.
The most ubiquitous feature of CoffeeScript is probably
-> the so-called dash-rocket that replaces the function
keyword. Most JavaScript rapidly becomes a mess of
callbacks and nested functions, which become difficult to
read. The dash-rocket is far more readable.
There are other lumps of syntactic sugar to sweeten our
code, such as testing for the existence of a variable with ?,
Python-style chained comparison and multi-line strings.
CoffeeScripts inclusion in Rails is somewhat divisive, but I
find it to be a welcome addition.
The projects website contains a more detailed
walkthrough of the languages features, but its basics can
be picked up in a few minutes. You could start by trying to

replace our unobtrusive JavaScript delete method with its


CoffeeScript equivalent.
In the same way CoffeeScript extends JavaScript to
increase programmer productivity, Rails also includes SASS
to improve our relationship with stylesheets. Anyone whos
ever tried to wrangle the stylesheets for a large web project
will probably have encountered the following problems:
CSS selector structure leads to a lot of duplication when
targeting related nested elements.
The only way to share style between unrelated elements is
via classes, which results in either duplication of styles or
presentational class names.
Theres no way to show relationships between numbers:
how can we show that the margin should always be twice the
line-height?
Although CSS keywords exist for some colours, theres no
way to specify our own, such as error-red or logo-blue.

SASS
SASS extends CSS with variables and nesting, which well
see below, along with mixins (reusable chunks of CSS),
mathematical operations (for example, set the margin to
line-height/2) and even colour maths (for example, set
the background to halfway between blue and green).
The true value of SASS doesnt become apparent until
youre managing large stylesheets and sadly our little
application doesnt quite meet that criterion. Well just look
at two of the most important features of SASS: nesting and
variables. Append the following to app/assets/stylesheets/
tasks.css.scss and then reload the page to see it in action.
$header_color: #0000FF;
$danger_color: #FF0000;
.fancy_text {
font-weight:bold;
color: $header_color;
}
h1 { @extend .fancy_text; }
#error_explanation {
h2 { background:$danger_color; }
}
table {
th { @extend .fancy_text; }
a[data-method=delete] {
color:$danger_color;
}
}
With SASS we are able to nest CSS selectors and use it to
organise our styles into a more visible hierarchy. We also gain
the ability to extract common colours as meaningfully named

Debugging Javascript
Developing JavaScript and Ajax behaviour is
best done with a good client-side extension to
your browser, such as Firebug for Firefox. Install
Firebug as you would any Firefox extension, and
then press F12 to bring up its extensive set of
tools. The Console and Net tabs are of most
interest to us. Both of these are disabled by
default, but you can click on their little black
triangle icons to activate them.
The Console gives you access to a JavaScript
REPL that runs within the scope of your page.

64

Fire it up when your browser is pointed at a page


within our Rails app, and start to type jQuery
into the bottom bar. Firebug will autocomplete
function and object names, show you the output
of operations, and is a generally invaluable help
when trying to work out whats going on inside a
particular page.
When debugging Ajax its often difficult to
visualise why problems occur. For instance, is a
failure because an Ajax request isnt firing or
because its not being answered by the server?

Sometimes youll find its something more


arcane. The content-type, maybe. The best tool
to use here is the Net Monitor.
Enable the Net Monitor and reload the
page. As resources come in youll see them
represented as rows showing the request, the
status and the loading time. Opening up any
of these rows will enable you to drill down and
examine every detail of the request, from the
outgoing headers through to the response from
the server.

Ruby
variables, rather than having to always refer back to an
external style guide to remind us which shade of red is which.
The SASS website has excellent documentation on all the
languages features. Even if you dont use Rails, Id urge you to
have a read and see how you can incorporate it into your web
framework of choice.
Speed is important. Users have a spectacularly low
attention span and any lag in your site will cost you visitors.
Amazon famously established a relationship between
response time and income where a 100 millisecond reduction
in load time resulted in a 1% bump in revenue. While you
should always try to optimise the server-side component (by
correctly indexing your database, cache templates etc) a
more substantial improvement is optimising how a browser
will render your site. Rails provides a set of built-in features
(collectively called the Asset Pipeline) which attempts to
maximise the delivery speed of your pages.

Optimise website rendering


To demonstrate how Rails helps us, well need to alter our
server settings. By default, Rails runs in development mode
where speed is sacrificed in favour of clarity, more is logged
and files are delivered unadulterated, and much of the
environment is reloaded on every request. Start the server in
development mode, as you have done to date, and save the
task indexs HTML source from your browser out to a file.
Well use this for
comparison as we
examine the optimisations
that are contained in
production mode.
The built-in web server
Rails uses is not at all
suitable for running in a production environment, but we can
use it here. Open up the config/environments/production.
rb file and remove the comment from the following line to
make our Rails server handle static content.
config.serve_static_assets = true
Youll then need to build a new database for the
production environment, prepare your assets for delivery and
start the server:
$ bundle exec rake db:migrate production
$ bundle exec rake assets:precompile
$ bundle exec rails server -e production
One thing to note about the production environment is
that changes made to the app wont take effect until the
server is restarted. When youre done with this section,
remember to drop back into development mode. Lets look at
some of the differences between our development and
production pages.

associated with requesting each resource on a page.


Fewer files means fewer handshakes and faster load times.
In addition to concatenating files, the pipeline also attempts
to minimise the file size of the grouped file.

Achieve better caching


By default Rails will remove non-essential white space from
JavaScript and CSS, but its also possible to use more
destructive compressors that will rewrite JavaScript to use
shorter variable names
and various other
optimisation methods.
On our production
page youll notice that the
filenames of both your
CSS and JavaScript files
have been suffixed with seemingly random strings, eg
application-27b252c669ac
588cef435fa3d3e8aebf.css
These suffixes are MD5 hashes of the file contents.
By tying the name of our files directly to the content that they
contain, we can aggressively cache them on the browser
without any fear of serving stale content to a client.
Any change to the file will result in a change to the hash, thus
causing a cache-miss in the browser which will, in turn,
request the updated file. If the files contents do not change
between versions then a correctly configured web server can
make sure the browser never has to download the file more
than once.
As with SASS and CoffeeScript, Rails provides developers
with a sensible set of defaults for handling asset delivery.
By enforcing this at the level of the framework, it ensures that
all Rails projects automatically incorporate a decent set of
optimisations without any intervention by the developer.
I hope youve enjoyed this three-part tour of Rails.
By necessity weve skipped over much of what Rails can do in
order to focus on a few of the aspects that distinguish it from
other frameworks. Theres a huge amount left to explore,
including how ActiveRecord makes working with complex
database relationships trivial; the wealth of community tools
available, and much, much more.
If youd like to learn more about Rails, pick up the
Pragmatic Programmers book Agile Web Development With
Rails by David Thomas or follow the popular RailsCast
screencast series (http://railscasts.com). Good luck with
your future Rails projects and I hope they leave you full of
programmer happiness! Q

Rails ships with


support for SASS
Syntactically
Awesome
Stylesheets
a rethink of
how Cascading
Stylesheets
ought to work.

Speed is important.
Users have a spectacularly
low attention span.

Get faster load times


Comparing the <head> of your development and production
pages, youll see that the production version contains far
fewer resource declarations. Where the development
environment loaded in each of our scripts and stylesheets
separately, our production page has compressed all these
down to just two files.
Rails allows you to specify manifests of nested
JavaScript/CSS files, as shown in your application.js/css
files. The development environment expands this out to
include the individual files, but production will concatenate
them all into a single lump. A surprising amount of a pages
loading time is spent waiting for the HTTP overhead

65

s
.

<

.
.
-

.
:
)

);

_
),

t
a

<

I
-

,
$

);

);

Lesser-known Languages
More languages

More languages
T

his section contains a potpourri of languages;


some that are cutting edge and some
established. But they each offer a unique
insight into how code can be constructed and
organised, offering many techniques you can take
back to your language of choice.

C and beyond: Code a starfield .......................................................... 68


Scheme: Learn the basics .........................................................................72
Scheme: Recursion............................................................................................76
Scheme: High order procedures ......................................................80

67

More languages

C and beyond:
Mike Saunders takes you on a whirlwind tour of three programming
languages and toolkits, showing how to make a starfield effect in each.
but its a really good way to discover how these languages
work in action, and how identical goals are achieved using
their varying approaches. Were not going to provide lengthy,
meandering introductions to the languages the best thing
you can do is read the code and our short explanations,
and then start hacking around with it yourself. So, without
further ado

Low-level: C and SDL

eve taken a good look at Ruby, but how many


programming languages do you need to know to
be a really good coder? Its not an exact science,
but wed say three or four. Of course, you can learn a single
language inside-out and become an absolute genius at it, but
no programming language does everything perfectly, so youd
still be missing out on other features and ideas. Its a bit like
human languages: even if you can speak English wonderfully
well, theres a lot to be gained from learning one of its
relatives, such as German or French. You pick up more than
just replacement words you learn to think in a different way,
and discover a new culture in the process.
So a really great hacker usually knows a handful of
different programming languages. This is why we recommend
that everyone, even hobbyist coders, should try a few
different languages at some point, as they all have something
to contribute. Learn a low-level language and youll discover
a lot about memory management and pointers, for instance;
and with a high-level language you can manage complicated
algorithms more easily. Once youre comfortable with
a bunch of languages, you can always pick the right one to
solve a job (too many coders become experts at just a single
language, such as C++, and then try to solve every single
problem using it).
So with this in mind, we decided to make this tutorial
about multiple languages. But theres a twist: were going to
do the same thing in each. That might sound a bit pointless,

68

Were going to write a parallax starfield simulation, which


shows a bunch of stars scrolling across the screen at varying
speeds. Its the sort of thing you can use in a screensaver, or
as the background for a side-scrolling shoot em-up game.
Most importantly, it shows how to achieve a number of things
in the language: creating arrays, performing maths, doing
loops, plotting pixels and so forth.
Now, were starting off at a pretty low level in the form of C
and SDL. C is regarded by many as a portable assembly
language, and doesnt include hand-holding features of
higher-level languages, such as garbage collection. With C,
you can interact more closely with memory and hardware,
and thats ideal for certain tasks. SDL (Simple DirectMedia
Layer), meanwhile, is a very popular multimedia library that
enables you to work with images, fonts and sounds. Its not
the easiest library to use when compared with game
development kits, but like C it gives you plenty of control.
So well start with this happy couple, looking at the raw
way of making a starfield, and then move on to some highlevel alternatives later. Heres the code you can find it on the
coverdisc as starfield.c (open the c_with_sdl directory inside
starfield.tgz).
#include <stdlib.h>
#include <SDL.h>
#define MAX_STARS 100
typedef struct
{
int x, y, speed;
} star_type;
star_type stars[MAX_STARS];
int main()
{
int i;
Uint8 *p;
SDL_Surface *screen;
SDL_Init(SDL_INIT_VIDEO);
atexit(SDL_Quit);
screen = SDL_SetVideoMode(640, 480, 8, SDL_
SWSURFACE);
for(i = 0; i < MAX_STARS; i++) {
stars[i].x = rand()%640;

More languages

Code a starfield
stars[i].y = rand()%480;
stars[i].speed = 1 + rand()%16;
}
for(i = 0; i < SDL_NUMEVENTS; i++) {
if (i != SDL_QUIT)
SDL_EventState(i, SDL_
IGNORE);
}
while (SDL_PollEvent(NULL) == 0) {
SDL_FillRect(screen, NULL, SDL_
MapRGB(screen->format, 0, 0, 0));
for(i = 0; i < MAX_STARS; i++) {
stars[i].x -= stars[i].speed;
if(stars[i].x <= 0)
stars[i].x = 640;
p = (Uint8 *) screen->pixels +
stars[i].y * screen->pitch + stars[i].x * screen->format>BytesPerPixel;
*p = 255;
}
SDL_UpdateRect(screen, 0, 0, 0, 0);
SDL_Delay(30);
}
return 0;
}
Not bad, eh? In just over 50 lines of code, we have a
snazzy starfield effect, all using plain C and SDL without any
fancy layers on top. To compile the binary from this, enter:
gcc -o starfield starfield.c `sdl-config --cflags` `sdl-config
--libs`
Note the backtick characters here the key to generate
them will probably be at the top-left of your keyboard.
Basically, anything inside backticks is treated as a command,
and the output from them is generated before the complete
command (from gcc onwards) is processed. So if you type
sdl-config --libs on its own, for instance, you can see the
compiler parameters required to build SDL programs. To run
the compiled program, enter ./starfield.
Anyway, lets look at the code. The first two lines tell the
compiler simply that we want to include header files for the
standard library (so that we can generate random numbers
later), and SDL (so that we can use routines from the library).
Then we have a #define line, which tells GCC that all
instances of MAX_STARS from here onwards in the source
code should be replaced with the number 100. This allows us
to experiment with the number of stars by changing just one
line, instead of having to make changes all over the code.
Next up, we define a structure that is, a collection of
variables that can be referenced together under one name.
We call this star_type, and every variable we create with this
type will have X and Y coordinate variables inside it, along with
a speed (theyre all integer numbers). With this line:

star_type stars[MAX_STARS];
we create a new array of star_type structures, called stars.
So, in the coverdisc version of the code, thats an array of
100 stars, which can be referenced from star[0] to star[99].

My main man
Now its time to kick off the code itself, inside the main()
function (the first one that is executed in a C program and
indeed the only one in our case). We declare i as an integer
variable that well use for counting purposes later. Then we
declare p as an unsigned 8-bit integer pointer, which is
a variable that well use to point to the graphics data later
on. And, lastly, we have a screen, a pointer that well use
with SDL.

To make the
screenshot more
exciting, heres
what the C
implementation
with 2000
double-size stars
whizzing around.

Pointers can be quite fiddly


to use and many high-level
languages avoid them.
If youve never heard of pointers before, theyre like
variables, but instead of holding numbers that we can use
directly they hold the address in memory of other variables.
So you might have the integer variable x, which holds the
number 50, and is sitting at location 1000 in RAM. If you
create a pointer variable called y and point it to x, then y
wont contain 50 but will contain the memory address instead
1000. They can be quite fiddly to use and many high-level
languages avoid them, but its worth being aware of them.

69

More languages
x is less than zero), we start it again from the right-hand side,
at 640.
The two lines beginning with p and *p are used to plot the
stars. In the first, we set our p pointer variable to point at the
exact area of the graphics data where the pixel should be
plotted. This data is stored inside the pixels member of the
screen structure, and because its a linear one-dimensional
row of bytes we need to do some maths to match it up with
our 2D image. Once we have p pointing to the right place, we
place the number 255 meaning white inside the byte that
it points to by dereferencing it (using the star).
Finally, we tell SDL that were finished with our drawing
operations so it should render the whole lot to the screen, and
then have a delay of 30 miliseconds so that it doesnt run too
quickly. And were done!
This might seem a bit complicated if youve never done
any C programming before, so lets move on to the higherlevel alternatives. Once you have those sussed out, come
back to this one and it will be clearer.
Even humble
Ncurses, mixed
up with a bit of
Perl, is capable of
parallax-scrolling
starfields.

High-level: Python and Pygame


Moving on, we have three lines which initialise the SDL
subsystems, tell the compiler that we want to call the SDL_
Quit routine when the program ends, and create a new
window (640 pixels wide by 480 high, in 8-bit colour mode for
256 colours).
The call to SDL_SetVideoMode returns a structure
containing the display information, so we make sure that our
screen pointer is pointing to it.
Next up, we have two for loops. The first one populates
our array of stars, using random X and Y locations for the
starting points, along with their speeds. Try changing the 16 in
the speed line to higher and lower numbers to see the effects.
The second for loop tells SDL that we dont want it to pester
us with every keyboard or mouse movement event it receives
it should ignore anything apart from when the user closes
the window.
And then we come to the while loop, where all the fun
takes place. We tell SDL that we want to perform this loop
until the window is closed, and our first order of business is to
blank out the screen with black (RGB 0, 0, 0) using the SDL_
FillRect function (the NULL here makes it draw to the whole
window). Then we cycle through the star array, updating the
horizontal positions of each star by subtracting their speed
values from their x coordinates. The windows pixels go from
0 on the left to 639 on the right, and 0 on the top to 479 at
the bottom. So, if a star flies off the left of the window (ie, its

Compared with what weve just done, life is a lot easier when
youre using Python and its Pygame library. Working with
graphics is a lot simpler, and you dont have to fiddle around
with pointers. Heres the code its called starfield.py, and if
youve got Pygame installed then you can run it with ./
starfield.py. If youre new to Python you should find this code
quite readable, but its important to note the indentation here,
which is vital for program flow. Code belonging to blocks
(such as loops) must always be indented.
#!/usr/bin/env python
import pygame
from random import randrange
MAX_STARS = 100
pygame.init()
screen = pygame.display.set_mode((640, 480))
clock = pygame.time.Clock()
stars = []
for i in range(MAX_STARS):
star = [randrange(0, 639), randrange(0, 479),
randrange(1, 16)]
stars.append(star)
while True:
clock.tick(30)
for event in pygame.event.get():
if event.type == pygame.QUIT:
exit(0)

A path to follow
Were bound to start a few flame wars here, but
anyway If you want to spread your wings and
become a great all-round programmer, these
are the languages wed recommend exploring:
Assembly This will get you familiar with the
nuts and bolts of programming, working directly
with hardware and memory. Telling the CPU
directly what to do is way more satisfying than
magically obscuring things behind compilers.
x86 is a bit of a mess, so to start with get a ZX
Spectrum or Commodore 64 emulator, and try
out their assembly languages (Z80 and
6502 respectively).

70

C Its close to being the standard


programming language, if there could ever be
such a thing. Its available almost everywhere,
most of the Linux kernel is written in it, and it
combines low-level features with some higherlevel abstractions. Most implementations of
other programming languages are written in C.
Python This gives you a taste of a high-level
language, with object orientation and highly
readable code. Theres a vast range of add-on
modules, making it ideal for all kinds of
programming, from network tools to GUI
desktop apps. Well look at Python later on.

Lisp The constant use of brackets may drive


you insane, but its a good way to learn about
functional programming a very different
approach to the likes of C and Python.
Were not saying these are the best languages
to learn, or indeed the most useful (if you want
to make a career in programming, then youll
want to learn Java, C# or Objective-C).
But we think that if you spend some time with
the above languages, youll absorb a huge
amount of knowledge and flesh out your
programming prowess so that youre a great
all-rounder.

More languages
screen.fill((0,0,0))
for star in stars:
star[0] -= star[2]
if star[0] < 0:
star[0] = 640
screen.set_at((star[0], star[1]), (255, 255,
255))
pygame.display.flip()
Logic-wise, this implementation is very similar to the C
one, so you can compare features in the languages. We start
off by telling Python that we want to use the Pygame module
and the randrange number-generating routine from the
random module, and then set up a variable called MAX_
STARS which has the same purpose as its C equivalent.
Then we get Pygame fired up and create a window, assigning
it to a variable called screen, before setting up a background
timer using Pygames clock facility, to slow things down
a bit later.
Next comes the construction of our star array (or list in
Python parlance), in the form of stars = []. You can see that
Python is a lot more flexible than C, and you dont have to
declare the size of an array at the start. What we do here is
create, step by step, 100 star objects and provide each of
them with three values: the X coordinate, the Y coordinate,
and the speed, just like in the C version. After creating each
star object we drop it into our stars list using append.
Then theres the while True loop, which is the main loop.
First, we have a delay using clock.tick, and then we tell
Pygame to process any keyboard or mouse events coming in
(so it can quit the program if the user closes the window).
Then, we use the fill routine of our screen object to black out
the window, and start cycling through the stars. For each star,
we subtract its third element (speed) from its first (X
position), making the stars move left (elements in an array or
list are counted from zero). Like in the C version, we check to
see if the star has gone off the left-hand side of the screen.
Then we plot a white pixel (255, 255, 255 in RGB format)
at the X and Y positions, and were done with the star
processing loop. Lastly, we call the display flip routine, which
renders all of our previous drawing operations to the screen.
Overall, its shorter, simpler and easier to read than the
C/SDL version, and you can imagine that writing games in
Pygame is a lot of fun.

Unusual-level: Perl and Ncurses


Finally, lets take a look at that equally loved and hated master
of text processing, Perl. And to mix things up even more,
instead of using graphics to render the starfield, were going
to use the text terminal. How? By printing full-stop characters
for the stars! Theres a very helpful library called Ncurses
thats available to most programming languages; it makes
handling the terminal window (such as moving around,
disabling the cursor, etc) pretty easy. Heres the code, which
youll find in the zip file listed on the contents page:
#!/usr/bin/perl
$numstars = 100;
use Time::HiRes qw(usleep);
use Curses;
$screen = new Curses;
noecho;
curs_set(0);
for ($i = 0; $i < $numstars ; $i++) {
$star_x[$i] = rand(80);

Get hacking!
Once youve spent a bit of the time with
these programs, why not see if you can
expand them? Some ideas:
Try experimenting with different
window sizes and dimensions.
For the C and Python versions, you
can add a few lines so that the stars
have random colours.
Take input from the keyboard to
affect the direction of the stars and
their speed.
For C and SDL, see the documentation
website at www.libsdl.org/cgi/docwiki.
cgi the quick guide is especially

useful. Pygame has especially good


tutorials and reference guides at www.
pygame.org/docs, while Perl + Ncurses
arent so well documented, but you can
often find examples of a specific
instruction by searching online. Also see
http://tldp.org/HOWTO/NCURSESProgramming-HOWTO its for C, but
most of the functions are implemented
in Perl as well. And if you get totally
stuck, or just want to share your work,
pop by the LXF forums at www.
linuxformat.com/forums and head
into the Programming section.

$star_y[$i] = rand(24);
$star_s[$i] = rand(4) + 1;
}
while (1) {
$screen->clear;
for ($i = 0; $i < $numstars ; $i++) {
$star_x[$i] -= $star_s[$i];
if ($star_x[$i] < 0) {
$star_x[$i] = 80;
}
$screen->addch($star_y[$i], $star_x[$i],
.);
}
$screen->refresh;
usleep 50000;
}
By now, the general structure of this code should make
a lot of sense to you. Theres one major difference here,
though; instead of having an array of stars containing
coordinates and speeds for each one (ie, an array of arrays),

Quick
tip
If youre thinking
about writing
a game, check out
the Allegro library
(http://alleg.
sf.net). Its stable,
mature, crossplatform and many
impressive games
have been written
with it, as you can
see at www.allegro.
cc (browse the
Action category in
Projects on the left,
for instance).

Theres a helpful library called


Ncurses thats available to most
programming languages.
to make things simpler weve just set up three arrays. star_x
holds the X coordinates of the stars, star_y the Y coordinates,
and star_s the speeds (at least one, so that no stars are
static). Youll see that the dimensions of 80x24 refer to the
standard X Window System terminal size, but you can change
these for bigger terminals.
Here, addch is the Ncurses routine to print a character
and interestingly, it takes the Y coordinate first. usleep pauses
execution for the specified number of microseconds. Oh, and
the noecho and curs_set instructions at the start say that we
dont want to see our own keyboard output or the text cursor.
So, weve explored generating a starfield in three
languages with three toolkits, and hopefully it has tempted
you to try more languages and libraries. You can see how the
basic algorithms in a program are usually the same across
implementations, but theres always something new to learn.
Enjoy, and happy hacking. Q
71

More languages

Scheme: Learn
the basics

To help you get used to learning new languages, Jonathan Roberts explains
how to get started with the simple but popular Scheme.

o far in this guide weve given you a crash course in


Ruby and introduced you to the essentials of some
other popular languages. One that we havent
mentioned so far is Scheme, which is a shame really, as many
of the most well regarded introductions to programming and
computer science choose it as their teaching tool.
While there are many reasons for this, suffice it to say that
Scheme is a popular choice because its a simple language
with little syntax to learn. This means it does a wonderful
job of demonstrating many important principles that are
obscured in other languages by their more complex syntax.
As well as being a good language for new programmers,
Schemeis also interesting to those with more experience.
Unlike most of the languages well focus on in this guide, its
primarily a functional language, as opposed to an objectorientated or imperative one. As such, its a great chance to
see some different approaches to common problems.
With all that in mind, over the next three issues were
going to introduce you to Scheme. To begin with, were going
to focus on some of the basics of Scheme, with later articles
looking at some particularly functional approaches to
solving problems.

Installing Scheme
Almost all Linux distributions come with a Scheme
interpreter built in. Its called Guile, but its pretty complex and
we wont be using it. Instead, were going to use a tool called
Dr. Racket, which you can find in some distributions
repositories or by heading to its official website: http://
racket-lang.org/download.
Dr. Racket is actually designed to work with a particular
dialect of Scheme (which is itself a dialect of Lisp, the
language Emacs is written and extended in) called Racket.

Dr. Racket is our tool of choice for programming in


Scheme. It provides many useful features, including
bracket balancing and tools for debugging code.

But its a great programming environment, and you can


make it work with standard Scheme by going to the Language
> Choose Language menu and selecting R5RS from the
window that appears. With Dr. Racket happily installed,
we can start programming.

Simple expressions
All programming is really about manipulating information,
data. That data can represent anything, from the ingredients
used to mass produce a certain kind of biscuit to the position
of a robots arms and claws on a factory assembly line.
In order to manipulate this data, we must find ways to
represent it that a computer can understand, and we must be

Dr. Racket
Dr. Racket is a programming environment that
makes coding in Scheme much easier than in a
plain text editor. Its main window is split in two.
The top half can be used for entering, saving
and loading entire programs, made up of many
functions and definitions. When you have
completed work on part of a program and you
want to see how it works, press the Run button
in the top-right of the screen and Dr. Racket will
evaluate the program and display any results in
the bottom half of the window.
As well as being used to display results of

72

programs entered in the top half, the bottom


half of Dr. Rackets main window can be used as
an interactive interpreter. If you type Scheme
expressions in to it and press return, it will
immediately show the results.
This is a handy way to quickly check that
small snippets of code work as you expect. Its
also the part that weve used throughout this
months tutorial.
There are other parts of Dr. Racket that make
it an appealing environment to code in.
Elsewhere in the article, weve mentioned its

automatic bracket balancing, which helps us


to avoid syntactic mistakes. However, it also
provides a number of other tools to help us spot
mistakes in our code.
For instance, if you try to run a program with
a mistake, Dr. Racket will provide a description
of the error and even highlight the part of the
code where it occurred. Whats more, when we
start working with more complicated programs
the Debug button in the top-right can help us
track what the program does and understand
how it evaluates different types of expressions.

More languages
Programming paradigms
At the start of this article, we said one of the
things that sets Scheme apart is its a largely
functional language, as opposed to an objectorientated or imperative one. If youre new to
programming, this might not have meant much
to you, but weve got you covered.
These three obscure sounding titles refer to
different programming paradigms that is,
particular ways of thinking about problems and
programming solutions for them, different styles
of programming.
When programming in an imperative style,
the main focus is on recording and adjusting the
state of the program. This is often done through
the use of variables and functions that modify
their value directly. This particular paradigm

matches the way programs are executed on the


hardware, but it can make programs harder to
develop and test, since the programmer must
take in to account the state of many external
variables, not just the inner workings of a single
function. That is to say, in imperative languages
functions have side-effects. The imperative
paradigm is most often associated with
Assembly language or C.
Object-orientated programming attempts to
resolve the problem of external variables by
structuring a program around the idea of
separate objects. Each object records its own
state, which it keeps separate from the rest of
the program. Each object also specifies
a number of methods (functions) that let other

able to define the processes that we will use to manipulate it,


too. Doing this for complex kinds of data is obviously very
challenging, so were going to start off with a far simpler
type: numbers.
In Scheme, numbers and many common (and some not
so common) operations that you might perform on them are
known as primitives theyre built in to the language itself.
You can experiment with this in the bottom half of the Dr.
Racket window.
This part of the window is an interactive interpreter: you
can type Scheme expressions in to it and the result of
evaluating the expression will immediately be displayed to
you. Try typing a 5 and pressing return and youll see another
5 this tells us that the result of evaluating a simple number
(a primitive) is the number itself.

Doing calculations
If you combine numbers with their basic operations, you can
use Dr. Rackets interpreter and Scheme as a simple
calculator, although the syntax is a bit different from what you
might be used to from school:
> (+ 5 5)
10
> (- 8 4)
4
> (* 2 3 4)
24
As you can see, the operation comes first and the
numbers its applied to (the operands) come afterwards;
whats more, the whole expression must be wrapped in
brackets. This syntax has two advantages: first, you can easily
apply the operation to as many numbers as you like without
having to repeat the symbol.
The other advantage is that theres never any question
about what order to perform the calculation in, just
remember that you evaluate the inner-most expression first
and work your way out:
> (+ 3 (* 2 4) (+ 4 (- 8 3)))
20
In this example, you calculate (- 8 3) first, then (+ 4 5) and
(* 2 4), before finally doing (+ 3 8 9). To make it easier to keep
track of all the brackets, Dr. Racket will highlight matching
brackets and which part of the expression they apply to as
you close them off.
Our lives are also made easier thanks to a convention
called pretty-printing. Rather than writing the entire

parts of the program inspect and modify its


state. It models the real world well, where most
things appear to us as individual objects.
A cooker, for instance, is an object. It has
controls, like methods, that let me modify and
inspect its state without affecting other parts of
the kitchen.
Finally, functional programming attempts
to do away with the idea of state completely.
There are no external variables, no side effects.
This makes designing and testing programs
much simpler, since every time a function is
run with the same input values, it will return
the same output. Theres no need to take into
consideration any external variables, or its
interaction with any external objects.

expression and all its sub-expressions on a single line, you


can break it, aligning all the operands (the numbers each
operation is applied to) vertically:
> (+ 3
(* 2 4)
(+ 4
(- 8 3)))

Variables
When were programming, the data were working with rarely
represents just numbers. For instance, a number might be
the quantity of elderberries needed to make 30 litres of
elderberry wine, or it might be the volume of a bottle needed
to store the wine thats being made.
To make our program more manageable as it becomes
more complex, to make it more readable, one of the most
important things a language can do is let us give the numbers
were working with names.
In Scheme, this is done with the define statement:
> (define pi 3.14)
> (define radius 3)
As you can see, it works just like any of the mathematical
operators in our earlier example. The only difference is that

Pretty-printing and bracket-balancers make dealing with Schemes many


brackets much easier. Once you get the hang of the brackets, youll come to love
the lack of ambiguity.

73

More languages
the operands must come in a certain order: first is the name
you want to give to the variable and then comes the value to
be assigned.
After defining a variable, you can then use it just as you
would any primitive value built in to the language:
> pi
3.14
> (* 2 pi radius)
18.84
A good way to think about how Scheme evaluates this
expression is like you were taught to do algebra in school:
through substitution. It will look up the values for pi and
radius and then re-write the expression with these values in
place: (* 2 3.14 3). After getting down to primitives, it will then
carry out the final evaluation and return the result (this is
a simplification, but its a good way to think about things).

Procedures, aka functions

Pretty-printing
helps separate
the function
name and formal
parameters from
the body. Once
defined, we can
use functions
just like any
other primitive
operation.

Were not restricted to giving names only to primitive pieces


of data, however. We can even give names to entire
procedures if we want to.
The second expression, above, calculates the
circumference of a circle with a radius of 3. You might
remember from school that theres a general formula for
describing how to calculate the circumference of any circle:
2 times pi times the radius.
In the expression above, we executed a specific instance
of the procedure described by this formula, but Scheme lets
us express the general form, as well:
> (define (circumference radius) (* 2 3.14 radius))
The final part of this expression (* 2 3.14 radius) is exactly
the same as we saw above (* 2 pi radius), the only difference
is that the value of radius isnt set yet. Instead, it refers to the
name in the previous part of the expression (circumference
radius).
We can also define procedures using pretty-printing:

> (define (circumference radius)


(* 2 3.14 radius))
This makes it easier to distinguish between the different
parts of the expression: the top line gives the name and any
parameters, while everything below that is the body of the
procedure the actual operations to be carried out. Once
weve defined our procedure like this, we can use it just like
any of the primitive operations:
> (circumference 10)
62.800...
> (circumference 5)
31.400...
What happens when we do this is that, within the
expression, the substitution model is once again applied
circumference is substituted by the body of the procedure,
while the value given in place of radius is put in to the
appropriate parts of the expression. In these examples, this
results in the evaluation of (* 2 3.14 10) and (* 2 3.14 5).

Conditional expressions
One tool that all languages, including Scheme, have is the
ability to take an action only if a certain condition is true. This
is obviously a vital ability, as its something we do all the time
when following a procedure. For example, if the elderberry
wine is clear on day 35, progress to the next step; if its not,
keep stirring.
In Scheme, this ability is implemented through three
separate tools. The first is the existence of relational
operations that allow us to inspect the relationship between
two things:
> (< 10 5)
#f
> (> 10 5)
#t
> (= 10 10)
#t
< is the less than symbol, and checks whether the number
on the left is less than the number on the right; > is the
greater than symbol and checks whether the number on the
left is greater than the number on the right; finally, = tests
whether the two numbers are equal. These operations will
always return #f for false and #t for true.
The second is the existence of the boolean operations that
allow us to combine the results of multiple relational
expressions:
> (and (> 10 5) (= 10 10))
#t
> (or (< 10 5) (= 10 10))
#t
> (not (> 10 5))
#f
As is ever the case, and returns #t if, and only if, all the
expressions within are true, or if any of the expressions are
true, and not inverts the result of the expression, turning true
to false and false to true.

Case analysis
The final tool that Scheme gives us is the ability to perform
case analysis. This allows us to perform different operations
depending on the result of any of the tests described above.
The general form of a case analysis in Scheme is:
(cond (<test> <expression>)
(<test> <expression>)
.
(<test> <expression>))

74

More languages
By putting
the code for
Fizz-Buzz in
the definitions
window, the
top part of Dr.
Racket, we can
use its debugger
to walk through
its execution
step-by-step.

The evaluation of a cond expression like this checks the top


test first. If its true, it will evaluate the associated expression
and the case analysis will be finished; it its false, however, it
will proceed to the next test down and do the same. If none of
the tests are true, it will
eventually reach the bottom of
the cond expression and
nothing will have happened.
Theres an alternative form
of cond, too, that replaces the
final test with the keyword
else. This means that if none of
the other tests evaluate to true, then carry out the final
expression whatever.

((= (remainder x 5) 0) buzz))


(else shh))
You can see how this fits the pattern of the case analysis
laid out above, even as it introduces a few other concepts.
First, note that remainder
is a primitive function that
returns the remainder of
a division between two
numbers, eg, the remainder
of 5 divided by 3 is 2. By
checking to see if the
remainder is 0, we can check whether a number is
a multiple of another. Second, note that to use plain English
words in Scheme, we have to put a single quote at the
beginning of the word.
The first test, then, checks to see whether x is a multiple of
3 and 5, in which case it evaluates the expression fizz-buzz,
which as with numbers, returns itself. The second checks
whether x is a multiple of just 3, returning fizz, and the one
after that whether x is a multiple of 5, returning buzz. The
else clause at the end returns shh in case the number isnt
a multiple of either 3 or 5.
Now you know all the basic elements of the Scheme
programming language, well move on to look at lists and
recursion two elements that are put to great use in
functional programming languages. Q

To use plain English


words, put a single
quote at the beginning.

Fizz-Buzz
For instance, imagine we were making a program to play the
game Fizz-Buzz. The rules of the game say that if a number is
a multiple of 3, it should return the word Fizz, if its a multiple
of 5, it should return the word Buzz and if its a multiple of 3
and 5, it should return Fizz-Buzz:
> (define (fizz-buzz x)
(cond ((and (= (remainder x 3) 0))
(= (remainder x 5) 0))
fizz-buzz)
((= (remainder x 3) 0) fizz)

Exercises
Having learned the basics of Scheme,
you ought to solidify your knowledge by
working on some example problems.
Having a good grasp of the material
covered here will make sure youre
ready for the next tutorial.
EXERCISE 1 Without using Dr. Rackets
interactive interpreter, evaluate the
following expression: (+ 2 (* (- 5 3) 4 (/
2 (- 3 1)))). Rewrite the expression in
pretty-printing format.
EXERCISE 2 Write a procedure, called

currency-conv, that accepts a value in


Pounds Sterling and returns a value in
US Dollars. Use Google to look up the
current rate of conversion.
EXERCISE 3 Without using Dr. Rackets
interactive interpreter, evaluate the
following expressions:
> (define a 2)
> (define b 3)
> (define (square x) (* x x))
> (square 2)
> (define (sum-squares a b)
(+ (square a) (square b)))

> (sum-squares 2 4)
EXERCISE 4 A cinema sets prices
based on age. If youre younger than six,
you get in for free; if youre younger
than 12, you get in for 3; if youre older
than 65, you also get in for 3.
Otherwise, you have to pay full adult
price, which is 5. Complete the
following Scheme procedure to
calculate how much someone has to
pay to go to the cinema.
> (define (ticket-price age)
(...

75

More languages

Ready to get your brain in a twist? In the second part of our guide to Scheme,
Jonathan Roberts introduces recursion and some useful data types.

ast time, we introduced you to Scheme and walked you


through the basics of the language, and indeed of
programming in any language. By the end of it, you
knew how to create procedures, including ones that can
handle different situations through case analysis. This time,
were going to introduce you to recursion. As we do so, were
going to introduce a new data type that will let your programs
work with much more than simple numbers.

Recur what?
If youre put off by the weird-sounding name recursion in
the introduction to this article, dont panic as theres nothing
much to it really; in fact, once you know what its all about,
you may think its pretty cool that is, if you enjoyed
Inception.
If youve ever watched that movie, or stood in a hall of
mirrors and seen the way the reflections seem to carry on for
ever, thats recursion the thing that youre interested in
repeats itself within itself: dreams within dreams, mirrors
within mirrors.
To understand how this can be put to work in
programming, imagine that you work for a large publishing
company, with 100 advertising sales people, 30 database
marketers, 20 photographers and 100 magazines, each with
six people working on them. Your boss comes along and asks
you to find out what every single person had for breakfast.
Ugh! you think, what a boring task!.
After procrastinating for a while, you decide that the
easiest way to get out of this is to pass the ball along. You call
the head of advertising and say that the boss wants them to
find out what all their staff had for breakfast, and ask them to
let you have the information once theyve found out; you then
do the same to the head of database marketing, the

photographers and the magazines. Obviously, the head of the


magazines doesnt want to ask 600 people, so she does the
same thing you did: asks the head of each magazine to find
out what each of their staff had for breakfast, and to let her
know the results when theyve found out. The advertising
team do the same thing.
The magazine editors have a small enough team to just
ask everyone, which is what they do. After some time passes,
magazine editors begin reporting in what their staff had for
breakfast, as do the advertisers. Eventually, all the managers
you contacted know what all the staff below them had for
breakfast, and they ring you and let you know. You can now
tell the boss.
After getting in touch with just four people, youve
managed to find out what 750 people had for breakfast this
is recursion in action. One over-arching task, finding out what
everyone in the company had for breakfast, was completed
by breaking it in to smaller, although identical, sub-tasks, with
a tiny amount of effort. Whats more, while the whole task
seemed daunting and complex to solve, the smaller tasks
were very easy to solve just ask a handful of people what
they had for breakfast.

Factorials
Exactly the same principle can be applied to code. A popular
example of this is calculating factorials that is, calculating
the number of ways a list of items can be arranged. By itself,
this sounds like a daunting problem to solve. Where on earth
do you start, and how can you be sure that youve got all
the arrangements?
Mathematicians figured out how to do this a long time
ago, and the procedure can be described thus:
If the length of the list is one, then the answer is 1 theres

Iterative recursion
Experienced programmers may know that the
factorial problem is easily solved in other
languages through a more common method.
Instead of writing a recursive function, you can
just assign some variables and write a loop, as
this Python example shows:
def factorial(n):
fact = 1
for x in range(1,n + 1):
fact *= x
return fact

76

This is, in fact, a more efficient way to solve


the factorial problem. By reusing variables like
this, the amount of memory required to calculate
a factorial is constant, no matter how big a
number we feed in to it.
In contrast, the recursive implementation
given in the article will need more and more
space in order to remember all the parts of the
multiplication it will have to do later. If you write a
few more examples of our Scheme procedure by
hand, youll see this illustrated in the way the line

www.linuxformat.com

grows to the right. Scheme isnt inefficient,


however. Being a simple language, it doesnt have
a special form to support loops like this, but the
same effect can be achieved with standard
procedures and recursion:
(define (fact n)
(define (fact-iter count result)
(cond ((> count n) result)
(else (fact-iter (+ count 1) (* count
result)))))
(fact-iter 1 1))

More languages
only one way to arrange a list with a single item in it.
For any other length list, the number of arrangements
equals that lists length multiplied by the factorial of all the
length lists smaller than it.
For example, 1 factorial (written 1!) equals 1; 2! = 2 x 1!; 3! =
3 x 2! x 1! etc...
There are two points to note here. First is that this is
a recursive technique the factorial of any number can be
found by finding the factorial of smaller and smaller numbers.
The second is that theres a base case, that is a simple case
at which the recursion stops and a simple, primitive answer is
returned (in this case, the base case is 1! = 1).
This can be translated in to a Scheme procedure quite
literally:
(define (factorial n)
(cond ((= n 1) 1)
(else (* n (factorial (- n 1))))))
The clever thing about this is that the procedure calls
itself! This is what makes it a recursive procedure. To
understand how this works, you can apply the substitution
model that we talked about last time and work through
a small run of factorial by hand:
(factorial 3)
(* 3 (factorial 2))
(* 3 (* 2 (factorial 1)))
(* 3 (* 2 1))
(* 3 2)
6
Each time the procedure is evaluated, the interpreter
substitutes the formal parameters into the appropriate parts
of the procedures body. Instead of reaching primitive
operations that it can evaluate straightaway, however, it finds
it has to evaluate another procedure itself.
Eventually, the parameter given to factorial will be 1,
which is a primitive and can be evaluated, and the procedure

Exercises
As with the previous tutorial, here are
some Scheme-based exercises to keep
you busy.
EXERCISE 1 Write a procedure that
takes a list as input and returns the
reversed version of it: eg, (1 2 3 4)
becomes (4 3 2 1).
EXERCISE 2 Draw a box and pointer
diagram that shows how you can use
pairs and lists to build a tree structure.
EXERCISE 3 The Towers of Hanoi is a

famous puzzle. In it, there are three


pegs: A, B and C. On peg A is a stack of
six discs, each of a different size, with
the largest at the bottom and the
smallest at the top. The challenge is to
move the entire stack to peg C, obeying
the following rules:
1. Only one disc can be moved at a time.
2. A disc can never be sat on another
smaller than it.
EXERCISE 4 Describe a recursive
method for solving this puzzle. For an
extra challenge, implement it in Scheme.

can unravel itself, just like the managers all reporting back to
the person above them.
Without this base case, however, the recursive procedure
would run forever and your interpreter would eventually give
up. Having a base case, then, is a vital element in all recursive
programming and thinking, and its always wise to start
writing recursive procedures with a base case to ensure you
dont miss it.

Pairs
The other way recursion is put to work in programming is in
creating data representations. To see how this works in
Scheme, we must first introduce one of its fundamental data
structures: pairs. These are Schemes data building blocks.
Remember what we said at the start of the last article:
programming is all about data, and specifically manipulating
that data. So far, weve seen how to write procedures that can

Dreams within
dreams, mirrors
within mirrors

77

More languages
manipulate simple numbers, but its rare that real world data
is this simple.
For instance, we might want to write procedures that can
manipulate information about a CD, including title, artist and
year of production; equally, we might want to write
a procedure that controls the pixels on our monitor, including
their colour and position.
In the pixel example, we could obviously treat each piece
of information separately, but its far more natural to
understand the pixel as a single object thats composed of
the other, smaller bits of data.
In Scheme, it is pairs that
let us represent compound
data. For instance, to
represent a pixel on screen at
position (3, 4), we could create
a pair called point:
> (define point (cons 3 4))
The procedure cons, short for construct, takes two
arguments, which it compounds together in to a single object.
The elements of that object, that is, the x and y coordinates,
can be accessed through two more primitive procedures, car
and cdr.
> (car point)
3
> (cdr point)
4

Then you can create the distance procedure, using the


primitive sqrt:
(define (distance point)
(sqrt (+ (square (car point))
(square (cdr point)))))
To use this, you can then create a new compound
structure, as we did above, and pass it to distance, where car
and cdr do the work of accessing the individual coordinates.
Now, cons, car and cdr all have their names for historical
reasons, and what they do is hardly obvious. You can make
your programs more readable,
however, by wrapping them in
other procedures. For
instance, the following
procedures give more obvious
names, given the context of
our distance program, that
will make it easier to read:
(define (make-point x y)
(cons x y))
(define (get-x point)
(car point))
(define (get-y point)
(cdr point))

In Scheme, its pairs


that let us represent
compound data.

Calculating distance
Once youve created a new compound object with cons, you
can pass it around just as you would any primitive object. To
see how this works, consider using cons, car and cdr to write
a small procedure that calculates the distance a point on a
graph is from the origin, that is the point (0, 0). To do this you
need to:
Add the squares of the x and y coordinates together
Take the square root of the resulting number
As ever, this translates quite literally in to Scheme. First,
create the square procedure:
(define (square x)
(* x x))

Pairs within pairs


This is quite useful, but what if you wanted to represent
a compound object with more than two elements? The
answer is that cons can be used to combine any kind of
object, even other pairs:
> (define x (cons (cons 1 2) (cons 3 4)))
> (car x)
(1 . 2)
> (cdr x)
(3 . 4)
> (car (car x))
1
> (car (cdr x))
3
This time, you can see that the car and cdr of x both point
to another pair, where calling car and cdr once again will

Scope
In the iterative version of factorial that we
demonstrated, you may have noticed that we
defined a procedure inside another procedure.
What was going on here? The first thing you need
to know is that a program is constructed of a
number of environments. These environments
provide a mapping between names, including
primitive or user-defined ones, and the values
and procedures they specify. The environment
can then provide a context for procedures to be
evaluated in. For example:
> (define x 10)
> (define y 7)
> (+ x y)
17
The define procedure adds the names x and y to
the environment, associating them with their
respective values. When the addition is
performed, the interpreter looks up the
procedure associated with the symbol +, and

78

then substitutes the values of x and y in to its


body for evaluation, all according to the values
stored in the environment.

Recursive environments
One problem with the environment model is that
there can only be a single occurrence of each
name in an environment, since otherwise the
interpreter wouldnt know which value or
procedure you meant to refer to when you used
it. The trouble is, within a single program you
might want to refer to different kinds of points in
different circumstances, but youd be unable to
re-use the make-point, get-x and get-y
procedures, since theyd already been used.
To get around this, Scheme has many
environments arranged in a hierarchy. At the top
of the hierarchy is the global environment, which
we saw above. Below this, however, each
procedure gets its own unique environment, with

its own set of names and values. So when we


do (define (make-point x y) were creating a
new environment.
All the assignments made in this local
environment are then invisible to the outside
world. Nothing from the global environment can
see inside. So if you define a global procedure
get-x and a local one, they wont conflict.
Its worth noting that local values always take
precedence over global ones, so if you do define
two values with the same name, the local value
gets evaluated. If theres no local value supplied,
however, the interpreter will look to whichever
environment surrounds it for a value to use.
Its often a good idea, then, to define helper
procedures, such as fact-iter, inside the local
environment rather than the global one. This
decreases the chance of creating names that
conflict, and helps make the close association
between two procedures clearer.

More languages
reach the base values. As you can see, when using car and
cdr as we did before, you get access to the pairs that are held
in those positions. By calling car or cdr again, you then get
access to the primitive data thats held inside those pairs.
Dreams within dreams, mirrors within mirrors, pairs within
pairs this, as you might have guessed, is a recursive data
structure. It is a pairs ability to combine other pairs that
makes it such a powerful building block, allowing us to build
all kinds of fancy data structures that we can use to represent
the real world with. For instance, pairs can be used to
represent sequences:
> (define li (cons 1 (cons 2 (cons 3 ()))))
> li
(1 2 3)
Here, the cdr of each pair is the next item in the list. The
final pairs cdr is a special symbol, (), that is used by Scheme
to represent an empty list. This is, however, such a common
structure in Scheme that there is some syntactic sugar to
make this easier to code:
> (define l (list 1 2 3))
This will create exactly the same list, only called l and
without all the confusing cons. The elements of a list can be
accessed with successive cars and cdrs:
> (car l)
1
> (car (cdr l))
2
> (cdr l)
(2 3)
The third example here is the most important. Notice that
the cdr of the list is the list minus the first element. Were
going to put this to use in a moment.

Finally, you can see


that a sequence is easily
represented by a chain
of pairs, each with its cdr
pointing to the next pair in
the sequence, and its car
to the value of the current
element. The final cdr has a
strike through it, representing
(), the empty list.

list

car

cdr

car

cdr

car

cdr

car

cdr

car

cdr

car

cdr

car

cdr

Recursing a list
Lists, being a recursive data structure, are naturally dealt with
by recursive procedures. Imagine writing a procedure to find
the length of a list. That sounds quite complicated, but if you
recognise that the empty list, at the end of the list, has length
0, then youve got a base case that can be used for making
a simple recursive procedure to do the hard work for you.

You
Advertising

Advertising
managers

Database
marketing

Magazines Photography

Magazine
editors

Magazine
staff
By asking a few people, recursion enables you to find out
what many people had for breakfast. The same technique
can be used to solve difficult programming problems.

(define (list-length list)


(cond ((null? list) 0)
(else (+ 1 (list-length (cdr list))))))
In this example, weve used the primitive null? predicate
that checks to see if were looking at an empty list.
Besides that, the procedure is almost identical to the one
we used to calculate factorials, the only difference being that
rather than returning a smaller number, we use cdr to return
a smaller segment of the overall list, and the base case is the
empty list rather than 0.
As another example, consider the task of returning the nth
element of a list. You could do it manually with a lot of cdrs,
but that would look terrible and be very error prone. Its quite
easy to solve as a recursive procedure, though:
(define (list-get list n)
(cond ((= n 0) (car list))
(else (list-get (cdr list) (- n 1)))))
Once again, the structure of the procedure is exactly the
same as weve seen before, the only things were changing
are the base case and what we pass on to the next iteration of
the procedure. In this example, weve had to construct an
external base case that is, n, the number of the element we
want to return from the list, which we reduce as we step along
the list, counting the number of elements weve already
passed. Once this counter reaches 0, we return the current
element, not the rest of the list (hence we use car).
Thats all weve got space for in this tutorial. In the next
and final article of this chapter, well look at map, filter,
enumerate and accumulate, four procedures that are key to
functional programming, and which can even shed some light
on how to get the best out of Bash. Q
79

More languages

Scheme: Highorder procedures


Jonathan Roberts explains how to take your Scheme skills to the next level
with an advanced technique for building procedures that can multi-task.

n the previous tutorial, we introduced Schemes universal


data building block, pairs, along with the idea of recursion.
This month, were going to look at a more advanced idea
that is central to functional programming: higher-order
procedures. Just as we saw with giving names to variables
and functions in part one of this series, or with the use of
pairs which let us pass related bits of data around a program
together, higher order procedures are another technique for
managing complexity in programs.
Instead of giving names or combining lumps of data
together, however, higher-order procedures let us build
procedures that are capable of doing more than one thing,
based on their parameters.
In most programming
languages, the ability to
construct higher-order
procedures comes from
their treatment of functions
as first-class objects that
is to say, like any other
primitive, they can be passed as arguments to other
functions and returned as values from them, too. To see how
this works, how it relates to higher-order procedures and to
see why its useful, read on.

(define wish-list-b (list laptop holiday wallet))


One thing you might like to do is find out whether
a particular wish list contains a certain item maybe you
want to use this information to find users with similar desires,
and make recommendations based on their peers wish lists.
In Scheme, you could write a simple recursive procedure
to do this:
(define (car-in-wish-list wish-list)
(cond ((null? wish-list) false)
((eq? (car wish-list) car) true)
(else (car-in-wish-list (cdr wish-list)))))
This example tests to see whether a given wish list
contains the symbol car: if it does, it returns true, otherwise,
it returns false. As with all our
work on lists last time, this
procedure takes a recursive
approach to solving the
problem. It looks very much
like the procedures we saw in
the Part 2:
Theres a base case at the start of the function, ensuring
that we dont get stuck in a never-ending loop. In this
instance, our base case is the empty list, as found by the
primitive null? predicate.
We then do some work on the current element of the list,
this time checking to see whether it meets a certain criteria
that is, whether its the symbol car.
Finally, we apply the same process to the remaining
elements in the list.
As clever as this is, its actually not that useful. What
happens if you want to check to see whether a wish list
contains the symbol lxf-subscription or laptop? Youd have

Were going to look at


an idea that is central to
functional programming.

Abstract procedures
Before we look at higher-order procedures, lets start by
looking at the simpler idea of abstract procedures. Imagine
that you run an online shopping business and your users can
construct wish lists of items they would like to buy.
In Scheme, a typical wish list might look like this:
(define wish-list-a (list lxf-subscription 1984 car))

Lazy evaluation
One very cool functional programming technique
that we havent had time to cover in detail is the
idea of lazy evaluation. The idea is that, if you
assign the result of an expression to a variable,
the value of that variable doesnt matter until
you make use of it. As such, evaluation of the
expression can be delayed until later in the
application or, depending on the expression,
it can be partially evaluated, doing only those
calculations that are strictly necessary at the
current point of the program.

80

As well as reducing execution time, this can


also reduce the amount of memory used when
expressions are applied to very large sets of data.
A good example of lazy evaluation would be a
random number generator. You wouldnt want to
use one that could only return, 10, 100 or maybe
1,000 random numbers, but rather you want one
capable of creating an infinite set of random
numbers. If you were to build an infinitely large
set of random numbers right from the start,
however, youd use an infinite amount of memory

and time to generate it. Instead, you can use lazy


or deferred evaluation to only generate one
random number at a time.
In Scheme, lazy evaluation can be achieved by
encapsulating variables within a scope and with
the delay procedure. Other languages provide
support for lazy evaluation as well, however;
in particular, Pythons iterator data structure
represents this idea in that language, and in
Python 3, a common example is the
range function.

More languages
to write new procedures for each and every symbol you want
to check for. This is bad because repetition leads to mistakes;
its also bad because it makes programming, normally a fun
and challenging hobby, into a boring one! To fix this, you can
create a more abstract, or more general, version of the
procedure. For example, consider what the differences would
have been between car-in-wish-list and laptop-in-wish-list:
the only thing to change between the two would be the
symbol car in the second line, which would become laptop.
Recognising this, you can easily create a more general
version by adding a new parameter to the procedure: the
symbol to search for in the list.
(define (in-wish-list symbol wish-list) ...

A higher-order procedure
This was an easy example, but in many situations featuring
repetition, the same techniques work: look at examples of the
similar procedures, note what changes between them, and
then abstract these out of the body of the procedure and in to
the parameters. In fact, exactly the same technique can be
applied when creating higher-order procedures.
Consider the following procedure, which extracts all the
even numbers from a list and returns them as a new list:
(define (ev? list)
(cond ((null? list) ())
((= (remainder (car list) 2) 0)
(cons (car list) (ev? (cdr list))))
(else (ev? (cdr list)))))
Again, in itself this is quite a clever procedure, but it could
be made more general or abstract. What would happen if you
wanted to create a new procedure that would extract all the
odd numbers from a list, or all the numbers that are divisible
by 3? Youd once again find yourself writing three almost
identical procedures: first checking for the empty list, then
checking to see if the current item in the list matches your
criteria, whether even, odd or divisible by 3, and then moving
on to the next item if it doesnt match.
As before, you can create a more abstract version of this
procedure by looking for the differences in similar functions.
Unlike in our last example, this time the difference actually
comes in the predicate used to check whether the current
item meets our requirements, which seems altogether more
difficult to abstract than a simple data variable.
The thing is, its really no more difficult, since in Scheme
predicates are just procedures, and procedures are first class
objects and can be passed in to other procedures just like any
other. This means that we can abstract the above procedure
in exactly the same way as before, by abstracting the
differences in to parameters.
(define (filter list pred)
(cond ((null? list) ())
((pred (car list))
(cons (car list) (filter (cdr list) pred)))
(else (filter (cdr list) pred))))
As you can see, weve given the procedure a new name,
filter, as it more accurately represents what this new, more
abstract version does. Weve also created a new parameter
called pred, a place holder for the predicate thats used to do
the work of the function - to determine whether or not a given
element meets our criteria. With this in hand, we can easily
find all even numbers, or all odd numbers, or all any category
of number, simply by writing a new predicate to do the work,
without nearly so much duplication:
(define (ev? list)
(define (z x)

Functional programming
Many of the examples weve looked at
over the last few months have involved
mathematical problems. These kind
of problems suit pure functional
programming well, since most
mathematical functions dont involve
side-effects. Whats more, mathematical
examples have allowed us to focus on
patterns and techniques, as opposed
to libraries for interacting with more
complex data types, such as files,
images or web pages, since numbers
are built straight in to most
programming languages.
Because of this focus, you might have
come to the conclusion that while
functional programming is an
interesting novelty, its not much use in
the real world. You would be wrong!
The recursive programming model is
well suited to working with files and
directories, while list or stream
processing can easily be applied to text
file processing or network programming.
For a good introduction to the

application of functional programming


to real-world problems, there are two
good free books. The first is Text
Processing with Python (available at
http://gnosis.cx/TPiP/) and uses
Pythons functional features to do
increasingly complex text processing.
This might not sound that interesting,
but all the configuration files on Linux
systems are made of text, and so are
web pages and many of the protocols
that power networks. Being able to
effectively do text processing with
functional techniques immediately
opens up a whole world of possibilities.
The other book is called Real World
Haskell (http://book.realworldhaskell.
org). Haskell is a functional
programming language thats increasing
in popularity. Its syntax isnt as simple
as Schemes, but it has a range of
libraries that let you do anything
from systems to GUI to database
programming using techniques that
should now be familiar to you.

car-in-wish-list

cdr wish-list

wish-list is null?

YES

(car wish-list)
equals car?

NO

Return true

Return false

This flow
diagram shows
how the recursive
procedure carin-wish-list goes
about checking
to see whether
a list contains a
given item.

81

More languages
(= (remainder x 2) 0))
(filter list z))
(define (threes list)
(define (z x)
(= (remainder x 3) 0)))
(filter list z))
Filter is one example of a higher-order procedure. Its also
one of three procedures designed to operate on lists, which
when combined together can be made to do an amazing
number of things for very little effort on your part. Lets look
at these other procedures now.

Higher-order procedures
The first of these other procedures were going to look at is
map, another useful higher-order procedure for operating on
lists, which lets us modify all the elements in a list.
(define (map list proc)
(cond ((null? list) ())
(else (cons (proc (car list))
(map (cdr list) proc)))))
With this procedure, you could pass a list of numbers in
and return a new list with the
square of all the original lists
elements contained within,
find the absolute value of all
those items (that is, removing
the sign from all the numbers),
or anything else that you
can imagine.
Weve shown you this example map procedure so you can
get an idea of how it works, but Scheme actually has a built-in
version of it that is more flexible and more powerful, capable
of applying the given procedure to the elements of an
arbitrary number of lists its well worth investigating.
The second of these is a bit more complicated its called
accumulate. It compresses an entire list to a single element.
To see why this is useful, consider the case of summing all the
values in the list. You could do this with an independent
procedure, sum:
(define (sum list)
(cond ((null? list) 0)
(else (+ (car list)
(sum (cdr list))))))
But then the same problem weve seen throughout this

article rears its head again: what happens if you want to find
the product of all the elements in the list, for instance? Once
again, youd have to write a new procedure, product.
While its a touch more tricky, with more variables to keep
track of, the same technique of looking for differences
between similar functions, such as sum and product, makes
creating a higher-order procedure simple.
On this occasion, the differences lie in the base case, and
the procedure thats applied to the current element of the list
and the rest of it. In sum, for instance, the base case would be
0, as above, but in product it would be 1 (think about it, if you
kept the base case as 0, youd get 0 as your answer every
time anything multiplied by 0 is 0!).
Thinking it through, youll find yourself with a procedure
like the one below:
(define (accumulate proc base list)
(cond ((null? list) base)
(else (proc (car list)
(accumulate (cdr list) proc base)))))
After looking at all these procedures, you might be feeling
a bit lost they all look quite similar, and none of them seem
incredibly powerful or useful
alone, even if the idea of
writing one procedure to do
the work of many seems like
a clever idea.
Lets take a look at an
example to see how these
higher order procedures can let us express complex ideas in
clear and simple ways.
Our example task will be to find the sum of all the squares
of numbers which are multiples of three between any two
numbers. If we were to write a procedure to accomplish this
without the help of any higher-order procedures, it might look
something like this, without all the helper procedures
definitions:
(define (sum-cubes a b)
(cond ((> a b) 0)
((= (remainder a 3) 0)
(+ (cube a)
(sum-cubes (+ a 1) b)))
(else (sum-cubes (+ a 1) b))))
After everything else weve seen over the last three
instalments, with a bit of careful thought you can probably get

After looking at all


these procedures, you
might be feeling lost.

Functional programming in Python


As we hope youve seen, Scheme is a powerful
little language with a simple syntax that makes
it easy to pick up and do very clever things with.
That said, it doesnt have the same level of
adoption and library support that many other
popular languages enjoy. If the functional
techniques demonstrated by Scheme appeal to
you, however, then you may want to investigate
applying some of them in a mainstream
language, such as Python or Perl.
In Python, for example, there are many built-in
functions provided explicitly for functional
programming, including map, filter, reduce (aka
accumulate) and enumerate.
There are also many useful data structures
and generators, including generators, list
comprehensions and iterators. In fact, things like

82

iterators are deeply integrated in to the language,


as common methods such as file.readlines()
are iterators.
If youre interested in delving deeper into
functional programming in Python, wed
recommend the functional programming
HOWTO on the Python website is a good starting
point (http://bit.ly/TRyWMF).
For getting a greater understanding of Perl,
another very popular language, theres an entire
book dedicated to the subject, called Higher
Order Perl (http://hop.perl.plover.com/) and
its an excellent read.
If this chapter has piqued your interest in
functional programming, a book like Higher
Order Perl can show how these techniques
work in a more familiar language.

More languages
car

cdr

car

cdr

car

cdr

list

map cube

car

cdr

car

cdr

car

The map
procedure lets
you transform
the elements
of one list in to
another, making
use of a helper
procedure to
do the work of
transformation
to the value
of the current
element. The
final cdr has a
strike through it,
representing (),
the empty list.

cdr

list

1
your head around this, but its not the most readable piece of
code youll ever see. One of the main reasons for this, despite
the use of clear procedure names such as cube, is that the
overall logic is muddled and obscured by details not relevant
to this specific task.
For instance, the first, fourth and fifth lines of this
procedure are all involved in walking the procedure through
the integers between a and b. Wouldnt it be simpler if this
part of the process could be kept to one small area of code,
rather than spread through the rest of it? Whats more, if you
came to this code without any guidance, you might find
yourself tripped up by the remainder statement or by the two
recursive calls its not exactly clear what the purpose of any
of these lines is.
You may have noticed another significant problem with
this version of the procedure: its not very re-usable, so if you
ever wanted to complete a similar task, such as finding the
sum of the cubes of multiples of four, youd have to write an
entirely new procedure, all the while watching for typos and
other mistakes. And if there were any mistakes, because all
the parts of the code are mixed together, it would be very
difficult to debug.
Fortunately, it doesnt have to be this way: with the help
of the higher-order procedures discussed in the rest of this
article, you can come up with a much better solution.
(define (sum-cubes-better a b)
(accumulate +
0
(map cube
(filter multiple-three
(enumerate a b)))))
The only thing that weve not seen so far in this version is
the enumerate procedure at the very end. This simply

27

creates a list containing all the integers between a and b.


Its not as flexible as the other procedures, but the idea of a
procedure that enumerates things is very useful.
At first glance, and while youre still unfamiliar with exactly
what these different procedures do, this may not seem any
easier to read. If you do find that to be the case, first, try to
think about what each of these procedures does and not
how they do it if you try to do the latter, youll get caught
in a maze of recursive calls thats almost impossible to
parse manually.
Then, start from the bottom of the procedure and
work backwards:
The enumerate procedure creates a list of all the integers
were interested in.
Then the filter procedure creates a new list, based on the
one generated by enumerate, that only contains multiples
of three.
The map procedure then transforms this list, cubing every
element in it.
Finally, the accumulate procedure sums all the elements
contained.
Seen like this, each step of the process is expressed
distinctly far more so than the previous example and
once youre familiar with the job each of the higher-order
procedures does, its much easier to read, too, since all the
inner-workings of recursion and conditional testing have been
hidden away.
Its also an easier procedure to write, since each step of
the process is contained in an independent procedure, each
part can be tested and debugged independently. Finally, all
of these procedures can be re-used elsewhere, reducing
mistakes from repetition and making programming more
fun and less boring. Q
83

.
.

.
:

_
,

t
a

PHP

PHP
P

HP is one of the most popular programming


languages around, and is the secret sauce
that powers many millions of websites. Follow
our series of tutorials and find out how you can get
in on the act.
PHP: Write your first script....................................................................... 86
PHP: Build an online calendar ..............................................................90
PHP: Extend your calendar .....................................................................94
PHP: Get started with MySQL ............................................................. 98
PHP: Do more with MySQL .................................................................. 102

85

PHP

PHP: Write
your first script
Using this open source technology and your Linux platform, Mike Mackay
explores how to dive into the popular world of PHP programming.

HP dates back to 1995 when its creator, Greenlandic


programmer Rasmus Lerdorf, began work on a
scripting toolset that was originally known as Personal
Home Page (PHP). The sudden demand for the toolset
spurred Rasmus to further develop the language and, in 1997,
version 2.0 was released with a number of enhancements
and improvements from programmers worldwide.
The version 2.0 release was hugely popular and spurred a
team of core developers to join Rasmus in developing the
language even further.
Version 3.0, released
in 1997, saw a rewrite and
release of the parsing
engine, and in 1998 it was
estimated that more than
50,000 users were using
PHP on their web pages. This version also saw the name
change that we know now PHP: Hypertext Preprocessor.
Fast-forward a year to 1999 and with an estimated base
of more than one million users, PHP was rapidly becoming
one of the most popular languages in the world. Development
continued at a frenzied pace, with hundreds of functions
being added. It was at this time that two core developers,
Zeev Suraski and Andi Gutmans, decided to rethink the
way that PHP operated and so the parser was once again
rewritten and released in version 4.0, dubbed the Zend
scripting engine.

A few months after version 4.0 was released, Netcraft


estimated that PHP had been installed on more than 3.6
million domains. Version 4.0 represented a massive leap
forward at an enterprise and programming level, but the
language still had some drawbacks, mainly due to its infancy.
Version 5.0 was released to the world in 2004, and with it
came a myriad of improvements taking the language to a
maturity and installation peak; its thought that PHP is
running on more than 20 million domains and its reported
that its the most popular
Apache module, available
on almost 54% of all
Apache installations. At the
time of writing, version 6.0
is nearing public release
and is intended to further
improve the functionality and maturity of the language.
With websites such as Wikipedia, Facebook,
Flickr and Digg all making use of it, its no wonder
that PHP has become so widely adopted amongst web
developers. Lets see just how easy it is to get started with
this dynamic, server-side scripting language.

Its thought that PHP


is running on more than
20 million domains.

Setup and installation


Most of the latest distributions of Linux come with PHP, so
this tutorial already assumes that its been installed and set
up on your Linux platform and is being parsed correctly
through your web server of choice.
Although you can run PHP scripts via the command line,
well be using the browser (and therefore a web server) for
this tutorial. You can follow along by uploading your PHP files
to a web server on the internet (if you have one). Were using
a default installation of Apache2 on our local Linux machine,
though, because we find it easier and quicker to write and
test PHP on a local machine instead of having to upload files
via FTP each time.
If you require installation and/or setup instructions or
guides for your local machine, we recommend reading the
Installation on Unix Systems manual on the official PHP site,
available at http://php.net/manual/en/install.unix.php.
Alternatively, there are hundreds of installation guides
written for pretty much every flavour of Linux. Simply search
Google for your distribution if the official guide doesnt tick
all the boxes.

Getting started
The PHP website might not be the most eye-catching in the world, but itll be a
site you return to time and time again.

86

Now we get to the fun part working with, and writing, our
first PHP script. Historically, wed write a basic Hello World

PHP
script, but that can be, well, a little boring. Instead well write
some dynamic text output using PHPs date() function.
Before we get into the real nitty gritty of the language, we
must first understand how the interpreter reads in our PHP
code and generates the necessary output.

Easy embedding
One of the advantages of PHP is that you can embed your
code directly into your static HTML pages of which the
entire page is sent directly to the interpreter. Its extremely
important to note that all of your PHP files must end in the
.php extension. Embedding your PHP code in HTML or HTM
files means they wont be run through the interpreter and
wont get executed instead, youll just see the plain text
code in your pages.
For your PHP code to be extracted from the rest of your
content, it must be enclosed within delimiters. PHP will
execute any code found within these delimiters anything
else is simply ignored by the interpreter. The default, and
most common, delimiters that we use are <?php to indicate
the start of our code and ?> to signal the end.
There are a few other options available for delimiters
such as short tags, but some of these can have implications
with XML and XHTML languages. For the purpose of this
tutorial, were going to stick with the recommended default.
With this in mind, open up your favourite text editor and write
the following:
<?php echo Welcome to the world of PHP; ?>
Save this file as welcome.php in your web servers root
folder that is, the folder that your web server reads when
you request the site in your browser. Once the file has been
saved, open your web browser and point it to the file on your
local web server, for me this is http://127.0.0.1/welcome.
php this URL may differ based on your Linux setup/
configuration. When run, you should simply see Welcome to
the world of PHP displayed in your browser.
If, instead, you see the raw PHP code, this means that
you havent set up your web server to interpret your PHP files
correctly. Go back, or find the relevant installation guide, and
make sure youve followed all the steps outlined. If you
successfully see the text without any PHP code, then were
ready to move on.

Syntax, data types and functions


Youll notice that we ended our code with a semi-colon
before the closing delimiter. PHP uses a semi-colon to
indicate the end of a line of code, or statement without
it PHP wouldnt know when to stop evaluating our code,
in turn breaking the script. PHP is very forgiving when it
comes to formatting it will ignore any white space and
new lines (except when theyre contained inside string
quotes) allowing you to be pretty free over how you
format (indent etc) your code.
PHP supports many data types giving us enormous
flexibility when writing our programs. To quote Wikipedia:
In computing, a data type is a classification identifying
one of various types of data, such as floating-point, integer
or Boolean. PHP supports all of these data types and more,
including strings and compound data types such as
Arrays and Objects.
In the standard distribution of PHP, there are more than
1,000 functions available to use. These range from simple
things such as date & time functions (which were using in
this tutorial) to more advanced concepts such as LDAP and

Why choose PHP?


PHP has long been a popular choice for
web developers. Not only because it
has a massive userbase, and therefore,
support for developing (and
debugging) your code is widely and
freely available, but web servers or
hosts with PHP installed and ready to
use are ten-a-penny these days.
The relative ease of the language
has been one of the reasons for its
massive uptake. PHP can also be very
forgiving when it comes to
programming. For instance, you dont
have to declare your variables (or their

type) before using or instantiating


them and there are a couple of other
ways in which you can bypass
traditional programming methods.
The easiest way to understand why
PHP is so popular is to simply dive in
and get started.
If youve already worked with
another programming language youll
soon see just how easy PHP is to get to
grips with. If this is your very first
language, youll be pleasantly
surprised at just how quickly you can
get usable results.

MySQL database functionality. For anything thats missing


(or something you want to improve) you can simply roll your
own function to give additional support. We wont go into too
much detail about functions here as well be focusing on the
basics and keeping things simple.

Flexible scripts
In our first example, we simply instructed PHP to output a
specific string of text by using the echo function. The value
of our string can come from many places a database,
the output of a function, a file on the server or even from
user interaction on our site. By hard-coding this value were
pretty much stuck on what the string value can be. Instead
well now assign the value to a variable, so open up a new file
in your text editor, enter the code below and save it as
welcome-var.php:
<?php
$display_text = Welcome to world of PHP;
echo $display_text;
?>
When you run this script you shouldnt see any difference
in output from the first file, but first youll notice a new line
starting with $display_text. This is a variable. Variables in
PHP can hold a single piece of data at any one time. This
data can change and can be of any type at any point.

Quick
tip
Use a text editor
that has syntax
highlighting for
PHP itll help you
quickly identify
your code and
specific parts, or
functions, inside
it. There are free
programs available
too so have a look
around and go with
the one you prefer
the look of.

Believe it or not we used to write all our PHP code in Notepad, then we realised
how much nicer life is in colour. Text editors are heaven to coding eyes.

87

PHP

You can clearly


see the rapid
growth in PHP
usage and the
number of
installations is
still on the rise.

Variables begin with a $ followed by the variable name,


in this case, display_text. A variable can only begin with a
letter or an underscore but the rest of the name can consist
of any letters, underscores or numbers. An important note
to be aware of is that variables are case-sensitive meaning
that $Display_text is different (and a separate variable)
from $display_text.
In our script above, were declaring and assigning the
value simultaneously. Some
languages do not allow this,
however PHP is very flexible
when it comes to
programming. Value
assignment is simply the
process of copying a value
to the assigned variable, such as:
$display_text = Welcome to the world of PHP;
$my_age = 29;
Typically, you would declare your variable before assigning
a value but given the nature and context of this tutorial its
acceptable to do the above.
The next line in our script simply changes from echoing
the hard-coded string, to echoing the value of the variable
(that we assigned in the previous line). Although they
essentially do the same job of outputting the string value, this
method allows us to be truly flexible with the output we see in
the browser.

browser, as it would from a JavaScript program (or similar


browser-based script).
The first line of our code is almost identical to the previous
script; all weve changed here is the copy to reflect the more
dynamic nature of the output. The second line is where the
magic (also known as concatenation) happens. Essentially,
and in non-computer talk, all the second line is saying is
print the display text, followed by the date command, then
some more text, and finally append the date.
At this point you might be wondering how weve
specified the date that gets printed. That is all to do with the
parameters that we supply to the date function. PHPs date
function accepts input parameters in order to represent
exactly what value/string is returned from the function.
It currently accepts 35+ date parameters, each one
representing a unique piece of date and/or time. In our
example weve split the date and time into two different
date() calls it would be perfectly acceptable to merge
them into one.
The more eagle-eyed amongst you will notice that Ive
hard-coded the text in the middle of the date functions. Again,
this could be assigned to a variable instead (as we did for the
initial text), for greater flexibility, in which case the line might
look something like:
echo $display_text . date(l,
jS F) . $secondary_text .
date(H:iA);
For a full list of date input
parameters, check out the
date() function page of the
official PHP docs: http://
php.net/manual/en/function.date.php.

To concatenate means
to combine two or more
things together.

Quick
tip
Where possible,
make use of
indenting as it will
make things flow
better and help you
read the code on
the page. Some text
editors auto-indent
for you but most
developers use
either a single tab
or two-four spaces.

88

Time gentlemen, please


So far, so good... but static text is pretty boring. Lets do
something about that and add the date and time into the
mix. For this, were going to make use of two things the
first is the date() function and the second is the
concatenate operator. As weve mentioned earlier in this o
concatenate means to combine two or more things together
to form one single entity; in this script were going to combine
a welcome message with the date and time.
To do this, open up a new file in your text editor and enter
the following code. Once youve done that save it as
welcome-date.php:
<?php
$display_text = Welcome to the world of PHP. It is ;
echo $display_text . date(l, jS F) . , and the time is .
date(H:iA);
?>
When run, you should see the line of text with the
current date and time embedded. Its important to note that
the date and time comes from the server and NOT your

Put it all together


We mentioned earlier that one of PHPs great selling points is
the ability to embed snippets of code into static HTML
documents with ease. This becomes apparent when we want
to create dynamic sections inside an otherwise static page,
for example, our date script above. We could easily drop this
code into our existing template (if we have one). Lets see
how that might look:

Essential resources
There are plenty of books
dedicated to learning PHP
and its often hard to tell
which one(s) to buy to steer
you in the right direction.
While I cant help you chose
the book that suits you the
most, I can point you in the
general direction of great
companion websites:
http://php.net The
ultimate resource for
anything PHP related.
http://php.net/manual/
en/intro-whatcando.php
A taste of things that can be
done with PHP.
http://phpsec.org/ A great
resource for any security
related with PHP.

Despite owning a few


PHP books, I often find
myself heading over to the
official PHP documentation
online its often quicker
than picking up a book and
looking for the right page.
For some inspiration, check
out the second link let
your mind wander and
think of something youd
love to build!
The last link is equally
important if you plan on
installing PHP on a publicfacing web server. Install the
script, as it gives you some
recommended base
settings, then read up on
general security practices.

PHP
PHP 6 whats it all about?
So what can we expect in version 6?
One of the core updates will be better
support for Unicode strings, allowing
for a much broader set of available
characters to cover greater international
support. For the more advanced
developers, its bringing in better
support for Namespaces. With the
massive take up of Web 2.0 functionality,
version 6 is also giving default support

for the SOAP protocol and the library


of XML features (for both reading and
writing) are being overhauled.
A handful of features are being
dropped from the core language, these
include magic_quotes(), register_
globals(), register_long_arrays() and
safe_mode(). The main reason is
security related some functions
allowed for potential security holes to be

<html>
...
<body>
<div id=welcome-text>
<?php
$display_text = Welcome to the world of PHP. It is ;
echo $display_text . date(l, jS F) . , and the time is .
date(H:iA);
?>
</div>
...
</body>
</html>
In this case, Ive pasted
some trimmed and
rudimentary HTML code
and you can clearly see
where Ive embedded the PHP code to output my dynamic
text on the page. The PHP code block can sit anywhere on the
page and any amount of times inside a page dont be
concerned if you have five, 10 or sometimes more code
blocks within your HTML.
Something thats really handy is that internally, PHP will
communicate between each code block on your page. For
example, if you set the value of a variable in the first code
block at the top of your page, it will be available to the last
code block at the bottom of your page. This can work
wonders when youre altering the display of the content
based on the value of a variable elsewhere in the page a
really common use of this is a Login/Logout system where a
user is presented with the Login or Logout options based on
their logged in state.

exposed while others lead to poor


programming practice.
You can already download a developer
version of PHP 6 to try out, but at the
time of writing theres no official public
release date. Once its been released,
you can expect a good wait before its
available on public servers as most
companies let it have a good run to iron
out any bugs before installing it.

A note of caution the filename specified in the include()


function is relative to the script thats calling it, in other words,
if your main HTML is located in the root folder and your
welcome-var.php file is in a folder called scripts, your PHP
code would look like this instead:
<?php include(scripts/welcome-var.php); ?>
While its not quite rocket science, weve actually
covered some pretty decent fundamentals about PHP in
this tutorial. We have learned a little bit about how PHP got
started and just how much
its grown. Youve been
introduced to some basic,
but core, programming
skills and weve covered the
basic syntax. By now youre
hopefully beginning to
understand a little of PHPs
potential, and be ready to tackle the next part of this chapter.

By now youre hopefully


beginning to understand a
little of PHPs potential.
Over to you...

Weve written our first script and now have first-hand


experience of how easy it is to make use of PHP on a website.
From here, why not play around further with the date
example, try changing the input parameters to something
different or even try embedding this code into an existing
site. Take a few minutes to have a look through the official
PHP documentation online, www.php.net/manual/en
and see what other functions PHP has to offer youll be
surprised how much you can achieve with just the standard
installation. Its worth bookmarking that URL as the more
you use PHP the more youll use the website as a reference
manual and a great one at that. Q

Multiple updates
Dont forget when doing this, that you must save your files
with the .php extension otherwise your PHP code will fail to be
executed and youll be left with plain text code on your page.
If you find that youre adding the same block of code to
multiple pages and you need to change it, going through each
page and updating your code can be a treacherous job.
Thankfully, PHP has got you covered. To help with this
process, we can use the include() function. This allows us to
write our PHP to a file (just as we have in our welcome-var.
php file). Instead of embedding the full code in our HTML
each time, we can instead do:
<?php include(welcome-var.php); ?>
When the page is run, PHP will pick up the include()
request and will read in and execute the code on that page,
simply embedding the output it works as if the code was
directly on the page.

Facebook loves
PHP so much, it
even wrote its
own Facebook
Optimised
version called
HipHop.

89

PHP

Build an
online calendar
Continuing from his introductory tutorial, Mike Mackay explores arrays and
functions to build a basic events calendar for our website.

n the last tutorial, we covered the basics of PHP, including


how the language was created and subsequently grew. We
were also introduced to various parts of the language,
such as variables, strings, integers and PHPs internal date()
function. In this tutorial, well expand on those parts, but well
also introduce the concept of arrays and functions to make a
fully working calendar.
Well assume that you have your Linux platform
configured and serving PHP pages through your web browser,
as outlined in the previous
tutorial. If not, please refer to
the previous article or the
section titled Installing PHP
on your Linux platform.
So what exactly is an
array? Well, to help us define
this, lets go back to the last tutorial, where we made use of a
variable ($display_text) to hold a simple string message. The
problem with variables is that they can hold only one piece of
information at any one time. Wouldnt it be great if we could
store multiple items inside a variable? Well, this is where
arrays come in.

items inside it as necessary (the only limitation on the size of


the array is based on how much memory PHP is allocated).
You can step through all the items inside an array (known as
traversing), and PHP comes with more than 70 functions,
allowing you to perform certain actions on your array, such as
searching inside, counting the number of items inside it,
removing duplicate items, and even reversing the order.
Theres almost nothing to it when creating an array either:
$data = array();
We have now created a new,
empty array called $data.Arrays
are structured using a key index
and value data architecture. By
default, when you add an item to
an empty array, that items
position in the array is 0. If you
add another item, that items position becomes 1 in the array.
You can also create your array with pre-populated data (if
you already know whats going to be in it). To do this, we just
create the array as before, but this time we supply the data in
a comma-separated list:
$data = array(Red,
Orange, Yellow, Green,
Blue);
This is where the key system comes in to place. The way
that PHP interprets this array will be as follows:
0 = Red, 1 = Orange, 2 = Yellow, 3 = Green , 4 = Blue
As you can see, each key is associated to the value in the
array. The most important part to remember is that arrays
always start at key 0 and not key 1 as many might assume;
its easy at first to forget.

An array allows you


to hold as many items
inside it as necessary.

Introducing arrays
The best way to think of an array is a special variable that
holds other variables. An array allows you to hold as many

Associative arrays

Despite the many books that have been written on the language, the PHP
website is still most up-to-date and comprehensive reference manual available.

90

Arrays also have the flexibility of allowing us to specify our


own keys (known as associative arrays). This helps a lot when
you want to store a value against a specific key instead of
having to rely on automatic indexes.
Lets say we wanted to store data about a person in an
array; to do that we can do the following:
$person = array(name => Mike Mackay, location =>
Essex, age => 29);
By using the associate instruction (=>), were telling PHP
that we want to create a key called name and store the value
Mike Mackay against it. You can store any data type in an
array even other arrays. The way that PHP interprets our
person array is how we would expect:
name = Mike Mackay, location = Essex, age = 29
When you want to use an array item, all you have to do is

PHP
call the array and key you want:
echo $data[0];
This echos out the word Red to the screen. To echo out
the word Orange, you would simply change the key from 0 to
1. On our person array its just as simple to echo the name
to the screen, all I need to do is:
echo $person[name];

Add data to your array


If we have our existing array, but want to add more data to it,
how do we accomplish that? There are a few ways in which
we can do this and often it depends largely on whether your
array has custom key indexes or not; but to add another item
to our $data array we can simply do:
$data[] = Indigo;
By supplying square brackets next to the array name, PHP
recognises this action as wanting to push a value in to the
array. PHP has a built-in function that does the same trick:
array_push($data, Indigo);
This function takes a minimum of two arguments. The first
is the array you want to push the data to, then any items
afterwards are pushed to the end of the array. This
conveniently allows you to push multiple items in to the array
at once, for example:
array_push($data, Indigo, Violet);
If you need only to push one item, the first method (using
the square brackets) is recommended, as it has no system
overheads of calling a function. If we wanted to add another
item to our $person array, we need only to specify the key we
wish to use, along with the required value:
$person[profession] = Developer;

Arrays within arrays


As I mentioned before, an array can hold any type of item
inside this includes other arrays. The practice of multiple
arrays is quite common, and youll find it extremely useful.
Again, there are a couple of different ways of achieving this,
and the one you use will be based on your array structure.
For this example, lets say we have an array of McLaren F1
racing drivers. Open up a text editor, enter the following PHP
code and save it as drivers.php in your web root:
<?php
$drivers[] = array(
name => Jenson Button,
nationality => British,
championships => 1,
);
$drivers[] = array(
name => Lewis Hamilton,
nationality => British,
championships => 1,
);
?>
Were using the square brackets to instruct PHP that we
wish to push the driver data to the end of the array. Each item
inside the master array() must be separated by a comma.
Our $drivers array now contains two items these items are
arrays of data containing driver information that we wish to
display. In PHPs eyes, the data for Jenson Button is located in
$drivers[0], while the data for Lewis Hamilton is located in
$drivers[1]. We could have created custom keys instead of
using 0 and 1, but its not strictly worthwhile for this example.
We could simply display the data using echo and then
specifying the array index key (such as $drivers[0]), but how

Installing PHP on Linux


Most of the latest distributions of Linux
come with PHP. Although you can run
PHP scripts from the command line,
well be using the web browser (and
therefore web server) for this tutorial.
You can follow this tutorial by uploading
your PHP files to a web server on the
internet (if you have one).
For me, though, Im using a default
installation of Apache 2 on my local
Linux machine. I find it easier and
quicker to write and test PHP on my
local machine instead of having to
upload files via FTP each time.

If you require installation and/or setup


instructions or guides for your local
machine, I recommend reading through
the Installation on Unix systems
manual found on the official PHP site,
available at the following address:
http://php.net/manual/en/install.
unix.php.
Alternatively, there are hundreds of
installation guides written for pretty
much each flavour of Linux. Simply
search Google for your distribution if
the official PHP guide doesnt tick all
of your boxes.

would we display each item when we dont know how long the
array is? Thankfully for us, theres a simple control function
called foreach() that allows us to do this.
So you might be asking why we wouldnt know the length
of an array? Well, we know what driver data is contained in
each item (name, nationality and championships), but a
database query (or similar function) might return one driver,
or it might return five drivers. We could get the the total
number of items in the array by using a PHP function, but
using foreach() is simpler and lets us write shorter code. The
foreach() control gives us an easy way to iterate over an
array. Using drivers.php that weve just created, copy the
code just below the PHP block that contains the drivers array:
<?php
foreach($drivers as $driver) {
echo Name: . $driver[name]. <br />;
echo Nationality: . $driver[nationality]. <br />;
echo World Championships: . $driver[championships].
<br /><br />;
}
?>
When you view the file in your browser, you should see a
basic list of drivers on your screen:
Name: Jenson Button
Nationality: British
World Championships: 1
Name: Lewis Hamilton
Nationality: British
World Championships: 1

Quick
tip
Use a text editor
that has syntax
highlighting for
PHP itll help you
quickly identify
your code and
specific parts, or
functions, inside
it. There are free
programs available
too, so have a look
around and go with
the one you prefer
the look of.

Using a code editor that has built-in syntax checker (such as Eclipse, the
winner of our IDEs Roundup in LXF152) can save a lot of time and frustration!

91

PHP

The 2012 F1
calendar well be
recreating with
our code.

Quick
tip
Where possible,
make use of
indenting, as it will
make things flow
better and will help
you read the code
on the page. Some
text editors autoindent for you, but
most developers
use either a single
tab or 24 spaces.
.

92

On each loop of $drivers, the value of the current item is


assigned to $driver (the first loop being Jenson Button), and
the internal array pointer is moved on by one; so on the next
loop youll get the next item from the array (Lewis Hamilton).
This continues throughout each item in the array until the
end is met.
The foreach() function requires two parameters the first
is the array we want to loop through. We then use a PHP
keyword as, then we enter a temporary variable name that
we want to assign the loop item to (this variable is only
available inside the loop). In literal terms, were saying: loop
through each item in the $drivers array and store each driver
item to a temporary array called $driver.
Hopefully, in the example youll recognise a few parts from
the last tutorial; were echoing out a string concatenated by a
variable this being each bit of information about the driver
in the array. We then concatenate another string which is
HTML, allowing us to format the output in a basic manner.
On the last array item, $driver[championships], we echo
out two line breaks; this just gives us a bit of separation
between each driver.

Lets talk about functions


There are two types of functions in PHP:
1 Built-in PHP functions, such as date() and array_push().
2 User-defined functions.
Well be focusing on the second type for now (weve
already covered a few built-in PHP functions).
A user-defined function is a special block of PHP code that
we write that can perform custom operations any time its
called. Some functions are written to manipulate data and
then send that new value back, while others perform one-way
operations, such as writing data to a file or inserting the data
in to a database. To create a function, all we need to do is
write the word function followed by the name we wish to call
our function (Its important to note that function names can
only start with letters or underscores), followed by
parenthesis and then a pair of curly braces:
function shout() {
}
We can also supply data, known as arguments, to our
function to be used inside it. When calling this function and
sending information to it, the function assigns this data to the
internal variable called $text, where it can manipulate it, or
do as required. This data is then held locally to the function
and does not overwrite any variables outside of it:

function shout($text) {
}
If we want the newly-modified data back, we can use
return to send it back:
function shout($text) {
return $text;
}
To call a function, all you need to do is write the function
name followed by parenthesis either with or without any
parameters (based on the functions requirements):
shout();
We have our basic function, but all it does is send back
exactly what we sent to it pretty pointless Im sure youll
agree. Lets make our function do something a bit more
interesting. Create a new PHP file, copy the following code in
to it and save it as function.php:
<?php
echo shout(Hello World);
function shout($text) {
return strtoupper($text);
}
?>
If you run that in your browser, you should see HELLO
WORLD being displayed. Whats happening is were sending
a string straight to the function, where we echo the returned
value out.
The built-in PHP function strtoupper() has a simple
purpose take the string input and convert it to uppercase.
We could modify the function to perform the echo inside
instead of using return, but our original method gives us
greater flexibility for multi-purpose use (we may not always
want to echo a value out immediately).
We could write any kind of code inside our function, and
were not limited to doing string transformations.

If() and else()


Youll notice something else with our code well be
performing a conditional check using if() and else(). If/else
provides a simple way of evaluating which code to run, based
on the outcome of a particular check, or condition. If() is only
executed when the condition inside the parenthesis equates

Essential PHP resources


There are plenty of books
dedicated to learning PHP,
and its often hard to tell
which one(s) to buy to steer
you in the right direction.
While I cant help you
choose the book that suits
you most, I can point you in
the general direction of
great companion websites:
http://php.net The
ultimate resource for
anything PHP related.
http://php.net/manual/
en/intro-whatcando.php
A taste of things that can be
done with PHP.
http://phpsec.org
A fantastic resource for any
security related to PHP.

Despite owning a few PHP


books, I often find myself
heading over to the official
PHP documentation online
its often quicker than
picking up a book and
looking for the right page.
For some inspiration,
check out the second link
let your mind wander and
think of something that you
would love to build!
The last link is equally
important if you plan on
installing PHP on a publicfacing web server. Install
their script, as it gives you
some recommended base
settings to use, then read up
general security practices.

PHP
From the previous tutorial
In case you missed it, here are some of
the basics from what we covered in the
last tutorial:
In the standard distribution of PHP,
there are more than 1,000 functions
available to use. These range from
simple things, such as date and time
functions, through to more advanced
concepts, such as LDAP and MySQL
database functionality.

All PHP code (usually) starts with


<?php and ends with ?> delimiters.
PHP supports many data types, such as
strings, booleans, integers, arrays,
objects and more.
Variables in PHP can hold a single
piece of data at any one time. Variables
begin with $ followed by the variable
name and can only begin with a letter or
an underscore.

(or returns) TRUE, otherwise else() is called just the code


between the curly braces is executed, but only one of them
will ever be run:
if(condition is true) {
// Run the code in this block
}
else {
// Run the code in this block instead
}
In literal terms, all were going to say is: If todays date is
found as an index in the array, then echo out the race data,
otherwise echo the no races message instead.
Lets now create a basic
events calendar using our arrays
and function knowledge.

Put it all together

The rest of the name can consist of


any letters, underscores or numbers, but
is also case-sensitive.
The built-in date() function accepts
more than 35 input parameters to
represent exactly what is returned, and
runs on the server, not the browser time.
PHP can be run as a standalone script,
or as part of an existing template,
using include().

Write the following just below the end of the array and
before the functions closing curly brace:
$date = date(m/d/Y);
if(array_key_exists($date, $race_dates)) {
echo Todays race is the . $race_dates[$date][title] . in
. $race_dates[$date][location] . .;
}
else {
echo There is no race today.;
}
We use the value of $date inside the array_key_exists()
function this function accepts
two parameters: the first is the
key youre looking for (in our
case its the date) and the
second is the array you wish to
check against (our $race_dates
array).
The array_key_exists() function returns a boolean of
TRUE if the key exists, or FALSE if it doesnt.
If the race is found, were going to echo out the race
details. We can retrieve this data because we know the key
exists, therefore we use the $date variable as a shortcut to
retrieve the information. Essentially, its the same as writing:
echo $race_dates[13/5/2012]
title];
All thats left for us to do is to call the script. We do this in
exactly the same way as our earlier function script and put
the function name (with parenthesis) at the top of our script:
is_race_day();
We can then run this script in our browsers by going to
calendar.php, or we can use includes() to embed it on an
existing website. If a race exists on the day the script is run,
well be presented with the details. To test this, simply
hardcode a date in the $date variable:
$date = 13/5/2012;

Get todays date and


check whether it has a
Grand Prix or not.

We dont want anything over the


top, so for now well create an F1
2012 race calendar. Well send todays date to a function and
return a current race if one is happening on that day,
otherwise well echo out a generic message.
Start off by creating a new PHP file called calendar.php,
then create a function that initially holds an array of the race
dates:
<?php
function is_race_day() {
$race_dates = array(
18/3/2012 => array(title
=> Australian Grand Prix,
location => Melbourne),
25/3/2012 => array(title
=> Malaysia Grand Prix,
location => Kuala
Lumpur),
15/4/2012 => array(title => Chinese Grand Prix,
location => Shanghai),
22/4/2012 => array(title => Bahrain Grand Prix,
location => Sakhir),
13/5/2012 => array(title => Spanish Grand Prix,
location => Catalunya),
);
}
?>
Ive included just the first five dates of the season for now,
but feel free to add more. Next, lets update the function. Get
todays date (see the first tutorial on the date() function) and
check whether it has a Grand Prix or not; for this well use a
built-in function, array_key_exists().

And with that, were done!


This might all feel like quite a lot to take in at once if youre
new to arrays and functions, but hopefully youll see that
what weve learnt here is extremely important and intrinsic
to programming.
Try modifying the calendar; you could be more specific
with your dates and have something on every day of a month
if you wish. As an exercise, look at the date() function and
alter the calendar to echo messages based on the hour of the
day. Remember, you wont need an entry for every day only
every hour. Q
93

PHP

Extend
your calendar
Following his second tutorial, Mike Mackay explains how to add
the functionality to dynamically select races and view details.

n the last tutorial, we covered the basics of using


arrays, control statements (if/else) and functions
both custom and native to PHP. We put this together to
create an F1 season calendar. In this tutorial, were altering
the functionality of the calendar by allowing the user to select
a specific race from an HTML drop-down list.
Well assume that you have your Linux platform
configured and serving PHP pages through your web browser,
as outlined in the first tutorial. If not, please refer to the
previous article or the section titled Installing PHP on Linux.

Forms and security


Forms are a common occurrence in every developers career,
yet some good practices are often overlooked especially
when youre starting out in the world of PHP. On any kind of
user input, we should always perform a certain amount of
validation and filtering because we cant always be sure
where the data came from and what it contains. We want to
verify that the user has submitted only the data were looking
for and nothing else.
Validation is the process of making sure the input we
receive is the input were expecting (correct format, type etc).
Filtering, or sanitising, means to clean the input of undesirable
characters so that its safe to use. Without doing this, we open
our PHP scripts up to possible code injection attacks. Were
going to use PHPs built-in validation and filtering functions so

that we can work safely when dealing with user input. Start
by creating a blank PHP file (mines called races.php), and
at the top make an empty PHP code block with an empty
variable, $race_data, in it (well come to using this variable a
little later on):
<?php
$race_data = FALSE;
?>
Before we begin building our full script, we need a few
things in place. First, create an array of race dates inside the
code block after the $race_data line youve just created
(refer back to the previous article if youve forgotten all about
arrays). You dont have to use F1 races youre free to use
any kind of dates you want. For F1 race dates, please refer to
www.formula1.com/races/calendar.html.
My array looks like this (for display purposes, Ive
truncated most of my array data):
$races = array(
Australia => array(title => Australian Grand Prix,
location => Melbourne, date => 18/3/2012),
Malaysia => array(title => Malaysia Grand Prix, location
=> Kuala Lumpur, date => 25/3/2012),
China => array(title => Chinese Grand Prix, location =>
Shanghai, date => 15/4/2012),
Bahrain => array(title => Bahrain Grand Prix, location
=> Sakhir, date => 22/4/2012),
Spain => array(title => Spanish Grand Prix, location =>
Catalunya, date => 13/5/2012),
...
You may notice that the array differs from last time Im now
using a location instead of a date as the array key. Were going
to allow the user to specify the location of the race, so that
they can view more details for this we need to change the
array key.

Add the HTML

The official Formula One website (www.formula1.com) is the best resource,


with all the latest news, reviews and race dates available.

94

The next thing we need is to add the HTML code and form
that lets the user choose the race. Were going to assume
that you have some HTML knowledge, as covering HTML is
outside the scope for this article. Create a basic page
structure (head, body etc) directly below (and outside) the
PHP code block.
The form is basic and consists of one drop-down menu
and a Submit button. To populate the locations in to the dropdown menu dynamically, were going to use PHPs foreach()
function. We covered this function previously, but to

PHP
summarise it allows us to simply loop, or iterate, through
an array and access each array items data.
In your HTML body area, add the following form:
<form method=post>
<fieldset>
<label for=location>Choose a race:</label>
<select name=location>
<?php foreach($races as $location => $race): ?>
<option value=<?php echo $location; ?>><?php echo
$location; ?></option>
<?php endforeach; ?>
</select>
<input type=submit name=submit value=View />
</fieldset>
</form>
View the script in your browser, and you should be
presented with all races in the drop-down. As you can see,
a minimal amount of code gives us a dynamically generated
drop-down of the items inside our array. Ive called my dropdown location, but you can choose whatever name you want.
After the drop-down, Ive added a Submit button, so that
once the user has chosen a location, they can then submit
the form.
You may notice that Ive set the form method to post. You
can use get if you wish, but to keep the URL tidy, I like to use
post for my forms. If you want know the difference between
POST and GET, refer to the information box titled GET vs
POST in short on p96.

Checking for user data


By omitting the action attribute, the form will post to itself.
How do we know when the form has been posted? By default,
when GET or POST data is sent to a script, PHP converts that
data to an array, using the field names as array keys this is
extremely helpful, as we can easily check for the presence of
a specific form field by using the isset() function. Add the
following code right beneath the races array:
if(isset($_POST[location]) check_for_race($_
POST[location]);
Here, were checking if the $_POST data is available and
has the array key location inside it. If you changed your select
name from location then you should change the field name
in the $_POST array accordingly.
If the field is present (meaning the form was submitted),
were going to call the function check_for_race() with the
location field sent to it as an argument. We havent built that
function yet, so lets do so now.

Gentlemen, start your engines

Installing PHP on Linux


Most of the latest distributions of Linux
come with PHP. Although you can run
PHP scripts on the command line, well
be using the web browser (and
therefore web server) for this tutorial.
You can follow this tutorial by
uploading your PHP files to a web
server on the internet (if you have one).
For me, though, Im using a default
installation of Apache 2 on my local
Linux machine.
I find it easier, and quicker, to write
and test PHP on my local machine
instead of having to upload files via FTP

each time. If you require installation


and/or setup instructions, or guides for
your local machine, I recommend
reading through the Installation on Unix
Systems manual found on the official
PHP site, available at the following URL:
http://php.net/manual/en/install.
unix.php
Alternatively, there are hundreds of
installation guides written for pretty
much every flavour of Linux. Simply
search on Google for your distribution
if the official PHP guide doesnt tick all
of your boxes.

return;
}
Before we do anything, were declaring our function,
check_for_race(). This function houses the crux of our
validation and assignment code.
The first line (starting with global) is new to us. Because
we defined the races array outside of the function, we have
to tell PHP that we want our function to be able to access it.
By writing global, preceded by any existing array and/or
variables name(s), were giving our function scope to access
the data. Scope can be difficult to understand if youre new
to it, check out the information box at the top of p97 for a
more detailed explanation.
The next line is where PHPs internal validation and
filtering routine comes in. PHP (as of 5.2+) comes with
built-in functions for checking specific input. You can verify
that input is in a certain format, within a range, or conforms
to a specific pattern. The filter_input() function allows you to
validate integers, floats, email addresses, URLs or supply your
own regular expression pattern to test against. You can also
perform a few sanitising functions on data, too this is the
part were most interested in for this tutorial.
Youll notice that weve supplied four arguments to
the filter_input() function. The first one tells PHP that we
want to deal with data thats coming from the POST input.
Following on, we then specify the field name that we want
to sanitise, location. Dont forget, if you changed the field
name earlier to something other than location, youll need
to amend this line accordingly.

Quick
tip
Always comment
your code
throughout.
While at the time
everything makes
sense to you, will
it do so if you have
to return to your
code a few months
later? Commenting
will help you make
sense of those
complicated
functions that
youve written.

We have our array of races, we have


our HTML page and form; all we need to do now is validate
the user data and check for the race details. When the race is
found in the array, we will display the details to the user,
otherwise well show an error message.
Copy the following PHP code just below the code snippet
(if(isset($_POST...) added above:
function check_for_race($location) {
global $races, $race_data;
$location = filter_input(INPUT_POST, location, FILTER_
SANITIZE_STRING, FILTER_FLAG_STRIP_LOW);
if(isset($races[$location])) $race_data = $races[$location];
else $race_data = No matching races have been found.;

The official PHP website is the best place online to learn about the other
filtering and validation options available natively to PHP.

95

PHP

Quick
tip
Use a text editor
that has syntax
highlighting for
PHP itll help you
quickly identify
your code and
specific parts, or
functions, inside
it. There are free
programs available
too, so have a look
around and go with
the one you prefer
the look of.

The next two arguments are the really interesting ones.


The first, FILTER_SANITIZE_STRING, tells PHP that were
expecting a string input, and that we want to remove any
potentially unsafe characters; by default, PHP will remove any
tags from the string for you. We then supply the FILTER_
FLAG_STRIP_LOW argument, which tells PHP to remove any
potentially dangerous characters. There are six possible
options we can use, ranging from encoding ampersands, not
encoding quotes within strings and stripping or encoding low
or high values.
When PHP talks about low or high values, its referring to
the ASCII table. Standard ASCII input (numbers, letters,
ampersands) starts at #32 and ends at #127. Anything below
32 is considered low (inputs such as line endings, tab spaces,
null characters etc), and anything above 127 is considered
high (inputs such as foreign
accents, currency symbols,
numeracy symbols etc). By
specifying whether we want
these removed or encoded, we
can better control how PHP deals
with the string. As our race
names, or other data, could
include foreign characters, well leave these in input such as
tabs and line endings isnt required, so were going to remove
it altogether.
Validating and santising input data is extremely important
when the data you need to work with is coming from the
outside world. Not only because theres the potential for
a hacker to break your code, but users arent always
predictable and can submit incorrect data by mistake.
By checking the data before we use it, were reducing the
possibility of bad things happening. This isnt by any means
the only precaution you should take when dealing with user
input, but its a good start.
Once weve validated and sanitised our input data
(the value from the drop-down menu), we assign it back to
the $location variable we know now that this data is safe
to work with.
The next two lines deal with checking if the race exists,
and what to do if it doesnt. Hopefully, youll recognise the
isset() call from the previous tutorial its also almost
identical to how were checking for our input POST data.

Because our array keys are defined by the place names, its
easy to check if the chosen race exists.
The key will exist in our array if we have the data we
verify this by using the shortcut to an array item with the
square brackets and using the isset() function to return
a TRUE or FALSE value (boolean). If the function returns
TRUE, we have array data, and so we assign the value of that
array item to the $race_data variable. If the function returns
FALSE, it means the value the user chose doesnt exist
so we supply a string message back.
You might be wondering how a user could choose a value
thats not in the array when were using the array itself to
populate the drop-down menu in the HTML? Well, nine times
out of ten this wont be an issue, but this is in place as a
preventative measure.
Sometimes, URLs get
broken, or people tamper with
the HTML to see what they
can do to sites by only
showing the array data if its
there or an error message
if it isnt, were preventing
our script from breaking
and showing a PHP error message; or disclosing private
information about our script or server. You may not think
this is important right now, but if youre using a database
or have potentially private data in the script or on the server,
the less you can reveal about your hardware, or scripts
content, the better.
For example, in one case, I saw a website break on a
database connection, and the error message displayed
to me contained the database username and password.
Anyone with a mind to do so could use this information
to get further into the database (or server) to get their
hands on information they wouldnt otherwise be able to
access.You should always be proactive in your security
measures, and not have the bad luck of needing to be
reactive instead.
After weve assigned a value to the $races_data variable,
be it the race data array or an error message, we finish
off the function with return. This isnt completely necessary,
but its always good to return from a function when youre
done using it.

You should always


be proactive in your
security measures.

GET v POST in short

The SecurePHP
Wiki (www.
securephpwiki.
com) is a good
site to visit for
general security
practices and
tips related
to PHP.

96

GET and POST both form part of the RFC guidelines on the
HTTP Protocol that defines the method for transferring
data between the client (your internet browser, for
example) and a web server. When you type a URL in to your
browser, it sends a GET request (in plain text) to the server
for the content you want to view. GET requests can be fairly
large and can provide copyable deep links into websites,
or search pages. Data (typically in key/value pairs) is sent
as a part of the URL, and interpreted by the server.
POST requests are similar to GET in that they, too, can be
used to retrieve content. However, any data thats sent
along with the request (again, in key/value pairs) is done
so transparently to the user and the URL it cant easily
be manipulated, viewed, or copied via the URL.
POST is perfect for sending data to a site without
obfuscating the URL.

PHP
Show the user their data
Now that we have data, no matter what its contents are, we
need a way of showing it to the user. We do this by embedding
some additional PHP code in to our HTML page, similar to
how we embedded our drop-down menu code. Below the
form, in the body area, add in the following:
<?php if(is_array($race_data)): ?>
<h3><?php echo $race_data[title]; ?> - <?php echo $race_
data[date]; ?></h3>
<h4><?php echo $race_data[location]; ?></h4>
<?php elseif(is_string($race_data)): ?>
<h3><?php echo $race_data; ?></h3>
<?php endif; ?>
This may look a little confusing at first, but if you break it
down line by line you should be able to work out whats
happening. Essentially, were checking our $race_data
variable to see what sort of content it holds if it contains an
array, we know we have race data, if it contains just a string
we have an error message.
PHP provides us with some great, simple tools to check
the data type of a variable. The first line uses the function
is_array(), with $races_data supplied to it as an argument.
This function will return a TRUE or FALSE value back.
By checking the response, we know whether to show all the
data fields that we know are incased inside it, or to show a
standard string line.
In our previous tutorial, we
used only if()else(); this is the
first time weve come across
elseif(). The elseif() control
simply allows us to execute a
certain block of code if
additional criteria is met this
criteria will differ from the one contained in the initial if()
block. In readable terms, were saying: if its an array, show
this data, otherwise if its just a string, show this data instead.
Were not using else() here, because the $races_data holds
an initial value of FALSE, set at the start of our script. If we
used else(), then wed see an error message even if the form
hadnt been submitted not really ideal.

Scope within PHP


Scope, in general programming terms,
refers to where inside your script a
variable can be seen/accessed.
In short, in PHP variables created
outside a function cannot be seen
inside it, and variables created inside
(static) a function cannot be seen
outside it. To give access to a variable
inside a function (created outside of it,
as we did in our tutorial) or method, we
can give it global scope this term
should be self-explanatory, but means
that the variable is accessible anywhere

within the function that its declared as


global. Its important to remember that
variables created inside functions and
methods have local (static) scope that
arent accessible outside (unless
declared), whereas control structures,
such as if() and while(), dont.
Scope can initially be tricky to
understand and keep in mind, but for
more details I recommend reading the
official PHP document page available
at: http://php.net/manual/en/
language.variables.scope.php

down menu of race locations.By selecting a location


and submitting the form, were dynamically showing the
race data on the page to the user, and, if the user chooses
any race that doesnt exist (or if the form is tampered with),
we display an error message instead helpfully informing
the user that something has gone wrong. Try it for yourself
by visiting the script in your browser and then submitting
the form with a location chosen.
You are, of course, free to
display the data differently,
however you choose. Why not
try spicing up the page with
a sprinkling of CSS, or altering
the date output to show
weekday names or full-month
names? By making a few basic changes here and there, you
can dramatically transform the look and feel of the calendar.
Perhaps you have an existing template, or theme that you
want to replicate?
As mentioned before, you can choose any kind of events
calendar, and arent confined to using F1 races. You could be
creative and use this on your personal website for family
occasions, or memorable events youre bound only by the
limit of your imagination.
All that you need to remember is to use a distinct key as
the array index, and then youre free to store and display any
amount of data you want.
If you decide that you would prefer to store dates instead
of the names for the array keys, and have those shown in
the drop-down list instead, then you can simply replace
them as required. It should be a straightforward task to alter
the code, but if youre unsure how to do this then please
refer back to the previous article in this chapter, in which
we created our first version of the calendar and used race
dates for the array keys.

Theres a plethora
of ways to get data
in to your website.

Wrapping it all up
With the output code in place, you should now have
everything together to run an interactive F1 calendar. When
the user first visits the page, theyre presented with a drop-

Deeper into the dynamic web

I love comments in code; I describe most things in my


scripts. If I have to return its easy to pick back up.

Working with, and displaying data doesnt stop at arrays.


Theres a whole plethora of ways to get data in to your
website and/or application, and databases are one of,
if not the, most popular ways of doing so. Dynamic and
responsive websites play a massive part in todays web as
we know it, and youll soon begin to realise the potential of
what you can build once you get to grips with it all. Read
on to find out more. Q
97

PHP

Get started
with MySQL
Mike Mackay puts the M in LAMP, and shows how easily we
can use databases by writing our own live visitor counter.

n our last tutorial, we finished off our F1 season calendar


by extending the functionality to allow a user to select
a specific race from an HTML drop-down list. This time,
were starting on something new databases. Were going
to create a live visitor counter for our website.
An additional step is now required for this tutorial, and
thats adding MySQL in to the server mix. You can download
and install MySQL for Linux from http://dev.mysql.com/
downloads/mysql you should follow the links for the
appropriate version of Linux youre using. If in doubt, the
MySQL online documentation has installation guidelines and
can be found at http://dev.mysql.com/doc.
Well assume that you have your Linux platform
configured and serving PHP pages through your web browser,
as outlined in the first tutorial. If not, please refer to the
previous article or the section titled Installing PHP on Linux.
Youll also now need to make sure you have compiled PHP
with the --with-mysql[=DIR] configuration option. See the
official PHP documentation at http://php.net/manual/en/
mysql.installation.php.
To manage your database(s), Id recommend downloading
and installing phpMyAdmin. phpMyAdmin provides a webbased interface to a MySQL server and allows you to fully
manage all aspects of your databases. It also makes
debugging and testing our application easier. You can
download the software at www.phpmyadmin.net.

So, what is MySQL?


MySQL is a relational database management system
(RDBMS) that runs as a server, providing access to databases
(collections of data) using Structured Query Language (SQL).
MySQL has been around since 1996, and is currently one of

the worlds most popular database platforms. It puts the M in


the acronym LAMP (Linux Apache MySQL PHP). Its open
source and is used by some of the biggest names on the web,
including Facebook, Twitter and Wikipedia.
Data inside a database is held there using tables. A table
is simply a collection of related data entries consisting of,
and defined by, columns and rows. Well cover a brief table
structure shortly as we make our first database application.
MySQL is available on a plethora of operating systems,
and is extremely quick. Ive used it for a database containing
more than 33 million rows, and havent noticed any issues
with speed or performance in fact, its my preferred
database choice.

Planning our script


We have a task count the number of visitors on a live site
and then display the output on our page(s). Whenever I have
a specific task, I always sit down and plan out whats going to
happen. As soon as I have that information clear in my mind,
it makes writing the script 100 times easier its a practice
that Id recommend highly, as theres nothing worse than
being confused.
It doesnt have to be anything complicated just laymans
terms on a bit of paper, with some kind of basic logic flow is
usually enough. With that said, this is how I envisage well
approach the script:
A visitor lands on a page with our whosonline.php script
included on it.
The script gets the visitors IP address and adds, or
updates, it in our database.
The script counts how many unique IP addresses are in the
database inside a five-minute window.
The script then outputs an integer representing the
above value.
Now, how easy was that? Granted, it gets more
complicated for larger scripts and processes, but now weve
got that written down we can start writing our script.

Getting started

Creating a database in phpMyAdmin is laughably easy. All you need to do is


give it a name and hit Create.

98

The simplest way to achieve what we need is to create one


PHP script that does all the hard work and simply echos an
integer value out (as we outlined above); that way we can
make use of the native PHP include() function (as weve
done in our other tutorials) and drop our counter in to any
existing PHP pages we want.
So, open up your favourite text editor and create a new
PHP file called whosonline.php and add the standard
opening and closing PHP tags:

PHP
<?php
?>
In our planning steps, I mentioned that wed count only
visitors inside a five-minute window. You might prefer a larger,
or even smaller, timeframe, so what well do is allow you to
customise this amount. Were going to use the current date
and time value in EPOCH in order to allow us some pretty
fine-grained control.
What exactly is EPOCH, I hear you cry? EPOCH, or Unix
Time, or even POSIX Time, is simply the number of seconds
that have elapsed since midnight (UTC) of 1 January, 1970.
For example, the EPOCH value for midnight of 1 January, 2012
is 1,325,419,200. So thats 1 billion, 325 million, 419 thousand
and 200 seconds since midnight of 1 January, 1970. Capiche?

Our Visitors database isnt complicated, so all we need is to create two


columns to house our user data in.

Creating our database


Before our PHP script can do anything, we need a database
to work with. Using a copy of phpMyAdmin installed earlier
(go back and do this if you skipped that step), create a new
database called whosonline. You can see in my screenshot
that Ive chosen to specify a utf8_unicode_ci format for my
database; this isnt wholly required, though, as were storing
only an IP address and a timestamp.
Once your database has been created, youll be presented
with a screen that says your database has no tables. This is
fixed easily by creating the necessary table that will house our
visitor data. Do so by creating a new table called visitors and
specifying the number of fields as two.
The third screen may look a little confusing at first, but
dont worry as Ill explain what to select for each field and why
were doing it. Lets now start structuring our table
Field This is the name of the column that you want. For the
first row, enter ip_address, and underneath on the second
row enter visited.
Type This instructs MySQL what sort of data type were
using for this column. MySQL supports many different data
types from integers, longtexts, binary, floats, varchars, date
and times. For this exercise, we want the first row for ip_
address to be a VARCHAR meaning variable character. The
second row should be set to TIMESTAMP, indicating we want
MySQL to store a date/time representation.
Length/Values For certain data types, we can limit the
length of that field. For our VARCHAR record (ip_address)
we want a maximum string length of 15 characters.
Restricting the length of the field can save database storage
size and help keep speeds up. Because TIMESTAMP is
a fixed-length field, we can leave this blank for visited.
Default MySQL allows you to specify a fallback value when
inserting records in to a database if there isnt a value
supplied for a field. We always want the users IP address, so
we can leave this blank, but well specify CURRENT_
TIMESTAMP for visited, as it saves us having to supply that
information when saving the visitor in the database.
Index Indexing in relational databases can be extremely
useful. You can gain a performance boost from your SELECT
queries when indexes are in place. For more information on
indexing, please refer to the info box titled Indexing, what
gives? For our database, though, select PRIMARY for our
ip_address row, and INDEX for the Visited row.
Thats pretty much it where structuring our table is
concerned. You can leave Collation, Attributes, Null, AI and
Comments empty in the column rows, as we dont need to
mess around with them. The only additional change Ive

made is specifying MyISAM as my Storage Type and, again,


utf8_unicode_ci as my Collation on the general structure.
Make sure you hit Save, and lets move on!
Now that our database and table have been set up, were
ready to move on to writing our script.
Lets set up some default values in our script. Open up
whosonline.php in your editor and, just below the opening
PHP tag, insert the following lines:
$time_limit = 300;
$dbHost = 127.0.0.1;
$dbUsername = myuser;
$dbPassword = mypassword;
$dbDatabase = whosonline;

Time limit

Quick
tip
Always comment
your code
throughout.
While, at the time,
everything makes
sense to you will
it do so if you have
to return to your
code a few months
later? Commenting
will help you make
sense of those
complicated
functions that
youve written.

This code should be relatively self-explanatory. One thing to


note is the value weve got for $time_limit this value is the
number of seconds you want the Visitor window to count for
(continuing from our earlier discussion about EPOCH, 300
seconds is 5 minutes). Feel free to change the value to suit
your needs.
The $dbHost value should point to your MySQL server.
In most cases, this will be 127.0.0.1 or localhost.
$dbUsername and $dbPassword will again be different for
your set-up. You should replace these values with the correct
ones for your server (see the info box titled MySQL users in
brief for more information). $dbDatabase should remain the
same unless you have called your database something other
than whosonline.
Were going to be using the MySQLli set of PHP

We have many different data and structure options in MySQL. Take a browse
through the options available and see what else is there.

99

PHP
functions. Its similar to its predecessor MYSQL, but offers
both procedural and object-orientated methods of working.
For the purpose of simplicity, well work exclusively with the
procedural method in this tutorial.
Well be using the mysqli_connect() function initially, and
then well take a look at mysqli_query() and the trick to
echoing out the live visitor count easily, the mysqli_num_
rows() function.

Connect, Query and Display

Quick
tip
Use a text editor
that has syntax
highlighting for
PHP itll help you
quickly identify
your code and
specific parts, or
functions, inside
it. There are free
programs available,
too, so have a look
around and go with
the one you prefer
the look of.

Now weve come to the main focus of this tutorial working


with our MySQL database. With the whosonline.php file
already open in your editor, just below the database variables
we wrote, add the following lines of code:
$mysqli = mysqli_connect($dbHost, $dbUsername,
$dbPassword, $dbDatabase);
if(mysqli_connect_errno($mysqli)) echo Failed to connect to
MySQL: . mysqli_connect_error();
Again, this should be relatively easy going. Were calling
the mysqli_connect() function using our database
connection credentials and assigning a database connection
handle to the variable $mysqli. Why do we do this? Well, this
is because the additional database functions need an
available resource to work with (which youll see shortly);
otherwise, well, they just wont work. Its important to note
that this rule applies only to the procedural method and not
the object-orientated method.
The next line down, containing the if() statement,
performs a quick internal check to make sure we connected
to our specified database without any problems. If this fails,
then the script will die with a convenient error message
explaining why.
Providing you didnt encounter any connection errors, or if
you did you have since corrected them, its now time to insert
the visiting user in to our visitors table.
Beneath the connection code, write the following:
mysqli_query($mysqli, REPLACE INTO visitors (ip_
address) VALUES ( . mysqli_real_escape_string($mysqli, $_
SERVER[REMOTE_ADDR]) . ));
This is now starting to look a little more complicated.
The first part isnt too bad, though. mysqli_query() is the

Installing PHP on Linux


Most of the latest distributions of Linux come with PHP.
Although you can run PHP scripts from the command line,
well be using the web browser (and therefore web server)
for this tutorial.
You can follow this tutorial by uploading your PHP files to
a web server on the internet (if you have one). For me,
though, Im using a default installation of Apache2 on my
local Linux machine. I find it easier and quicker to write and
test PHP on my local machine instead of having to upload
files via FTP each time.
If you require installation and/or setup instructions or
guides for your local machine, I recommend reading
through the Installation on Unix systems manual found on
the official PHP site, available at the following URL: http://
php.net/manual/en/install.unix.php.
Alternatively, there are hundreds of installation guides
written for pretty much each flavour of Linux. Simply
search Google for your distribution if the official PHP guide
doesnt tick all of your boxes.

function we use to perform a database query. We have


to supply two parameters to this function the first is
our database handle and the second is the SQL query we
want to perform.
For our query, we use the SQL REPLACE INTO to tell
MySQL that we want to update or insert a value into our
visitors table. We could go about this another way and check
if an IP address exists in the table first, and then perform an
update or insert it if it doesnt exist, but this approach is
more succinct.
One caveat to this approach is that we can use this
method only on distinct values in our database, but seeing as
each visitor will have their own distinct IP address, this makes
it perfect for our script.
In order to make the database accept only distinct values
and allow the REPLACE INTO query to work, we had to set
the ip_address field as a PRIMARY KEY in our table
structure. The role of a PRIMARY KEY within a database

MySQL users in brief

MySQL.
com is a great
resource for
the latest news
in the world of
MySQL. Its also
home to official
documentation.

100

Users in MySQL work in a very similar method to your Linux


platform. When you first install MySQL, you are asked to
create a password for the root user. Once the installation
has been completed, you can then connect to your server
as root.
For local development, a lot of users run and test their
PHP application as root. It goes without saying that you
should never connect your application as root on the web.
Personally, I always create a local user that mimics the
same user I will be using on my public web server. This user
has fewer privileges and, therefore, is limited in what they
can perform.
Setting up a user correctly can also protect against
higher levels of SQL Injection. The permissions you give
your user should be only what they need to complete the
job. You can easily create users in phpMyAdmin under the
Privileges tab. You can assign global permissions or restrict
them to certain databases. Think carefully about their
needs and go from there.

PHP
table, put simply, is to identify uniquely a single row, or piece
of data, within a table.
The next part of the query specifies which table we want
our data added to (visitors) and then what our data is. The
SQL approach is very similar to the key => value relationship
that we use in arrays. Here, were saying simply (column1,
column2) VALUES (value1, value2), but when it comes to
the actual data insert, MyQSL translates that to column1 =>
value1, column2 => value2.
Our query inserts only one item of data, the users IP
address, but to make sure we do this securely, were going to
enclose that piece of data inside a special function.
In the previous tutorial, we touched on some important
security basics, such as validating and sanitising user input
data. The same rules apply to working with databases, too.
Not only should we protect the data that goes in, but we
also need to make sure that the user isnt inserting malicious
code that can mess around with our database, such as
deleting every single record a practice more commonly
known as SQL Injection.
To counteract this, well use MySQLs built-in sanitising
function mysqli_real_escape_string(). As with our other
functions, the first parameter is a database handle. This
function takes the second input parameter and escapes any
special characters inside it, making it safe for use in SQL
queries. This can be a real life-saver, so Id recommend using
it all the time, even for user input that you believe is safe.
While we can guarantee with 99.9% certainty that
$_SERVER[REMOTE_ADDR] contains the users IP
address, safeguarding our input in this way prevents anyone
from messing around on the server and inserting
inappropriate or malicious content.
The $_SERVER global array contains information such as
headers, paths and script locations that are created by the
web server itself. Check out the PHP website if you want
further information.

Counting our unique visitors


Now that our visitor has been added, all we need to do is
count the number of unique visitors in our database and echo
out the value. Beneath the above query, add the following
lines to your whosonline.php file:
$visitors = mysqli_query($mysqli, SELECT ip_address
FROM visitors WHERE UNIX_TIMESTAMP(NOW())-UNIX_
TIMESTAMP(visited) <= . mysqli_real_escape_
string($mysqli, $time_limit) . );
Here, we again use the mysqli_query() function
(supplying the database handle), but this time we perform
a SELECT query on our visitors table. Here, we specify an
additional statement in our SQL that restricts the data thats
pulled in from the database; we do this by adding a
WHERE clause:
WHERE UNIX_TIMESTAMP(NOW())-UNIX_
TIMESTAMP(visited) <= . mysqli_real_escape_
string($mysqli, $time_limit) .
There are two new things to spot here these are
UNIX_TIMESTAMP() and NOW(). You could be forgiven for
thinking that they are PHP functions, as the syntax is
identical, but theyre SQL functions. NOW() is a function in
SQL that defines the current date and time (in the format
YYYY-MM-DD HH:MM:SS), and UNIX_TIMESTAMP()
converts a timestamp to EPOCH.
Effectively, were saying here that we want any IP
addresses back where the time they visited was less than, or

Indexing, what gives?


Indexes are used to find rows with
specific values quickly. Without an
index, MySQL must begin with the first
row and then read through each row
until the correct value is found. The
larger the table, the slower this can
be. If an index is present, MySQL can
quickly jump to the approximate
position to start searching from, giving
your database a performance boost.
Think of it like an address book
with your contacts in. Without an
alphabetical index, you would have to

flip through each contact until the


right one is found, but with an index
you know to jump straight to M
when looking for Mike Mackay,
saving you time by not needing to
go through A-L first.
When an index is in place, each
INSERT, UPDATE or DELETE means
MySQL has to recalculate its index
data. If you place too many indexes
on a table, you eventually lead to a
slowdown with each request, as MySQL
is busy updating its indexes.

equal to, 300 seconds (5 minutes). We can achieve this by


taking the stored visitors time from the database (ie the time
their record was added, visited) and subtracting it from the
current time. The difference that remains from the calculation
is the number of seconds since they last looked at a page on
the site our counter time window.
Were concatenating our $time_limit variable, as weve
done in previous tutorials. Even though were in control of this
variable, I have still chosen to enclose it in the mysqli_real_
escape_string() function.
Where security is concerned, even if you believe the data
wont change, its still worth putting in the extra characters to
be sure. Its better that than losing your database!
All thats left to do now is display the amount of visitors
from our query. We have two ways to do this, but Im opting
for the neater solution (well explain the other way in the final
tutorial in this chapter).
As we want only the row count, and not the specific data
set, we can use the function, mysqli_num_rows(). This
simply gives us a count on the number of rows that a query
has returned, so simply add in:
echo mysqli_num_rows($visitors);
In order for this function to work, it needs a valid mysqli_
query() supplied as a parameter. In our previous query, we
stored the results of our SELECT query in to the $visitors
variable, so this is all we need to supply to the function to get
the count. If we wanted the number of rows for an alternate
query, we could assign that to a different variable and supply
as necessary.

Putting the script to use


With that, our script is complete. If you open your web
browser and run the script, you should simply see 1
displayed. If you then look at the database using
phpMyAdmin, youll be able to view the IP address and
timestamp thats held in the record. Using databases really
can be this easy!
I mentioned earlier that we could drop this in to all our
pages on a website, or inside a web template. Im doing this
by using the include function inside an HTML tag and
wrapping some text around the output, for example:
<h4>Current Site Visitors: <?php include(whosonline.php);
?></h4>
This value is then shown to all the users on the website.
You could, of course, choose to display it a different way, or
even change the text surrounding the count be adventurous
and see what you can do with it. Q
101

PHP

Do more
with MySQL
Mike Mackay explains how to add to your sites visitor counter database
to show not only the all-important numbers but also visitor data.

n the previous tutorial in this chapter, we looked at using


MySQL and databases for the first time. We created a
basic live visitor counter that simply told us the number
of visitors to our website at any given moment. Now, lets take
a look at how we can work closer with MySQL and display
not only the number of visitors, but also their IP address
and visited time.

Installing MySQL
Just as before, we have now added MySQL to our arsenal.
If you followed along with the last tutorial, you can skip this
installation step. If youre new to MySQL then youll need to
download and install the MySQL Server software so that PHP
can make use of it.
You can download and install MySQL for Linux from
dev.mysql.com/downloads/mysq you should follow the
links for the appropriate version of Linux you are using. If in
doubt, the MySQL online documentation has installation
guidelines, and can be found at dev.mysql.com/doc.
Were going to make the assumption here that you
already have your Linux platform configured and serving
PHP pages through your web browser, as outlined in the
first tutorial. If not, please refer to the previous tutorials in
this chapter, or the section titled Installing PHP on your
Linux platform at the top of p103.
Youll also now need to make sure you have compiled PHP
with the --with-mysql[=DIR] configuration option. See the
official PHP documentation at http://php.net/manual/en/
mysql.installation.php
For managing your database(s), I would recommend
downloading and installing phpMyAdmin. phpMyAdmin
provides a web-based interface to a MySQL server and allows

Creating a database in phpMyAdmin is laughably easy. All you really need to do


is give it a name and hit Create.

102

you to manage fully all aspects of your databases. It also


makes debugging and testing our application easier. You can
download the software at www.phpmyadmin.net
As were working on from the previous tutorial, well be
using a lot of the same code we built. Its a great timesaver
and already gives us a foundation on which we can expand.
Well be using the same database table structure too.
Just to recap, MySQL is a relational database
management system (RDBMS) that runs as a server
providing access to databases (collections of data) using
Structured Query Language (SQL). Data is held inside
a database using tables. A table is simply a collection of
related data entries consisting of, and defined by, columns
and rows. MySQL is available on a plethora of operating
systems, and is extremely quick.

Diving back in
So, our task now is to be able to display the visitor data along
with our counter. Lets get started with our changes and
review what we need to update.
Open up whosonline.php in your favourite text editor and
spend a few minutes familiarising yourself with the script in
your mind. Check out the variables we set at the top of the
script, such as the database connection details and our
visitor timeout window. If you need to alter these for your
local setup then feel free to do so. The $dbHost value should
point to your MySQL server; in most cases this will be
127.0.0.1 or localhost. $dbUsername and $dbPassword will
again be different for your setup, and you should replace
these values with the correct ones for your server (see the
info box titled MySQL users in a nutshell). $dbDatabase
should remain the same unless you called your database
something other than whosonline.
We still need to continue using the mysqli_connect()
function as well as the mysqli_query() function, as these
form the basics of our SQL query and are required. Were
calling the mysqli_connect() function using our database
connection credentials, and assigning a database connection
handle to the variable $mysqli:
$mysqli = mysqli_connect($dbHost, $dbUsername,
$dbPassword, $dbDatabase);
if(mysqli_connect_errno($mysqli)) echo Failed to connect to
MySQL: . mysqli_connect_error();
Beneath our initial connection function, we want to keep
the exact same query to insert the visitors IP address into
our table so a record of their visit is tracked:
mysqli_query($mysqli, REPLACE INTO visitors (ip_
address) VALUES ( . mysqli_real_escape_string($mysqli, $

PHP
Installing PHP on your Linux platform
Most of the latest distributions of Linux come
with PHP. Although you can run PHP scripts
from the command line, well be using the web
browser (and therefore the web server) for
this tutorial.
You can follow this tutorial by uploading your
PHP files to a web server on the internet (if you
have one). For me, though, Im using a default

installation of Apache2 on my local Linux


machine. I find it easier and quicker to write and
test PHP on my local machine, instead of having
to upload files via FTP each time.
If you require installation and/or setup
instructions or guides for your local machine,
I recommend reading through the Installation
on Unix systems manual found on the official

SERVER[REMOTE_ADDR]) . ));
Keep in the next part that retrieves all the applicable visitor
data. However, we will need to make a small change here:
$visitors = mysqli_query($mysqli, SELECT ip_address
FROM visitors WHERE UNIX_TIMESTAMP(NOW())UNIX_TIMESTAMP(visited) <= . mysqli_real_escape_
string($mysqli, $time_limit) . );

Getting all the data


The query above retrieves only one item of data from the
database the IP address. Because we are now going to
display all the data, we need exactly that back all the data.
MySQL has a great shortcut when it comes to asking for
every piece of data in the database; instead of having to
manually type in each field name we can simply insert a *.
The * works like a wildcard, and MySQL interprets that as:
I want every column of data. Put simply, youre asking for
everything the database has. If, however, we wanted to limit
data, wed simply write out each column name separated by
a comma. For example, ip_address, visited.
Lets now alter that query to give us every item of data
back from the database:
$visitors = mysqli_query($mysqli, SELECT * FROM visitors
WHERE UNIX_TIMESTAMP(NOW())- UNIX_
TIMESTAMP(visited) <= . mysqli_real_escape_
string($mysqli, $time_limit) . );
Remember here that we make use of UNIX_
TIMESTAMP() and NOW(). NOW() is a function in SQL that
defines the current date and time (in the format YYYY-MMDD HH:MM:SS), and UNIX_TIMESTAMP() converts a time
stamp to EPOCH. For more information on EPOCH, please
refer to the information box titled Whats EPOCH when its
at home?
Now were at the point in the script where we want to
change the behaviour from the previous tutorial of just
outputting an integer (representing the number of visitors).
Instead, we want to also show a list of IP addresses and
timestamps of the visitor(s). So, then, lets start doing
that now.

PHP site, which is available at the following URL:


http://php.net/manual/en/install.unix.php
Alternatively, there are hundreds of
installation guides written for pretty much every
flavour of Linux.
Simply search Google for your distribution
if the official PHP guide doesnt tick all of
your boxes.

function, while Visitors will contain another array of all the


visitor data. If we were to visualise this array, this is how the
structure would look:
$visitors_data = array(
total => [integer],
visitors => [array]
);
As you can see, this is extremely straightforward. Lets go
straight in to the bulk of our change and add the code that will
get and store the visitor data. It might look complicated now,
but well break it down into bite-sized pieces and will review
every line afterwards.
Directly below the above SQL query (where we made our *
change), add in the following code:
if(is_numeric(mysqli_num_rows($visitors)) && mysqli_num_
rows($visitors) > 0)
{
$visitors_data[total] = mysqli_num_rows($visitors);
while($online_visitors = mysqli_fetch_array($visitors))
{
$visitors_data[visitors] = array(
ip_address => $online_visitors[ip_address],
visited => $online_visitors[visited],
);
}
}
The first line, with the if() statement, is relatively easy.
Here, we are checking that our SQL query returned a set of
rows back; we do this by using the is_numeric() function
wrapped around our mysqli_num_rows() function. The
function is_numeric() returns simply TRUE or FALSE
(a boolean) based on the contents supplied to it, in this case
our mysqli_num_rows() function.
If the value inside is indeed numeric, the function returns

Quick
tip
Always comment
your code
throughout.
While, at the time,
everything makes
sense to you will
it do so if you have
to return to your
code a few months
later? Commenting
will help you make
sense of those
complicated
functions that
youve written.

Changing our output behaviour


Before, we made use of the function mysqli_num_rows() to
get the number of entries returned back from our SQL query.
We still want to keep this in, as it saves us time having to
manually write additional PHP code to count the number of
items in the results array, but now we also want to locally
store the database data so that we can display it easily on our
page(s). For this, were going back to basics and will create an
array ($visitors_data) to hold our information. Our array will
consist of two items, Total and Visitors. The Total field will
store the integer output from the mysqli_num_rows()

Using phpMyAdmin, we can quickly view our visitor data without having to
check it through the website.

103

PHP
TRUE and the code inside the curly braces is then executed.
The initial line within the if() braces assigns the integer value
(the number of rows returned) from the database query to
our $visitor_data[total] array item. The next line,
containing the while() statement, is the most complicated
part of our update. The while() loop contains the crux of our
local data storage/assignment and makes use of the
massively helpful PHP function, mysqli_fetch_array().

Using loops

The official
online PHP
reference
has details of
every single
MySQLi function
available an
absolute must!

Loops are helpful in PHP when you want to execute


a block of code when a condition is met, iteratively.
The code will continue to be run for as long as the condition
evaluates to TRUE. The way we achieve this with our
MySQL dataset is with the mysqli_fetch_array() function
and assignment.
The mysqli_fetch_array() function takes the returned
dataset from the SQL query, row by row, and allows you to
retrieve the columns/data from that via an array the array
index is the field name of the respective column. In our
example, our database has two fields, ip_address and
visited. PHP will automatically create a temporary array with
these names as the array key, and will associate the row data
to them. We then take the output from the function (an array)
and assign it to $online_visitors. This converts $online_
visitors to an array containing the row from the database.
Because the method of assigning data to a variable when

MySQL users in a nutshell


Users in MySQL work in a very similar method to your Linux
platform. When you first install MySQL, you are asked to
create a password for the root user. Once the installation is
complete, you can then connect to your server as root.
For local development, a lot of users run and test their
PHP application as root. It goes without saying that you
should never connect your application as root on the web.
I always create a local user that mimics the same user I will
be using on my public web server. This user has fewer
privileges, and so is limited to what they can perform.
Setting up a user correctly can also protect against higher
levels of SQL injection attacks. The permissions you give
your user should be only what they need to do the job.
You can easily create users in phpMyAdmin under the
Privileges tab. You can assign global permissions, or restrict
them to certain databases. Think carefully about their
needs and go from there.

successful equates to TRUE in PHP terms, the code inside the


while() braces is then executed.
Once that code has finished, the while condition is
then re-evaluated and the entire process starts all over
again. When the condition returns FALSE, the while()
loop is then broken and any code following on from it is
then executed.
Thankfully, PHP does all the hard work for us when
dealing with the dataset, and on each iteration of the while()
loop it automatically moves on to the next available row of
data (where applicable). The contents assigned to $online_
visitors are only temporary and available to that one
individual iteration of the loop its this reason that we store
a copy of this array data inside another array, making it easy
for us to access the data at a later stage without needing to
communicate with MySQL.
The actual array assignment we see inside the curly
braces of our while() loop should look familiar to you.
As explained above, were storing the actual visitor data as
an array inside the array item for $visitors_data[visitors].
Were making use of a previously discussed method to
automatically append data to the end of an array, using [].
As soon as the while() loop has finished, and PHP has
looped through all returned rows of data from our SQL query,
our new $visitors_data array is now available to us and ready
to use on our page(s).
Weve worked with arrays a lot so far in this series, but if
youre unsure of whats happening, feel free to refer back to
previous tutorials where this type of functionality is explained.

Displaying our visitor data


It would be silly to go to all the hassle of retrieving and storing
each unique visitor item without showing it on our page.
Doing so is simple, and all you need to do is echo out which
pieces of information you want.
In order to display the total number of visitors, we know
that we just need to echo out the integer value returned from
the mysqli_num_rows() function. But this has changed
slightly from the previous tutorial and this value is now held in
a different place. In order to show the total, all you need to
write on any PHP page is the following:
<?php echo $visitors_data[total]; ?>
You could surround this with various HTML tags to create

104

PHP

Whats EPOCH when its at home?


EPOCH, or Unix Time, or even POSIX
Time, is simply the number of seconds
that have elapsed since Midnight (UTC)
of 1 January, 1970. For example, the
EPOCH value for midnight of 1 January,
2012 is 1325419200. So thats one
billion, 325 million, 419 thousand and
200 seconds since midnight of
1 January, 1970.
You might be wondering what the
importance, or relevance, of this data is

MySQL.com is a great resource for the latest news in the


world of MySQL. Its also home to official documentation.

an emphasis on the value the choice is entirely yours. But


how do we go about displaying the actual visitors data? Well,
by making use of the PHP function foreach(), we can achieve
this with relative ease.
Semantically speaking, our visitor data would be
considered tabular data. As Im always keen to try to use the
correct mark-up for the correct data, I will be showing the
visitor data on my site using an HTML table. Its nothing fancy,
but does the job nicely for me. Weve used the foreach()
function before to loop through a set of data before, so this
shouldnt be too unfamiliar to you:
<table border=1 cellpadding=5 cellspacing=0>
<tr>
<th>IP Address</th>
<th>Visited</th>
</tr>
<?php foreach($visitors_data[visitors] as $visitor): ?>
<tr>
<td><?php echo $visitor[ip_address]; ?></td>
<td><?php echo $visitor[visited]; ?></td>
</tr>
<?php endforeach; ?>
</table>
Here, youll notice that the PHP code is embedded inside
a few HTML table tags. Hopefully, you will already have some
HTML knowledge and this wont be too confusing for you.
Unfortunately, covering HTML is beyond the scope of this
tutorial, but if youre interested in learning more then there
are some fantastic tutorials and training sites to be found on
the internet.
The key part in this code is our foreach() loop. We are
looping through each item inside the $visitors_
data[visitors] array (that we know contains all visitor
records) and assigning each row to a new temporary variable
called $visitor. The newly-created $visitor array consists of
two fields, ip_address and visited (replicating how our
MySQL table was structured).
As we loop through, the code inside the curly braces is
executed, creating a new table row. The first table column
outputs an IP address followed by the next table column for
the visited time.
Its important to note that the timestamp we see in the
output is in the default formatting from MySQL. If youd prefer

when youre programming in PHP?


Well, it allows us to easily
(and mathematically) work out various
date calculations.
You could easily determine the
number of seconds from one user, or
script action, to another. You might
even want to use it to add in some
time-sensitive locks.
EPOCH really does come in handy,
usually when you least it expect it to!

something a little bit fancier, then you can format this data
with the date() function in PHP.
This function has a multitude of formatting options, all of
which can be found on the official reference at http://php.
net/manual/en/function.date.php
This display method assumes that you are showing the
data on the same PHP file, or page, as the main MySQL
functionality. If this is not the case, you can simply include()
the main whosonline.php (substitute accordingly if youve
changed the filename) file on any page you wish to show the
data on (note that this must be a file ending in .php for it to
be parsed correctly). Youd then just embed the HTML table
code where necessary. When you execute the code and view
the output on your page, youll have noticed that Im far from
a designer and my table is, well, extremely bland and boring.
Why not make good use of some CSS and style the table to
fit around your website theme or design? You are free to
customise it as much as you need or want to.

Quick
tip
Use a text editor
that has syntax
highlighting for
PHP itll help you
quickly identify
your code and
specific parts, or
functions, inside
it. There are free
programs available
too, so have a look
around and go with
the one you prefer
the look of.

Where to next?
Now that youve learnt the basics of retrieving data from
a MySQL query and looping through the records, why not
adapt this function to output some other types of data? You
could perhaps store the F1 2012 season calendar in a
MySQL database.
By using the DATE() function (in MySQL, not PHP)
in a WHERE clause, you could display only the races that
are left in the season. You might even decide to take things
a step further and introduce teams and driver data to the
mix, as well. You could achieve this by creating further
tables in a database and writing queries to retrieve specific
information from them.
By following the guide in the previous tutorial in this
chapter, which covered creating MySQL tables with
phpMyAdmin, you should find it relatively straightforward
to set up any required tables and fields. You can also use
phpMyAdmin to enter manually the data yourself through
the Insert tab found on any table properties page. The most
important aspect of learning not just PHP but any new
language is to be creative and to start challenging yourself
more and more.
Push and drive your PHP skills forward and be creative.
Make use of all the available online resources, from tutorial
sites to forums. There are literally thousands, if not millions,
of people out there who are willing to help others out when
they get stuck. Its always a good idea to get involved in the
community, so dive in and have fun. Q
105

a
1

c
a
e
m
o
a
e
++
r

Modern Perl

Modern Perl
W

hy should a modern Gnu/Linux user


know the fundamentals of Perl? Because
if you want to automate a task, be it web
management or desktop customisation, Perl can
do it. Text is its speciality, but with the right modules
Perl can process every conceivable category of data.
Modern Perl: Track your reading....................................................108
Modern Perl: Build a web app............................................................. 112
Modern Perl: Adding to our app....................................................... 116

107

Modern Perl

Modern Perl:
Track your reading
Modern Perl makes it simple to write a database program say, for example,
to keep tabs on your books without using SQL. Dave Cross explains how.

n this article we will build a simple command line program


that accesses a database. The program we are going to
write will keep track of a reading list. Well tell it about the
books that were reading or about to read, and it will display
that information in various lists. In the following tutorial well
make the program into a web application.
Firstly, were going to need a database to store this
information in. Im going to use MySQL as its the most widely
available database system, but the same code will work with
minor amendments with
any other relational
database.
Well store the data in
two tables author and
book. In the interests of
keeping things simple during this tutorial, well ignore books
with multiple authors.
First well create a new database to contain the tables and
switch to that database:
create database if not exists books;
use books;
Well also create a user for our application. You might want
to change the password. If you do, youll also need to change
it in the get_schema subroutine as well:
create user books@localhost identified by README;
grant all privileges on books.* to books;
The author table is very simple:
create table if not exists author (
id integer primary key auto_increment,
name varchar(100)
) engine innodb;
The engine innodb is important as that means that we
can give these tables constraints that define the relationships
between them. Well see that being used in the book table:
create table if not exists book (
id integer primary key auto_increment,
isbn char(10),
author integer,
title varchar(250),
started datetime,
ended datetime,
image_url varchar(250),
foreign key (author) references author (id)
) engine innodb;
The foreign key line at the end of the definition says that
the author column in the book table contains values that are
equal to the id column in the author table. So if Douglas
Adams has the id 1 in the author table then the record in the
book table for The Hitchhikers Guide to the Galaxy will have a

1 in its author column. Splitting the author out into a


separate table means that we can store information about
several Douglas Adams books in the book table without
duplicating the information about the author. Avoiding data
duplication is called normalisation and is an important topic
in database design.
Having created our database, we now want to set up some
Perl code to talk to the database. We could use the DBI
(Database Interface) module and write raw SQL. But no one
likes writing SQL so were
going to use Object
Relational Mapping (or
ORM) to convert Perl code
into SQL. This will make our
code much easier to write at
the cost of a small amount of set-up. The ORM we are going
to use is called DBIx::Class so well need to ensure we have
that module installed. Well also need a separate module
called DBIx::Class::Schema::Loader which can generate Perl
libraries that are specific to our database. You can probably
install both of these libraries using your distributions
packaging tools, but if they arent available you can get them
both from CPAN.
DBIx::Class::Schema::Loader comes with a commandline program called dbicdump, which will look at the tables in
your database and create the Perl code needed to manipulate
those tables. You run it like this:
$ dbicdump -o components=[InflateColumn::DateTime] \
Book dbi:mysql:database=books books README
The -o components option loads some extra functionality
that well see later on. Book is the name of the Perl module

No one likes writing


SQL, so well use ORM to
convert our Perl code.

108

You can get more information about DBIx::Class from the


website at http://dbix-class.org.

Modern Perl
you want to create. Then there is a Perl DBI connection string,
which includes information about the type of database we are
talking about (mysql) and the actual database that were
interested in (books). The last two arguments are the
username (books) and the password (README).
When you run that command youll find a new file in your
current directory called Book.pm and a new directory called
Book. Within the Book directory youll find another directory
called Result and within that there are two files called Author.
pm and Book.pm. If you look at the contents of these last two
files, youll see code that closely matches the definitions of
the two tables in your database.

Fishing the Amazon


Theres one more thing that we need to do before starting to
write our program. Well be using the Amazon API to get
various details about the books in our database, and we need
to register for an API key in order to use the API. You can
register for a key at www.amazon.com/gp/aws/
registration/registration-form.html.
Once youve signed up you can go to the Security
Credentials part of the site to get your Access Key ID and
Secret Access Key. We recommend setting environment
variables to these values like this:
export AMAZON_KEY=[Your key here]
export AMAZON_SECRET=[Your secret key here]
It then becomes easy to access these from within a program.
Finally, were ready to start looking at the program. Were
going to create a program called book that has four subcommands. Typing book add <ISBN> will allow us to add a
book to our reading list. Typing book start <ISBN> will flag
that weve started to read a book and book end <ISBN> will
flag that weve finished it.
At any time, typing book list (or just book without a subcommand) will display the list of books in our database,
indicating which ones we are currently reading and which we
have finished. The start of the program looks like this:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use Book;
use Net::Amazon;
use DateTime;
A lot of this will be common to every Perl program that you
write. The first is like the shebang line. This tells the Linux
shell to run this program using the Perl compiler.
The next two lines load two standard Perl libraries called
strict and warnings. Think of these as programming safety
nets. The most important thing that the strict library does is
to force you into declaring your variables. The warnings
library looks for a number of potentially unsafe programming
practices and displays a (non-fatal) warning if it finds any. No
serious Perl programmer writes programs without loading
these two libraries.
The third use statement is slightly different. It doesnt load
a module, but tells Perl that this program needs to be run on
a particular minimum version of Perl. Were forcing the use of
Perl 5.10 as were going to use the say function that was
added in this version.
The following three lines are back to loading libraries.
Book is the library that we created to talk to the database.
Net::Amazon is the library that well use to talk to the
Amazon API. And finally, DateTime is a powerful Perl library

CPAN Perls killer app


If youre programming in Perl then you
need to know about the Comprehensive
Perl Archive Network (or CPAN). On the
CPAN youll find almost 100,000 extra
Perl modules that you can use in your
programs.
The CPAN is at www.cpan.org, but
most people use the search page at
http://search.cpan.org. A new project
called MetaCPAN (at http://
metacpan.org) aims to provide a
better interface and an API.

A large number of the most useful


CPAN modules have been repackaged
for popular Linux distributions, and
this will be the easiest way to install
most modules.
For example, if you want to install the
DateTime module on a Red Hat or
Fedora system you just need to run
sudo yum install perl-DateTime. On a
Debian or Ubuntu system that
command becomes sudo apt-get
install libdatetime-perl.

for the manipulation of dates and times.


Next we need to work out which of the sub-commands
has been invoked and run the appropriate code:
my %command = (
add => \&add,
list => \&list,
start => \&start,
end => \&end,
);
The next statement defines the valid sub-commands that
our program will implement. It does this by setting up a hash
(or dictionary) called %command. The % at the start of a
variable name indicates that its a hash. A hash is like a lookup table. It has keys which are associated with values. In our
case, the keys are the names of the sub-commands and the
values are references to the subroutines which implement
those commands.
Putting an ampersand on the front of a subroutine gives
us a way to refer to the subroutine without executing it and a
backslash is the standard Perl syntax to get a reference to
something:
my $what = shift || list;
if (exists $command{$what}) {
$command{$what}->(@ARGV);
} else {
die Invalid command: $what\n;
}
The next few lines deal with the command-line options
and calling the appropriate subroutine to do the work.
Command-line arguments to a Perl program are stored in an
array called @ARGV (the @ indicates an array in the same
way that a % indicates a hash). The shift function removes
the first element from an array and returns it. Youll notice
that we dont give shift an argument. Thats because of its
special behaviour. If you call shift without an argument
outside of a subroutine then it will work on @ARGV by default.

Quick
tip
The language is
called Perl. The
program that
compiles Perl
programs is called
perl. Typing either
of these as PERL is
wrong.

For arguments sake


If we havent been given a command-line argument then
@ARGV will be empty and shift will return a false value. In
this case we want to act as if the user gave us the subcommand list. The || operator lets us do this. This is the
Boolean or operator. It returns its left operand if that value is
true, otherwise it returns the right operand. So if theres a
value in @ARGV we get that, otherwise we get list. The value
calculated from that expression is stored in $what (Perl
scalar variables begin with a $).
Having got the sub-command, we now need to know if its

109

Modern Perl

Quick
tip
Perl comes
with a lot of
documentation
which you can
read using the
perldoc program.
Alternatively, its all
online at http://
perldoc.perl.org.

a valid value. We do this by looking in the %command hash.


We use the exists function to see if $what matches one of
the keys in the hash. If it does, we call the appropriate
function, if not we die with an appropriate error message.
Notice that as the hash contains subroutine references, we
need to call the subroutine using a dereferencing arrow. Also
notice that we pass what is left of @ARGV on to the function
that we are calling. In some cases it will be empty, but in
others it will contain the ISBN for a book.
Thats the main structure of the program complete. All we
need to do now is to implement the various subroutines that
do the actual work for the various sub-commands. Before
we start those, well write a useful utility subroutine that they
will all use:
sub get_schema {
return Books->connect(dbi:mysql:database=books,
books, README)
or die Cannot connect to database\n;
}
All of the commands will need to communicate with the
database. When using DBIx::Class all communication with a
database is carried out through an object called a schema.
Our get_schema object just connects to our books database
(using the Book module we created earlier). If it cant connect
for any reason, it just kills the program with an error message.

One for the books


The first sub-command we will look at is the one to add books
to the database:
sub add {
my $isbn = shift || die No ISBN to add\n;
my $schema = get_schema();
my $books_rs = $schema->resultset(Book);
if ($books_rs->search({ isbn => $isbn })->count) {
warn ISBN $isbn already exists in db\n;
return;
}
my $amz = Net::Amazon->new(
token => $ENV{AMAZON_KEY},
secret_key => $ENV{AMAZON_SECRET},
locale => uk,
) or die Cannot connect to Amazon\n;

Object Relational Mapping


Many programs are going to need a
persistent data store and in many
cases that will be a relational database
such as MySQL or SQLite. In order to
talk to these youll need some kind of
database interface (such as Perls DBI
module) and a lot of SQL scattered
throughout your code.
Object Relational Mapping (ORM)
allows you to write code that interacts
with a database at a higher level. You no
longer write SQL, you just manipulate
objects in your program and ORM
takes care of converting that into SQL.
Three concepts in Object Oriented
Programming (OOP) map rather well
onto matching concepts in relational
databases. In OOP, a class defines a

110

type of object (such as books), and


thats very similar to table in a
database. A particular instance of a
class is an object (a particular book)
and thats like a row in a database table.
Finally, classes and objects have
attributes which are the individual
properties of the object (for example
title and author) and this is similar to
columns in a database.
ORM uses these similarities to map
data from relational databases into
OOP objects within your program. A
good ORM, such as DBIx::Class, will be
able to automatically generate the
classes from the metadata stored in
the database which describes the
various tables.

my $resp = $amz->search(asin => $isbn);


unless ($resp->is_success) {
say Error: , $resp->message;
return;
}
my $book = $resp->properties;
my $title = $book->ProductName;
my $author_name = ($book->authors)[0];
my $imgurl = $book->ImageUrlMedium;
my $author = $schema->resultset(Author)->find_or_
create({
name => $author_name,
});
$author->add_to_books({
isbn => $isbn,
title => $title,
image_url => $imgurl,
});
say Added $title ($author_name);
return;
}
We need the ISBN number of the book to add. This is
passed as a parameter into the subroutine. Perl passes
parameters into subroutines in an array called @_. In the
same way that shift works on @ARGV when called without
an argument outside of a subroutine, it works on @_ when
called without an argument inside a subroutine. If no value is
found, then the program dies.
Having got an ISBN, we first need to check that the book
isnt already in the database. We get a schema object and use
that to give us a resultset object for the book table. In
DBIx::Class, all manipulation of a specific table is done using
a resultset object. We can use the resultsets search method
to look for books with the same ISBN.
The search returns another resultset object and we can
use the count method on that to see how many books
already exist in the database with this given ISBN. Hopefully
there arent any. But if there are, we can display an
appropriate message and return from the subroutine without
doing any more work.
If the book isnt already in the database, then we can add
it. But first we need to get more details from Amazon. We
create a Net::Amazon object giving it the key and secret that
we got from Amazon. We also set the locale to uk to indicate
that we want to use Amazons UK data. We can then use the
search method on the Amazon object to look for products
with our ISBN. If the search is successful, we can get details
of the matching book from the returned object.
Having got the details of the book, we can extract various
interesting things from the object and store them in our
database. Notice that the authors method returns a list of
authors and were only taking the first one.
To insert the book, we first look for the author in the
database by getting an author resultset and using the find_
or_create method to either find an existing author record or
create a new one.

Find the author


Once we have the author object we can use its add_to_books
method to add a new book related to that author. The add_
to_books method is one that was created automatically by
DBIx::Class::Schema::Loader when it created our classes. It
knew that this relationship between the tables existed

Modern Perl
Taking it further

Once youve read a couple of books, your reading list


should look a bit like this.

because of the foreign key constraint that we created on the


book table.
We can now try adding a book to our database. Get the
ISBN of a book from Amazon and try running the command:
$ ./book add 0330258648
$ ./book list
The next sub-command well implement will be list; so
that we can see what is in our database. This looks complex,
but actually, its rather repetitive as we print the list in three
sections (Reading, To Read and Read).
The only difference between the sections is the selection
criteria we use. Books being read have a value in the started
column but a null ended column. Books that have been read
have a value in ended. A book with a null started is still in the
to be read pile. To run these queries we use the search
method on a book resultset object. A null value in the
database is represented by the undef value in Perl. The
reading query looks like this:
foreach ($books_rs->search({
started => { !=, undef },
ended => undef,
})) {
say * , $_->title, (, $_->author->name, );
}
The search arguments say that started is not null and
ended is null. For each book found by the query we print
the title and the authors name. Again, these methods are
created for use by DBIx::Class::Schema::Loader using
information it finds about the columns in the tables and the
relationships between tables. For the list of books read, the
query looks like this:
foreach ($books_rs->search({
ended => { !=, undef },
})) {
say * , $_->title, (, $_->author->name, );
}
And for the list of books still to read, it looks like this:
foreach ($books_rs->search({
started => undef,
})) {
say * , $_->title, (, $_->author->name, );
}
The full version of the list subroutine is on the CD.

The beginning and the end


The last two sub-commands we need to implement are start
and end to indicate when we start and finish reading a book.

Theres a lot to learn about Perl. Here


are some suggestions for places to go
for more information.
The Perl home page is at http://
perl.org. From there you can find links
to many other resources about Perl.
One of the best ways to read about
what is going on in the Perl world is to
follow the Perl Iron Man blog
aggregator at http://ironman.
enlightenedperl.org.
The definitive book about Perl is
called Programming Perl. The third

edition has been out for rather a long


time now, but a fourth edition is due to
be published later this year.
The best book for learning Perl is
called, unimaginatively, Learning Perl
and the sixth edition was published this
summer. There are two more books in
this series called Intermediate Perl and
Mastering Perl.
Perl user groups are known as Perl
Mongers. You can get in touch with
your nearest Perl Monger group by
visiting the website at http://pm.org.

They are very similar, so Ill just show the start one here:
sub start {
my $schema = get_schema();
my $books_rs = $schema->resultset(Book);
my $isbn = shift || die No ISBN to start\n;
my ($book) = $books_rs->search({ isbn => $isbn });
unless ($book) {
die ISBN $isbn not found in db\n;
}
$book->started(DateTime->now);
$book->update;
say Started to read , $book->title;
}
A lot of this looks very standard by now. We check weve
been given an ISBN number and then we get a schema
object and a book resultset object. We use the search
method to get the book object from the database (and die if
it cant be found). Then we use the started method to update
that column and call the update method to save the changes
back to the database.
When we set up our database classes using
dbicdump we asked for an extra component called
InflateColumn::DateTime to be included. This is where we
see the advantage of that. It identifies any date and time
columns in the database and converts those values into
Perl DateTime objects in our program. So we can create
a Perl DateTime object using the classs now method
and DBIX::Class will automatically convert that into the
appropriate string to be stored in the database. The end
sub-command looks very similar to this, with only the
column name changed from started to ended.

Quick
tip
Theres a useful
program called
perltidy, which
will tidy up Perl
code. Its almost
certainly available
prepackaged for
your distribution.

Write your own


We now have a working system. We can add books, start
reading books, finish reading books and see what the current
state of our reading list is. Having already added a book to the
system above, try running the following commands:
$ ./book list
$ ./book start 0330258648
$ ./book list
$ ./book end 0330258648
$ ./book list
Youll see the book moving between the different sections of
the report. Q
111

Modern Perl

Build a web app


Dancer is a Perl framework for building web applications, and Dave Cross
discovers its an ideal way to expand his simple reading list program.

n the previous tutorial we built a simple command-line


program to manage a reading list. We could add books to
it and note when we started and finished reading them. At
any time the program would display a list of books that we
were reading, that wed read and that were waiting in the pile.
Command-line programs arent particularly pretty,
however. It would be nicer to display these lists on a web
page. Perl is, of course, a great language for doing this and in
this article well build a web application that displays our
reading list.
Well be using the Dancer framework, as its well suited to
the simple web page were going to build. Dancer is one of a
number of Perl frameworks, though, so have a look at the box
for a brief discussion of some of the alternatives.
As well as the modules we installed in the previous tutorial,
were going to need some more modules from CPAN (http://
metacpan.org). Not least of these is Dancer itself.
Fortunately, Dancer is available from the package repositories
of most major Linux distributions, so youll just need to run
yum install perl-Dancer, apt-get install libdancer-perl or
something similar for your version of Linux.
The Dancer installation includes a number of useful
Dancer tools, but well also need one which is distributed
separately on CPAN. This is called Dancer::Plugin::DBIC and
will make it easy for us to take the DBIx::Class libraries that
we built in the previous article and use them with Dancer.

The first Dance


Dancer makes it easy to start writing a web application. Once
its installed, you get a command-line program called dance
which helps you to create the skeleton of an application.
All you need to do is type:
$ dancer -a BookWeb
This will create a directory called BookWeb and fill it with
the beginnings of a Dancer application. Move into this
directory and take a look at the files. Well be editing these
later, but already Dancer has given us enough to demonstrate
a running program. One of the directories created was called
bin, and within that directory youll see a single file called app.
pl. Thats our web application, so lets run it:
$ ./bin/app.pl
[29957] core @0.000009> loading
Dancer::Handler::Standalone handler in /usr/share/perl5/
vendor_perl/Dancer/Handler.pm l. 41
[29957] core @0.000274> loading handler
Dancer::Handler::Standalone in /usr/share/perl5/vendor_
perl/Dancer.pm l. 366
>> Dancer 1.3072 server 29957 listening on
http://0.0.0.0:3000

112

== Entering the development dance floor ...


If you open a browser and visit http://localhost:3000/
youll see your application. It doesnt do very much right now,
but it looks pretty and contains useful links to pages that will
help you to learn more about Dancer.

Creating our web pages


The first thing that were going to do is undo all of that nice
HTML formatting and replace it with our own pages.
The HTML is stored in two files. Theres a layout file in views/
layouts/main.tt and the content of the page is in views/
index.tt. Theres also a stylesheet in public/css/style.css.
Immediately, you can see a split between where Dancer
stores its output files. Files that are processed in some way to
produce output are stored under views. Static files, such as
stylesheets and images, are stored under public.
Open the file views/layouts/main.tt in a text editor.
All were going to do here is remove the line that loads jQuery.
Our application wont get complex enough to use JavaScript
so this is unnecessary.
Whilst editing the file, notice the <% content %> tag
that follows the HTML <body> tag. This is an example of a
Dancer Template tag. Tags are processed by Dancer and
replaced with other text. The <% content %> tag will be
replaced by the contents of whichever template is used for
the current request.
In this example, were dealing only with a single request
and thats a request for the index page of the application. The
template for that page is views/index.tt and thats the next

The default Dancer application is a good way for the


programmer to start learning what its all about.

Modern Perl
file we need to look at. Its really up to you how much you
change this file. We removed most of the text, left a few of the
<div> elements and changed the headers. My file ended up
looking like this:
<div id=page>
<div id=sidebar>
</div>
<div id=content>
<div id=header>
<h1>BookWeb</h1>
<h2>Heres your reading list</h2>
</div>
</div>
</div>

Web Frameworks in Perl


Writing a web application is a complex
process, but using a framework can
make the job much easier. There are a
number of web frameworks for Perl and
you can get them all from CPAN.
In this article weve used Dancer,
which is based on the Ruby framework,
Sinatra. It defines a web application as
a number of routes which are the HTTP
requests that the application will
respond to. This approach makes it
easy to get an application up and

running quickly. You can get more


information at http://perldancer.org.
The best known Perl web framework
is Catalyst, a very powerful and flexible
framework. Its behind a number of
well-known web applications. Visit
www.catalystframework.org.
Another alternative is Mojolicious,
which concentrates on making things
as simple as possible while not
compromising on power or flexibility.
Take a look at http://mojolicio.us.

At last some Perl


But this is supposed to be a Perl tutorial, so its about time we
wrote some Perl code. We know that bin/app.pl is the file that
drives the program, but if you look in there youll see that its
very simple:
#!/usr/bin/env perl
use Dancer;
use BookWeb;
dance;
It loads the Dancer library and the BookWeb library, and
then calls Dancers dance function. All of the real work goes
on in the BookWeb library. And that lives in the lib/BookWeb.
pm file. So lets have a look at that:
package BookWeb;
use Dancer :syntax;
our $VERSION = 0.1;
get / => sub {
template index;
};
true;
Again, theres not much there yet. But you can see how
Dancer responds to requests. In a Dancer application you
define a number of routes and the Dancer request handler
matches each incoming request against the definitions of
the routes and runs the code associated with the first route
that matches.
A route is a combination of an HTTP request type (GET,
POST, etc) and a path. Currently, we have only one route
defined, which handles a GET request to the root of our
application. Any request that doesnt match that definition
will be handled by Dancers default resource not found
handler, which will send a 404 response to the browser.
If a request does match the route, then the code
associated with that route is run. In this case, that just
interprets the index template that we just cleared out. A lot of
Dancers power is in the new keywords, such as template,
that it makes available to your application. The template
keyword hides quite a lot of work searching the filesystem to
find the templates, dealing with the expansion of variables and
embedding the content templates inside the layout template.
So, if we want to put useful things into our web page we
need to pass variables to the template call. Specifically, we
will want to pass in the lists of books that were reading, have
read and are planning to read. The call will then look a bit like
the following:

template index, {
reading => \@reading,
read => \@read,
to_read => \@to_read,
};
Weve added another parameter to the template call. This
parameter is a hash, which contains details of the data we
need while processing the template. The keys of the hash are
names that we can use within the template (reading, read
and to_read) and the values are references to arrays of
books. Actually, they are references to arrays of books as you
cant store an array in a Perl hash, but you can store a
reference to an array. See the perldoc perlreftut manual page
for more explanation of this.

Getting data out of a database


The next thing that we need to do is to populate those
arrays. And, as we did in the previous article, were going to
get that data from our database. And were going to use the
Object Relational Mapper DBIx::Class in the same way as
we did last time.
Dancer has a number of plugins available on CPAN, and
one of them is called Dancer::Plugin::DBIC. That will give
us a schema keyword which allows us to get easy access
to the DBIx::Class schema object that we use to talk to
the database.
Once weve installed Dancer::Plugin::DBIC from CPAN, we
need to configure our Dancer application to use it. We do that
by editing the config.yml file, which is in the BookWeb
directory. At the end of this file, add the following lines:
plugins:
DBIC:
book:
schema_class: Book
dsn: dbi:mysql:database=books
user: books
pass: README
Youll recognise this as the connection information that we
used to talk to the database in the previous tutorial. The only
new information is the schema_class value, which is the
name of the Book.pm class that we created to enable us to
interact with the database. Our new application will need
access to that class and the easiest, if slightly lo-tech, way to
achieve that is to copy the Book.pm into the applications lib
directory. Youll also need to copy the Book sub-directory that
contains the other database classes (Book/Result/Author.
pm and Book/Result/Book.pm).

Quick
tip
The Perl motto is
Theres more than
one way to do it.
So dont be too
surprised if you find
Perl code that is
written differently
to my examples.

113

Modern Perl
While youre editing config.yml you should also change
the template engine. Dancer comes with support for two
templating engines. The default engine is a simple one that
isnt quite powerful enough for our needs, so we need to
change to use the Template Toolkit. Do that by changing the
template section in config.yml to look like this:
# template: simple

Quick
tip
If you want advice
on improving your
Perl code, try the
perlcritic program
that is probably
available
pre-packaged for
your distribution.

template: template_toolkit
engines:
template_toolkit:
encoding: utf8
start_tag: <%
end_tag: %>
Notice that weve changed the template toolkits start and
end tags from [% and %] to <% and %>. Thats
because our existing templates already contain these tags.

Templating the output


With these libraries in place we can finally write code that
accesses the database. Change BookWeb.pm so that our
route looks like this:
get / => sub {
my $books_rs = schema->resultset(Book);
my @reading = $books_rs->search({
started => { !=, undef },
ended => undef,
});
my (@read, @to_read);
template index, {
reading => \@reading,
read => \@read,
to_read => \@to_read,
};
};

The Template Toolkit

114

<div id=content>
<div id=header>
<h1>BookWeb</h1>
<h2>Heres your reading list</h2>
</div>
<h3>Reading</h3>
<% IF reading.size %>
<ul>
<% FOREACH book IN reading %>
<div class=book><p><img src=<% book.image_url %> />
<a href=http://amazon.co.uk/dp/<% book.isbn %>><%
book.title %></a>
<br />By <% book.author.name %></p>
<p><% IF book.started %>Began reading: <% book.started.
strftime(%d %b %Y) %>.<% END %>
<% IF book.ended %>Finished reading: <% book.ended.
strftime(%d %b %Y) %>.<% END %></p>
</div>
<% END %>
</ul>
<% ELSE %>
<p>No books found.</p>
<% END %>
</div>
</div>
The interesting bits are in the <% ... %> tags. Youll see an
if/else statement and a
foreach loop. These are
both standard
programming constructs
that act as youd expect.
The hash of variables
that we passed into the
template call had a key called reading, and that becomes the
name that we use to refer to that data inside the template. As
the variable is an array, we can call a size method to see how
many elements it contains. If the array is empty, the if
condition is false (Perl treats zero as a false value) so we
execute the else code which displays no books found. If there
are books in the list then we iterate over the list, putting each
book in turn into a temporary variable called book.
Each of these book values is a DBIx::Class book object
like the ones we used in the previous article. And that means
they all have methods for each of the columns in the
database table. We can use those to print the title and the
authors name. We also use the ISBN to construct a link back
to the books page on Amazon. Notice that the started
column returns a Perl DateTime object so that we can use
that classs strftime method to get a nicely formatted date.
In order to make the page look a little more attractive, we
can add the following tweaks to the stylesheet in
public/css/style.css:
img {
float: left;
margin: 0 5px 5px 0;
clear: both;
}

Macros like functions


make it easy to bundle and
re-use repetitive code.

Were using the schema


keyword to access our schema object, but the rest of
the database code is exactly the same as the code we
used last time. We get a book resultset object and then
use its search method to get an array of the books were
interested in. Books that were currently reading have a
not null value in the started column and a null value in
the ended column. Well ignore the other two lists for
now and look at how we deal with the data inside the
index template.

In this example weve used the


Template Toolkit to build the HTML
pages for our application. A templating
engine is an essential tool for building a
website of any complexity. A good
templating engine will make it easy to
separate the business logic of your
application from the logic that displays
the data to the users.
The Template Toolkit is generally
accepted as the most powerful and
flexible templating engine for Perl. It

Heres what the index.tt file looks like once weve added
code to handle the reading array.
<div id=page>
<div id=sidebar>
</div>

has its own simplified display language,


but a plugin system gives you easy
access to much of the power of CPAN.
Version 2 of TT has been around for
several years, but version 3 is now
getting close to being released.
TT has a website at http://tt2.org
which is not only a useful resource,
but also a demonstration of the power
of the toolkit. Theres also a book,
Perl Template Toolkit which is published
by OReilly.

Modern Perl
Routes in Dancer
Weve said that Dancer is built around
the concept of routes, but what exactly
is a route and how does it use them?
A route is a combination of three
things: an HTTP request method; a
request path; and some code. When
Dancer matches an incoming request
with the method and the path, it runs
the associated code. In our example,
this was the definition of the only route.
get / => sub { ... };
In this example, get is the HTTP
method, / is the path and the code is

Once youve put a couple of months effort into reading


the classics, your booklist will look something like this.

.book, h3 {
clear: both;
}
All we need to do now is to write the code to retrieve and
display the other two lists the list of books weve read and
the list of books we havent started. But thats going to get a
little repetitive, so before we do that lets make a few changes
to the index.tt to make our life as easy as possible.
The Template Toolkit has the concept of macros. These
are a bit like functions in the templating world. They make it
easy to bundle up and re-use repetitive code. We can define
a showbook macro that defines how we display a book and
then call it whenever we want to show the details of a book.
Heres the macro:
<% MACRO showbook(book) BLOCK %>
<div class=book><p><img src=<% book.image_url %> />
<a href=http://amazon.co.uk/dp/<% book.isbn %>><%
book.title %></a>
<br />By <% book.author.name %></p>
<p><% IF book.started %>Began reading: <% book.started.
strftime(%d %b %Y) %>.<% END %>
<% IF book.ended %>Finished reading: <% book.ended.
strftime(%d %b %Y) %>.<% END %></p>
<% END %>

<% END %>


<% ELSE %>
<p>No books found.</p>
<% END %>
<h3>To Read</h3>
<% IF to_read.size %>
<% FOREACH book IN to_read %>
<% showbook(book) %>
<% END %>
<% ELSE %>
<p>No books found.</p>
<% END %>

Arrays of light
Each section is exactly the same; they just work on different
arrays. All we need to do now is to fill in the values of the read
and to_read arrays that are passed to the template.
We wrote code that did this in the previous article and we
just need to replicate that in the route in BookWeb.pm. When
weve finished, the complete route definition will look
something like this:
get / => sub {
my $books_rs = schema->resultset(Book);
my @reading = $books_rs->search({
started => { !=, undef },
ended => undef,
});

Re-usable bundles

my @read = $books_rs->search({
ended => { !=, undef },
});

Its exactly the same code as before, just bundled in the


MACRO ... BLOCK ... END syntax that makes it re-usable. With
this block added to the top of the index template we can write
the section that actually displays the books like this:
<h3>Reading</h3>
<% IF reading.size %>
<% FOREACH book IN reading %>
<% showbook(book) %>
<% END %>
<% ELSE %>
<p>No books found.</p>
<% END %>
<h3>Read</h3>
<% IF read.size %>
<% FOREACH book IN read %>
<% showbook(book) %>

defined in the subroutine. When Dancer


sees an HTTP GET request to the path
/ (ie the root of the website) the code
is run.
All parts of the route definition can
be more complex. You could match a
POST request with the post keyword or
any HTTP request method with any.
You can also access data passed in the
request path using code like this:
get /hello/:name => sub {
return Hello . params->{name};
};

my @to_read = $books_rs->search({
started => undef,
});
template index, {
reading => \@reading,
read => \@read,
to_read => \@to_read,
};

};

Next time

We still have to update the database using the book


command-line program we wrote last time, but we have an
attractive web version to display our current reading list. Q
115

Modern Perl

Adding to our app


The power of web frameworks is in how they take care of standard features.
Dave Cross uses Dancer to add interactivity to his reading list program.

n the previous tutorial in this chapter, we added a web


front-end to our reading list program, but this interface
displayed only the contents of our database and we still
needed to use the command-line program to change the
data. In this tutorial, well fix that by adding interactivity to our
web application. By the end of this article, you wont need the
command-line program at all.
This will involve two major changes to the web app. Firstly,
well add actions to deal with adding books to the reading list
and starting and finishing books. But if you want to put your
reading list on a public website, you dont want just anyone to
be able to edit it, so well also implement a basic level of
authorisation and authentication.
As in the previous article, well find that Dancer will make
this all a lot easier than it would be doing it all from scratch.

How to read a book

The books
application, with
links allowing you
to maintain your
reading list.

Well start by adding routes to our application allowing us to


start and finish reading books. Well do this before adding
books to the list, as these actions are simpler. Well implement
these actions by adding new route definitions to the
BookWeb.pm file. Heres the definition of the start route:
get /start/:isbn => sub {
my $books_rs = schema->resultset(Book);
my $book = $books_rs->find({ isbn => param(isbn)});

if ($book) {
$book->update({started => DateTime->now});
}
return redirect /;
};
Like all Dancer routes, this definition consists of an HTTP
action (in this case get), a path and some code to execute
when the first two items are matched. The path here is more
complex than the path that we saw last time, as it contains a
parameter. The URL that we want to use to start reading a
book looks like http://example.com/start/1930110006.
This will flag that you have started reading the book with
ISBN 1930110006.
Obviously, that ISBN value will change for different books,
so we need a way to capture that parameter and use it in our
code. In a Dancer route, you can match parameters with the
:name syntax that you see in our definition. You can have
more than one parameter defined in the route as long as they
are named and separated by slashes. You access these
parameters using Dancers param function.
The rest of the code will look familiar to anyone who read
the first article in this chapter (beginning on p108) where we
wrote the command-line version of this program. We get a
resultset for our book table, search it for a book with the given
ISBN and then update the started column in that object to be
equal to the current date and time.
You might also remember that the DBIx::Class tool that we
are using for database access automatically converts
between Perl DateTime objects and date/time columns in
your database.
Notice that if we dont find a book with the given ISBN,
then we do nothing. It might be worth displaying an error
message at that point. Or, perhaps, redirecting to the add
action (which we havent written yet).
Once we have updated the book record, we just use
Dancers redirect function to redirect the browser back to the
main page of the application. The user will then see that the
chosen book has moved from the To Read list to the
Reading list.
The code for the end route is almost identical. Only the
path and the database column will differ. The path will be
/end/:isbn, and well need to update the ended column in
the database.

Adding new books


The next thing we need to do is to add new books to the list.
Again, well be repurposing code from the original commandline program. As we need to go to Amazon for details of the
book, we need to create a Net::Amazon object. Well need this

116

Modern Perl
object in a couple of places, so well write a get_amazon()
subroutine that creates the object for us.
sub get_amazon {
return Net::Amazon->new(
token => $ENV{AMAZON_KEY},
secret_key => $ENV{AMAZON_SECRET},
associate_tag => $ENV{AMAZON_ASSTAG},
locale => uk,
) or die Cannot connect to Amazon\n;
}
Theres nothing complicated here. Its just calling the
constructor on the Net::Amazon class and returning the
object that is created. Annoyingly, Amazon has changed the
way that this works since I wrote the first article in this series.
See the Amazon API Changes boxout for more details.
We can now define our add route. The path will be a similar
format to the start and end routes. The code looks like this:
get /add/:isbn => sub {
my $author_rs = schema->resultset(Author);
my $amz = get_amazon();
# Search for the book at Amazon
my $resp = $amz->search(asin => param(isbn));
unless ($resp->is_success) {
die Error: , $resp->message;
}
my $book = $resp->properties;
my $title = $book->ProductName;
my $author_name = ($book->authors)[0];
my $imgurl = $book->ImageUrlMedium;
# Find or create the author
my $author = $author_rs->find_or_create({
name => $author_name,
});
# Add the book to the author
$author->add_to_books({
isbn => param(isbn),
title => $title,
image_url => $imgurl,
});
return redirect /;
};
In this function, we need to talk to both the database and
Amazon, so the first thing
we do is create an author
resultset and a
Net::Amazon object. We
then search Amazon for the
ISBN that weve been given.
If we find it, we first create
an author record (or find
the existing one if we already know about this author) and
then insert details of the book. Once again, when weve
finished, we just need to redirect to the front page and the
user will see their new book in the To Read list.

Amazon API changes


Its rare for a big company such as
Amazon to make changes to its web
services API in such a way that it
breaks a lot of existing code. But,
unfortunately, thats exactly what
happened at some point after I wrote
the previous article in this series.
In the older version of the API you
needed a key and a secret. These
values were passed to Net::Amazon as
you created the object. Amazon has
now added a third mandatory
parameter your Amazon Associates
ID. Like the other two parameters, you
can get this value from your Amazon
web services account information.

The Net::Amazon module checks that


you have given it all of the mandatory
parameters when you call its
constructor method. Older versions
checked for the key and the secret. But
once the API change was introduced
those parameters werent enough and
any API calls were failing with an error
about the missing parameter. Version
0.61 of Net::Amazon adds the
associates ID to the list of mandatory
parameters. The new version of the call
is shown in the code in this article. I
recommend that you update your
version of Net::Amazon to avoid any
potential problems.

<div class=book><p><img src=<% book.image_url %> />


<a href=http://amazon.co.uk/dp/<% book.isbn %>><%
book.title %></a>
<br />By <% book.author.name %></p>
<p><% IF book.started %>Began reading: <% book.started.
strftime(%d %b %Y) %>.<% END %>
<% IF book.ended %>Finished reading: <% book.ended.
strftime(%d %b %Y) %>.<% END %></p>
<% IF book.started AND NOT book.ended -%>
<p><a href=/end/<% book.isbn %>>Finish book</a></p>
<% ELSIF NOT book.started -%>
<p><a href=/start/<% book.isbn %>>Start book</a></p>
<% END %>
</div>
<% END %>
Our additions are towards the end. If the book has a value in
the start date but no value in the end date then it must be in
the reading list and we display a finish book link. If it has no
start date then it must be in the to read list and we display a
start book link.
If we make these changes and start our application (with
bin/app.pl), you should see these links appearing next to the
books assuming that you
have books on the list. And
that brings us neatly to the
next problem. We need a
better way to add books to
the list. Lets do it by
searching Amazon.

Quick
tip
The best book
about Perl is
Programming Perl.
The fourth edition
has just been
published.

We need a better way to


add books to the list. Lets
do it by searching Amazon.

Adding links
Thats all very well, but currently the only way to access our
new routes is by typing addresses, including the ISBNs, into
the location bar in your browser. Thats hardly user-friendly.
Lets fix that by adding links to the list of books. In the file
views/index.tt, we have a macro called showbook which is
responsible for displaying an individual book in the main list.
We can edit that and have the links appear for every book.
Once the links have been added, the macro looks like this:
<% MACRO showbook(book) BLOCK %>

Amazon exploration
The best place for a search box is in a sidebar that appears on
every page. Our sidebar is defined in views/layouts/main.tt.
Edit the sidebar div so it looks like this:
<div id=sidebar>
<p><form method=POST action=/search><p>Search
Amazon:
<input name=search values=<% search %> /> <input
type=submit value=Search /></form></p>
</div>
That will put a search box on every page in our application.
But now we need to write code to carry out the search and
display the results. Notice in the form definition weve said
that the form sends a POST request to /search. That gives us
a couple of clues as to how our route definition should look:

117

Modern Perl

Quick
tip
There are a huge
number of blogs
dedicated to Perl
programming.
Many of the best
ones are collected
at http://mgnm.
at/ironman.

post /search => sub {


my $amz = get_amazon();
my $resp = $amz->search(
keyword => param(search),
mode => books,
);
my %data;
$data{search} = param(search);
if ($resp->is_success) {
$data{books} = [ $resp->properties ];
} else {
$data{error} = $resp->message;
}
template results, \%data;
};
We need a Net::Amazon object in order to search Amazon,
so we get that first. We can then use the same search
method as we used before, but with different arguments.
We tell Amazon that were looking for a book and that the
keyword were looking for is the search term that the user has
given us. If the search is successful then the books that
match are retrieved by calling the properties method on the
response object. We put
that list in a hash called
%data, along with the text
we searched for, and pass
that to the results template.
Which means we need
to create a template called
views/results.tt. It looks like this:
<h1>BookWeb - Search Results</h1>
<% IF error -%>
<p class=error><% error %>
<% ELSE %>
<p>You searched for: <b><% search %></b></p>
<% IF books.size %>
<ul>
<% FOREACH book IN books -%>
<li><b><% book.title %></b> (<% book.authors.list.0 %>) <a
href=/add/<% book.isbn %>>Add to list</a></li>
<% END %>
</ul>
<% ELSE %>
<p>Your search returned no results.</p>
<% END %>
<% END %>
Theres a bit of code there for displaying an error if the search
failed and for displaying a no results message, but most of

118

the code is used to display a list of books that are returned


from Amazon. For each book in the list we display the title, the
author and a link to add the
book to our reading list.
If you save these
changes and restart the
application, you should find
that you have a fully
functional website that now
allows you to do anything that our original command-line
program did. You can add new books to the list and tell the
system when you start and finish a book. The only problem is
that anyone else can do all of that too.

Presumably youd like to


display your reading list to
anyone who is interested.

Deploying your application


In the past two articles, weve been
using Dancers built-in test web server
to run our web application. But if you
find the app to be useful, youll
eventually want to deploy it on a real,
public web server.
There are a number of different
options. Dancer is built on top of Perl
technology called PSGI, which is a
protocol that defines the interactions
between a web application and the web
hosting environment where the
application runs. If you have a PSGIcompatible application, then its simple

The search results page. Amazon seems to have a rather


liberal definition of Perl.

enough to deploy it in any PSGI-ready


web hosting environment. And as any
Dancer application is already PSGI
compatible, you can deploy it just
about anywhere. Details of some
common deployment scenarios are in
the Dancer::Deployment manual page,
which comes as part of the standard
Dancer distribution. Just enter perldoc
Dancer::Deployment at the command
line. For more details of PSGI (and
Plack, a reference implementation of
the specification), see the projects
website at http://plackperl.org.

Adding security
Presumably, youd like to display your reading list to anyone
whos interested, but youd prefer it if only you can update it.
For that we need to introduce some security. Were going use
some really basic authentication, but I hope it will be obvious
how to extend it for use in the real world.
Were going to add the concept of a logged in user.
And were going to store whether the current user is logged in
or logged out using a session cookie. Support for sessions
comes as a part of the standard Dancer distribution, but in
order to store your session in a cookie, you will need to install
the extra Dancer::Session::Cookie module from CPAN.
Having installed the module, you need to configure it by
adding the following two lines to your config.yml file:
session: cookie
session_cookie_key: somerandomnonsense
The value of the cookie key can be any random string
the more random the better. Mine isnt a great example.
In order to add session support, we need to add use
Dancer::Session to the list of modules near the top of
BookWeb.pm. Now we need to think about how our security
will work. Im going to define a list of paths that are public.
Anyone can see those pages, but anyone trying to access
pages outside of this list will be prompted to log in if they
havent already.
Dancer has the concept of a before hook which is fired
before any route is run. Thats a perfect place to check
whether the user is allowed to do whatever they are trying
to do:
my %public_path = map { $_ => 1 } (/, /login, /search);
hook before => sub {

(
,

n
;

n
;

Python

Python
P

ython is the Swiss Army knife of


programming languages. Its range of add-on
modules means you can do almost anything
quickly and easily. Here were going to discover
how to create Clutter graphics, code a Gimp plugin
and have fun hacking Minecraft on a Raspberry Pi.
Python: Different types of data ....................................................... 122
Python: Code a system monitor .................................................... 124
Python: Clutter animations .................................................................. 128
Python: Stream video ................................................................................. 132
Python: Code a Gimp plugin ............................................................... 136
Python: Gimp snowflakes ..................................................................... 140
Python: Make a Twitter client............................................................ 144
Minecraft: Start hacking ..........................................................................148
Minecraft: Image wall importing ....................................................150
Minecraft: Make a trebuchet ..............................................................154
Minecraft: Build a cannon ...................................................................... 158

121

Python

Python: Different
types of data
Functions tell programs how to work, but its data that they operate
on. Nick Veitch goes through the basics of data in Python.

n this article, well be covering the basic data types in


Python and the concepts that accompany them. In later
articles, well look at a few more advanced topics that
build on what we do here: data abstraction, fancy structures
such as trees, and more.

What is data?

While were
looking only at
basic data types,
in real programs
getting the
wrong type can
cause problems,
in which case
youll see
a TypeError.

122

In case youve hopped straight to this chapter, lets go back


to basics. In the world, and in the programs we write, theres
an amazing variety of different types of data. In a mortgage
calculator, for example, the value of the mortgage, the interest
rate and the term of the loan are all types of data; in a
shopping list program, there are all the types of food and the
list that stores them each of which has its own kind of data.
The computers world is a lot more limited. It doesnt know
the difference between all these data types, but that doesnt
stop it from working with them. The computer has a few basic
ones it can work with, and that you have to use creatively to
represent all the variety in the world.
Well begin by highlighting three data types: first, we have
numbers. 10, 3 and 2580 are all examples of these. In
particular, these are ints, or integers. Python knows about
other types of numbers, too, including longs (long integers),
floats (such as 10.35 or 0.8413) and complex (complex
numbers). There are also strings, such as Hello World,
Banana and Pizza. These are identified as a sequence of
characters enclosed within quotation marks. You can use
either double or single quotes. Finally, there are lists, such as
[Bananas, Oranges, Fish]. In some ways, these are like a

string, in that they are a sequence. What makes them


different is that the elements that make up a list can be of any
type. In this example, the elements are all strings, but you
could create another list that mixes different types, such as
[Bananas, 10, a]. Lists are identified by the square brackets
that enclose them, and each item or element within them is
separated by a comma.

Working with data


There are lots of things you can do with the different types of
data in Python. For instance, you can add, subtract, divide and
multiply two numbers and Python will return the result:
>>> 23 + 42
65
>>> 22 / 11
2
If you combine different types of numbers, such as an int
and a float, the value returned by Python will be of whatever
type retains the most detail that is to say, if you add an int
and a float, the returned value will be a float.
You can test this by using the type() function. It returns
the type of whatever argument you pass to it.
>>> type(8)
<type int>
>>> type(23.01)
<type float>
>>> type(8 + 23.01)
<type float>
You can also use the same operations on strings and lists,
but they have different effects. The + operator concatenates,
that is combines together, two strings or two lists, while the *
operator repeats the contents of the string or list.
>>> Hello + World
Hello World
>>> [Apples] * 2
[Apples, Apples]
Strings and lists also have their own special set of
operations, including slices. These enable you to select a
particular part of the sequence by its numerical index, which
begins from 0.
>>> word = Hello
>>> word[0]
H
>>> word[3]
l
>>> list = [banana, cake, tiffin]
>>> list[2]
tiffin
Indexes work in reverse, too. If you want to reference the last

Python
element of a list or the last character in a string, you can use
the same notation with a -1 as the index. -2 will reference the
second-to-last character, -3 the third, and so on. Note that
when working backwards, the indexes dont start at 0.

Methods
Lists and strings also have a range of other special
operations, each unique to that particular type. These are
known as methods. Theyre similar to functions such as
type() in that they perform a procedure. What makes them
different is that theyre associated with a particular piece of
data, and hence have a different syntax for execution.
For example, among the list types methods are append
and insert.
>>> list.append(chicken)
>>> list
[banana, cake, tiffin, chicken]
>>> list.insert(1, pasta)
>>> list
[banana, pasta, cake, tiffin, chicken]
As you can see, a method is invoked by placing a period
between the piece of data that youre applying the method to
and the name of the method. Then you pass any arguments
between round brackets, just as you would with a normal
function. It works the same with strings and any other data
object, too:
>>> word = HELLO
>>> word.lower()
hello
There are lots of different methods that can be applied to
lists and strings, and to tuples and dictionaries (which were
about to look at). To see the order of the arguments and the
full range of methods available, youll need to consult the
Python documentation.

Variables
In the previous examples, we used the idea of variables to
make it easier to work with our data. Variables are a way to
name different values different pieces of data. They make it
easy to manage all the bits of data youre working with, and
greatly reduce the complexity of development (when you use
sensible names).
As we saw above, in Python you create a new variable with
an assignment statement. First comes the name of the
variable, then a single equals sign, followed by the piece of
data that you want to assign to that variable.
From that point on, whenever you use the name assigned
to the variable, you are referring to the data that you assigned
to it. In the examples, we saw this in action when we
referenced the second character in a string or the third
element in a list by appending index notation to the variable
name. You can also see this in action if you apply the type()
function to a variable name:
>>> type(word)
<type str>
>>> type(list)
<type list>

Other data types


There are two other common types of data that are used by
Python: tuples and dictionaries.
Tuples are very similar to lists theyre a sequence data
type, and they can contain elements of mixed types. The big
difference is that tuples are immutable that is to say, once
you create a tuple you cannot change it and that tuples are

identified by round brackets, as opposed to square brackets:


(bananas, tiffin, cereal). Dictionaries are similar to a list or
a tuple in that they contain a collection of related items. They
differ in that the elements arent indexed by numbers, but by
keys and are created with curly brackets: {}. Its quite like an
English language dictionary. The key is the word that youre
looking up, and the value is the definition of the word.
With Python dictionaries, however, you can use any
immutable data type as the key (strings are immutable, too),
so long as its unique within that dictionary. If you try to use
an already existing key, its previous association is forgotten
completely and that data lost for ever.
>>> english = {free: as in beer, linux: operating system}
>>> english[free]
as in beer
>>> english[free] = as in liberty
>>> english[free]
as in liberty

The Python
interpreter is a
great place to
experiment with
Python code and
see how different
data types work
together.

Looping sequences
One common operation that you may want to perform on any
of the sequence types is looping over their contents to apply
an operation to every element contained within. Consider this
small Python program:
list = [banana, tiffin, burrito]
for item in list:
print item
First, we created the list as we would normally, then we
used the for in construct to perform the print function
on each item in the list. The second word in that construct
doesnt have to be item, thats just a variable name that
gets assigned temporarily to each element contained within
the sequence specified at the end. We could just as well
have written for letter in word and it would have worked
just as well.
Thats all we have time to cover in this article, but with the
basic data types covered, well be ready to look at how you
can put this knowledge to use when modelling real-world
problems in later articles.
In the meantime, read the Python documentation to
become familiar with some of the other methods that it
provides for the data types weve looked at before. Youll find
lots of useful tools, such as sort and reverse! Q
123

Python

Python: Code a
system monitor
Tidying up some code with Clutter, Nick Veitch takes you far from the
command line into a new realm of technicolour graphical possibilities.

Its a bit dark in here Could be the promising start of a


3D adventure game, perhaps. Or your first Clutter effort!

e have touched on few web-based wonders you


can build with Python elsewhere in this guidebook,
but we going to do a rare thing now and cover
using a GUI to display stuff graphically to the user. One of the
reasons for this being unusual is, for the most part, GUI code
gets very big very quickly, so a whole tutorial would be taken
up by just drawing a panel and a few buttons on the screen.
Were going to take a break from being so user-unfriendly
for a while, as for the next few pages were going to be
building applications using the PyClutter library. If you dont
know much about Clutter, check the boxout (A Note About
Versions) over the page. For the first tutorial were going to
build a small but useful little utility to get to grips with how
Clutter and PyClutter work. As Clutter has a dearth of
documentation and examples, hopefully the code we will
cover here will give you an idea of how we can use it
practically within our Python web apps.
Our task here is to create an app that will show us the
current network speeds for our internet connection. Yes,
there are plenty of monitors out there, but this will be our
own, and delivered in about 70 lines of simple code.
The first thing you need to get to grips with in Clutter is the
basic terminology. Unlike other GUI toolkits, which usually

124

define objects like windows or panel, Clutter refers to the


visual area as a stage. To continue the analogy, objects that
appear on (or actually, in, but it sounds weird to say it) the
stage are called actors. It makes more sense when you start
coding it, and the names dont seem so strange after a while.
The thing about the actors is that they have more properties
than a standard widget because they actually exist in a 3D
environment, rather than a 2D one.

All the worlds a stage


Anyway, enough hyperbabble it will make more sense
when we write some code. Open up your standard Python
environment (mine is a Bash shell, but you can use some of
those fancy ones if you like), and lets create our very first
Clutter script
>>> import clutter
>>> stage = clutter.Stage()
>>> stage.set_size(500,300)
>>> red=clutter.Color(255,0,0,255)
>>> black=clutter.Color(0,0,0,255)
>>> stage.set_color(black)
>>> stage.show_all()
When youre done, click in the Close gadget on the window
that opened. I know it didnt do anything amazing, but it does
have the potential to! Lets take a look at what just happened.
The first line obviously loaded the Clutter module. In turn,

Python

Clutter opens a few more modules itself back-end stuff that


links into display libraries to be able to put things on the
screen. Next up we created a stage object. The stage is like a
viewport an area where your actor objects can play.
Setting the attributes is as simple as calling some
methods for the stage class, in this case a size and a colour.
The parameters for the size method are x and y dimensions,
and the colour is taken from the clutter.Color object (which
takes values for RGB and alpha). As with other GUI toolkits,
we should cause the object to be shown before any of it is
drawn on the screen, which is what the final command does.
But what of our actors, the objects that we want to show
on the screen? Lets add some text objects:
>>> a=clutter.Text()
>>> a.set_font_name(Sans 30)
>>> a.set_colour(red)
>>> a.set_text (Hello World!)
>>> a.set_position(130,100)
>>> stage.add(a)
Now weve added a text object, our first actor. Hopefully it will
be fairly clear what the methods are doing picking a font, a
colour, setting the text string and positioning it on the stage.
The final call in the code example adds the actor to the stage,
and until this point, you wont be able to see it. Now that its
there though, you can continue to play around with it try
setting it to a different position or adding new colours.
As I mentioned earlier, the PyClutter documentation is
scanty, but we can gain some solace in the fact that Python
has good introspection. Try typing in dir (a) at this point to see
the methods and attributes available for this object.
Our next step is to build a running script, but theres
something we havent covered yet: for all the Clutter magic to
work properly, we should turn control of the application over
to the clutter.main() function, but we dont want to do that
without some way to exit the program. In such situations,
Python will catch Ctrl+C interrupts, so we will have no way of
quitting. The answer is to provide some keyboard events.
When the stage window is active, Clutter will receive
signals for keypresses. All we need to do is provide a callback
function that will process that event, and if the correct key
has been pressed, quit out of the main loop. You could also
assign other actions to some keys, like changing the colour of
the stage for example.
>>> def parseKeyPress(self, event):

...
if event.keyval == clutter.keysyms.q:
...
clutter.main_quit()
...
elif event.keyval == clutter.keysyms.r:
...
self.set_color(red)
...
>>> stage.connect(key-press-event, parseKeyPress)
>>> clutter.main()
When run in the interactive Python shell, the quit function will
not quit Python itself, or even destroy the application; it will
just return control to the Python shell. In the case of a running
script though, calling the clutter.main_quit() method will
effectively end the application, or at least the Clutter part of it.

The traditional
first app, though
rather daringly
we have left out
the comma.

Time to monitor something


Right, now we have the interface sorted out, how are we
going to build an amazing bandwidth monitor? We first need
to find out the speed of the network traffic. Whenever I am
confronted with a question about some piece of system
statistics, I always go and ask my old friend, proc. Yes, the
/proc pseudo filesystem is the repository of everything
you ever needed to know about a running Linux box. proc
is a huge sprawling mess of files, but the one we want is
/proc/net/dev. This lists all the network devices, and reading
the file will give you statistics on bytes in and out, packets,

A note about versions


The Clutter library, and consequently the Python module that
uses the Clutter library, has been updated recently to version
1.18.0. Normally updates may cause a few inconsistencies
between old versions and new versions of software, but in this
case there are fundamental differences between the code of
versions before and after 0.9. The PyClutter module and the
Clutter library should be available in your distros repository,
but when you install it, make sure you have a version 0.9
(preferably 1.0) or above, otherwise I can guarantee you that
none of the code in this tutorial will work. If you think thats a
faff, you ought to try writing a tutorial and then discovering
the whole library changes

Messing around in the interactive shell is a quick, safe


way to finding out about Clutter objects and methods.

125

Python

Quick
tip
Keeping track of
versions can be
a nightmare, but
most modules
store their
version number in
<modulename>.__
version__ . Not only
is this useful for
you to check, but
your applications
can check for a
compatible version
before they try and
do anything tricky.

dropped packets, errors and so on. The only thing we are


interested in are the bytes sent and the bytes received. I know
that the number there is a total, and we wanted a speed, but
behold the power of proc just open the file again and the
magic numbers will have changed. Now, I hope I am not going
too fast for you, but simple arithmetic should not be beyond
us. If we poll the file every second and subtract the old
number from the new number, everything should be fine.
All we really have to do is build a little function that will
read in the file, parse it for the information we want, and
compute the deltas. Before we leave we will save the old
number so we can subtract it next time. Heres how the
function should look, more or less:
devfile=open(/proc/net/dev,r)
for line in devfile.readlines():
line=line.strip()
if (line[:4] == eth0):
line=line[5:].split()
print line[0], line[8]
Hopefully, this will make some sense to you without me
needing to draw diagrams. We read in the file and iterate
through the lines, looking for the one that begins eth0: it is
necessary to strip the line before searching because the
output is padded by an amount to make the tables line up.
When we have the correct line, we take of the interface part
and split the string up, so we have each of the numbers as part
of a list. The counts for bytes in and out happen to be at the 0
and 8 positions in this list. Here we have just printed them out
you can type in the code and see what it gives you. All that
needs to be added to that is to convert the strings to integers
and store them so we can keep a track of whats going on.

Maths is your friend


The more detail-oriented of you might question whether we
take into account the length of time it takes this snippet of
code to run. If you want to time this code, go ahead on my
development system it takes 0.0001 seconds to run. In case
youre interested, a complete command line app would look
something like this:
import time
lasttime=1
lastin=0
lastout=0
def getspeed():
x=open(/proc/net/dev,r)
for line in x.readlines():

Why should I care about Clutter?


Clutter is a GPL graphics and GUI
library that was originally developed by
the OpenedHand team. It was later sold
to Intel, which is committed to further
development and deployment.
The great thing about Clutter is that
its a simple, fast and powerful way to
deliver 3D or 2D graphics on a number
of platforms. The back-end is essentially
OpenGL, but by using the Clutter library
developers can take advantage of a fast,

126

efficient and friendly way to develop


graphically rich apps without messing
around with more technical aspects of
the OpenGL libraries.
Clutter also forms an integral part of
Moblin, an attempt to deliver a powerful
graphical version of Linux to run on
mobile devices. Moblin, via Meego, lives
on in a fork called Mer, which is being
developed as the Sailfish OS and a new
smartphone by Jolla (www.jolla.com).

Numbers. Coloured numbers. That change. And monitor


things. This is also a pretty good start.

line=line.strip()
if (line[:4] == eth0):
line=line[5:].split()
bin=int(line[0])
bout=int(line[8])
return (bin, bout)
while True :
z= getspeed()
timedelta=time.time()-lasttime
lasttime=time.time()
sin=(float(z[0]-lastin))/(1024*timedelta)
sout=(float(z[1]-lastout))/(1024*timedelta)
print sin, sout
lastin=z[0]
lastout=z[1]
time.sleep(5)
This incorporates a timing function to more accurately
calculate the speeds, but bear in mind that were only talking
about a couple of milliseconds, so it doesnt make a lot of
difference. It is useful however, if we ever want to alter the
timing period elsewhere in the software.
Now what we have to do is to incorporate this functionality
into our Clutter application. We could just stick the loop at the
end of our program and fail to ever call the main Clutter loop.
We can still update the actor objects whenever we like, but
this would be a Bad Thing. The nicer way to do it is to give
liberty, autonomy and freedom back to the actors, but make
use of an animation timeline to control their text.
Timelines are covered in slightly more detail in the box
over the page, but to give you a brief summary, a timeline is
just a timer that counts to some value and then emits the
programmatic equivalent of a beep a signal. The signal can
be caught and fed to a callback, and as well as itself, you can
supply other parameters to the call. For our purposes, we can
make the timer call a function that will test the network speed
and update our two actors.
The timeline is an object unto itself, but when we execute
the connection between the timeline and the callback
function, we can pass along our text actor objects too, so the
callback function will be able to change them directly. Note
that if youre going for more complicated behaviours, this
doesnt preclude you from having other timers too you
could set one up to change the colour of the objects every

Python

second if you wanted, and it neednt interfere with the


timeline we have already created. Timelines can be used like
threads in a multithreaded app they arent quite as flexible,
but they are easier to manage and they it easier to deal with
animated objects, because you can separate the business of
animating the object from the other interactions it has.
import clutter
import time
lasttime=1
lastbin=0
lastbout=0
black =clutter.Color(0,0,0,255)
red = clutter.Color(255, 0, 0, 255)
green =clutter.Color(0,255,0,255)
blue =clutter.Color(0,0,255,255)
def updatespeed(t, a, b):
global lasttime, lastbin, lastbout
f=open(/proc/net/dev,r)
for line in f.readlines():
line=line.strip()
if (line[:4] == eth0):
line=line[5:].split()
bin=int(line[0])
bout=int(line[8])
timedelta=time.time()-lasttime
lasttime=time.time()
speedin=round((bin-lastbin)/(1024*timedelta), 2)
speedout=round((bout-lastbout)/(1024*timedelta), 2)
lastbin, lastbout = bin, bout
a.set_text(str(speedin)+KB/s)
xx, yy=a.get_size()
a.set_position(int((300-xx)/2),int((100-yy)/2) )
b.set_text(str(speedout)+KB/s)
xx, yy=b.get_size()
b.set_position(int((300-xx)/2),int((100-yy)/2)+100 )
def parseKeyPress(self, event):
# Parses the keyboard
#As this is called by the stage object
if event.keyval == clutter.keysyms.q:
#if the user pressed q quit the test
clutter.main_quit()
elif event.keyval == clutter.keysyms.r:
#if the user pressed r make the object red
self.set_color(red)
elif event.keyval == clutter.keysyms.g:
#if the user pressed g make the object green
self.set_color(green)
elif event.keyval == clutter.keysyms.b:
#if the user pressed b make the object blue
self.set_color(blue)
elif event.keyval == clutter.keysyms.Up:
#up-arrow = make the object black
self.set_color(black)
print event processed, event.keyval
stage = clutter.Stage()
stage.set_size(300,200)
stage.set_color(blue)
stage.connect(key-press-event, parseKeyPress)
intext=clutter.Text()
intext.set_font_name(Sans 30)
intext.set_color(green)
stage.add(intext)

Its all about timing


The Clutter library uses objects called
timelines to do practically everything
that needs to be done while an
application is running. The timeline is
the heartbeat of your script, and makes
sure that everything at least makes a
good attempt at running together.
Timelines are used extensively for
controlling animations and effects
within Clutter, but you can also use
them as your own interrupts to call
routines every so often. It does this by
emitting signals for events such as
started, next-frame, completed and so
on. Each of these signals can be bound
to a callback function to control
something else.
Here is a short example you can type
into a Python shell:
>>> import clutter
>>> t=clutter.Timeline()
>>> t.set_duration(2000)
>>> t.set_loop(True)

>>> def ping(caller):


...
print caller
...
>>> t.connect(completed,ping)
9L
>>> t.start()
>>> <clutter.Timeline object at
0xb779639c (ClutterTimeline at
0x95b9860)>
Hopefully the methods of the timeline
object should be easy to follow. The
duration is set as a number of
milliseconds. The timeline is then set to
loop. Here we have created a simple
function called ping, which just prints
out the parameter it was called with.
next, we connect the completed
emitted signal to the ping function and
start the timeline running. Without any
further interaction, the ping function
will now be called every two seconds, as
the timeline completes, until you kill the
Python shell.

outtext=clutter.Text()
outtext.set_font_name(Sans 30)
outtext.set_color(red)
stage.add(outtext)
stage.show_all()
t=clutter.Timeline()
t.set_duration(5000)
t.set_loop(True)
t.connect(completed, updatespeed, intext, outtext)
t.start()
clutter.main()
Here weve brought together all the elements we have
explored in this tutorial. We have created a stage, populated it
with actors, and then used the timeline objects in Clutter to
make them update themselves at our whim. But so far we
have only scratched the surface of Clutters graphical
capabilities. We havent even learned about behaviours or
animations yet, never mind the alpha channel effects. Please
trust us that we will be including these in our next project. Q

The main
clutter website
at www.clutterproject.org
doesnt have
much help for
Python users, but
there is lots of
background info
and plenty of C
documentation.
127

Python

Python: Clutter
animations
The code master Nick Veitch gets his head in a spin with the help
of a news feed and some clever clutter animations.

Well, no news is good news, but at least we know what


internationally respected agency its coming from.

n the last tutorial we had a look at the basics of Clutter as


we used it to build a network speed monitor. This time
well be looking at some of the very powerful animation
techniques used in Clutter, how to group objects, and a little
more about text actors. We will be doing this in the guise of
implementing a feed reader. There isnt enough space for us
to implement a complete multi-stream reader and explore
the animations, but we will be covering enough ground to get
you started on building such a beast, including fetching the
data from the feed and applying it to the Clutter objects.
For those of you who havent been tempted by one of these
magnificent Python tutorials before, we usually try to do as
much as possible in the interactive mode of Python first. It is a
kinder, gentler environment than the normal mode in which
programs are run, as you can type things in and experiment.
The code listings in these cases include the Python prompt
>>> at the beginning of the line when you have something to
type in, and without it when the environment is giving you
some feedback, just as it appears on screen.

again this time in order to retrieve the amazingly interesting


data for our app.
Before we do that though, we need to have a URL for a
feed. You can choose any you like. The best way to find out a
particular feed address is to go to the relevant page in a web
browser and look for the RSS syndication icon. More often
than not, this icon is linked to the feed address, so you can
just copy the link location with your browser or read it off the
status bar. You could use the TuxRadar feed, for example, at
www.tuxradar.com/rss. For our example we are going to use
the BBC news feed, for two reasons. The first is that it
supplies an image reference, which will be nice for our
experiments with Clutter textures, and the second is that it
gets updated with news stories pretty much constantly, which
makes it good for testing.
The BBC news feed is at http://newsrss.bbc.co.uk/rss/
newsonline_uk_edition/world/rss.xml.

Getting fed
The very first thing we should look at is how to get the data
from our feed. We have come across the most excellent
Feedparser library for Python before, and we will be using it

128

The Clutter website is worth checking every so often,


because new documentation and new and life-changing
versions of Clutter have a habit of appearing there.

Python

>>> import feedparser


>>>f= feedparser.parse(http://newsrss.bbc.co.uk/rss/
newsonline_uk_edition/world/rss.xml)
>>>f
{feed: {lastbuilddate: uWed, 30 Dec 2010 19:11:25 GMT,
subtitle: uGet the latest BBC World News: international
news, features and analysis from Africa, Americas, South
Asia, Asia-Pacific, Europe and the Middle East., language ...
Actually, causing Python to display the variable we assigned
to the feed just spews out the whole content of the container
(we have truncated it in the output shown). To get to the bits
you are interested in, you need to use the keys. All of the
actual feed items are stored in a big list referenced by entries,
and contain the information for the item summary,
timestamp, link URL and so on.
>>> f.entries[0].title
uNokia expands claim against Apple
>>> f.entries[0].link
uhttp://news.bbc.co.uk/1/hi/technology/8434132.stm
>>> f.entries[0].updated_parsed
time.struct_time(tm_year=2009, tm_mon=12, tm_mday=29,
tm_hour=18, tm_min=0, tm_sec=46, tm_wday=2, tm_
yday=364, tm_isdst=0)
>>> f.entries[0].updated
uWed, 30 Dec 2009 18:00:46 +0000
>>> f.entries[0].summary
uNokia ramps up its legal fight against Apple, claiming that
almost all of its products infringe its patents.
As you can see, we can get pretty much everything we need
to build a parser out of these elements. There is one optional
thing that we might want though an image. The feed specs
allow for an image to be supplied as a channel ident for the
feed, which is passed along as a URL to the image and some
text describing it. This should be in the body of the feed, in an
element called image, so we can reference it with (in this
case), bbc.feed.image.href.
There are many ways we could use this image, but one of
the simplest is to just download it locally then we can do
what we like with it. A great way to do this in Python is to use
the splendid urllib module for Python. Among its ever-sharp

RSS and other feeds


There are plenty of different elements stuffed into an average
RSS feed. Apart from the feed image, were only using ones
that are guaranteed to be present in any feed that you might
come across. If you want to know more about what things
might be there, you should take a look at the different
standards documents. Yes, to make things more confusing
there were different versions of RSS developed at different
times by largely different groups with very different ideas
about how things should go. Using the Feedparser module
smooths out a lot of the nonsense.
The Harvard website has a really useful tutorial on
constructing an RSS feed, which, cunningly for our purposes,
works as an equally good tutorial for ripping one apart too.
http://cyber.law.harvard.edu/rss/rss.html.

tools is the urlretrieve method, which will download the


content of a URL and save it to the systems temporary
storage (usually /tmp), before handing back a filename and a
bunch of HTTP info. The saved file should get cleaned up
when the temporary storage is hosed down, or we can delete
it when we quit if we want to be nice. Here it is:
>>> import urllib
>>> img, data =urllib.urlretrieve(f.feed.image.href)
>>> img
/tmp/tmpTsCyDc.gif
Well do something useful with this file in just a moment.

There are many


places you can
use as a source
of RSS feeds;
just look for the
little orange icon
with the three
white arcs in it.

Cluttering up
With that little excursion into the realm of feeds now out
of the way, we can get down to the serious business of
getting all cluttery. Now, in the last tutorial we introduced the
basic Clutter elements of the stage, actors and timelines. If
you havent read it yet, its a good idea to go back now and
take a look because well be building on the things we went
through there.
Anyhow, for now, here is the Previously on Lost short
recap. A Clutter window is called a stage. This is where the
action happens, and by default in PyClutter is implemented in
a standard GTK window. The elements that appear on (in) the
stage are called actors, and can be anything from text to
simple shapes or pixmap textures.
The last one is what interests us this time. We will be using
text actors again, but Clutter allows us to import images
directly from files to use. Those of you with memories longer
than a Channel 4 ad break will no doubt remember that we
have such a file lurking around. Just to prove that it works, we
should set up a stage and all the usual Clutter, erm, clutter,
and then import our new pixmap.
>>> import clutter
>>> black=clutter.color(0,0,0,255)
>>> white=clutter.color(255,255,255,255)
>>> stage= clutter.Stage()

129

Python

>>> stage.set_size(400,60)
>>> stage.set_color(white)
>>> stage.show_all()
>>> ident =clutter.texture_new_from_file(img)
>>> stage.add(ident)
This code just sets up a simple stage, sets the background
colour to white and then adds the pixmap image. You can see
from the code that unlike many graphic toolkits, we can add
the actor to the stage after the stage is already on display. You
should see something like, if not exactly the same as, the
image on the first page of this tutorial.
Foresight is a wonderful thing, which is why the pixmap fits
so nicely on the stage. As we didnt set a position for it, it just
comes in at (0,0), which is the top-left of the stage.

Getting animated

Quick
tip
Want to find a
complete list
of the Clutter
built-in animation
codes? Try the C
documentation,
which is more up
to date: http://
clutter-project.
org/docs/
clutter/stable/
clutter-ImplicitAnimations.
html#Clutter
AnimationMode.

Before we finish our masterful feed reader, now would be a


useful interlude during which to mess around with some
animation. Last time, we animated some text objects through
the use of the timeline a powerful part of the Clutter magic
that gives us easy interrupts that we can use to animate all
sorts of items. However, since the 1.0 release of Clutter there
is an even more powerful way to animate objects. The Clutter
module now provides new methods for animating actors, and
the most powerful of these is one that we are going to
experiment with next.
A Clutter actor has an animate method. The arguments
it takes are an animation mode, followed by a duration
(in milliseconds) and then the properties and values to be
animated. Lets break that down a bit. The animation mode
can be defined, but Clutter has already built in several types
that can be referenced through the Clutter module. These act
as the tweening mechanisms for all the values you list
Here is a selection:
CLUTTER_LINEAR
CLUTTER_EASE_IN_QUAD
CLUTTER_EASE_OUT_QUAD
CLUTTER_EASE_IN_OUT_QUAD
CLUTTER_EASE_IN_CUBIC
CLUTTER_EASE_OUT_EXPO
CLUTTER_EASE_IN_OUT_EXPO
CLUTTER_EASE_IN_OUT_CIRC
and there are many more. These models compute the
in-between phases of properties at any frame in the
animation. The linear mode is a linear progression, while the
others are variations producing different effects.
So, say we had an object at position 0,0 and we animated
it over 2,000 milliseconds to a position 100,0 animating only
the x position. With a linear animation mode it would be at
position 50,0 after one second. The animation takes the

Doc Holiday
Clutter is really great. The only thing
that sucks at the moment is the
documentation for the Python module.
Although the classes and methods are
largely identical to the C
implementation of Clutter (naturally),
there are a few subtle differences, and
sometimes the semantics of doing

130

something in Python can get you in a


muddle. Pythons introspection tools
are useful for this, particularly the dir()
function, which you can call on any
object, even a module. Try it on Clutter
to see a list of the static types and
methods available: dir(clutter), or on a
method: dir(clutter.Text).

current values as the start point, and the supplied values


as the end point. Any property that is not supplied is not
animated. The supplied properties have to come in a pair with
the end value following, and any property can be animated.
Its easier with a few examples (I hope you still have your
stage open):
>>> ident.set_anchor_point(60,30)
>>> ident.set_position(60,30)
>>> ident.animate(clutter.LINEAR,2000,rotation-angle-y,720)
<clutter.Animation object at 0xa3f861c (ClutterAnimation at
0xa4bb050)>
>>> ident.animate(clutter.LINEAR,2000,rotation-angle-y,720)
<clutter.Animation object at 0xa3f85f4 (ClutterAnimation at
0xa4bb0c8)>
The second time we ran the rotation, nothing happened. That
is because the property value was the same the value you
supply isnt the amount to animate by, it is the position the
object should end up. The second time we ran the animation,
the object was already in that position, so it was animated
it just didnt go anywhere. You can of course, animate more
than one property at a time. Try this:
>>> ident.animate(clutter.LINEAR,2000,x,100, rotationangle-y, 360 )
<clutter.Animation object at 0xa3f85f4 (ClutterAnimation at
0xa4bb2c8)>
>>> ident.animate(clutter.EASE_IN_SINE,2000,x,0, rotationangle-y, 0 )
<clutter.Animation object at 0xa3f85f4 (ClutterAnimation at
0xa4bb3ae)>
This time the little logo did a dance and then returned to its
original position.

Just the facts


Well, spinning logos are all well and good, but what we really
want to see is the text for our headline and the summary of
the story sitting there in the space next to it. This will leave us
with three actors on the screen.
While Clutter can handle animating different actors
concurrently, it becomes a bit of a pain for us to have to
handle separate animations for the whole group of actors.
The key word there, if you hadnt picked up on it, is group.
Clutter now supports containers, including a group container
(the others are rather like GTK stacking containers, and we
will no doubt use them some other time). First of all well
create a new group, then define the elements to go in it and
add them all. To save messiness we will import the ident
image again as a new object.
>>> group1= clutter.Group()
>>> ident1=clutter.texture_new_from_file(img)
>>> head1=clutter.Text()
>>> head1.set_position(130, 5)
>>> head1.set_color(blue)
>>> head1.set_text(f.entries[0].title)
>>> body1=clutter.Text()
>>> body1.set_max_length(75)
>>> body1.set_position(130, 22)
>>> body1.set_size(250, 100)
>>> body1.set_line_wrap(True)
>>> body1.set_text(f.entries[0].summary)
>>> group1.add(ident1, head1, body1)
>>> group1.show_all()
>>> stage.add(group1)

Python

In this code we have created a new ident image and two text
objects one to represent the title of the news item (head1),
and one for the summary (body1). We have set the text for
them from the first entry we found in the feed and set them
up in a good position next to the ident image. Here we have
seemed to set absolute positions for the head1 and the
body1 text objects, but these positions will actually remain
relative to the group. When we initiate the group, it becomes
the parent object to the actors, rather than the stage as
before. So the positions of the objects are relative to the
group position (which starts off by default at 0,0).
You should try to remember this. For example:
>>> head1.get_position()
(130,5)
>>> group1.set_position(10,10)
>>> head1.get_position()
(130,5)
>>> group1.set_position(0,0)
You saw all the objects move along the screen when we
changed the group position, but dont fall into the trap of
thinking that the child objects think they have moved. They
havent they are still in the same place, but their world has
moved The great thing about this is, of course, that we can
also animate a group we need only supply one
transformation, because all the properties of the individual
actors are relative to the group.
>>> group.animate(clutter.LINEAR,2000,x,200,y,30)
<clutter.Animation object at 0xa3f85f4 (ClutterAnimation at
0xa4bb2c8)>
>>> group.animate(clutter.LINEAR,2000,x,0,y,0)
<clutter.Animation object at 0xa3f85f4 (ClutterAnimation at
0xa4bb4a8)>
This time all the elements moved smoothly as though they
were one, which (in terms of the group) they are. We can still
adjust them individually change the text or move them, but
any transformations are relative to the group, not the stage.
So, now we have a group, we can add another, then
implement two animation modes to make the transition
between the different items from the news feed:
>>> group2= clutter.Group()
>>> ident2=clutter.texture_new_from_file(img)
>>> head2=clutter.Text()
>>> head2.set_position(130, 5)
>>> head2.set_color(blue)
>>> head2.set_text(f.entries[1].title)
>>> body2=clutter.Text()
>>> body2.set_max_length(75)
>>> body2.set_position(130, 22)
>>> body2.set_size(250, 100)
>>> body2.set_line_wrap(True)
>>> body2.set_text(f.entries[1].summary)
>>> group2.add(ident2, head2, body2)
>>> group2.hide()
>>> stage.add(group2)
>>> group1.animate(clutter.EASE_OUT_EXPO,4000,x,800,
y,0,rotation-angle-y,180)
<clutter.Animation object at 0x92ca61c (ClutterAnimation at
0x935a2a0)>
>>> group2.animate(clutter.EASE_OUT_
EXPO,1,x,400,y,100,rotation-angle-y,720)
<clutter.Animation object at 0x92ca66c (ClutterAnimation at
0x935a118)>

>>> group2.show()
>>> group2.animate(clutter.EASE_OUT_
EXPO,3000,x,0,y,0,rotation-angle-y,0)
<clutter.Animation object at 0x92ca16c (ClutterAnimation at
0x8f46028)>
Here we have set up a new group and hidden it before
adding to the stage. Then, for conveniences sake, we
animate it out of the way and reveal it. As it is out of the
stage area, it cant be seen and doesnt interfere with the
first group. When we reveal it, it is still offstage, but the new
animation then puts it back in the proper position, and it
seems to glide through the ether to come to a perfect stop.
Hurrah! We are now expert animators to rival the likes of
Disney. Maybe.

That reveal in
all its animated
glory. Well,
obviously you
cant see it here,
but you can
if you run the
listing or type
it in.

Moving on
Its looking good so far, but our feed reader could still do
with some extensions. There is very little in the way of error
checking for instance (what if the feed is empty or the web
connection is lost?) and it only handles a single feed rather
than merging together several feeds.
Next time well be looking at more advanced methods
of animation, and stringing animations together. We
will also extend the basic building blocks of clutter by
showing how to incorporate elements of the Cairo
graphics library. Q
131

Python

Python:
Stream video

Combine the video power of GStreamer with the graphical cunning


of Clutter and you get what Nick Veitch categorises as neat stuff.

>>> import cluttergst


>>> stage=clutter.Stage()
>>> stage.set_title(Clutter_Streamer)
>>> stage.set_size(320,290)
>>> stage.set_color(clutter.Color(255,255,255,255) )
>>> stage.show()
>>>
If youve been following along with us for previous
episodes of this chapter, this should all be familiar. For
newbies, having imported the all-important modules (note
that we need gst as well as cluttergst), we create a Stage
object (a window, in Clutter-speak), give it a title, a size and
a background colour, and then display it. You should see a
white window.

All the worlds a stage

he primary purpose of Clutter is to make it easy to


create neat graphical interfaces. It adopts a sort of
fire-and-forget attitude to a lot of things, particularly
animation once youve set up a sequence, you can just start
it and leave it to do its thing.
However, Clutter cant do this on its own. The main
module has neat animation stuff, but its sorely lacking in
other areas. It only has a few primitive actor objects for a
start, which is why theres an extension that makes use of
Cairo, the 2D graphics library.
Another library that Clutter leans on is GStreamer, which
can do wonders with streaming media data. Indeed, when it
comes to multimedia, GStreamer must be one of the mostused libraries, so its no surprise that theres a Python module
for it. We will need to understand a small amount of the
GStreamer framework to use it in our application, but of
course we wont have space to cover all of it. Fortunately, it
isnt complicated. We just need something straightforward
we have a URI for a video file or source of some description,
and we want to be able to play it.
First of all, we have to set up the usual Clutter stage:
>>> import gst
>>> import clutter

132

Now things start to get interesting. Objects displayed on the


Clutter stage are called actors. There are several types
of actor. Some basic ones, such as the rectangle and text
objects, live inside the main Clutter module, but there are
other, special actors in some of the extra support libraries.
The VideoTexture object lives in the cluttergst module,
and is similar to the rectangle actor. It has many of the same

You now have streaming video in a window, and you


havent even written any sort of application yet.

Python

properties that we have investigated before, but it is special


because it links to a video player that it uses to update its own
texture. This is one of the powerful, useful things about
Clutter: once you have set up your actor, and maybe even
given it an animation to follow, it just carries on and does what
it was told without you having to nip back and check up on it.
The video player is known as a playbin, and is a sort of
encapsulated player. You give it a resource you want to play,
and it connects everything up. Lets do that now:
>>> vid=cluttergst.VideoTexture()
>>> playbin=vid.get_playbin()
>>> playbin.set_property(uri,mmsh://live.camstreams.com/
cscamglobal16MSWMExt=.asf)
>>>
First we set up our video texture actor, then we extract the
playbin object that was created. Before we can play anything,
we have to set the source. The best way to do this is by calling
the set_property method for the playbin object, including the
property we want to set (the URI) and what we want to set
it to. In this example, weve used a web-based video source.
Its actually a traffic cam in Picadilly Circus, London, which
streams live video (as opposed to ones that upload a new
image every five seconds). To begin with, you may wish to
eschew the dizzying delights of two dozen Transit vans trying
to negotiate the lights at the same time and use a local file. It
means theres one less thing to go wrong.
The URI can point to a file or anything else that qualifies as
a valid resource. GStreamer will figure out what it is and what
to do with it, assuming it has the correct plugins. If it doesnt,
youll get an error message about not being able to play the
stream. Stick to something simple that you know will play,
such as a video file youve played successfully in Totem.
For files (or anything else), you need to prefix the URI with
the appropriate protocol. So for example, a valid file URI
might be file:///home/evilnick/Videos/killmike.ogg.
Next we need to create another GStreamer object that we
can use to control the playbin object we just made:
>>> pipe=gst.Pipeline(pipe)
>>> pipe.add(playbin)
>>> stage.add(vid)
Having created the pipe object, we can connect it to the
playbin using the pipes own add() method. The pipe is just
being used as a container for our player. Finally, we can add
the video texture object to the stage thats the bit thats

My stream wont play


If your GStreamer pipe quits with an error, or more
unusually, just doesnt do anything, then you may have a
codec problem. Your distro probably doesnt have all these
codecs installed, but you can get them for your distro by
installing a package called gstreamer_plugins_good, or
something similar. The other ones are the bad and the
ugly (do you see the joke there?). Depending on your
location, you may be able to legally download and use these
extra plugins to see more types of stream.

updated when the stream is playing.


You now have a white window with a video texture
displayed inside it, although you wont be able to see it
because you havent turned it on yet.
>>> pipe.set_state(gst.STATE_PLAYING)
<enum GST_STATE_CHANGE_ASYNC of type
GstStateChangeReturn>
The pipe container is used to control the stream. Using
the predefined gst constant, gst.STATE_PLAYING, we turn
the player on. If youre playing from a file, you should see the
image instantly. If you see only part of it, you need to make
the stage bigger. Were still in interactive mode, so we can
adjust that using the stage.set_size method.
stage.set_size(640,480)
We can also change the state of the player to pause it:
>>> pipe.set_state(gst.STATE_PAUSED)
<enum GST_STATE_CHANGE_SUCCESS of type
GstStateChangeReturn>
>>> pipe.set_state(gst.STATE_PLAYING)
<enum GST_STATE_CHANGE_ASYNC of type
GstStateChangeReturn>
Note that the video stream does, in fact, pause. You might
have imagined that is what would happen with a file, but it
also happens on a live webcam feed.

Awww. There
are some quite
interesting cams
if you search for
them. Or just use
your own files.

Getting freaky
All weve done is create a Clutter version of a stream viewer
like Kaffeine, but with fewer options. However, our video
texture is a Clutter actor, so we can make it do things.
>>> vid.set_size(100,90)
>>> vid.set_size(100,290)
>>> vid.set_size(320,50)
Even when we change the size, or move the actor around
>>> vid.move_by(10,10)
>>> vid.move_by(10,10)
You can do pretty much all the standard actor methods on
the video. Previously we took a detailed look at the animation

133

Python
methods, and these will work on a video texture too. As a brief
recap, the method takes a value for the animation effect (a
Clutter constant that enumerates various styles), the duration
in milliseconds and a list of property /value pairs. The actor is
then transformed to these absolute values. For example:
>>> vid.animate(clutter.LINEAR, 1000, width,640, height,
480, x,0,y,0)
<clutter.Animation object at 0x96f3884 (ClutterAnimation at
0xb4799a00)>
>>> vid.animate(clutter.LINEAR, 1000, width,320, height,
240, x,160,y,120)
<clutter.Animation object at 0x96f4284 (ClutterAnimation at
0xb4799c00)>
If you want more of an idea what can be done with the
animation transforms, check out the previous tutorial. We can
now have some picture-in-picture fun with our little camera
experiment, but theres one more new element we need to

Quick
tip
Confusingly, the
VideoTexture actor
has a get_uri()
method that
returns nothing.
Thats because it is
just a texture the
playbin object is
the one that has
the URI data in
it, and if you ever
forget what feed
its linked to, you
can use playbin.
get_property(uri).

understand. First, we need to constuct another video object,


just like the first, but with a different stream
>>> vid2=cluttergst.VideoTexture()
>>> playbin2=vid2.get_playbin()
>>> playbin2.set_property(uri,mmsh://live.camstreams.
com/cscamglobal5MSWMExt=.asf)
>>> pipe2=gst.Pipeline(pipe2)
>>> pipe2.add(playbin2)
>>> stage.add(vid2)
>>> pipe2.set_state(gst.STATE_PLAYING)
>>> vid.set_position(0,0)
>>> vid.set_size(100,80)
>>> vid.set_depth(2)
Youll have come across most of this before, but the last
line is new. The concept of depth is part of the 2.5 dimensions
of Clutter. Objects are closer or further away from the stage.
In a strange twist of logic, at least to my mind, depth is
positive out from the stage, and negative into it. So, if you
want to layer objects, as we have done here, you will want to
have the top object depth > lower object.
This changes the way the actor is rendered, according to
the perspective settings, but for small values of depth, you
probably wont even notice. Do play around, though.
We now have everything we need to make a multichannel
streaming video browser. For this next bit of code, well build
an actual Python application, so fire up your friendliest text
editor and type in the following:
import clutter
import gst
import cluttergst
class videobrowser:
def __init__ (self):
self.channel1=mmsh://live.camstreams.com/
cscamglobal16?MSWMExt=.asf
self.channel2=mmsh://live.camstreams.com/
cscamglobal5?MSWMExt=.asf
# initialize stage
self.stage = clutter.Stage()
self.stage.set_color(clutter.Color(255, 255, 255, 255))
self.stage.set_size(640, 480)
self.stage.set_title(LXF Traffic Watch - press t to toggle
view)
self.stage.connect(key-press-event, self.parseKeyPress)
self.stage.connect(destroy, clutter.main_quit)
#set up 2 video textures
self.video1 = cluttergst.VideoTexture()
self.playbin1 = self.video1.get_playbin()
self.playbin1.set_property(uri, self.channel1)
self.pipeline1 = gst.Pipeline(pipe1)
self.pipeline1.add(self.playbin1)
self.video1.set_position(0,0)
self.video1.set_size(640, 480)
self.video1.set_depth(-2)

One minute you


are gazing out
over Piccadilly
Circus, the
next youre in
Trafalgar Square
or Tokyo or
anywhere you
can find a feed.

134

A word in your shell-like


The GStreamer objects will automatically play sound on any
video resources, and if you have two sources playing at the
same time, youll just get noise. You can use gst functions to
change the sound output (and pipe it to different places if you
like), but for just adjusting the volume, the VideoTexture actor
from Clutter has a suitable method set_audio_volume().

Python
self.stage.add(self.video1)
self.pipeline1.set_state(gst.STATE_PLAYING)
# second one
self.video2 = cluttergst.VideoTexture()
self.playbin2 = self.video2.get_playbin()
self.playbin2.set_property(uri, self.channel2)
self.pipeline2 = gst.Pipeline(pipe2)
self.pipeline2.add(self.playbin2)
self.video2.set_position(0,0)
self.video2.animate(clutter.LINEAR,
1000, rotation_angle_y, 0, rotation_angle_z, 0, width, 80,
height, 60 )
self.stage.add(self.video2)
self.video2.set_depth(0)
self.pipeline2.set_state(gst.STATE_PLAYING)
#display the stage and run
self.stage.show_all()
clutter.main()
def parseKeyPress(self, stage, event):
print parsekey got , self, event
#do stuff when the user presses a key
if event.keyval == clutter.keysyms.q:
#if the user pressed q quit the app
clutter.main_quit()
elif event.keyval == clutter.keysyms.t:
#if the user pressed t, toggle video
#which is in front?
if self.video1.get_depth() == -2 :
#video2 is on top
self.video2.animate(clutter.LINEAR,
300, rotation_angle_y, 360, rotation_angle_z, 360, width,
640, height, 480 )
self.video1.animate(clutter.LINEAR,
1000, rotation_angle_y, 0, rotation_angle_z, 0, width, 80,
height, 60 )
self.video2.set_depth(-2)
self.video1.set_depth(0)
else :
# video 1 is on top
self.video1.animate(clutter.LINEAR
300, rotation_angle_y, 360, rotation_angle_z, 360, width,
640, height, 480 )
self.video2.animate(clutter.LINEAR,
1000, rotation_angle_y, 0, rotation_angle_z, 0, width, 80,
height, 60 )
self.video1.set_depth(-2)
self.video2.set_depth(0)
if __name__ == __main__:
videobrowser()
The init code sets up two video streams, placing one as an
actor above the larger one, and the other in the corner. After
a few seconds the feeds should spring to life and youll see
double-decker buses and whatnot. The cunning part of the
code is the callback signal that traps keypresses. While the
window on screen has focus, any keypress will cause an
event. The stage.connect() method connects this signal to
the keypress method we defined in the main application
class. If the toggle key has been pressed, the two video feeds
are animated into the opposite position. We have to
remember to set the depths again, so the new, small image
goes on top (the depth setting is how we determine which

actor is on top when the button is pressed).


Feel free to experiment, add screens and play with
animation calls. Clutter does the animation for you, so theres
no need to worry about doing anything once its stopped. The
Q key is also trapped to give a clean exit from the app.

The GStreamer
site can help you
if you want to do
clever things with
your pipes and
feeds.

Going further
Its possible to add further streams to this and cascade them
down one side. Instead of toggling between two positions, the
streams could all rotate around one spot. You might also want
to play more with the animations: the values given in the
animate method are absolute, rather than relative, which is
why the angular rotations are multiples of 360 degrees so
the video texture ends up the right way round.
In the last two parts of this series, we looked at the
clutter.Text() actors. It would be pretty simple to add a text
actor here at the front to let you know which stream youre
looking at, and you can change the value of the text when the
screens are being switched without worrying about animating
that too. For streams with audio, I guess you would want to
mute the stream that is minimised. Check the boxout (A Word
in your Shell-like, p134) for details about doing that.
For more tricks with GStreamer objects, you should also
check out the GStreamer documentation, which is a lot more
effusive than the PyClutter docs at the moment. Q

Flash! Ahh-ahh!
There was a time when any sort of live
video feed was just that a nice
streaming socket you could clamp on to
and suck the goodness out of. These
days it seems that everyone would
rather you use their silly, customised
and Flash-based embedded players
instead. Im not sure whether they do
this to get better metrics on users or to

protect content, but apart from


making it difficult to have many open at
a time in Firefox, it makes it hard to find
suitable live streams. Traffic cams are a
good bet, or if you wanted to be
naughty, well, some of the embedded
Flash players kindly show the URL of
the raw feed in the page source. But I
didnt tell you that. Shh!

135

Python

Python: Code
a Gimp plugin
Jonni Bidwell uses Python to add some extra features to the favourite open
source image-manipulation app, without even a word about Gimp masks.

ultitude of innuendoes aside, Gimp enables you to


extend its functionality by writing your own plugins.
If you wanted to be hardcore, then you would write
the plugins in C using the libgimp libraries, but that can be
pretty off-putting or rage-inducing. Mercifully, there exist
softcore APIs to libgimp so you can instead code the plugin
of your dreams in Gimps own Script-Fu language (based on
Scheme), Tcl, Perl or Python. This tutorial will deal with the
last in this list, which is probably most accessible of all these
languages, so even if you have no prior coding experience you
should still get something out of it.

Get started
On Linux, most packages will ensure that all the required
Python gubbins get installed alongside Gimp; your Windows
and Mac friends will have these included as standard since
version 2.8. You can check everything is ready by starting up
Gimp and checking for the Python-Fu entry in the Filters
menu. If its not there, youll need to check your installation. If
it is there, then go ahead and click on it. If all goes to plan this
should open up an expectant-looking console window, with a
prompt (>>>) hungry for your input. Everything that Gimp
can do is registered in something called the Procedure

Database (PDB). The Browse button in the console window


will let you see all of these procedures, what they operate on
and what they spit out. We can access every single one of
them in Python through the pdb object.
As a gentle introduction, lets see how to make a simple
blank image with the currently selected background colour.
This involves setting up an image, adding a layer to it, and
then displaying the image.
image = pdb.gimp_image_new(320,200,RGB)
layer = pdb.gimp_layer_new(image,320,200,RGB,'Layer0',100
,RGB_IMAGE)
pdb.gimp_image_insert_layer(image,layer,None,0)
pdb.gimp_display_new(image)
So we have used four procedures: gimp_image_new(), which
requires parameters specifying width, height and image type
(RGB, GRAY or INDEXED); gimp_layer_new(), which works
on a previously defined image and requires width, height and
type data, as well as a name, opacity and combine mode;
gimp_image_insert_layer() to actually add the layer to the
image, and gimp_display_new(), which will display an image.
You need to add layers to your image before you can do
anything of note, since an image without layers is a pretty
ineffable object. You can look up more information about
these procedures in the Procedure Browser try typing
gimp-layer-new into the search box, and you will see all the
different combine modes available. Note that in Python, the
hyphens in procedure names are replaced by underscores,
since hyphens are reserved for subtraction. The search box
will still understand you if you use underscores there, though.

Draw the line

You can customise lines and splodges to your hearts content, though frankly
doing this is unlikely to produce anything particularly useful.

136

All well and good, but how do we actually draw something?


Lets start with a simple line. First select a brush and a
foreground colour that will look nice on your background.
Then throw the following at the console:
pdb.gimp_pencil(layer,4,[80,100,240,100])
Great, a nicely centred line, just like you could draw with
the pencil tool. The first parameter, gimp_pencil(), takes just
the layer you want to draw on. The syntax specifying the
points is a little strange: first we specify the number of
coordinates, which is twice the number of points because
each point has an x and a y component; then we provide
a list of the form [x1, y1, , xn, yn]. Hence our example
draws a line from (80,100) to (240,100). The procedures
for selecting and adjusting colours, brushes and so forth
are in the PDB too:
pdb.gimp_context_set_brush('Cell 01')

Python

pdb.gimp_context_set_foreground('#00a000')
pdb.gimp_context_set_brush_size(128)
pdb.gimp_paintbrush_default(layer,2,[160,100])
If you have the brush called Cell 01 available, then the
above code will draw a green splodge in the middle of your
canvas. If you dont, then youll get an error message. You can
get a list of all the brushes available to you by calling pdb.
gimp_brushes_get_list(). The paintbrush tool is more
suited to these fancy brushes than the hard-edged pencil,
and if you look in the procedure browser at the function
gimp_paintbrush, you will see that you can configure
gradients and fades too. For simplicity, we have just used the
defaults/current settings here.
For the rest of this tutorial we will describe a slightly more
advanced plugin for creating bokeh effects in your own
pictures. Bokeh derives from a Japanese word meaning
blur or haze, and in photography refers to the out-of-focus
effects caused by light sources outside of the depth of field. It
often results in uniformly coloured, blurred, disc-shaped
artefacts in the highlights of the image, which are reminiscent
of lens flare (think Star Trek: Into Darkness). The effect you
get in each case is a characteristic of the lens and the

aperture depending on design, one may also see polygonal


and doughnut-shaped bokeh effects. For this exercise, well
stick with just circular ones.
Our plugin will have the user pinpoint light sources using
a path on their image, which we will assume to be singlelayered. They will specify disc diameter, blur radius, and hue
and saturation adjustments. The result will be two new layers:
a transparent top layer containing the bokeh discs, and a

Applying our
bokeh plugin
has created
a pleasing
bokeh effect in
the highlights.

Our plugin creates a layer with


bokeh discs and another with
a blurred copy of the image
layer with a blurred copy of the original image. The original
layer remains untouched beneath these two. By adjusting the
opacities of these two new layers, a more pleasing result may
be achieved. For more realistic bokeh effects, a part of the
image should remain in focus and be free of discs, so it may
be fruitful to erase parts of the blurred layer. Provided the
user doesnt rename layers, then further applications of our

137

Python
Quick
tip
For many, many
more home-brewed
plugins, check out
the Gimp Plugin
Registry at http://
registry.gimp.org

plugin will not burden them with further layers. This means
that one can apply the function many times with different
parameters and still have all the flare-effect discs on the same
layer. It is recommended to turn the blur parameter to zero
after the first iteration, since otherwise the user would just be
blurring the already blurred layer.
After initialising a few de rigueur variables, we set about
making our two new layers. For our blur layer, we copy our
original image and add a transparency channel. The bokeh
layer is created much as in the previous example.
blur_layer = pdb.gimp_layer_copy(timg.layers[0],1)
pdb.gimp_image_insert_layer(timg, blur_layer, None, 0)
bokeh_layer = pdb.gimp_layer_new(timg, width, height,
RGBA_IMAGE, "bokeh", 100, NORMAL_MODE)
pdb.gimp_image_insert_layer(timg, bokeh_layer, None, 0)
Our scripts next task of note is to extract a list of points
from the users chosen path. This is slightly non-trivial since a
general path could be quite a complicated object, with curves
and changes of direction and allsorts. Details are in the box
below, but dont worry all you need to understand is that
the main for loop will proceed along the path in the order
drawn, extracting the coordinates of each component point

as two variables x and y.


Having extracted the point information, our next challenge
is to get the local colour of the image there. The PDB function
for doing just that is called gimp_image_pick_color(). It has
a number of options, mirroring the dialog for the Colour
Picker tool. Our particular call has the program sample within
a 10-pixel radius of the point x,y and select the average colour.
This is preferable to just selecting the colour at that single
pixel, since it may not be indicative of its surroundings.

Bring a bucket
To draw our appropriately-coloured disc on the bokeh layer,
we start somewhat counter-intuitively by drawing a black
disc. Rather than use the paintbrush tool, which would rely on
all possible users having consistent brush sets, we will make
our circle by bucket filling a circular selection. The selection is
achieved like so:
pdb.gimp_image_select_ellipse(timg, CHANNEL_OP_
REPLACE, x - radius, y - radius, diameter, diameter)
There are a few constants that refer to various Gimp-specific
modes and other arcana. They are easily identified by their
shouty case. Here the second argument stands for the

Here are our


discs. If youre
feeling crazy you
could add blends
or gradients, but
uniform colour
works just fine.

Paths, vectors, strokes, points, images and drawables


Paths are stored in an object called vectors.
More specifically, the object contains a series of
strokes, each describing a section of the path.
Well assume a simple path without any curves,
so there is only a single stroke from which to
wrest our coveted points. In the code we refer to
this stroke as gpoints, which is really a tuple
that has a list of points as its third entry. Since
Python lists start at 0, the list of points is
accessed as gpoints[2]. This list takes the form
[x0,y0,x0,y0,x1,y1,x1,y1,...]. Each point is counted

138

twice, because in other settings the list needs to


hold curvature information. To avoid repetition,
we use the range() functions step parameter to
increment by 4 on each iteration, so that we get
the xs in positions 0, 4, 8 and the ys in positions
1, 5, 9. The length of the list of points is
bequeathed to us in the second entry of gpoints
for j in range(0,gpoints[1],4):
You will see a number of references to
variables timg and tdraw. These represent the
active image and layer (more correctly image

and drawable) at the time our function was


called. As you can imagine, they are quite handy
things to have around because so many tools
require at least an image and a layer to work on.
So handy, in fact, that when we come to register
our script in Gimp, we dont need to mention
them it is assumed that you want to pass
them to your function. Layers and channels
make up the class called drawables the
abstraction is warranted here since there is
much that can be applied equally well to both.

Python
number 2, but also to the fact that the current selection
should be replaced by the specified elliptical one.
The dimensions are specified by giving the top left corner
of the box that encloses the ellipse and the said boxs width.
We feather this selection by two pixels, just to take the edge
off, and then set the foreground colour to black. Then we
bucket fill this new selection in Behind mode so as not to
interfere with any other discs on the layer:
pdb.gimp_selection_feather(timg, 2)
pdb.gimp_context_set_foreground('#000000')
pdb.gimp_edit_bucket_fill_full(bokeh_layer, 0,BEHIND_
MODE,100,0,False,True,0,0,0)
And now the reason for using black: we are going to draw
the discs in additive colour mode. This means that regions of
overlapping discs will get brighter, in a manner which vaguely
resembles what goes on in photography. The trouble is,
additive colour doesnt really do anything on transparency,
so we black it up first, and then all the black is undone by our
new additive disc.
pdb.gimp_context_set_foreground(color)
pdb.gimp_edit_bucket_fill_full(bokeh_layer, 0,ADDITION_
MODE,100,0,False,True,0,0,0)
Once weve drawn all our discs in this way, we do a
Gaussian blur if requested on our copied layer. We said
that part of the image should stay in focus; you may want to
work on this layer later so that it is less opaque at regions of
interest. We deselect everything before we do the fill, since
otherwise we would just blur our most-recently drawn disc.
if blur > 0:
pdb.plug_in_gauss_iir2(timg, blur_layer, blur, blur)

Softly, softly
Finally we apply our hue and lightness adjustments, and set
the bokeh layer to Soft-Light mode, so that lower layers are
illuminated beneath the discs. And just in case any black
survived the bucket fill, we use the Color-To-Alpha plugin to
squash it out.
pdb.gimp_hue_saturation(bokeh_layer, 0, 0, lightness,
saturation)
pdb.gimp_layer_set_mode(bokeh_layer, SOFTLIGHT_
MODE)
pdb.plug_in_colortoalpha(timg, bokeh_layer, '#000000')
And that just about summarises the guts of our script. You
will see from the code on the disc that there is a little bit of
housekeeping to take care of, namely grouping the whole
series of operations into a single undoable one, and restoring

any tool settings that are changed by the script. It is always


good to tidy up after yourself and leave things as you found
them. In the register() function, we set its menupath to
<Image>/Filters/My Filters/PyBokeh... so that if it registers
correctly you will have a My Filters menu in the Filters menu.
You could add any further scripts you come up with to this
menu to save yourself from cluttering up the already crowded
Filters menu. The example images show the results of a
couple of PyBokeh applications.

After we apply
the filter, things
get a bit blurry.
Changing the
opacity of the
layer will bring
back some detail.

To finish, group the operations


into a single undoable one, and
reset any changed tool settings
Critics may proffer otiose jibes about the usefulness of
this script, and indeed it would be entirely possible to do
everything it does by hand, possibly even in a better way.
That is, on some level at least, true for any Gimp script. But
this manual operation would be extremely laborious and
error-prone youd have to keep a note of the coordinates
and colour of the centre of each disc, and youd have to be
incredibly deft with your circle superpositioning if you wanted
to preserve the colour addition. Q

Registering your plugin


In order to have your plugin appear in the Gimp
menus, it is necessary to define it as a Python
function and then use the register() function.
The tidiest way to do this is to save all the code
in an appropriately laid out Python script. The
general form of such a thing is:
#! /usr/bin/env python
from gimpfu import *
def myplugin(params):
# code goes here
register(
proc_name, # e.g. python_fu_linesplodge
blurb, #.Draws a line and a splodge
help, author, copyright, date,
menupath, imagetypes,

params, results,
function) # myplugin
main()
The proc_name parameter specifies what
your plugin will be called in the PDB; python_fu
is actually automatically prepended so that all
Python plugins have their own branch in the
taxonomy. The menupath parameter specifies
what kind of plugin youre registering, and where
your plugin will appear in the Gimp menu: in our
case <Image>/Filters/Artistic/LineSplodge...
would suffice. imagetypes specifies what kind
of images the plugin works on, such as RGB*,
GRAY*, or simply if it doesnt operate on any
image, such as in our example. The list params

specifies the inputs to your plugin: you can use


special Python-Fu types here such as PF_
COLOR and PF_SPINNER to get nice interfaces
in which to input them. The results list describes
what your plugin outputs, if anything. In our case
(PF_IMAGE, image, LSImage) would suffice.
Finally, function is just the Python name of our
function as it appears in the code.
To ensure that Gimp finds and registers your
plugin next time its loaded, save this file as (say)
myplugin.py in the plugins folder: ~/.gimp-2.8/
plug-ins for Linux (ensure it is executable with
chmod +x myplugin.py) or %USERPROFILE
%\.gimp-2.8\plug-ins\ for Windows users,
replacing the version number as appropriate.

139

Python

Python: Gimp
snowflakes
Winter is here, so set your White Walker traps and snares, hunker down
and admire some fractal snowflakes with Jonni Bidwell.
there will be an entry called FractalFlake, so go ahead and
click it, if you haven't already, you impatient devil. You will be
greeted with a new dialogue don't worry about the Image
and Drawable options, these are irrelevant for plug-ins that
output new. In fact, don't worry about any of the options for
your first snowflake, just click go ahead and OK and watch the
script work away.
When youre done admiring your handiwork, have a fiddle
with the parameters: Size corresponds to the pixel size of the
square canvas produced by the plugin (of which the
snowflake occupies about 60%), Minimum Length is the line
length of each straight line in your fractal. This corresponds to
the base case for the recursion (see below), if you have a
larger image, and a smaller minimum length, then the image
will take longer to produce. The random wobble will randomly
deviate the vertices of the snowflake by up to this number of
pixels, making for a more organic look, or a bit of mess if you
set it too high.

Recursion, see recursion

Quick
tip
See more on
the Thue-Morse
connections on
a blog piece by
Zachary Abel
http://bit.ly/
ThueMorse

140

s I write this tutorial, the temperatures have begun


their steady decline, which seems a reasonable
excuse to draw some snowflakes. The canonical
snowflake of choice for programmers is based on a fractal
curve invented by the Swedish mathematician, Helge von
Koch. As well as making a pretty picture, this tutorial serves
as a gentle introduction to recursion, one of the trickier
programming paradigms.
Inside the ZIP archive at http://linuxformat.com/files/
ca2015.zip you'll find a file called kochflake.py. Assuming
you have the newest version (2.8) of Gimp installed, then
copy this file to your ~/.gimp-2.8/plug-ins folder and give it
the chmod +x treatment. If you don't have Gimp installed,
then it will certainly be in your distribution's repositories (it
will be also be there if you do have it installed, incidentally) so
let fly with apt-get install, pacman -S, yum install, or
whatever is your weapon of choice.
When you start Gimp have a meander to the Filters menu,
and there you should find a new sub-menu called My Filters
(when you start playing with lots of plug-ins, the Filters menu
can easily get crowded, so it's a good idea to annex off your
custom additions in this way). Inside the My Filters menu

Now that you've satisfied all your snowflake-rendering desires


and are ready to learn what's going on behind the scenes,
let's take a deep breath and think about recursion. A recursive
function is one which calls itself. "Shenanigans!", I hear you
cry, "Only an infinite loop of despair could result from such a
confabulation". And you would be quite correct, were it not for
a (good) recursive function calling itself with different
parameters and having a non-recursive base case. The

Following steps derived from a binary sequence, our


Turtle can trace an ever more convincing von Koch curve.

Python

Turtles and the Thue-Morse sequence


The von Koch curve is a little less messy to
program if you forget about geometry and think
like a turtle. If you have Python's turtle module
installed together with the tk graphical toolkit we
can do it in the following snippet of code:
import turtle
def von_koch(t, order, size):
if order == 0:

t.forward(size)
else:
for angle in [60, -120, 60, 0]:
von_koch(t, order - 1, size / 3)
t.left(angle)
von_koch(turtle,5,400)
This enables us to see a connection with the
binary sequence obtained by starting with a 0

Fibonacci sequence example in our Mathematica tutorial


[see p88] provides a reasonable example, (we won't say a
good example, because it's hideously inefficient), the j-th
Fibonacci number F(j) is defined as the sum of the (j-2)-th
and (j-1)-th Fibonacci numbers, so F(j)=F(j 1) + F(j - 2). This
definition as it stands is not satisfactory, until we first specify
two initial Fibonacci numbers, traditionally F(0) := 0 and F(1)
:= 1. Armed with this knowledge, we can work out F(3) as the
sum F(2) + F(1), F(2) we don't know, but it is F(1) + F(0) = 2,
so F(3) = F(1) + F(0) + F(1) = 2.
Another example which might be closer to some of your
hearts is a recursive directory listing: Print a list of files in the
current directory, then for each directory do the same,
possibly with some indentation. Here the base case is when
the current directory contains no subdirectories, and applying
the procedure will traverse down a directory and give a
lengthy listing of its entire contents.
So once we have our base cases sufficiently well-defined,
then recursion is all good. Granted, its much easier for
computers to understand than humans, so have a peruse at
the step-by-step guide [see p142] to see how the von Koch
curve is constructed. The snowflake is just three of these
curves arranged around an equilateral triangle.

Understanding the code


The code contains a number of housekeeping lines which
might be initially distracting, so let's jump straight into the
fractalflake() function, which is where all the action is. All
Python-fu plugins accept the timg and tdraw arguments
(respectively an image and a drawable), even though they are
not relevant for functions like this which output new, rather
than acting on existing, images. So forgetting about these
arguments we have size, min_length and rnd which are
exactly what the user passes to GIMP via the initial dialogue.
Our first tasks are to set up an RGB image and a layer to
draw on, and start an undo group so that the whole process is
seen as one operation, not several hundred carefully directed
little lines. We also set up a temporary paintbrush, since
drawing lines 1 pixel thick is otherwise tricky, and set the
foreground colour to snow white.
This is all dealt with in the first nine lines of fractflake()
after that we have our recursive step, drawStep(), which we
will skip over for a minute so we can see how it is called. The
code on line 52 refers to three points (ax,ay), (bx,by) and
(cx,cy). The first two are the base of our triangle, located 75%
of the way down the image and at 20% and 80% in the
horizontal direction. The point (cx,cy) is horizontally centered,
and up top, 25% down the page.

and adding the sequences complement at each


stage. Thus 0, 01, 0110, 01101001, and so on. This
is known as the Thue-Morse sequence. And if we
interpret a 0 as an instruction to move the turtle
forward by one unit, and a 1 to rotate 60 counter
clockwise, then a term sufficiently far down the
Thue-Morse sequence begins to look uncannily
von Koch like, as in the image below-left.

We call our drawStep() function to draw three von Koch


fractals between these points, and there is where the magic
happens. So let's now delve into this function. We first
calculate the distance, using the Pythagorean theorem,
between the two points passed to drawStep, then we see if
this length is greater than the supplied min_length:
dy = y2 - y1
dx = x2 - x1
length = math.sqrt (dx ** 2 + dy **2)
if length > min_length:
First, pay attention to what happens if this isn't the case
(skipping over the mess down to line 48), so the points are
sufficiently close together. This is our base case and involves
nothing more than drawing a straight line:
pdb.gimp_pencil(layer,4,[x1,y1,x2,y2])
The syntax is a little strange the 4 refers to the number
of co-ordinates, hence half of the number of points. So now
we can tackle the recursive case. This looks ugly, but thats
just geometry. We define some new points (px,py), (qx,qy)
and (rx,ry): the first two divide the line segment into thirds,
and the latter (which is tricky to calculate) is located
perpendicular to the midpoint at a distance such that an
equilateral triangle is formed by these points.
if length > min_length:

The plug-ins output with a minimum line length of five.

141

Python

px = x1 + dx / 3.
py = y1 + dy / 3.
mpx = x1 + dx / 2.
mpy = y1 + dy / 2.
h = length / 3 * math.sqrt(3)/2
qx = px + dx / 3.
qy = py + dy / 3.
rx = mpx + h * (y1 - y2) / length
ry = mpy + h * (x2 - x1) / length
Next, we consider if we are adding a random wobble. If so
we make a list of 10 random numbers in the required range,
if not we make a list of 10 zeros. We then do the recursive call
with the new points perturbed, if requested.
if rnd > 0:
r = [random.randrange(0,rnd) for j in range(10)]
else:
r = [0 for j in range(10)]
drawStep(x1 + 0,y1 + 0,px + r[2],py + r[3])
drawStep(px + r[2],py + r[3],rx + r[4],ry +r [5])
drawStep(rx + r[4],ry + r[5],qx + r[6],qy + r[7])
drawStep(qx + r[6],qy + r[7],x2 + 0,y2 + 0)
Notice that we don't let the random wobble affect the end
points (x1,y1) and (x2,y2). You are welcome to do this, using
x1 + r[0], y1 + r[1] and x[2] + r[8] and y[2] + r[9], but the
resulting curve will not be closed, which looks a bit odd.

And there you have it, barring some trivial housekeeping


tasks at the end of the function, fractals and snowflakes are
your oysters. There are all manner of other fractals you can
draw In this way. Trees and ferns are particularly popular.
The register() function at the end is used to register the
plug-in in the Gimp Procedure Database. The first argument
here is the main function name prefixed by python_fu. Then
we have fairly self-explanatory entries for a descriptive name,
a more verbose description, an author, a licence and a date.
After this we need to specify where in the menu structure the
plug-in will appear, if (as in our case) the plug-in has some
options then it is customary to indicate this by adding an
ellipsis at the end of the menu entry.
The next entry specifies which type of images the
plugin works on, and we have set this to an empty string
since it is irrelevant in our case. Then, more interestingly,
we have a list of arguments to pass to our function. The
PF_SPINNER type gives a neat way of entering an integer.
The first number is the default an then we have a triplet
consisting of the minimum, maximum and step size for
manipulating the spinner. The same structure works for
PF_SLIDER, which controls the input with a sliding bar.
Other useful types are PF_TOGGLE, for boolean (on or off)
options, as well as the self-explanatory PF_FONT, PF_
BRUSH and PF_LAYER. Q

Ordering up the perfect snowflake

Beginning at order 0 (or a line)

The von Koch curve of order 0 is just a humble straight line. There
really isnt that much more one can possibly say about it aside from it
being rather peaceful and reminds us of Flatliners

Order 2 curve (emergent snowake)

Now subdivide each order 0 curve, so that our order 2 curve is


constituted of four order 1 curves, or 16 order 0 curves. We can see
the familiar snowflake-edge pattern emerging.

142

An order 1 curve (pointy)

If we divide this line into thirds and form an equilateral triangle with
the middle third we get the order 1 curve. This curve is made up of
four order 0 curves at one third scale.

Order 3 curve (or pretty)

The order 3 curve, composed of 64 order 0 curves. Definitely things


are getting more complicated, and youve probably got the idea by
now: The von Koch curve of order n has 4n straight lines, and is pretty.

Why switch to Linux?


The top 10 reasons to give it a try today

Linux is free

Is Windows 7 really worth spending 100? Are you actually


going to get 100 of value from it, or are you just going to use
the same old programs you use on XP? Linux is free, now and
forever. You pay nothing and get great software whats not to like?

Linux is fast

Linux has 1000s of apps

A few years ago, your computer was faster than a speeding


Superman, so what happened to slow it down? Why does it
take minutes to start up? Switch to Linux and let your PC perform at
its best: maximum speed, all the time.

Want to make some music? Or burn DVDs? How about if you


want to make a website? Or touch up some photos? Maybe
you feel like running your own web site? Linux lets you do all this
and more out of the box for no cost. Dont pay for software again!

Linux is secure

Hackers? Viruses? Remote exploits? Weve heard of them,


but only because Windows users get hacked so often. Switch
to Linux and leave these security problems where they belong: in the
last decade. No more viruses, no more critical updates. Poof theyre
gone.

Linux is reliable

Do the words blue screen of death bring you out in a cold


sweat? Stop losing your work. Stop having to run CHKDSK to
fix problems. Stop rebooting every other hour. Linux will run and
keep on running without a hiccup until you turn it off.

Linux works on any PC

Got a bang up to date wonder PC? Great! Itll run Linux. Got a
PC from 10 years ago with an old CPU and limited RAM?
Thatll run Linux too. Got a PC from 20 years ago? Yes, even that
will run Linux just fine. Whether you have 64MB of RAM or 4GB,
Linux is ready for you.

Linux gives you choice

Choose from a dozen different web editors, two dozen text


editors, three dozen programming toolkits, four dozen music
players and thousands of games get the perfect software for you.
And the best bit is, its all free!

Linux is easy to use

Linux is growing

Get up and running with familiar programs like Firefox in


minutes, then explore as much as you want. Plus, try as hard
as you want, Linux is really hard to break you can even leave your
computer-savvy five-year-old alone with it for a day and itll be
unscathed.

With a huge and growing community of friendly users, Linux


is ready for people of all levels. From absolute newbies to
hardened computer veterans, theres a place for everyone to join in,
ask their questions, and meet like-minded people.

10

Linux is everywhere

Google uses Linux. Amazon uses Linux. BMW uses


Linux. Nokia uses Linux. Intel, IBM, Oracle, Cisco, HP,
Motorola, Novell, BT, Dell, Toshiba yup, they all use Linux too.
With millions of users around the world already using Linux, whats
stopping you?

This advert brought to you by the absolutely unbiased


folks at Linux Format magazine.

Linux Format: the easiest way to try Linux

Python

Make
a Twitter client
Jonni Bidwell shows you how to do Twitter like a boss. A command-line
boss that accepts arguments and catches errors.

hile prior studies have shown that a great deal


of Twitter content can be categorised as, phatic,
pointless babble and self promotion, and a great
deal more is just plain spam; it is nevertheless a fact that
among all the chaff there is some highly-informative and up
to the minute wheat. Or information if you prefer.
Twitter enables developers access to its comprehensive
REST API, so that they can use custom applications to
interact with various tweety resources in a sensible manner.
While you could use the API directly in Python, you would
have to write a bunch of messy code to parse your queries
correctly or unwrap lengthy
JSON responses. Mercifully,
all of this has been done for
you in the python-twitter
module, available from all
good distributions, or via pip
install if you want the latest
version. Besides twittering, we will also see how command
line options are dealt with using the argparse module, as well
as how to do some simple error catching.
In order to use the REST API, you must register as a Twitter
developer, so hop along to http://dev.twitter.com and
declare yourself with your regular Twitter credentials. Then
create a new application, populating the Name, Description
and Website fields with anything you like. Leave the Callback
URL field blank and click the create button. You will get
shouted at if your applications name contains twitter, so
dont do that. Now go to the Permissions section of your
application and change its access to Read and Write. You want

to be able to post stuff after all. Now go to the API Keys


section and create an OAuth token. Now grab yourself a copy
of the code linked at the top of the page at www.linuxformat.
com/archives?issue=184, unzip it and populate config.py
with the API Key, API Secret and Access Token Secret
respectively. And now that the stage is set, let us see what
people are saying about us with a quick search. Run:
python twitter_api.py --search="\"linux format\"".
Such slander! We have to escape the inner quotes so that
bash doesnt disappear our results, and the argument is
passed quoted to the Twitter search function as a phrase.
If you have a butchers at
the code, you will see that
twitter_api.py processes all
the command line options
and calls the relevant
functions in twitter_
functions.py. In the case of
our search above, once we have set up our API object, then all
it takes is a call to api.GetSearch() and weve got ourselves a
list of 15 tweets on our chosen subject.
Tweets have their own class with a GetText() method for
extracting the content, but since this content could be in any
character encoding we use the helper function safe_print()
to force UTF-8 output where possible. You can read about all
the available API methods from the command line with
pydoc twitter.Api or you can visit the Google Code website
here: http://bit.ly/1jZ5qIl. Note that the argparse module
now replaces the old optparse module, providing a handy
means of parsing command line arguments.

Twitter enables
developers access to its
comprehensive REST API.

Our search
for Windows XP
found a lot of
worried people
survival kit
indeed. Notice
how nicely
the unicode
characters
are printed.

Adding argparse
By importing the argparse module and creating an
ArgumentParser object and calling parse_args() your
program will get a -h or --help option for free. You can
see this in action by creating a file argtest.py with the
following contents:
import argparse
parser = argparse.ArgumentParser()
args = parser.parse_args()
Then run python argtest.py -h to see your free usage
message. As it stands this is not particularly useful, but once
we start adding arguments this will change. Arguments can
be positional (mandatory) or optional and we can add a
mandatory argument to argtest.py by inserting the following
just above the last line:
parser.add_argument("grr_arg", help="Repeat what you just
told me")

144

Python

Taking a REST
REST is short for Representational State
Transfer and refers to a set of principles for
gathering and sharing data rather than any
concrete protocol. Twitter implements two
major APIs, a RESTful one, which we will use,
and a streaming one, which we wont.
The streaming API provides low-latency
access to real-time data, which you can do all
sorts of fancy stuff with, but the RESTful API
provides a simple query and response
mechanism which suits our purposes just fine.

There are a couple of ways to authenticate


your application with Twitter. If you just want to
access public data, then theres an applicationonly method. Otherwise you will need to use
OAuth tokens, this may seem slightly
convoluted for this simple personal-use
exercise, but userid/password authentication
was turned off last year. Proper OAuth2
authentication is a back and forth dance with a
few variations depending on the context.
Ultimately it asks the user if an app can use

their account, and if the user consents then an


access token is returned to the app via a
callback URL. Only the authenticated
application can use the token and it can be
revoked by the user at any time. The upshot is
that the application never gets to see the users
credentials. In our simple situation we hardcode
the token to our developer account, if you were
making something distributable you would
never share the variable secret, and all the
access tokens would be requested dynamically.

Now when you run python argtest.py you will be given a


stern reprimand about too few arguments. If you run it with
the -h option, you will see that correct usage of your program
requires you to provide a value for grr_arg. We havent added
any functionality for this option yet, but at least if we run our
program with an argument, eg python argtest.py foo, then
we no longer get an error, or indeed any output whatsoever.
The args namespace we created contains all the arguments
that our program expects, so we can use grr_arg by adding
the following to our file:
print "You argued: {}. Huh.".format(args.grr_arg)

Using arguments
More complicated arguments can easily be dealt with; for
example we could sum an arbitrarily long list of integers by
modifying the add_argument call like this:
parser.add_argument('integers', metavar='N', type=int,
nargs='+', help='some integers')
By default, arguments are assumed to be strings, so we
use the type= option to stipulate that integers are provided.
The metavar directive refers to how our argument is referred
to in the usage message, and nargs=+ refers to the fact
that many integers may be provided. We could make a
regular-ordinary program for summing two integers with
nargs=2, but where would be the fun in that? We have to put
the arguments provided into the list args.integers, so we can
process it like so:
print "The answer is {}.".format(sum(args.integers))
Our Twitter project works exclusively with optional
arguments. These creatures are preceded with dashes, often
having a long form, eg --verbosity, and a short form, say -v.
Our Twitter program has 5 options in total (not counting the
complementary --help option): --search, --trending-topics,
--user-tweets, --trending-tweets, and --woeid.
As it stands --woeid only affects the --trending-topics and
--trending-tweets options. While the argparse module could
easily handle grouping these arguments so that an error is
issued if you try and use --woeid with another option, its
much easier to not bother and silently ignore the users
superfluous input: Havent we all seen enough errors?
For example, the search argument which takes an
additional string argument (the thing youre searching for) is
described as follows:
parser.add_argument("-s", "--search",
type=str,
dest="search_term",
nargs=1,
help="Display tweets containing a particular string.")

Once weve built up all the arguments then we collate them


into a namespace with:
args = parser.parse_args()
so that our search term is accessible via args.search_term,
which we pass to search() in twitter_functions.py. This
function acquires a list of tweets via:
tweets = api.GetSearch(searchTerm)
and the following block prints them all out, prefixed by the
user id of the individual responsible:
for tweet in tweets:
print '@'+tweet.user.screen_name+': ',
util.safe_print(tweet.GetText())getsearch

Our usage
instructions for
all the optional
arguments you
can use.

Trends near you


The original Python Twitter code originated from Boston, and
hard-coded the Where On Earth ID (WOEID) used by the
trendingTopics() function accordingly (its 2367105). We can
forgive the authors clinging to their New England roots, but
for this tutorial we have added the --woeid option to see
whats hot elsewhere. This is an optional parameter and only
affects the trending topics/tweets functions. If you dont
provide it then results are returned based on global trends
using the GetTrendsCurrent() method of the API, rather than
GetTrendsWoeid(). You can use the WOEID looker upper at
http://zourbuth.com/tools/woeid for this.
For example, we can see whats going on in sunny Glasgow
by the invocation:
python twitter_api.py --trending-topics --woeid=21125
This only works for a few cities, so the wretched backwater
wasteland you call home may not have any trends associated

145

Python

OpenHatch community
OpenHatch.org is a Boston-based
not-for-profit with the admirable and
noble goal of lowering the barriers into
open source development.
Its website provides a system for
matching volunteer contributors to
various community and education
projects and it runs numerous free
workshops imparting the skills
required to become a bona fide open
source contributor. Since 2011 it has

been running outreach events with a


particular focus on Python, but also
covering other software and striving to
get more women involved with
programming. In this tutorial weve
built on OpenHatchs Python code for
providing simple yet powerful
interaction with the Twitter social
networking platform. The original
code was developed for a Python
workshop in 2012 and we have

with it, which results in an error. You can test for this in the
Python interpreter as follows, where woeid is the WOEID of
your desired location:
import twitter_functions
test = twitter_functions.api.GetTrendsWoeid(woeid)
If you dont get an error ending with Sorry, this page does
not exist, then all is well. We use Pythons error catching to
fallback to the global trends function GetTrendsCurrent()
when this happens:
try:
trending_topics = api.GetTrendsWoeid(woeid)
except twitter.TwitterError:
trending_topics = api.GetTrendsCurrent()
Its prudent (but not necessarily essential, the catchall
clause except: is entirely valid) to specify the exception that
you want to catch if you arent specific, however, confusion
and hairpulling may arise.
The common base
exceptions include IOError,
for when file operations go
wrong, and ImportError
which is thrown up when you
try and import something
that isnt there:
try:
import sys, absent_module
except ImportError:
print "the module is not there"
sys.exit()
Modules will also provide their own exceptions, for example if

brought it up to date and expanded on


it for purposes of this tutorial. In
particular we now use the argparse
module rather than the deprecated
optparse. You can check out some
of the other great Python projects
from this and other events at the
official site (http://bit.ly/1fuabFI).
You could even use your mad
programming skillz to help out some
thoroughly worthy causes.

you try and do this tutorial without a network connection


youll get an error from the urllib2 module. So we catch that
by wrapping the net-dependant functions. We can chain
except: clauses, so the next bit of the above code is:
except twitter.urllib2.URLError:
print ("Error: Unable to connect to twitter, giving up")
twitter.sys.exit()
The userTweets() function is pretty straightforward,
so well just print the relevant segment here:
tweets = api.GetUserTimeline(screen_name=username)
for tweet in tweets:
util.safe_print(tweet.GetText())

Unicode fixer
The function trendingTweets() is a little more complicated:
we need to first get a list of trending topics, and then for each
of these grab some tweets.
But theres a sting in the tail
sometimes the topics
returned will have funky
unicode characters in them,
and these need to be
sanitised before we can feed
them to our search function. Specifically, we need to use the
quote function of urllib2 to do proper escaping, otherwise it
will try and fail to ASCII-ize them.
trending_topics = api.GetTrendsCurrent()
for topic in trending_topics:
print "**",topic.name
esc_topic_name = twitter.urllib2.quote(topic.name.
encode('utf8'))
tweets = api.GetSearch(esc_topic_name)
for tweet in tweets[:5]:
print '@' + tweet.user.screen_name + ': ',
util.safe_print(tweet.GetText())
print '\n'
Weve been a bit naughty in assuming that there will be at
least five tweets, the syntax for limiting the number of tweets
GetSearch returns seems to be in a state of flux, but since
these are trending its reasonable that there will be plenty.
And that completes our first foray into pythonic twittering.
We have developed the beginnings of a command-line Twitter
client, we have parsed options, caught exceptions and
sanitised strings. If your appetite is sufficiently whetted then
why not go further? You could add a --friends option to just
display tweets from your friends, a --post option to post stuff,
a --follow option, and really anything else you want. Q

For this tutorial we have


added the --woeid option to
see whats hot elsewhere.

You might not


get exactly the
same results as
the website, but
both methods
show that people
appear to care
about acorns.

146

Try the new issue of MacFormat


free* in the award-winning app!
macformat.com/ipad
Packed with practical tutorials and independent advice discover why
MacFormat has been the UKs best-selling Apple magazine for seven years!
* New app subscribers only

Python

Minecraft:
Start hacking
Use Python on your Pi to merrily meddle with Minecraft, says Jonni Bidwell.

rguably more fun than the generously provided


Wolfram Mathematica: Pi Edition is Mojangs
generously provided Minecraft: Pi Edition. The latter
is a cut-down version of the popular Pocket Edition, and as
such lacks any kind of life-threatening gameplay, but includes
more blocks than you can shake a stick at, and three types of
saplings from which said sticks can be harvested.
This means that theres plenty of stuff with which to
unleash your creativity, then, but all that clicking is hard work,
and by dint of the edition including of an elegant Python API,
you can bring to fruition blocky versions of your wildest
dreams with just a few lines of code.

Dont try this at home, kids actually do try this at home.

Assuming youve got your Pi up and running, the first step


is downloading the latest version from http://pi.minecraft.
net to your home directory. The authors stipulate the use of
Raspbian, so thats what wed recommend your mileage
may vary with other distributions. Minecraft requires the X
server to be running so if youre a boot-to-console type youll
have to startx. Start LXTerminal and extract and run the
contents of the archive like so:
$ tar -xvzf minecraft-pi-0.1.1.tar.gz
$ cd mcpi
$ ./minecraft-pi
See how smoothly it runs? Towards the top-left corner you
can see your x, y and z co-ordinates, which will change as you
navigate the block-tastic environment. The x and z axes run
parallel to the floor, whereas the y dimension denotes altitude.
Each block (or voxel, to use the correct parlance) which
makes up the landscape is described by integer co-ordinates
and a BlockType. The floor doesnt really have any depth,
so is, instead, said to be made of tiles. Empty space has the
BlockType AIR, and there are about 90 other more tangible
substances, including such delights as GLOWING_OBSIDIAN
and TNT. Your players co-ordinates, in contrast to those of
the blocks, have a decimal part since youre able to move
continuously within AIR blocks.
The API enables you to connect to a running Minecraft
instance and manipulate the player and terrain as befits your
megalomaniacal tendencies. In order to service these our first
task is to copy the provided library so that we dont mess with
the vanilla installation of Minecraft. Well make a special folder
for all our mess called ~/picraft, and put all the API stuff in
~/picraft/minecraft. Open LXTerminal and issue the
following directives:
$ mkdir ~/picraft
$ cp -r ~/mcpi/api/python/mcpi ~/picraft/minecraft

Dude, wheres my Steve?


Here we can see our intrepid character (Steve)
inside the block at (0,0,0). He can move around
inside that block, and a few steps in the x and z
directions will take Steve to the shaded blue block.
On this rather short journey he will be in more than
one block at times, but the Minecraft APIs
getTilePos() function will choose the block which
contains most of him.
Subtleties arise when trying to translate standard
concepts, such as lines and polygons from Euclidean

148

space into discrete blocks. A 2D version of this


problem occurs whenever you render any kind of
vector graphics: Say, for instance, you want to draw
a line between two points on the screen, then unless
the line is horizontal or vertical, a decision has to be
made as to which pixels need to be coloured in.
The earliest solution to this was provided by Jack
Elton Bresenham in 1965, and we will generalise this
classic algorithm to three dimensions a little later in
this chapter.

Isometric projection makes


Minecraft-world fit on this page.

Python

Now without further ado, lets make our first


Minecraftian modifications. Well start by
running an interactive Python session
alongside Minecraft, so open another tab in
LXTerminal, start Minecraft and enter a world
then Alt-Tab back to the terminal and open up
Python in the other tab. Do the following in the
Python tab:
import minecraft.minecraft as minecraft
import minecraft.block as block
mc = minecraft.Minecraft.create()
posVec = mc.player.getTilePos()
x = posVec.x
y = posVec.y
z = posVec.z
mc.postToChat(str(x)+ + str(y) + + str(z))
Behold, our location is emblazoned on the
screen for a few moments (if not, youve made
a mistake). These co-ordinates refer to the
current block that your character occupies, and
so have no decimal point. Comparing these
with the co-ordinates at the top-left, you will
see that these are just the result of rounding
down those decimals to integers (e.g. -1.1 is
rounded down to -2). Your characters
co-ordinates are available via mc.player.getPos(), so in
some ways getTilePos() is superfluous, but it saves three
float to int coercions so we may as well use it. The API has
a nice class called Vec3 for dealing with three-dimensional
vectors, such as our players position. It includes all the
standard vector operations such as addition and scalar
multiplication, as well as some other more exotic stuff that
will help us later on.
We can also get data on what our character is standing
on. Go back to your Python session and type:
curBlock = mc.getBlock(x, y - 1, z)
mc.postToChat(curBlock)
Here, getBlock() returns an integer specifying the block
type: 0 refers to air, 1 to stone, 2 to grass, and you can find all
the other block types in the file block.py in the ~/picraft/
minecraft folder we created earlier. We subtract 1 from the y
value since we are interested in whats going on underfoot
calling getBlock() on our current location should always
return 0, since otherwise we would be embedded inside
something solid or drowning.
As usual, running things in the Python interpreter is great
for playing around, but the grown up way to do things is to
put all your code into a file. Create the file ~/picraft/gps.py
with the following code.
import minecraft.minecraft as minecraft
import minecraft.block as block
mc = minecraft.Minecraft.create()
oldPos = minecraft.Vec3()
while True:
playerTilePos = mc.player.getTilePos()
if playerTilePos != oldPos:
oldPos = playerTilePos
x = playerTilePos.x
y = playerTilePos.y
z = playerTilePos.z

t = mc.getBlock(x, y 1, z)
mc.postToChat(str(x) + + str(y) + + str(z) + +
str(t))
Now fire up Minecraft, enter a world, then open up a
terminal and run your program:
$ python gps.py
The result should be that your co-ordinates and the
BlockType of what youre stood on are displayed as you move
about. Once youve memorized all the BlockTypes (joke),
Ctrl+C the Python program to quit.
We have covered some of the passive options of the API,
but these are only any fun when used in conjunction with the
more constructive (or destructive) options. Before we sign off,
well cover a couple of these. As before start Minecraft and a
Python session, import the Minecraft and block modules, and
set up the mc object:
posVec = mc.player.getTilePos()
x = posVec.x
y = posVec.y
z = posVec.z
for j in range(5):
for k in range(x - 5, x + 5)
mc.setBlock(k, j, z + 1, 246)
Behold! A 10x5 wall of glowing obsidian has been erected
adjacent to your current location. We can also destroy blocks
by turning them into air. So we can make a tiny tunnel in our
obsidian wall like so:
mc.setBlock(x, y, z + 1, 0)
Assuming of course that you didnt move since inputting the
previous code.
In the rest of this chapter, well see how to build and
destroy some serious structures, dabble with physics, rewrite
some of the laws thereof, and go a bit crazy within the
confines of our 256x256x256 world. Until then, try playing
with the mc.player.setPos() function. Teleporting is fun! Q

All manner
of improbable
structures can
be yours.

Quick
tip
Check out Martin
OHanlons
website www.
stuffaboutcode.
com, which
includes some
great examples of
just what the API is
capable of.

149

Python

Minecraft: Image
wall importing
Have you ever wanted to reduce your pictures to 16 colour blocks? You
havent? Tough Jonni Bidwell is going to tell you how regardless.

Not some sort of bloodshot cloud, but a giant raspberry


floating in the sky. Just another day at the office.

echnology has spoiled us with 32-bit colour, multimegapixel imagery. Remember all those blocky
sprites from days of yore, when one had to invoke
something called ones imagination in order to visualise what
those giant pixels represented? In this tutorial we hark back
to those halcyon days from the comfort of Minecraft-world, as
we show you how to import and display graphics using blocks
of coloured wool. Also Python. And the Raspberry Pi.

The most colourful blocks in Minecraft are wool


(blockType 35): there are 16 different colours available, which
are selected using the blockData parameter. For this tutorial
we shall use these exclusively, but you could further develop
things to use some other blocks to add different colours to
your palette. The process of reducing an images palette is
an example of quantization information is removed from
the image and it becomes smaller. In order to perform this
colour quantization we first need to define our new restrictive
palette, which involves specifying the Red, Green and Blue
components for each of the 16 wool colours. This would be a
tedious process, involving importing an image of each wool
colour into Gimp and using the colour picker tool to obtain
the component averages, but fortunately someone has done
all the hard work already.
We also need to resize our image Minecraft-world is only
256 blocks in each dimension, so since we will convert one

Standard setup
If youve used Minecraft: Pi Edition before
youll be familiar with the drill, but if not this is
how to install Minecraft and copy the API for
use in your code.
Were going to assume youre using Raspbian,
and that everything is up to date. You can
download Minecraft from http://pi.minecraft.
net, then open a terminal and unzip the file as
follows (assuming you downloaded it to your
home directory):
$ tar -xvzf ~/minecraft-pi-0.1.1.tar.gz

150

All the files will be in a subdirectory called


mcpi. To run Minecraft you need to have first
started X, then from a terminal do:
$ cd ~/mcpi
$ ./minecraft-pi
It is a good idea to set up a working directory
for your Minecraft project, and to copy the API
there. The archive on the disk will extract into a
directory called mcimg, so you can extract it to
your home directory and then copy the api files
like so:

$ tar -xvzf mcimg.tar.gz


$ cp -r ~/mcpi/api/python/mcpi ~/mcimg/
minecraft
For this tutorial were going to use the PIL
(Python Imaging Library), which is old and
deprecated but is more than adequate for this
projects simple requirements. It can import your
.jpg and .png files, among others, so theres no
need to fiddle around converting images. Install it
as follows:
$ sudo apt-get install python-imaging

Python

With just
16 colours,
Steve can
draw anything
he wants
(inaccurately).

pixel to one block our image must be at most 256 pixels in its
largest dimension. However, you might not want your image
taking up all that space, and blocks cannot be stacked more
than 64 high, so the provided code resizes your image to 64
pixels in the largest dimension, maintaining the original
aspect ratio. You can modify the maxsize variable to change
this behaviour, but the resultant image will be missing its top
if it is too tall.
The PIL module handles the quantization and resizing
with one-line simplicity, but we must first define the palette
and compute the new image size. The palette is given as a list
of RGB values, which we then pad out with zeroes so that it is
of the required 8-bit order. For convenience, we will list our
colours in order of the blockData parameter.
mcPalette = [
221,221,221, # White
219,125,62, # Orange
179,80,188, # Magenta
107,138,201, # Light Blue
177,166,39, # Yellow
65,174,56, # Lime Green
208,132,153, # Pink
64,64,64, # Dark Grey
154,161,161, # Light Grey
46,110,137, # Cyan
126,61,181, # Purple
46,56,141, # Blue
79,50,31, # Brown
53,70,27, # Green
150,52,48, # Red
25,22,22, # Black
]
mcPalette.extend((0,0,0) * 256 - len(mcPalette) / 3)
Unfortunately the / 3 is missing from the code on the
disc, though it is a mistake without any real consequence

(phew). Padding out the palette in this manner does however


have the possibly unwanted side-effect of removing any really
black pixels from your image. This happens because their
value is closer to absolute black (with which we artificially
extended the palette) than the very slightly lighter colour of
the black wool. To work around this you can change the
(0,0,0) above to (25,22,22), so that there are no longer any
absolute blacks to match against. A reasonable hack if youre
working with a transparent image is to replace this value with
your images background colour, then the transparent parts
will not get drawn. We make a new single-pixel dummy image
to hold this palette:
mcImagePal = Image.new("P", (1,1))
mcImagePal.putpalette(mcPalette)
The provided archive includes the file test.png, which is in
fact the Scratch mascot, but you are encouraged to replace
this line with your own images to see how they survive the
{res,quant}ize. You can always TNT the bejesus out of them if
you are not happy. To ensure the aspect ratio is accurate we
use a float in the division to avoid rounding to an integer.
mcImage = Image.open("test.png")
width = mcImage.size[0]
height = mcImage.size[1]
ratio = height / float(width)
maxsize = 64
As previously mentioned, blocks in Minecraft-world do not
stack more than 64 high (perhaps for safety reasons). The
next codeblock proportionally resizes the image to 64 pixels
in its largest dimension.
if width > height:
rwidth = maxsize
rheight = int(rwidth * ratio)
else:
rheight = maxsize
rwidth = int(rheight / ratio)
If you have an image that is much longer than it is high,

151

Python

then you may want to use more than 64 pixels in the


horizontal dimension and fix the height at 64. Replacing the
above block with just the two lines of the else clause would
achieve precisely this.
Now we convert our image to the RGB colourspace, so as
not to confuse the quantize() method with transparency
information, and then force upon it our woollen palette and
new dimensions. You might get better results by doing the
resize first and the quantization last, but we prefer to keep
our operations in lexicographical order.
mcImage = mcImage.convert("RGB")
mcImage = mcImage.quantize(palette = mcImagePal)
mcImage = mcImage.resize((rwidth,rheight))
For simplicity, we will position our image close to Steves
location, five blocks away and aligned in the x direction to be
precise. If Steve is close to the positive x edge of the world, or
if he is high on a hill, then parts of the image will sadly be lost.
Getting Steves coordinates is a simple task:
playerPos = mc.player.getPos()
x = playerPos.x
y = playerPos.y
z = playerPos.z
Then it is a simple question of looping over both
dimensions of the new image, using the slow but trusty
getpixel() method, to obtain an index into our palette, and
using the setBlocks() function to draw the appropriate
colour at the appropriate place.
If your image has an alpha channel then getpixel() will
return None for the transparent pixels and no block will be
drawn. To change this behaviour one could add an else
clause to draw a default background colour. Image
co-ordinates start with (0,0) in the top-left corner, so to
avoid drawing upside-down we subtract the iterating variable
k from rheight.
for j in range(rwidth):
for k in range(rheight):

Unlike in Doom,
this strawberry/
cacodemon
doesnt spit
fireballs at you.
This is good.

152

pixel = mcImage.getpixel((j,k))
if pixel < 16:
mc.setBlock(j + x + 5, rheight - k + y, z, 35, pixel)
To do all the magic, start Minecraft and move Steve to
a position that befits your intended image. Then open a
terminal and run:
$ cd ~/mcimg
$ python mcimg.py
So that covers the code on the disc, but you can have a
lot of fun by expanding on this idea. A good start is probably
to put the contents of mcimg.py into a function. You might
want to give this function some arguments too. Something
like the following could be useful as it enables you to specify
the image file and the desired co-ordinates:
def drawImage(imgfile, x=None, y=None, z=None):
if x == None:
playerPos = mc.player.getPos()
x = playerPos.x
y = playerPos.y
z = playerPos.z
If no co-ordinates are specified, then the players position
is used. If you have a slight tendency towards destruction,
then you can use live TNT for the red pixels in your image.
Just replace the mc.setBlock line inside the drawing loop
with the following block:
if pixel == 14:
mc.setBlock(j + x + 5, rheight - k + y, z, 46, 1)
else:
mc.setBlock(j + x + 5, rheight - k + y, z,
mcPaletteBlocks[pixel])
If you dont like the resulting image, then its good news,
everyone it is highly unstable and a few careful clicks on the
TNT blocks will either make some holes in it or reduce it to
dust. It depends how red your original image was.
While Minecraft proper has a whole bunch of colourful
blocks, including five different types of wooden planks and

Python
Thats right,
Steve, go for
the ankles.
Lets see how
fast he runs
without those!

stairs, six kinds of stone, emerald, and 16 colours of stained


glass, the Pi Edition is a little more restrictive. There are some
good candidates for augmenting your palette, though:
Blockname

Block ID

Red

Green

Blue

Gold

41

241

234

81

Lapis Lazuli

22

36

61

126

Sandstone

24

209

201

152

Ice

79

118

165

244

Diamond

57

116

217

212

We have hitherto had it easy insofar as the mcPalette


index aligned nicely with the coloured wool blockData
parameter. Now that were incorporating different blockTypes
things are more complicated, so we need a lookup table to do
the conversion. Assuming we just tack these colours on to the
end of our existing mcPalette definition, like so:
mcPalette = [
241,234,81,
36,61,126,
209,201,152,
118,165,244,

116,217,212
]
mcPaletteLength = len(mcPalette / 3)
then we can structure our lookup table as follows:
mcLookup = []
for j in range(16):
mcLookup.append((35,j))
mcLookup += [(41,0),(22,0),(24,0),(79,0),(57,0)]
Thus the list mcLookup comprises the blockType and
blockData for each colour in our palette. And we now have
a phenomenal 31.25% more colours [gamut out of here - Ed]
with which to play. To use this in the drawing loop, use the
following code inside the for loops:
pixel = mcImage.getpixel((j,k))
if pixel < mcPaletteLength:
bType = mcLookup[pixel][0]
bData = mcLookup[pixel][1]
mc.setBlock(j + x + 5, rheight - k + y, z, bType, bData)
In this manner you could add any blocks you like to your
palette, but be careful with the lava and water ones: their
pleasing orange and blue hues belie an inconvenient
tendency to turn into lava/waterfalls. Incidentally, lava and
water will combine to create obsidian. Cold, hard obsidian. Q

More dimensions
One of the earliest documentations of displaying
custom images in Minecraft:Pi Edition is Dav
Stotts excellent tutorial on displaying Ordnance
Survey maps, http://bit.ly/1lP20E5. Twodimensional images are all very well, but Steve
has a whole other axis to play with. To this end
the aforementioned Ordnance Survey team has
provided, for the full version of Minecraft, a world
comprising most of Great Britain, with each
block representing 50m. Its Danish counterpart

has also done similar, though parts of MinecraftDenmark were sabotaged by miscreants.
Another fine example is Martin OHanlons
excellent 3d modelling project. This can import
.obj files (text files with vertex, face and texture
data) and display them in Minecraft: Pi Edition.
Read all about it at http://bit.ly/1sutoOS .
Of course, we also have a temporal
dimension, so you could expand this tutorial in
that direction, giving Steve some animated gifs

to jump around on. If you were to proceed


with this, then youd probably have to make
everything pretty small the drawing process
is slow and painful. Naturally, someone (Henry
Garden) has already taken things way too far
and has written Redstone a Clojure interface
to Minecraft which enables movies to be
rendered. You can see the whole presentation
including a blockified Simpsons title sequence
at http://bit.ly/1sO0A2q.

153

Python

Minecraft: Make
a trebuchet
Build your labour of love and then blow it sky high with pyrotechnic
Jonni Bidwell, a stash of TNT and an age-old siege machine.

ow that were au fait with the basics of the API, its


time to get crazy creative. Building a house is hard,
right? Wrong. With just a few lines of sweet Python
your dream home can be yours. Provided your dream home is
a fairly standard box construction, that is. If your dreams are
wilder all it takes is more code. You will never have to worry
about planning permission, utility connection, chancery repair
contributions or accidentally digging up a neolithic burial
ground (unless you built it first).
It never actually rains in Minecraft Pi, so a flat-roof
construction will happily suit our purposes just fine. We kick
off proceedings by defining two corners for our house: v1 is
the block next to us in the x direction and one block higher
than our current altitude, whereas v2 is an aesthetically
pleasing distance away:
pos = mc.player.getTilePos()
v1 = minecraft.Vec3(1,1,0) + pos
v2 = v1 + minecraft.Vec3(10,4,6)
Now we create a solid stone cuboid between these
vertices and then hollow it out by making a smaller interior
cuboid full of fresh air:
mc.setBlocks(v1.x,v1.y,v1.z,v2.x,v2.y,v2.z,4)
mc.setBlocks(v1.x+1,v1.y,v1.z+1,v2.x-1,v2.y,v2.z-1,0)

154

Great, except our only means of egress and ingress is


via the rather generous skylight, and a proper floor wood
(geddit?) be nice. If youre standing in a fairly flat area, youll
notice that the walls of your house are hovering one block
above ground level. This space is where our floor will go. If
your local topography is not so flat, then your house may be
embedded in a hill, or partly airborne, but dont worry the
required terraforming or adjustments to local gravity will all
be taken care of. Lets make our rustic hardwood floor:
mc.setBlocks(v1.x,v1.y-1,v1.z,v2.x,v1.y -1,v2.z,5)
The windows are just another variation on this theme:
mc.setBlocks(v1.x,v1.y+1,v1.z+1,v1.x,v1.y+2,v1.z+3,102)
mc.setBlocks(v1.x+6,v1.y+1,v1.z,v1.x+8,v1.y+2,v1.z,102)
mc.setBlocks(v2.x,v1.y+1,v1.z+1,v2.x,v1.y+2,v1.z+3,102)
mc.setBlocks(v1.x+2,v1.y+1,v2.z,v1.x+4,v1.y+2,v2.z,102)
The roof uses the special half block 44, which has a few
different types. Setting the blockType makes it wooden,
matching our floor:
mc.setBlocks(v1.x,v2.y,v1.z,v2.x,v2.y,v2.z,44,2)
The door is a bit more complicated, the gory details are in
the box on page 156, but the following three lines do the job:
mc.setBlocks(v1.x+2,v1.y,v1.z,v1.x+3,v1.y,v1.z,64,3)
mc.setBlock(v1.x+2,v1.y+1,v1.z,64,8)
mc.setBlock(v1.x+3,v1.y+1,v1.z,64,9)
Having lovingly and laboriously constructed our new
property, the next step is to come up with new and inventive
ways of destroying it. We have already mentioned that TNT
can be made live, so that a gentle swipe with a sword (or

A house. Now lets blow it up.

Python

It doesnt look
like much, but
just you wait...

anything really) will cause it to detonate. It would be trivial to


use setBlocks to fill your house with primed TNT, but we can
do much better. Readers, let me introduce my beta trebuchet.
Rather than simulating a projectile moving through space
we will instead trace its parabolic trajectory with hovering
TNT. Detonating the origin of this trajectory will initiate a most
satisfying chain reaction, culminating in a big chunk of your
house being destroyed. First we will cover some basic twodimensional mechanics. In the absence of friction, a projectile
will trace out a parabola determined by the initial launch
velocity, the angle of launch and the local gravitational
acceleration, which on earth is about 9.81ms-2.
As a gentle introduction, we will fiddle these constants so
that the horizontal distance covered by this arc is exactly 32
blocks and at its peak it will be 16 blocks higher than its
original altitude. If blocks were metres, then this fudge would
correspond to a muzzle velocity just shy of 18ms-1, and an
elevation of 60 degrees. We will only worry about two
dimensions, so the arc will be traced along the z axis with the
x co-ordinate fixed just next to our door. This is all summed
up by the simple formula y = z(2- z/16), which we implement
this way:
for j in range(33):
height = v1.y + int(j*(2 j/16.))
mc.setBlock(v1.x+4,height,v1.z-j,46,1)
The final argument sets the TNT to be live, so have at it
with your sword and enjoy the fireworks. Or maybe not: the
explosions will, besides really taxing the Pis brain, cause
some TNT to fall, interrupting the chain reaction and
preserving our lovely house. We dont want that, so we instead
use the following code:
height = v1.y
ground = height - 1
j=0

while ground <= height:


mc.setBlocks(v1.x + 4,oldheight,v1.z - j,v1.x + 4,height,v1.z
- j,46,1)
j += 1
oldheight = height
height = v1.y + int(j * (2 - j / 16.))
ground = mc.getHeight(v1.x + 4, v1.z - j)
This ensures that our parabola is gap-free and also
mitigates against the TNT arc-en-ciel terminating in mid-air.
We have dealt with this latter quandary using the getHeight()
function to determine ground level at each point in the arc,
and stop building when we reach it. Note that we have to
make the getHeight() call before we place the final TNT
block, since the height of the world is determined by the
uppermost non-air object, even if the said object is hovering.
If our construction exceeds the confines of the Minecraft
world, then you could just build another house in a better
situation, or you could change v1.z - j to max(-116,v1.z-j) in
the above loop, which would make a vertical totem of danger
right at the edge of the world. Now that we have our
trajectory, we can add the mighty siege engine:
z = v1.z -j - 1
mc.setBlocks(v1.x + 3, oldheight, z + 10, v1.x + 6, oldheight
+ 2, z + 7,85)
mc.setBlocks(v1.x + 4, oldheight + 2, z + 12, v1.x + 4,
oldheight + 2, z + 1, 5)
Up until this point, we have aligned everything along a
particular axis: Our house (before you blew it up) faces the
negative z direction, which might be akin to facing south, and
this is also the direction along which our explosive parabola is
traced. Naturally, we could rotate everything 90 degrees and
the code would look much the same modulo some
judicious permuting of x,y and z and +/- though your house
would look a bit funny built on its side. Things get complicated

Quick
tip
The trebuchet
code was inspired
by the amazing
Martin OHanlon
and his Pi-based
projects on www.
stuffaboutcode.
com.

155

Python

Swiss cheese, baby! (Your house may need some repairs after this tutorial.)

Quick
tip
You can do all the
coding here in the
interpreter, but
copying errors are
frustrating. Thus
it might be easier
to put it all in a
file called house.
py which you
can execute with
python house.py
while Minecraft is
running.

if we want to shed the yoke of these grids and right angles, to


work instead with angles of our choosing. The problem is how
to approximate a straight line when our fundamental units are
blocks of fixed orientation rather than points.
A general three-dimensional drawline() function will prove
invaluable in your subsequent creations, enabling you to
create diverse configurations from parallelepipeds to
pentagrams. What is required is a 3D version of the classical
Bresenham algorithm. Pi guru Martin OHanlons github
contains several wonderful Minecraft Pi Edition projects,
including a mighty cannon from which this beginner project
takes its inspiration. Martin has a whole Python drawing class,
which includes the aforementioned 3D line algorithm, but
once you understand the 2D version the generalisation is
reasonably straightforward.
Let us imagine we are in a Flatland-style Minecraft world in
which we wish to approximate the line in the (x,y) plane
connecting the points (-2,-2) and (4,1). This line has the
equation y = 0.5x - 1. The algorithm requires that the gradient
of the line is between 0 and 1, so in this case we are fine. If we
wanted a line with a different slope, then we can flip the axes
in such a way as to make it conform. The crux of the
algorithm is the fact that our pixel line will fill only one pixel
(block) per column, but multiple pixels per row. Thus as we

rasterize pixel by pixel in the x direction, our y coordinate will


either stay the same or increment by 1. Some nave Python
would then be:
dx = x1 x0
dy = y1 y0
y = y0
error = 0
grad = dy/dx
for x in (x0,x1):
plot(x,y)
error = error + grad
if error >= 0.5:
y += 1
error -= 1
where plot() is some imaginary plotting function and grad is
between 0 and 1. Thus we increment y whenever our error
term accumulates sufficiently, and the result is the image
which meets your gaze.
Bresenhams trick was to reduce all the calculations to
integer operations, which were far more amenable to 1960s
hardware. Nowadays we can do floating point calculations at
great speed, but it is still nice to appreciate these novel hacks.
The floating point variables grad and error arise due to the
division by dx, so if we multiply everything by this quantity
and work around this scaling, then we are good to go.
To get this working in three dimensions is not so much of
an abstractive jump, we first find which is the dominant axis
(the one with the largest change in co-ordinates) and flip
things around accordingly. Moving along the dominant axis
one block at a time and incrementing the co-ordinates of
minor axes as required. We have to pay careful attention to
the sign of each co-ordinate change, which we store in the
variable ds. The ZSGN() function returns 1, -1 or 0 if its
argument is positive, negative or zero respectively; I have left
coding this as an exercise for the reader. We make extensive
use of a helper function minorList(a,j) which returns a copy
of the list a with the jth entry removed. We can code this using
a one-liner thanks to lambda functions and list slicing:
minorList = lambda a,j: a[:j] + a[j + 1:]
Our function getLine() will take two vertices, which we will
represent using three-element lists, and return a list of all the
vertices in the resulting 3D line. All of this is based on Martins
code, for which we should all be grateful. The first part of it
initialises our vertex list and deals with the easy case where
both input vertices are the same...

Double door details


Putting doors into our house is our first
encounter with the additional blockData
parameter. This is an integer from 0 to 15 and
controls additional properties of blocks, such
as the colour of wool and whether or not TNT
is live. Our door occupies four blocks and is
aligned in the x direction. Its recessed slightly
back from the surrounding walls, closed, and
has the handles helpfully placed towards the
middle. These properties are controlled by
various bits of the blockType. We number the
four bits from the rightmost bit 0 to the
leftmost bit 3 and in little-endian notation so
that 8 is represented in binary as 1000. Bit 3 is

156

set if the block is part of the top section of a


door. If this is the case, then bit 0 is the only
other bit of concern, it determines the
placement of the handles/hinges. Top sections
of doors thus have blockType 8 or 9.
For the bottom sections we have the
following bit assignments:
bit 3...........off
bit 2 ...........door is open
bit 1 ...........door recessed
bit 0 ..........alignment (off=x, on=z)
The top sections must be placed after the
bottom ones, since they inherit their
properties from their inferiors.

Doors are always a good idea for those


wishing to avoid claustrophobia/death.

Python

Here our line is just a single block:


def getLine(v1, v2):
if v1 == v2:
vertices.append([v1])
After this it gets a bit ugly, we set up the previously
mentioned list of signs ds, and a list of absolute differences
(multiplied by two) a. The idx = line is technically bad form,
we want to find our dominant axis, thus the index of the
maximum entry in a. Using the index() method together with
max means that we are looping over our list twice, but since
this is such a short list we shant worry, it looks much nicer
this way. We refer to the dominant co-ordinates by X and X2.
Our list s is a re-arrangement of ds, with the dominant
coordinate at the beginning. And there are some other lists to
keep track of the errors. The variable aX refers to the sign of
the co-ordinate change along our dominant axis.
else:
ds = [ZSGN(v2[j] - v1[j]) for j in range(3)]
a = [abs(v2[j]-v1[j]) << 1 for j in range(3)]
idx = a.index(max(a))
X = v1[idx]
X2 = v2[idx]
delta = a[idx] >> 1
s = [ds[idx]] + minorList(ds,idx)
minor = minorList(v1,idx)
aminor = minorList(a,idx)
dminor = [j - delta for j in aminor]
aX = a[idx]
With all that set up we can delve into our main loop, in
which vertices are added, differences along minor axes
examined, errors recalculated, and major co-ordinates
incremented. Then we return a lovely list of vertices.
loop = True
while(loop):
vertices.append(minor[:idx] + [X] + minor[idx:])
if X == X2:
loop = False
for j in range(2):

if dminor[j] >= 0:
minor[j] += s[j + 1]
dminor[j] -= aX
dminor[j] += aminor[j]
X += s[0]
return vertices
To conclude in style, we will test this function by making a
mysterious and precarious beam of wood next to where we
are standing as a fitting testament to your wonderous labours
this day, padawan.
v1 = mc.player.getTilePos() + minecraft.Vec3(1,1,0)
v1 = minecraft.Vec3(1,1,0) + pos
v2 = v1 + minecraft.Vec3(5,5,5)
bline = getLine([v1.x,v1.y,v1.z],[v2.x,v2.y,v2.z])
for j in bline:
mc.setBlock(j[0],j[1],j[2],5)
Over the page, well look at creating a fully functioning
Minecraft cannon. Boom! Q

Now we
can escape
the gridlock
and build at
whatever angles
our heart desires.

Dont try this at


home, kids.

157

Python

Minecraft:
Build a cannon
Learn some object-oriented Python and, just as importantly, blow more stuff
up as Jonni Bidwell continues his adventures with Minecraft: Pi Edition.
distinct species. Thus you have likely already done some work
with objects, possibly without knowing it. A particular object
has some methods associated with it: for example we can
add, subtract, divide and multiply integers; compare and
concatenate strings; slice, curtail and append to lists, and so
on. These methods all come for free when we instantiate an
object. So when we do proper object oriented programming
we make a blueprint detailing our own custom methods for
our own bespoke objects. This blueprint is called a class, and
in Python the methods therein are defined as functions. If you
look at Martins code, the first class we come across (line 33)
is called MinecraftDrawing. Its fairly lengthy, comprising six
methods, dealing with all the drawing primitives one could
hope for the drawing of points, lines, spheres and faces.

Taking advantage of objects

I
Quick
tip
Try experimenting
with the velocity
and blast radius
arguments to the
cannon.fire()
method on line
376 of minecraftcannon.py

158

n the previous tutorial. your blockophilic author had some


fun building and demolishing a house, while you learned
about the Minecraft API, the Bresenham algorithm for
drawing blocky lines and how to fiddle with the bits of the
blockType value to make lovely doors and live TNT. If youve
skipped straight to this section, I advise you to go back and
try the trebuchet first. In this outing well learn some equally
valuable lessons and continue with the destructive theme,
this time by way of a controllable cannon courtesy of Martin
O Hanlon (www.stuffaboutcode.com). All the code is on his
site (http://bit.ly/1u9D2bs) and you can run it from its
directory with a simple python minecraft-cannon.py.
The code runs to nearly 400 lines so we wont cover
everything just the juicy bits. This project serves as a nice
introduction to object oriented programming, so first a few
words on this topic.
In programming parlance, an object is an instantiation of
a class. This definition is, to start with at least, unsatisfactory
at best and meaningless at worst. Think instead of objects as
a family of which all the standard programming constructs
(arrays, data types, functions pretty much anything) are

You might wonder at this point why exactly this objectification


of drawings is necessary, as opposed to just having a module
housing all the drawing functions. This is certainly a valid
concern it is entirely possible to modularise this, but if you
look closely within the class block you will see that many of
the methods share the variable self.mc defined in the special
__init__() method. Thus if you were to extract all these
methods to independent functions, many of them would all
have to take an extra mc parameter. For a single variable this
wont hurt, but you can easily imagine how the situation could
deteriorate. Object oriented programming helps us group
variables and functions (or data and behaviour) in a sensible

Steve watches nonchalantly as fiery obsidian hell is


unleashed on an unsuspecting, if rashly exposed, tree.

Python

manner. If nothing else, you will have noted the proliferation of


the keyword self throughout each class. This keyword, given
as the first argument to a method, stipulates that that
method will inherit all the class-specific properties the
variables self.* specified in the __init__() method.
The other three classes in the module (MinecraftBullet,
MinecraftCannon and CannonCommands) all define more
than one property, more than justifying their class-ification,
and refer to each other in a coherent manner as required.
The cannon is controlled by a simple command
interpreter provided by the cmd module, which lets you call
functions with arguments by wrapping them in directive
functions. This is a great example of Python taking something
which would otherwise be laborious and tedious and making
it childs play. All that is needed is a class which subclasses
the cmd.Cmd class and contains all the commands you
require the interpreter to understand. These commands are
defined as functions with names of the form do_*() hence,
for example, the function describing the exit command is
named do_exit(). Any function which returns a value will exit
the interpreter loop, thus all functions except do_exit() and
do_EOF() (called when Ctrl + D is inputted) dont return
anything. Since we have subclassed cmd.Cmd we have to call
its __init__() method manually to start the interpreter, and we
also set up a custom introductory message and prompt here,
so that the CannonCommands class begins as follows:
class CannonCommands(cmd.Cmd)
def __init__(self):
cmd.Cmd.__init__(self)
self.prompt = Stuffaboutcode.com Cannon >>
self.intro = Minecraft Cannon - www.stuffaboutcode.
com
With all this set up, we have a fully functional command
line, with history, line editing and even bash-like tab
completion. Furthermore, we even get a help command which
will print the docstrings for the do_* functions. Delightful,
but how do I get my weapon? I hear you interject, your
enthusiasm perhaps giving way to impatience. This is simply
a case of invoking the start command which sets up the mc
object (put in the self namespace since it is shared amongst
the whole class) and instantiates a MinecraftCannon object
three blocks away from the players current position. The
cannon itself is pretty simple: drawCannon() draws a 3x3
wooden base, and drawGun() draws 5 blocks of dark wool in

a line. The reason for having two functions here is that we can
change the angles of azimuth and elevation for the cannon,
necessitating its redrawing. This is achieved from the
interpreter using the commands rotate and tilt respectively,
which in turn call the setDirection() and setAngle() methods
of the cannon class. The coordinates of the end of the cannon
are calculated by considering the point on an appropriately
sized sphere centred at the fuse end of the cannon. Details
are in the Spherical Trigonometry box overleaf paint over it
if trigonometry triggers youth-related trauma.
When the cannon is tilted or rotated the clearGun()
method is called, which draws over the gun with air blocks.
Then we calculate the new end-point of the cannon as
described in the box, and use mcDrawing.drawLine to draw
the appropriate line. The latter function calls getLine() (the
longest function in the module), which is an implementation
of the Bresenham line algorithm which we talked about last
issue. (If you missed it, dont worry, but if you worry get
yourself a back issue as instructed below.)
So now we come to the best bit: firing the cannon. This
instantiates a MinecraftBullet object with velocity 1 (it moves

Steve wishes
his balls were
just a bit more
destructive.

My first objects
As a gentle introduction to object oriented
programming, lets consider a simple music
library application. Here our objects will be the
library itself and our favourite tracks (whether
thats some rousing Brahms or the latest Israeli
psytrance anthems) and we will have a method
for adding tracks.
class library:
def __init__(self):
self.lib = []
def add(self,trackobj):
self.lib += [trackobj]

class track:
def __init__(self,artist,title):
self.artist = artist
self.title = title
Here our library uses just the standard list
methods, and we bring it into fruition and
populate as follows:
>>> mylib = library()
>>> mylib.add(track("Tom Lehrer", "Poisoning
pigeons in the park"))
>>> mylib.add(track("Bill Bailey", "Beautiful
ladies in danger"))
Its admittedly not much, but you get the idea.

You can (and should) add another special


method __repr__() which returns how these
objects are displayed the default
representation is not so helpful:
>>> mylib.lib[0]
<__main__.track object at 0x7f12eaad9908>
So inside the track class you could add the
following lines:
def __repr__(self):
return("<Track> %s,%s" % (self.artist,
self.title))
This will give you a slightly more informative
description in each case.

159

Python

You have to
fire this one
manually, but
Steve dont
care, he crazy.

at 1 block per tick or hundredth of a second) and blast radius


3 (when it hits something an empty sphere of radius 3 will
result). The bullet itself is just a single block of glowing
obsidian, so drawing it and erasing it are straightforward
see the one-line methods draw() and clear(). We have to
work out the velocities in three dimensions, which requires
similar trigonometry to that involved in drawing the gun
barrel. We then enter into a while loop, calling bullet.update
once per tick, which will return True until a collision occurs.
Velocity in the negative y direction (due to gravity) increases
linearly with time, as we see in line 250
self.yVelocity = self.yStartVelocity + self.gravity * self.ticks
whereas in the x and z dimensions it remains constant as
friction is not worth bothering with. Velocity is measured in
blocks per tick, so the new position is calculated by
incrementing each component of the old position with the
corresponding component of the velocity (line 253) and then
rounded to integer co-ordinates (line 258). If the projectile is
moving slowly, then the rounding could result in its remaining
in situ across ticks, in which case theres no point wasting

Spherical trigonometry
Given an azimuthal (horizontal) angle phi
and an angle of elevation (vertical) theta,
the point on a sphere centred on the
origin and having radius l is calculated by
trigonometry as shewn in the diagram.
Here the blue dot represents a point
on the sphere and the black dot its
projection onto the x-z plane. Because
of how the angles are defined, the y
co-ordinate ends up with a slightly
tidier expression than the others,
which we can see in the function
findPointOnSphere() (line 24):
def findPointOnSphere(cx, cy, cz, radius,
phi, theta):

x = cx + radius * math.cos(math.
radians(theta)) * math.cos(math.
radians(phi))
z = cz + radius * math.cos(math.
radians(theta)) * math.sin(math.
radians(phi))
y = cy + radius * math.sin(math.
radians(theta))
The trig functions require angles to be
converted to radians (we assume that
the rotate and tilt commands take their
angles in degrees). You may recall that
there are exactly pi radians in 180
degrees, and they enjoy the property of
being an entirely dimensionless measure.

Nobody else in the Coding Academy 2015


team respects me for including this image.

160

effort redrawing it. We test that this is not the case with
if matchVec3(newDrawPos, self.drawPos) == False:
and then proceed with the redrawing. This involves a simple
check that the new position is empty space:
if self.mc.getBlock(newDrawPos.x, newDrawPos.y,
newDrawPos.z) == block.AIR:
If this is the case then we disappear the old obsidian block,
update the draw position and redraw:
self.clear()
self.drawPos = minecraft.Vec3(newDrawPos.x,
newDrawPos.y, newDrawPos.z)
self.draw()
If not then we make our crater and change movedBullet
to False so that we exit the update loop:
self.mcDrawing.drawSphere(newDrawPos, self.
blastRadius, block.AIR)
movedBullet = False

Better explosions
Remember in our previous tutorial how we blew all that stuff
up using chain reactions from TNT? Now were going to
upgrade our cannon fire using similar principles. Since the
only way to detonate TNT is by hitting it or by detonating
another block of TNT in its vicinity, we will have to manually
trigger the explosion, which means that the fire command
will work differently this time around. Specifically, it will now
add a block of TNT to the end of the barrel, leaving poor Steve
to light the blue touch paper and run. We need to monitor this
block and act swiftly when it is struck a matter of delicate
timing, to be sure. Just before it explodes, we need another
TNT block to appear just in front of it, and so on creating
the illusion of a moving/exploding/super nashwan fireball of
destruction. The code for this part of the exercise is called
tntcannon.py.
Once Steve hits the TNT and it starts flashing, the
getBlock() method actually detects it as air, which is a
convenient trigger for us to prepare to place the next block
in the chain. This waiting with bated breath is achieved using
the following loop, in which pass is the standard Python do
nothing command:
while self.mc.getBlock(xt, yt, zt) == 46:
pass

Python

From this moment of ignition we have a four-second


window in which to get someplace safe and ensure that the
next block is placed in a timely manner. The initial explosion
will instigate a chain reaction, and empirical studies reveal
that this reaction propagates at a rate of about a block every
0.3 seconds, so we shall synchronise the placing of new TNT
with this imaginary metronome. We will need to tweak the
trajectory slightly to avoid duplicates in the path (we
discussed this resulting from rounding a few paragraphs
ago keep up!), which would otherwise spoil the show.
The upshot of all this is that we dont really have any
control over the velocity of our cannonball: it will travel at four
blocks per second, give or take. In the y direction, since we
want to preserve our modelling of gravity, we will use a trick
from last issue and draw a column of TNT in the event that
there is significant vertical movement. This way at least the
trajectory will be vaguely accurate even if the projectile
doesnt really accelerate, and moreover appears to elongate
and contract as it moves vertically.
The complete fire() method then looks like this:
def fire(self, velocity, blastRadius):
xt, yt, zt = findPointOnSphere(self.baseOfGun.x, self.
baseOfGun.y, self.baseOfGun.z, self.lenghtOfGun, self.
direction, self.angle)
#draw the TNT trigger
self.mcDrawing.drawPoint3d(xt, yt, zt, 46, 1)
#support so that it dont fall when hit
self.mcDrawing.drawPoint3d(xt,yt - 1, zt, block.WOOL.
id, 15)
# wait patiently for trigger
while self.mc.getBlock(xt, yt, zt) == 46:
pass
time.sleep(3.6)
startPos = minecraft.Vec3(xt,yt,zt)
tntBullet = MinecraftTNTBullet(self.mc,startPos,self.
direction,self.angle,1)
while not tntBullet.update():
time.sleep(0.3)
In much the same way as in the earlier part of the tutorial,
we will have an update() method for our object which will
calculate the new position and return False until we hit
something. It looks like this:

Even crazy Steve is awed by the power of the new weapon.

Yeah, it dont
always work
right. If you can
fix it Ill let you
paint my fence.

def update(self):
self.yVelocity += self.gravity
oldPos = minecraft.Vec3(int(round(self.
Posx)),int(round(self.Posy)),int(round(self.Posz)))
self.Posx += self.xVelocity
self.Posy += self.yVelocity
self.Posz += self.zVelocity
newPos = minecraft.Vec3(int(round(self.
Posx)),int(round(self.Posy)),int(round(self.Posz)))
if newPos != oldPos:
if self.mc.getBlock(newPos.x,newPos.y,newPos.z) !=
block.AIR.id:
return True
if abs(newPos.x) > 128 or newPos.y > 128 or
abs(newPos.z) > 128 or newPos.y < -10:
# off the edge of the world or deep underground
return True
self.mc.setBlocks(newPos.x,newPos.y,newPos.z,new
Pos.x,oldPos.y,newPos.z,46,1)
height = newPos.y
return False
Since TNT is quite destructive it is possible that our bullet
could rip through quite a bit of scenery before finally coming
to rest, but we limit the damage by having the update()
method return True when the bullets altitude drops below
10. We also do this if it leaves the confines of Minecraft world
that is, if any of its coordinates gets larger in magnitude
than 128. Once the fireworks finally die down, we redraw our
cannon since it will have suffered some damage at launchtime, and then return the command prompt so that Steve can
have another pop at some innocent blocks.
Unfortunately, thanks to the unfathomable forces at
work when so many TNT blocks are exploding at once, there
is an element of randomness in all of this. As a result, blocks
can fly off in all directions or detonate at the wrong time
culminating in the unfortunate side-effect of stopping the
reaction and leaving half a parabolas worth of TNT in the
sky. The sleep time values on lines 276 and 280 were chosen
in a fairly ad hoc manner, so possibly by tweaking these you
might be able to improve the situation. On the other hand,
no weapon is perfect, and this unreliability is merely a
reflection of this ineluctable truth. Have fun experimenting
and happy coding! Q
161

MAGAZINE IPAD EDITION


The iPad edition of net has been completely rebuilt from the
ground up as a tablet-optimised reading experience.

TRY IT
FOR FREE
TODAY!

Youll find additional imagery, exclusive audio and video content in


every issue, including some superb screencasts that tie in with
the practical projects from the issues authors. Dont miss it!

TRY IT FOR FREE TODAY WITH OUR NO-OBLIGATION 30-DAY TRIAL AT


UK: netm.ag/itunesuk-261 US: netm.ag/itunesus-261

You might also like