You are on page 1of 8

Digging Into Ruby Symbols 245/09/Saturday 12h14

Digging Into Ruby Symbols


- published to O'Reilly Ruby Blog on Dec 28, 2005
- re-published here on Jan 01, 2006
Steve Yegge

Lots of people have been discussing symbols in Ruby, and seem have converged on
the explanation that symbols should be used whenever you're referring to a name (i.e.
an identifier or keyword, essentially), even if you're talking about a hypothetical name
that doesn't really exist in actual code yet.

I think this is the correct idiomatic usage, and it's a pretty good way to explain
symbols. But I also think it's going to feel a bit hollow or contrived to someone
coming to Ruby from a background in (say) JavaScript, Python, or even Java. If I
were them, I'd be thinking: "Um, OK. Intent, intent, intent. Got it. But... isn't a
program-source identifier a fairly abstract notion to reify as a first-class object type,
especially going so far as to give it a special syntax? And did I just use the word
'reify'? Geez."

I mean, Ruby symbols are right up there with numbers, strings, regexps and the like as
first-class lexical entities. I'm guessing that this feels like a really odd decision to a lot
of programmers. They might be comfortable with the "intent" explanation (which,
incidentally, is similar to why I tell people I like tuples in Python so much -- they
help me express tuple-ish intent better than a list). Comfortable, sure, but they're
probably not wholly satisfied. It still smells a little fishy.

Am I right?

I'd like to offer my own humble take on Ruby symbols, in the hope that it'll clear
things up a teeny bit more. Nothing I'm going to say in any way negates what folks
have concluded already, which is that symbols are best viewed as representing names
in program code, not as "lightweight strings".

Metaprogramming crash-course
Symbols as first-class objects are an idea that's usually associated with Lisp. I don't
want to force you to learn any Lisp, and I won't show you any Lisp today. But
hopefully I can give you the flavor of how symbols are used in Lisp by describing a
"hole" in Ruby that I hope will be fixed someday.

As a toy example, let's take a look at the following Ruby code, which dynamically
creates four methods and attaches them to an empty holder class, using eval.
#!/usr/bin/env ruby
# define a blank class as a holder for some methods
class BigMeanGiant
end

# Now add some silly-ish methods, using a flavor of eval.


# They're going to be instance methods, because it's as if
# we defined them inline inside the class definition above.
# When invoked, the giant yells the name of the method.

%w(fee fi fo fum).each do |name|


http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 1 sur 8
Digging Into Ruby Symbols 245/09/Saturday 12h14

%w(fee fi fo fum).each do |name|


BigMeanGiant.class_eval <<-EOS
def #{name}()
puts 'Giant says: #{name.upcase}!'
end
EOS
end

# invoke the methods, just for fun


begin
g = BigMeanGiant.new
g.fee
g.fi
g.fo
g.fum
end

When you run this little program, it obligingly prints:


Giant says: FEE!
Giant says: FI!
Giant says: FO!
Giant says: FUM!

This program is roughly the "hello, world" of metaprogramming in Ruby. We've


written some code that generates code on the fly: in our case, four nearly identical
methods on BigMeanGiant called 'fee', 'fi', 'fo', and 'fum'. It's almost the same as if
we'd written the code like this instead:

#!/usr/bin/env ruby

class BigMeanGiant
def fee() puts "Giant says FEE!" end
def fi() puts "Giant says FI!" end
def fo() puts "Giant says FO!" end
def fum() puts "Giant says FUM!" end
end

# invoke the methods, just for fun


begin
g = BigMeanGiant.new
g.fee
g.fi
g.fo
g.fum
end

Running this version of the program has the same output.

What did we do that for?


Although this isn't meant to be a lesson in metaprogramming, let's make sure we're all
on the same page here. The second version is clearer, right? Why would you ever do
the first version?

You almost certainly wouldn't do it in an example this small, but the DRY principle

http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 2 sur 8


Digging Into Ruby Symbols 245/09/Saturday 12h14

tells us to avoid duplicating code. You can only get so far with function abstraction.
Without metaprogramming, you can't really compress the BigMeanGiant class much.
You might factor out some of the repetition with a helper function:
class BigMeanGiant
def say(msg) puts "Giant says #{msg}!" end
def fee() say "FEE" end
def fi() say "FI" end
def fo() say "FO" end
def fum() say "FUM" end
end

But it's not much of a savings, because you still have to write all the stubs. Imagine
you're writing an HTMLOutputter class, with one method for every HTML tag --
you'll have to write a few dozen stubs, which is more than just annoying. It's also
probably more error-prone, since you'll have so much code it'll be harder to spot
missed tags, duplicated tags, incorrect method bodies, and so on. And if you have to
go back and change them all in some minor way, your refactoring editor may or may
not be able to help, depending on what change you have in mind.

In short, having lots of similar-looking code is a Bad Thing.

To solve problems like this in Java, you either have to build elaborate and inevitably
awkward dispatching infrastructure, or you have to use external code generators, then
hack your build system to know how to generate and then use the generated code.

This, incidentally, is why you so often see generated code in large Java projects -- it's
because Java offers no language-level ways to deal with problems like this. And of
course, this is only one type of problem that's solved elegantly with
metaprogramming; there are many other classes of problem that are equally difficult
to implement cleanly in Java.

OK, we're all on the same page now, right? Generating code on the fly can lead to
cleaner, more maintainable code, assuming you use taste and good judgement and
blah blah blah. You get the idea.

The example explained


Continuing with my quest to get us all on the same page, let me make sure you
understand the code in the first example. The relevant part is this blob right here:

%w(fee fi fo fum).each do |name|


BigMeanGiant.class_eval <<-EOS
def #{name}()
puts 'Giant says: #{name.upcase}!'
end
EOS
end

This weird-looking snippet, interpreted in English, is saying:

1. Make me a list of the strings "fee", "fi", "fo", and "fum".


2. For each one of those strings:
substitute it into another string below, containing a Ruby method

http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 3 sur 8


Digging Into Ruby Symbols 245/09/Saturday 12h14

definition
The first time, use it as the method name.
The second time, use it (uppercased) as what the Giant says.
Then call class_eval to turn it into a real method on the BigMeanGiant
class.

Make sense? We're constructing method definitions in a loop, as strings, then passing
them to the Ruby interpreter to attach them to a class. It's not all that different from
putting the code in a Ruby source file, then invoking the interpreter; it's just that we're
controlling the process ourselves at runtime.

The argument to class_eval is a string. The string contains code. Before class_eval
gets hold of it, it's Pinocchio, wanting to be a Real Boy. class_eval is the fairy that
sends him off to Pleasure Island to be ridiculed and learn valuable lessons, or
whatever the interpreter does in its Big Black Box.

So far, so good. eval seems like a useful thing to have in your language, if you use it
with caution.

Trouble in Paradise
So let's say there's a bug in my generated methods. Maybe the giant isn't saying
anything, or he's saying the wrong thing. Let's say I'm having trouble figuring out the
bug by staring at my code-string, which is really just a template. It's not real code
until the interpreter finishes evaluating it and attaching it to the BigMeanGiant class.

So I fire up the debugger, and step through the code, and immediately notice a few
things:

1. The call to class_eval is atomic. The debugger just steps right over it.
2. Calls to the generated methods are also atomic.
3. I have no way of printing out the generated code.

In other words, your metaprogramming-generated code isn't "first class" in the same
way your normal source code is. It's not visible to the debugger, and it's not available
to other tools either. (For instance, rdoc lets you include the source code in the
generated documentation, but I don't think there's any easy way to have it know about
your eval-generated code.)

There are some games you can play that might make some of these things achievable.
For instance, you might be able to override class_eval to store the original source
code (after the template substitution) in the class somewhere, and then provide an API
for getting at it for your favorite debugger. But to the best of my knowledge, it's not
something that's supported "out of the box" in Ruby, and it means that working with
generated code is harder than it really needs to be.

Even if I'm completely mistaken here, and someone comments with a way to print out
a generated method's source code (which would be pretty nifty), the whole experience
still falls remarkably short of the metaprogramming facilities in Lisp.

To clarify, let's peer more closely into the lifecycle of that generated code. There are

http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 4 sur 8


Digging Into Ruby Symbols 245/09/Saturday 12h14

some distinct activities that rush right by us in Ruby, things we might actually want
some control over.

We really will make our way to symbols soon, promise.

Constructing the code string

We start with a string, which the first example has in a "here doc" -- one of Ruby's
genuine Perl-isms that you're free to view with suspicion. Python's syntax would be a
triple-quoted string, which I think is nicer, but what's done is done. Here's the string
again:

def #{name}()
puts 'Giant says: #{name.upcase}!'
end

It could just as easily have been a normal, double-quoted string, even a one-liner:

"def #{name}() puts 'Giant says: #{name.upcase}!' end"

However, because dynamically-generated code is notoriously tricky to debug, most of


the time you'll want to format code in template strings as clearly as possible.

I'm calling it a template because Ruby strings can contain inline expressions,
delimited with #{}. In Java you'd use string concatenation, e.g.:

"Giant says: " + getThingGiantSays() + "!"

Python has the printf-like % operator, and other languages have their own approaches.
The Ruby way is probably more readable if the substituted expressions are short;
using something like sprintf (which Ruby also has) will be better if there are long
expressions. Basically you want to do whatever makes the code template look as
much as possible like the code it's going to turn into.

Here's Secret Observation #1: in Lisp, your code template isn't a string. It's a data
structure that represents the tokenized and partially-parsed code. If Ruby had this
feature, the BigMeanGiant example might look something like this:

%w(fee fi fo fum).each do |name|


BigMeanGiant.class_eval START_CODE_TEMPLATE
def #{name}()
puts 'Giant says: #{name.upcase}!'
end
END_CODE_TEMPLATE
end

I put those big START/END tokens there in an attempt to make it clear that what's
inside them is NOT a real boy; it's Pinocchio, and it will take some major Good Fairy
work to make it real code.

But notice that the code inside the template is actually syntax-highlighted properly.
When it was all inside a string (heredoc, double-quoted, or otherwise -- it's still just a
string), it was all highlighted in light blue, which is what my editor tells me Strings
should look like. My editor was nice enough to highlight the substitution expressions
http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 5 sur 8
Digging Into Ruby Symbols 245/09/Saturday 12h14

in brown, but you still need to realize they're substituted before the final string is used
as an argument to class_eval. But inside the CODE_TEMPLATE, we know it's
going to be code, so we can invoke the syntax-highlighter on it. Helps you see what's
going on more clearly. And auto-indenting, tagging, and other IDE functions will
work on it. Muuuuuch nicer than code in a string, wouldn't you agree?

Imagine that you could pass around one of those CODE_TEMPLATE doohickeys as
an object, one that actually represented the Pinocchio-code in a way that let you
traverse it and modify it before passing it off to eval. That seems like it could come in
quite handy, and in fact it does. For one thing, it makes it far easier to do meta-
metaprogramming, where you're writing code that generates those code templates. But
at a perhaps more mundane level, it makes it possible to create new syntactic
constructs in the Ruby language.

At this point, some people will cringe and shudder and proclaim: "Evil! What you just
said is Pure Evil!" Lots of programmers, maybe even most of them, are so irrationally
afraid of new syntax that they'd rather leaf through hundreds of pages of similar-
looking object-oriented calls than accept one new syntactic construct. I blogged about
this once, in an article called Language Trickery and EJB. That article actually
managed to convince a bunch of hardcore Java programmers that new syntax might
actually be a useful tool. Maybe it'll convince you too. If not, well, feel free to skip to
the next section.

It would actually take me too far afield to go through a detailed example of how
adding a new syntactic control-flow construct to Ruby could turn into a huge benefit
for your project. Imagine, though, that Ruby didn't have here-docs, and that you were
practically drooling with jealousy over Python's triple-quoted strings. If you're a Java
programmer, and you're not drooling purely out of habit, then you should definitely
drool over multi-line strings. It boggles the mind that they didn't include it as a
language feature, and in Java we wind up doing zillions of manual concatenations to
produce long strings (which usually by then look nothing like the thing they're trying
to represent.) Ah, me.

If Ruby didn't have here-docs, but Ruby had those CODE_TEMPLATE thingies and
one system hook that allowed you to control the evaluation of those templates, then
you could implement here-docs pretty easily. Because the code-to-be is represented
as a data structure, allowing you to quickly and easily filter out the #{}-substitution
elements, you could simply evaluate whatever's inside those elements, and not
evaluate anything else in the template. That's all they really do. And of course (much)
more sophisticated syntactic constructs are also possible, if you put in more work.

That's the kind of thing Lisp programmers do for breakfast before going and writing
their application code. And the funny thing is, it could really be super easy in Ruby --
maybe even easier than in Lisp. It's just that Ruby doesn't support it today.

If the hairs are all standing up on the back of your neck, and you're just recovering
from shock and trying to think of the dirtiest word you could possibly call me, well,
take a few deep breaths, nice and slow. It just means I'm a dog person and you're a cat
person, or something like that. Let's not bite each other. Many people (notably Paul
Graham in "On Lisp") have spent lots of effort explaining how this kind of
programming has to be treated with MUCH more deference and caution than ordinary

http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 6 sur 8


Digging Into Ruby Symbols 245/09/Saturday 12h14

API programming. Language extensions and minilanguages can be extremely


powerful and useful -- imagine where we'd be without regular expressions, for
instance -- but they also require tons more care, documentation, and thought than
defining an ordinary function.

You're already sort of doing this kind of "language extension" programming every
time you call eval -- for that matter, you're doing it whenever you invoke a separate
code generator, or open up a class and add stuff to it, or use a tool like yacc or
ANTLR. We're completely surrounded by languages, large and small, all the way
down to the minilanguage you use for ordering coffee at Starbucks. It'd be hard to get
along without them.

Evaluation

Once you have that code template as an actual object, as opposed to a string that you
need to parse yourself, then you could do all sorts of things with it. For one thing, you
could pretty-print it. It's effectively in parse-tree format, so all you'd need to do is
decide the rules for line breaks and spacing between various token types. For another
thing, you could tell your debugger about it, which would allow you to inspect and
step through generated code. And evaluation -- the creation of actual code from your
template -- would no longer be the black box that it is in Ruby today (and in Python,
Perl and JavaScript, for that matter.) More control means more opportunities to
remove DRY violations, and do so in a way that has strong(er) long-term
maintainability characteristics. I mean, you have to admit, not being able to inspect or
step through your generated code makes maintenance a bit of a tricky proposition.

(Note: see the important correction Jim Weirich made in the comments section. --
steve)

Symbols at last
Those nonexistent code templates I've been referring to -- that is, objects (collections,
really) that represent snippets of code to be evaluated -- they're really just syntax trees
representing your source code. They're similar to the output you'd get from any
parser, including generated parsers from tools like ANTLR. Or maybe a more familiar
example is the XML DOM -- an object-tree representation of the parsed XML file.
You have to admit, working with a DOM is a lot more convenient than working with
a string containing raw XML. It's a huge difference, and it's a feature Lisp has that
Ruby mostly lacks, at least today. A set of features, really: it's a rich programming
domain.

In a system with first-class syntax trees represented as language entities, in a way that
allows you to interact with the lexer, parser, and evaluator (i.e. different components
of the Ruby interpreter), symbols make a whole lot more sense. A symbol is literally
an object that represents a name in the code tree. If you had a code template snippet
representing this code:

def fum() say "FUM" end

Then your syntax tree would contain a Symbol object for each token in the code
except for the string "FUM" (which would be a String), because that's just a string

http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 7 sur 8


Digging Into Ruby Symbols 245/09/Saturday 12h14

and not a source identifier or keyword, and also except for the parens in the arg list,
but that's another long story that we don't have time for today.

So Ruby's symbols are really a placeholder for grand things to come. Ruby is already
a very powerful, capable language, but it has some weaknesses in its ability to process
Ruby code at runtime. Your only real tool today is eval (which comes in several
flavors in Ruby, but that's irrelevant to our discussion), and it's a big black box. Once
your code template is handed over to the Good Fairy, crossing that magical line
between your program and the Ruby interpreter, you've lost it, and what you get back
is effectively an opaque binary blob wrapped in a thin Method (or UnboundMethod,
etc.) class that doesn't remember much about its original symbolic representation.

Well, that went on way too long. Was it helpful?

comments

Back to Stevey's Drunken Blog Rants(tm)

http://opal.cabochon.com/~stevey/blog-rants/digging-into-ruby-symbols.html Page 8 sur 8

You might also like