You are on page 1of 8

Safe systems-level programming C is still the language of choice for programming operating systems and other code that

needs to be efficient and manage system resources. However, C code is often afflicted with buffer overflows and other memory corruption vulnerabilities. Some experimental programming languages (such as Vault and Cyclone) aim to give the same power as C while being much safer (eliminating buffer overflows, for instance). A PhD thesis could build on the lessons learnt from those languages to defend against more classes of attacks. Concurrent programming and security Some of the most subtle security flaws arise due to race conditions in concurrent access to some critical resoure. The time-of-check-time-of-use (TOCTOU for short) vulnerability class is a classic example. Race conditions and other concurrency problems are likely to become more prevalent in distributed computing and the spread of multi-core processors. A PhD could use types and logics for concurrency to address these problems. My current PhD student Horia Corcalciuc is working in this research area. Defending against command injection attacks The most widely known form of command injection attack is the SQL command injection attack, the scourge of many websites. However, the phenomenon of command injection is quite general, as it arises whenever data can be interpreted as code. Programming language research should be useful for understanding the problem and developing principled means of defence. In particular, formal language theory and type systems are relevant. Type and effect systems I have done some work on type and effect systems for control (continuations); it would be interesting to see whether these results fit into a broader picture of computational effects in general, using mathematical tools like modal and classical logics. Principled access control in programming languages Java and .NET use stack inspection to control access to system resources. It is not always easy to understand what this mechanism does and whether it correctly protects resources. A number of static access control systems have been proposed that can ensure access control at compile time rather than by dynamic checks. I am interested in type and effect systems for access control, which could be a good thesis topic.

Type Inference
Hindley-Milner and beyond Functional programming languages like Standard ML and Haskell use type inference to derive type information from expressions. The type inference process provides the user with a proof for partial correctness of her program. The first well-known and widely spread type inference algorithm is algorithm W and is based on the Hindley-Milner type system. It is part of the Standard ML programming language [29]. The type system found in the Haskell programming language [16] is an extension of the Hindley-Milner system, adding type and constructor classes. Type inference becomes more difficult but remains fully automatic (i.e., it requires no user annotations).

Literature: 1. Louis Damas and Robin Milner, Principal Type-Schemes for Functional Programs [9] 2. Robin Milner, A theory of type polymorphism in programming [28] 3. Oukseh Lee and Kwangkeun Yi, Proofs about a Folklore Letpolymorphic Type Inference Algorithm [27] 4. Mark Jones, A System of Constructor Classes: Overloading and Implicit Higher-Order Polymorphism [21] 5. Fritz Henglein, Type inference with Polymorphic Recursion [20] 6. Tobias Nipkow and Christian Prehofer, Type checking type classes [33] 7. Mark Jones, Typing Haskell in Haskell [22] Object inference Although automatic type inference has usually been considered in the context of functional languages, recent research is focusing on automated type inference for object languages. The primary goal for object inference is to augment objectoriented languages with flow analysis that catch errors like method not understood and other exceptions which are not always caught at compile-time. Literature: 1. Jens Palsberg and Michael Schwartzbach, Object Oriented Type Systems [35] 2. Jens Palsberg, Efficient Type Inference for Object Types [34] 3. Martin Abadi and Luca Cardelli, A theory of Objects [1]

Garbage Collection Techniques


Basic Garbage Collection Techniques

Memory management is one of the most notorious sources for bugs in programs. Especially dereferencing deallocated pointers can lead to exceptions or -worse- to unspecified and unpredictable program behaviour. Since the invention of Lisp in the 60's, garbage collection has been used as a technique to take the burden of manual memory management from the programmer. Garbage collection was long thought to be very inefficient even though many algorithms with different space/time properties have been proposed and developed. Thanks to Java, garbage collection has become widely accepted and this seminar should give the participants an idea of how and why it works. Literature: 1. 2. Paul Wilson, Uniprocessor Garbage Collection Techniques [48] Richard Jones and Raphael Lins, Garbage Collection [23]

Generational and distributed GC Garbage collectors have become very efficient over the last few years. A particular clever and fast technique is generational garbage collection. The underlying philosophy is that not all dynamically allocated objects have the same space behaviour: some are very short lived whereas others stay around for a long time, perhaps the whole life time of the program. Hot Java implements a variation of generational garbage collection, thereby improving drastically on earlier Java runtime systems. Garbage collection is also possible and necessary in parallel and distributed environments, where more than one process can access and mutate the object space. This makes manual memory management horrendously difficult and error-prone. Especially since the advent of the Internet, a lot of attention has been paid to extend garbage collection algorithms in a distributed setting. Literature: 1. The article of Wilson and the book of Jones and Lins (see above) 2. Andrew Appel, Simple Generational Garbage Collection and Fast Allocation [2] 3. David Plainfoss and Marc Shapiro, A survey of distributed Garbage Collection Techniques [36]

Continuations
Continuations and the CPS transformation The term continuation covers a range of techniques and applications. Basically, the word continuation can be understood as context: the "rest" of the execution, the

contents of the stack at some point during execution, a function expressing what comes after the current expression etc. etc. Continuations can be captured as first-class function values, for instance, in the Scheme programming language [24]. In Scheme, it is possible at any point in the program to catch the continuation of the current execution, name it, move it around and run it at some other point. Continuations are also related to a style of programming where every computation passes its result to a continuation, which is represented explicitly as a function. This transformation from direct-style into socalled Continuation-passing Style (CPS) can be done automatically and is a useful compilation and program transformation technique since it makes the flow of execution more explicit. Generally, handling and using continuations requires higher-order functions. Literature: 1. Daniel Friedman, Mitchell Wand and Christopher Haynes, Essentials of Programming Languages (chapters 8-10) [14] 2. John Hatcliff and Olivier Danvy, A Generic account of continuation-passing style [18] 3. Olivier Danvy and Andrzey Filinsky, Representing Control: a study of the CPS transformation [10] Applications for continuations Continuations are a powerful abstraction mechanism, but what can you do with them? Continuations have been proven extremely useful in implementing multiprocessing, concurrency and threads. A multi-processing facility requires three features: elementary exclusion, data protection, and process saving. Continuations seem to be a good tool for implementing the latter two. The paper by Wand describes the implementation of a multi-processing facility in Scheme using continuations. Draves, Bershad, Rashid and Dean use continuations to implement threads in the Mach 3.0 operating system (the predecessor of Unix) and the paper by Olin Shivers discusses the relationship between continuations and threads. Literature: 1. Mitchell Wand, Continuation-Based Multiprocessing [47] 2. Richard Draves, Brian Bershad, Richard Rashid and Randall Dean, Using Continuations to Implement Thread Management and Communication in Operating Systems [11] 3. Olin Shivers, Continuations and threads: Expressing machine concurrency directly in advanced languages [38]

Macro Languages
Everyone knows C's famous preprocessor that allows the C hacker to #define all sorts of "neat" and "cool" stuff, usually with the goal to improve efficiency. Another notable goal of macro languages is to provide a language extension facility. Unfortunately, the use macros in C is rather error-prone because the meaning of a use of the macro depends on its context of use, whereas the meaning of a function depends only on the context of its definition. This can give rise to subtle bugs. A lot of attention has been paid on how to design a proper macro language, i.e., what kind of features it may have and how its behaviour should be restricted with respect to the actual programming language it is built upon. Although most people agree on the necessity of macro languages, it is astonishing how its design and implementation in existing languages are still neglected. Literature: 1. Luca Cardelli, Florian Matthes and Martin Abadi, Extensible Syntax with Lexical Scoping [6] 2. William Clinger and Jonathan Rees, Macros that work [7] 3. Eugene Kohlbecker, Daniel Friedman, Matthias Felleisen and Bruce Duba, Hygienic Macro Expansion[25] 4. Eugene Kohlbecker and Mitchell Wand, Macro-by-example: Deriving syntactic transformations from their specifications [26] 5. Claus Brabrand and Michael Schwartzbach, Growing Languages with Metamorphic Syntax Macros [4]

Parametrised Classes aka Mixins


Object-Oriented Programming, invented in the 60s, properly designed in the 70s and buzzed in the 80s, is usually associated with classes. Although all mainstream OOlanguages are effectively built on the notion of class, it has been recognised that the class concept suffers from some inherent limits. One particular problem is the so-called extensibility problem. It turns out that classes are a perfect match if the data domain is to be extended (sub-classing), however, classes prove less practical when new functionality is to be added. Recently, the concept of parametrised classes or mixins has been proposed to solve exactly that problem. A mixin is simply a class that is parametrised over its superclass. Although relatively unproblematic in a dynamically typed setting, mixins pose some interesting problems when put in a statically typed context. Literature: 1. Bruce Findler and Matthew Flatt, Modular Object Oriented Programming with units and mixins [12] 2. Matthew Flatt, Shriram Krishnamurthi and Matthias Felleisen, Classes and Mixins [13]

3. Gilad Bracha and William Cook, Mixin-based Inheritance [5] 4. Viviana Bono, Amit Patel and Vitaly Shmatikov, A core calculus of classes and mixins [3]

Generative Programming
When we program, we usually want to calculate a value of some kind, whether this results in interactive word-processing, playing Quake or calculating the particle stream in a supernova. Supposing appropriate extensions to the programming language, it is possible to generalise the concept of programming in such a way that it allows the user to calculate programs, not just values. In other words, the generative programming idiom adds code to its value domain. MetaML One particular approach to generative programming is present in the MetaMLsystem. It extends the Standard ML Programming Language [29] with annotation constructs that produce code when executed. Moreover, it does this in a type-safe manner, i.e., the generated code can be proven type-safe by the MetaML type checker. Meta-programming has a lot of applications, and it is often used to implement aggressive optimisation techniques. Literature: 1. Walid Taha and Tim Sheard, MetaML and Multi-stage Programming with Explicit Annotations [45] 2. Walid Taha and Tim Sheard, Multi-stage Programming with Explicit Annotations [44] 3. Eugenio Moggi, Walid Taha, Zino Benaissa and Tim Sheard, Idealised MetaML: Simpler, and More Expressive [32] Partial Evaluation Although the MetaML-approach to generative programming is very flexible and allows extensive user control on program generation, it may be quite laborious to use as a general optimisation technique. Partial evaluation approaches the optimisation problem from a different angle. Instead of expecting manually inserted annotations in the input program, it simply tries to (partially) evaluate as much of the given program as possible when given some (usually not all) of its input. How the actual program is reduced depends on the partial evaluation (aka program specialisation) technique at hand. However, just like MetaML, the result of partial evaluation is a new program. Literature:

1. Torben Mogensen, Partial Evaluation: Concepts and Applications [30] 2. John Hatcliff, An Introduction to Online and Offline Partial Evaluation Using a Simple Flowchart Language [17] 3. Torben Mogensen and Peter Sestoft, Partial Evaluation (encyclopedia of computer science [31] 4. Charles Consel and Olivier Danvy, Tutorial Notes on Partial Evaluation [8] Supercompilation The kind of optimisations that can be performed by a partial evaluation system are limited. The power of a program specialiser actually depends on the level at which it makes decisions on whether to perform some static calculations or produce code. A very powerful program specialisation technique is supercompilation, developed in the ex-Soviet Union by Valentin Turchin long before partial evaluation was known in the West (the original article was in Russian!) A supercompiler starts from a global view of how the program is executed and then generalises just enough to make this supervising and code-generating process finite. A supercompiler is so powerful that it is capable of deriving the very efficient KnuthMorrison-Pratt string matching algorithm automatically from the naive implementation! Needless to say, supercompilation is quite inefficient. Literature: 1. Morten Srensen and Robert Glck, Introduction to Supercompilation [42] 2. Morten Srensen, Turchin's supercompiler revisited [43] 3. Robert Glck and Morten Srensen, A road-map to metacomputation by supercompilation [15] 4. Valentin Turchin, The concept of a Supercompiler [46]

Constraint Programming in OZ
Another, perhaps somewhat unexpected, programming idiom is constraint programming. Next to "ordinary" programming, a constraint programming language allows to express computation in terms of constraints. Applications for constraint programming are to be found in natural language processing and combinatorial integer calculation. Another rather different application for constraints is the scheduling problem in high-level concurrent programming. The OZ programming language is a concurrent object-oriented programming language with constraints built-in to provide for communication and synchronisation of concurrent computations. The Oz/Mozart system (http://www.mozart-oz.org) is an implementation of the Oz language. Literature:

1. Gert Smolka, Problem Solving with Constraints and Programming [40] 2. Gert Smolka, The Oz Programming Model [39] 3. Martin Henz, Gert Smolka and Jrg Wrtz, Object-Oriented Concurrent Constraint Programming in Oz [41]

You might also like