You are on page 1of 13

CS323 Lecture: Language Evaluation Criteria Objectives:

Last revised 1/6/09

1 . T o consider criteria for evaluating programming languages. Introduction A . The design and evaluation o f programming languages i s a challenging area because - a s w e shall see - there i s n o such thing a s a "best" language. Instead, existing languages are strong b y some criteria and weak b y the others, s o that the choice o f a language for a particular purpose i s tied t o a decision a s t o which criteria are most important. B . W e will now consider a number o f criteria which can b e used i n developing a language design, o r evaluating a n existing one. The criteria listed below come from books b y Alan Tucker and Ellis Horowitz's - which have good chapters o n this topic. I have grouped them somewhat differently, though - under three main categories: 1 . Criteria Relating t o Ease o f Using a Language 2 . Criteria Relating t o Software Engineering 3 . Criteria Relating t o Performance C . You should apply these criteria when you are doing language evaluations a s part o f your Mini-Projects. I . Criteria Relating t o Ease o f Using a Language A . Programming languages are used b y programmers t o write programs. Thus, a good language should make i t easy for a programmer t o express what needs t o b e done. Several criteria contribute t o making a language easy t o use. B . The first criterion w e will consider i s WELL-DEFINEDNESS. Both the syntax and the semantics o f the language should b e clearly defined. 1 . Syntax answers the question "What forms does the language allow?" a . This i s important, s o that a programmer knows how t o construct statements that will b e accepted b y the compiler. I f the syntax definition i s ambiguous, then the programmer may have t o resort t o trial and error. Even worse, different compilers for the same language may differ i n their interpretation o f a n ambiguity, leading t o portability problems. b . There are a number o f notation systems which can b e used t o spell out the syntax o f a language. (We will study these later.) Any one o f these systems can b e used t o spell out syntax unambigously, though some are more readable for humans than others are. 2 . Semantics answers the question "what does this form mean?". a . The importance o f this for the programmer i s obvious. Ambiguity here may again force the programmer t o resort t o trial and error.

Example: a classic example o f ambiguous semantics i s the "dangling else" problem: should a construct o f the form i f (B1) i f (B2) S 1 else S 2 [ where B 1 and B 2 are boolean expressions and S 1 and S 2 are statements ] b e interpreted as: i f (B1) i f (B2) S1 else S2 or as i f (B1) i f (B2) S1 else S2

- - else goes w/second i f

- - else goes w/first i f

- Obviously, the problem can always b e avoided b y using braces (or its equivalent) around the inner if. But this i s inconvenient and programmers don't usually d o this. - ALGOL handled this b y forbidding the "then" part o f a n i f from being another if. Should a programmer want a construct like this, h e would b e FORCED t o use begin . . end. - Some newer languages require a n explicit "end if" t o terminate a n i f . . then . . else, also avoiding the problem (e.g. FORTRAN77, Modula-2, Ada) - Most languages resolve the ambiguity a s Java does, b y matching the else with the NEAREST i f that has n o else (the first interpretation above.) But this rule i s often not clearly stated i n the language manual (though i n the case o f Java i t is, but i n a strange way!) b . Further, i f the semantics o f language construct are ambiguous, and two compilers interpret the ambiguity differently, then when a program i s ported from one compiler t o another i t may n o longer run correctly - a n even worse problem than that arising from a syntactic ambiguity, since the compiler will give the programmer n o indication o f the problem. Example: i n a boolean expression like ( i < max) & & (x[i] > 0), i s the second comparison done i f the first fails? This i s a n important question i f the second comparison would cause a run-time error i f the first comparison i s false.

- Java addresses this question b y saying that, i n a case like this, the second comparison i s NEVER done i f the first fails. - Ada addresses this question b y saying that both comparisons are ALWAYS done. I t provides a different construct for use when one wants t o not avoid doing the second comparison: ( i < max) and then (x[i] > 0 ) - Some other languages leave this question unresolved, and different compilers may produce different results. (In fact, i n some cases the SAME compiler may handle this differently i n different contexts - this lead t o some interesting problems with our old Pascal compiler!) c . Here, unfortunately, the descriptive tools are not a s well developed a s they are for syntax. W e will survey some formal tools later; but often semantics are simply described i n English. C . Another criterion i s CONSISTENCY WITH COMMONLY USED NOTATION, o r what Tucker calls EXPRESSIVITY. This point can b e illustrated best b y looking a t some violations o f this criterion. 1 . I n writing mathematical expressions, the normal practice i s t o write the arithmetic operators i n infix form - e.g. w e write instead o f or x + 1 + x 1 x 1 + (infix) (prefix) (postfix)

Most programming languages adhere t o this convention, though some don't - notably LISP (prefix) and FORTH (postfix). This makes programs i n these languages harder t o read and write - though one eventually gets used t o it. 2 . Again, i n writing mathematical expressions, certain conventions are normally understood with regard t o operator precedence. For example, 3 * x + 2 i s normally understood t o mean (3 * x) + 2 Most programming languages adhere t o conventional rules o f operator precedence, but some d o not. For example, i n APL the unparenthesized expression wouldbe interpreted a s 3 * (x + 2) 3 . I n conventional mathematical notation, the = operator means the ASSERTION that two things are equal, a s distinct from the assignment operation MAKE two things equal. a . Thus, Algol (and its desendants through Pascal) used = for comparison and : = for assignment.

b . Some other languages use = for both purposes, leading t o possible misreadings. (Examples: BASIC, COBOL). c . Other languages use = for assignment and something else for comparison, contrary t o conventional mathematical usage. (Examples: FORTRAN, and the C descendants o f ALGOL including Java). D . A language should have good facilities for INPUT-OUTPUT. 1 . This turns out t o b e one o f the hardest parts o f a language t o design, because there i s a n inevitable dependence here o n distinctive characteristics o f different hardware devices. Facilities for interactive I O t o terminals, for example, must b e somewhat different from those for reading/writing disk files. 2 . One issue that langauges handle quite differently i s whether the I O facilities o f a language should b e part o f the language definition itself, o r provided b y library procedures. a . FORTRAN, COBOL, and a number o f others take the first approach, with the language including READ and WRITE statements. b . C and its descendants (including Java) and Ada (among others) take the second approach. I O operations are done through calling library procedures. A basic I O library i s furnished with the language system, but a programmer i s free t o extend i t t o handle special needs. 3 . Another issue i s facilities for FORMATTING output. COBOL i s very strong o n this count, and others (like FORTRAN) include good facilities; while other languages are quite weak. (For example, formatted output i s very hard t o d o i n Java). E . A language should also b e UNIFORM. That is, similar constructs should have similar meanings. Again, this can b e illustrated b y a counter-examples: 1 . I n the C family o f languages (including Java), parameters t o functions are normally passed b y value. Thus, given the C function definition int f(x) int x ; { x = 2 * x; return x ; }; and a call t o the function int a = 2 ; b = f(a); / / a still has the value 2 here The assignment t o the formal parameter x i n f has n o effect o n the actual parameter a . 2 . But i f the parameter i s a n array, then i t i s passed b y reference instead. Thus, given the very similar function definition:

int f(x) int x[]; { x[1] = 2 * x[1]; return x[1]; }; and a call t o the function int a[2]; a[0] = 2 ; b = f(a); / / The value o f a[0] i s now 4 ! The assignment t o the formal parameter x i n f WILL ALTER the first element o f the actual parameter a . F . A similar concept i s that a language should b e ORTHOGONAL. A n orthogonal language i s one which has a limited number o f features, each o f which can b e understood b y itself, which can b e combined i n any way t o produce a variety o f results. 1 . This was a major design goal o f ALGOL68, but has been less characteristic o f most other languages. 2 . A s a counter-example, consider the matter o f types and functions i n Pascal. a . I n Pascal, a variable can b e declared t o b e o f any type, and a variable o f a given type can b e assigned a value o f that type b y a n assignment statement. b . A function can take parameters o f any type, but cannot return values o f certain types. (Specifically, i t cannot return a n array o r a record.) c . That is, t o use Pascal, one must not only learn about types and functions, but must also learn the rule that functions cannot return certain types. These two otherwise unrelated concepts i n the language interact i n a n unusual way. I n a truly orthogonal language, functions would b e able t o return values o f any type. d . The difficulty can b e pictured this way: Use | Variable | Parameter | Return value ------------------------------------------------------------Type Scalar | OK | OK | OK | -------------+-----------+------------+-------------| Structured | OK | OK | No! | ----------------------------------------------------(In a n orthoganal language, all six squares would b e OK)

G . A debatable criterion i s whether a language should b e GENERAL - i.e. capable o f tackling any type o f problem. 1 . Carrying this too far can lead t o failure. For example, i n the late 1960's, IBM promoted a language called PL/1 (programming language 1 ) that was intended t o replace FORTRAN, COBOL and ALGOL - among others b y incorporating facilities that would allow one t o d o everything one could d o with FORTRAN and COBOL, with the elegance o f ALGOL. This attempt, however, failed miserably; and though PL/1 i s still i n use, few use it. 2 . I n fact, some o f the most useful languages are those that are specifically designed for a particular class o f problems - e.g. those designed for programming numerically-controlled machine tools, o r solving civil engineering problems, o r the like. 3 . However, t o gain wide use, a language does have t o have broad usefulness for different kinds o f problems, s o generality i s generally a good thing! H . Finally, a language should have good PEDAGOGY - i t should b e easy t o learn. 1 . Several o f the features w e have already considered contribute t o pedagogy - e.g. consistency with commonly used notation, uniformity, orthogonality. 2 . O n the other hand, abundance o f features tends t o make a language hard t o use. a . That is, generality can sometimes conflict with pedagogy. b . However, i t i s possible t o achieve a good balance between both. One o f the reasons for Pascal's popularity a s a teaching language i s that i t i s powerful, but relatively small. One can master the entire language i n a n introductory course sequence. 3 . For larger languages, one approach that has sometimes been taken t o pedagogy i s the development o f subsets - i.e. smaller versions o f the language that include necessary features while excluding minor ones. Perhaps the most thorough example o f this i s a set o f subsets o f a variant o f PL/I, known a s SP/1, SP/2, SP/3 ... - each o f which includes more features than the preceeding one. I n the case o f Ada, though, this approach was explicitly ruled out the Department o f Defense, which holds the copyright t o the name Ada. II. Criteria Relating t o Software Engineering A . Beyond ease o f use, i t i s important that a programming language support the development o f CORRECT software, even when writing large systems. The next group o f criteria w e consider pertain t o support for good software engineering. B . I t i s important that the language make i t difficult t o make careless errors that g o undetected b y the compiler. Horowitz calls this characteristic RELIABILITY. 1 . Many early programming languages - and some recent ones - d o not require that a variable b e explicitly declared b y the programmer.

a . For example, i n FORTRAN, a variable that i s not explicitly declared i s implicitly declared the first time i t i s used, with its type being determined b y the first letter o f its name. (Names beginning with I..N are integers; all others are real) b . Why i s this bad? ASK

c . Consider what happens i f a programmer makes a typographical error, misspelling the name o f a variable. I n a language that requires that all variables b e declared, this will almost always b e caught b y the compiler a s a n "undeclared identifier" (unless the typo happens t o come out the same a s another variable.) I n languages like FORTRAN, though, the compiler will usually not catch such a n error. d . O f course, i t may b e argued that the requirement o f declaring every variable i s a n inconvenience for the programmer. But, i n this case, the inconvenience o f having t o track down a subtle bug due t o a mistyped variable i s even worse, s o it's worth it. 2 . A reliability feature related t o the requirement o f declaring variables i s type checking. a . A s you know, languages like Java check that the use o f a variable i s consistent with its declaration. For example, the following will b e detected a s erroneous: boolean b ; System.out.println(b*2); b . Many languages d o not d o this. For example, consider the following legal FORTRAN program: SUBROUTINE SUB(A) INTEGER A(100) ... D O 1 0 I = 1,100 A(I) = A(I) + 1 END ... REAL B , C ... CALL SUB(B) ... The compiler will permit this, and o n the first time through the loop the subroutine will treat the bit pattern for the real parameter B a s i f i t were a n integer, producing strange results from the addition. Even worse, o n subsequent times through the loop i t will move past the memory allocated t o B . Thus, the operation o n A(2) will probably b e done o n C , and the operation o f A(3) . . A(100) o n who knows what perhaps even the code will b e damaged!

100

3 . Finally, the commenting conventions o f a language are also a factor i n reliability. a . Languages tend t o approach commenting i n one o f three ways. i . I n some languages, comments occupy entire lines unto themselves. For example, FORTRAN uses a C i n column 1 t o specify a comment line, and COBOL uses a * i n column 7 for the same purpose. ii. Other languages use pairs o f comment delimiters t o bracket comments. A comment may start and end anywhere, s o you can have comments i n the middle o f a line o r extending over a full line o r many lines. For example, Pascal uses ( * and * ) o r { and } this way; C , Java and PROLOG use / * and */, etc. iii. Finally, some languages use a symbol t o start a comment, which may appear anywhere o n the line. The comment occupies the rest o f that line - i.e. the end o f the line closes the comment. For example, Lisp uses ; this way and Ada uses - - this way: i f DISCR < 0.0 then - - roots are complex S : = sqrt(-DISCR) ... iv. Also, there are languages that use multiple approaches - e.g. Java supports "remainder o f line" comments using / / b . What convention i s most reliable? Obviously, one will get differences o f opinion o n this point, but one approach does have the edge. i . The approach taken b y FORTRAN and COBOL tends t o discourage commenting, because a comment occupies a n entire line. I n particular, i t i s hard t o associate a comment about what a variable i s used for with its declaration - a good practice. ii. The approach taken b y Java and many other languages suffers from the danger o f mistyping the closing comment bracket. For example, almost everyone a t some time has added some comments t o a working program. I f one mistypes the closing bracket (e.g. a s * / o r the like), then all o f the code between that bracket and the end o f the NEXT comment i s "commented out". I f the code happens t o still b e syntactically correct, a working program may cease t o work without warning. iii. The designers o f Ada studied typical errors made b y programmers, and concluded that the convention they adopted was the most reliable. C . Another software engineering consideration i s the language's support for MODULARITY. A large software project i s typically constructed o f modules, each o f which interfaces with the rest o f the system i n certain well-defined ways. 1 . A subroutine facility, such a s that found i n FORTRAN o r COBOL, o r a block-structured procedure facility, a s i n ALGOL o r Pascal, i s one way t o address this need; but such facilities are not a s good a s they might be.

2 . Some languages, such a s Modula-2 and Ada, provide more sophisticated features t o support modular software, a s w e shall see later i n the case o f Ada. 3 . One o f the great strengths o f object orientation i s the modularity inherent i n the notion o f a class. D . A closely related issue i s SUPPORT FOR SEPARATE COMPILATION. 1 . For small programs, i t i s common for the entire program t o reside i n a single file that i s compiled a s a single unit. But for larger programs, i t i s almost essential t o allow the program t o b e spread over multiple files (perhaps 1000's) compiled separately. I n this way, when a change i s made, only the affected file(s) need t o b e recompiled. 2 . Many languages support this b y adding a separate step t o the program build process called LINKING. a . Each source file i s compiled t o produce a n object file. b . A separate linking step combines all the object files, together with needed code from libraries, into a single executable file. c . Example: a command such a s gcc o r g++ actually invokes both of' these steps. d . O f course, the longer the program, the more work the linking step has t o do. 3 . T o make this work, there must b e a mechanism whereby a module being compiled can b e aware o f "public" features o f other modules. This i s often handled b y splitting a module into two files. a . Example: the . h and .c/.cc files used b y C/C++. b . Some languages use a single source file, but the compiler produces two object files - one containing just declaration information and the other the actual code. Other modules need t o explicitly import the former, while the linker uses the latter. Examples: Ada, Modula c . Some languages include declaration information i n a single output file produced b y compilation that other modules can use via import. Example: Java .class files d . One tricky issue i s ensuring that, when the interface o f a module i s changed, other modules that depend o n i t are recompiled. There i s n o totally general solution t o this problem, short o f a "clean build".

E . Another important consideration i s the different DATA TYPES AND DATA STRUCTURING FACILITIES available i n the language. Ideally, the type structure should b e EXTENSIBLE, allowing the programmer t o easily create and use new data types t o fit the problem a t hand. 1 . FORTRAN i s a n example o f a language that i s particularly weak o n this score, having only arrays a s structured types - n o records o r pointer variables. Thus, what would b e done with structs/classes i n C like languages will have t o b e done with individual variables i n FORTRAN, and linked structures can only b e implemented b y using arrays o f nodes - dynamic storage allocation i s not possible. There i s n o facility for declaring new data types, either. A number o f other languages share this shortfall, including APL and BASIC. 2 . Most languages include a t least arrays, records, and pointers a s structuring facilities. Also, they typically include a data type creation facility, though the operations available o n user-created types are limited. Thus, user-created types are "second-class citizens". 3 . Object-oriented languages such a s Java carry this even further, o f course. 4 . Some languages (e.g. Ada, C++) even allow the standard operators t o b e redefined for user-defined data types. F . Another consideration that i s not a s often considered i n choosing a language i s PROVABILITY - the extent t o which the language lends itself t o using formal methods t o prove the correctness o f a program. 1 . A s you recall, i t i s possible t o construct a program proof b y embedding precondition and postcondition assertions into the program - e.g. / * data i s a n array o f integers * / max = data[0]; for (int i = 1 ; i < data.length; i ++) i f (data[i] > max) max = data[i]; / * max i s the largest element o f the array data * / 2 . I t would b e nice i f a programming language would make construction o f proofs like this fairly straightforward. Unfortunately, two characteristics found i n many programming languages tend t o make constructing proofs difficult. a . The goto statement complicates proofs, because one cannot b e sure what preconditions apply t o a statement i f i t can b e reached i n more than one way. b . The possibility o f two variables being ALIASES for one another complicates proofs - e.g. under some circumstances the postcondition given below might not b e valid: /* a == a0 && b == b0 */ a ++; /* a == a0 + 1 && b == b0 */

I n particular, i f a and b are aliases, then the assertion b = b 0 n o longer holds. 3 . T o facilitate proofs, some languages d o not have a goto statement (e.g. Java, which not only does not have the goto, but also makes goto a reserved word one cannot use!) and others have sufficient control structure flexibility t o make its use almost always unnecessary. A few languages also have mechanisms t o prevent aliasing from occurring (though none that w e will study i n this course.) III. Criteria Relating t o Performance A . Last o n our list - but not unimportant - are criteria relating t o how the language performs. B . First, w e consider the performance o f language translators. the language should lend itself t o FAST COMPILATION. Ideally,

1 . I n general, the more complex the syntax o f the language, the longer a program o f a given length will take t o compile. This was rather dramatically illustrated b y the compilers w e had o n our PDP-11/70. Student projects i n Pascal and FORTRAN would compile quite quickly. Programs written i n BASIC-PLUS-TWO would take much longer, and COBOL programs o f the same length would seem t o take forever! 2 . When developing large programs, i t i s nice t o have a separate compilation facility that allows the program t o b e spread over several files. When a small change i s made, only the affected file, and perhaps others that depend o n it, need t o b e recompiled. Most current languages support this. C . Also important i n many cases (but not all) i s that the compiler produce EFFICIENT OBJECT CODE. 1 . I n part, this i s a matter o f compiler technology; but some language features make this harder. 2 . O f course, this goal conflicts with the goal o f fast compilation. A n optimizing compiler spends extra time during compilation t o produce better object code. This i s nice for production software, but i s not a s pleasant during program development. 3 . Some languages are supported b y two compiler versions - a fast "checkout" compiler that produces less than optimal code, but which can b e used during debugging; and a n optimizing compiler that i s slower but produces production-quality code. Or, a single compiler may include a command line option t o turn optimization o n o r off o r even specify the degree o f optimization desired - trading off compilation time for execution time. Example: PL/I implementations included both a "checkout" compiler and a n optimizing compiler. Example: the gnu C and other compilers include a - O switch with possible values 0 (no optimization) o r 1 , 2 , 3 (increasing degrees o f optimization.)

(Unfortunately, sometimes the two compilers o r the one compiler with different optimization settings don't process the same language constructs i n exactly the same way. O f course, this i s more o f a compiler problem than a language problem per se) D . Last - but b y n o means least - w e consider the matter o f MACHINE INDEPENDENCE o r PORTABILITY. 1 . One o f the original reasons for adopting higher level languages was the desire t o b e able t o move a program from one type o f machine t o another without rewriting it. T o some extent, all higher-level languages achieve this goal; but some d o much better than others. 2 . Most important for portability i s the existence o f a well-defined and accepted language standard. a . Many languages have been standardized b y formal bodies like ANSI o r ISO. For others, the original report b y the language author may serve a s a standard - though not a s strong o f a one. Some languages have n o clear-cut standard, though. b . A standard does not help much i f different implementers o f the language choose t o g o beyond the standard with various extensions, each i n his own way. The classic example o f this i s BASIC. Despite the existence o f a n ANSI standard, n o two implementations are the same, due t o different extensions. (Actually, some also implement less than the standard.) Thus, i t i s important that the set o f features contained i n the standard b e complete enough t o help implementers resist the temptation t o add incompatible extensions. c . I n the case o f Ada, the Department o f Defense copyrighted the name Ada, i n order t o ensure that ALL implementations handle exactly the same language, thus facilitating portability. (In order t o use the name Ada, a compiler must pass a validation test that ensures i t handles the langauge exactly a s specified.) 3 . Standardization b y itself i s not enough, though, even when the standard i s adhered to. Certain characteristics o f the underlying machine have a way o f showing u p unavoidably i n the implementation. a . For example, every machine has a basic word length which determines the range o f integers that can b e processed b y regular machine instructions. i . Historically, microprocessor systems often used 1 6 bit integers, restricting the range o f integers t o -32768 . . 32767. ii. Many systems today use 3 2 bits, leading t o integers ranging from - 2 billion t o + 2 billion. iii. Still other systems use 6 4 bits, leading t o integers ranging from -100 trillion t o +100 trillion. iv. A program which relies o n the range o f integers available o n one machine may not run correctly o n another machine whose range o f values i s smaller. ( A similar phenomenon arises with the range o f values and precision o f real numbers.)

b . Again, different machines use different coding schemes t o represent characters, with ASCII, EBCDIC, and Unicode having wide use. This can lead t o problems, a s follows: i . I n ASCII, the codes for alphabetic characters are contiguous, without any gaps. This i s not true i n EBCDIC. For example, the EBCDIC code for I i s 201, while that for J i s 209. A program that relies o n the letters being contiguous - such a s a cipher program - will fail i f ported t o a n EBCDIC machine. ii. I n ASCII, upper case letters have codes 3 2 less than corresponding lower case letters. O n EBCDIC machines, their code i s 6 4 greater! A program that converts between lower and upper case letters may have a problem here. c . Java addresses these issues b y stipulating a s part o f the standard that byte, short, int, and long are - respectively - 8 , 16, 3 2 and 6 4 bit integers; and that characters are represented internally using Unicode.

You might also like