Professional Documents
Culture Documents
In this article, we provide an overview/review of the Java Collections API and how it evolved, concluding with a listing of best practices for using this API in your applications. The Collections API became a part of Java back when JDK 1.2 came on the scene, but many developers are still not familiar with it. Many of us are more than happy to continue using arrays and legacy collection classes like Hashtable and Vector that we are familiar with. The philosophy of "if it ain't broke, don't fix it" seems to hold. But the Collections API offers a structured paradigm for manipulating groups of objects in an ordered fashion, and employs design patterns (e.g., the iterator pattern) that make common programming tasks much simpler. With this in mind, let's review how the Collections API came to be, and offer best practices for its usage in Java applications.
If you are not familiar with the Collections API, or with the notion of collections as a programming construct, then you should definitely read the overview that follows. If you understand the fundamentals of Java Collections, know the difference between a Set, a List, and a Map, and have used these constructs in your own development work, then feel free to skip ahead to the Best Practicessection of this article. But... it might still be worth your while to look through the overview anyway to gain some additional insight into the purpose behind the Collections API.
Historical Perspective
Before we go into detail about how these mathematical formalisms are represented in the Java Collections API, let's examine how "legacy" collection classes found in earlier versions of Java tried to provide similar functionality, and where they fall short.
Vector
The Vector class was a more flexible extension of the notion of arrays common in virtually all programming languages. Before the existence of a Vector class, programmers used arrays to hold groups of similar objects. You could access and replace elements of an array by their position (i.e., via an array subscript), but you could not add (or remove) elements from the array without going through various conniptions to do so (usually involving deleting and reallocating the array with a greater or smaller size). The Vector class allowed dynamic addition and removal of members (without going through the aforementioned conniptions), which was a big improvement over arrays. What was lost was type safety: any type of object could be added to a Vector, and there was no requirement that the elements of a Vector all be of the same type. It became the responsibility of the programmer to cast an accessed element of the Vector into its appropriate type.
int initialSize = 10 ; Vector v = new Vector(initialSize) ; String blahblah = "blah blah blah" ; String question = "What was that?" ; String answer = "It was me!" ; v.add(blahblah) ; v.add(question) ; v.insertElementAt(answer, 0) ; v.setElementAt("not blah blah anymore", 1) ; String firstElement = (String) v.elementAt(0) ;
Enumeration
Even with all these changes, the process of traversing the members of a Vector still required the same coding structure as for arraysa for-loop that walks through the element subscripts, ranging from zero to the size minus one, to access each element by its subscript.
for (int i = 0; i < v.size(); i++) { String elementValue = (String) v.elementAt(i) ; System.out.println("Element " + i + " = " + elementValue) ; }
Enter the Enumeration interface, which encapsulated the process of walking through a group of entities in order according to the well-known Iterator pattern. The elements() method in both the Vector and Hashtable classes returns an Enumeration of the elements they contain.This meant that to traverse the elements of a Vector you could do the following:
Enumeration e = v.elements() ; while (e.hasMoreElements()) { String elementValue = (String) e.nextElement() ; System.out.println("Element " + i + " = " + elementValue) ; }
This example demonstrates some of the efficiencies associated with using Enumerations. Note that there is no need for a "loop variable" to keep track of position within the array or Vector; all of the loop control occurs through the use of
thehasMoreElements() and nextElement() methods. Ultimately, though, Enumerations do only half the job, especially when it isEnumerations that are returned by methods like getHeaderNames() in the Servlet API. If a "get" method returns an Enumerationover some collection of objects, all you could do with it is (as the name implies) to enumerate through it. You could not easily test for presence or absence of an element in the Enumeration without walking through it element by element. Ideally, what should be returned from a method like getHeaderNames() is an object that provides methods that test for presence or absence of a particular element and that return an Iterator to walk through the members if desired. In other words, a Collectionobject with a well-defined set of semantics for accessing and manipulating elements.
Hashtable
Mapping functionality prior to the advent of the Java Collections API was provided by the Hashtable class. It included methods for adding or replacing a value in the Hashtable and associating it with a key (put(Object key, Object value)), for retrieving the value associated with a provided key ( get(Object key)), for determining whether the Hashtable contains a particular key or value (contains(Object value) and containsKey(Object value)), and for asking how many elements the Hashtable contains (size()). Pretty much all the Java classes that were used for mapping were descendants of the Hashtable class, including theProperties class. (Hashtable extended the abstract Dictionary class.)
Hashtable ht = new Hashtable() ; String name = "..." ; CustomClass cc = new CustomClass(...) ; ht.put(name, cc) ; ... CustomClass cc2 = (CustomClass) ht.get(name) ; if (ht.containsKey(name)) { ... }
5.
6.
7.
8.
The Collections API has at its root a Collection interface, which the Set and List interfaces extend. The Collection interface includes methods for addition and removal of elements, as expected. It also includes a contains(Object obj) method, which returns a boolean value depending on whether the Collection contains the referenced object instance or not, obviating programmers of the need to walk through a collection to determine if it contains a particular element. It also contains many other methods to support more elaborate operations and tests, including a clear() method, a size() method, and an isEmpty() method. By definition, every Collection must include an iterator() method that returns an object that implements the Iterator interface, allowing traversal over the Collection's elements. The Iterator interface's hasNext() and next() methods correspond closely to the hasMoreElements() and nextElement() methods of the Enumeration interface. However, Iterators differ fromEnumerations in that they also have a remove() method that deletes the current element being traversed. As with the Collections themselves, the remove() method in an Iterator can throw an UnsupportedOperationException if the underlyingCollection is unmodifiable.
Back and Forth Lists, in addition to an Iterator, can also produce a ListIterator, which provides bidirectional traversal through a List's elements through the previous() and hasPrevious() methods, as well as providing information about position within the List via the nextIndex() and previousIndex()methods. The Map interface, while not strictly a "collection" in the mathematical sense, plays well with the other members of the API by providingcollection views of a Map's keys and values via the keySet() and values() methods. In fact, Maps are implemented as a Set of instances of the Map.Entry inner class, and a "set view" of the Map is available via the entrySet method, thus making the Mapinterface, strictly speaking, a Collection (actually a Set) under the hood. Naturally, since these collection views are Collectionobjects, their elements can be traversed by producing an Iterator through the iterator() method, e.g.:
Iterator i = myMap.keySet().iterator();
The following table summarizes what each of the various interfaces in the Collections API does, and delineates their associated implementations. This is only a summary, and the Java API documentation should be examined for more complete information.
Implementations
Legacy Implementations
Collection add(Object obj) addAll(Collection coll) remove(Object obj) removeAll(Collection coll) retainAll(Collection coll) contains(Object obj) containsAll(Collection coll) clear() iterator() size() isEmpty() toArray() toArray(Object[] array)
AbstractCollection
Set
same as Collection
SortedSet same as Set plus comparator() first() last() headSet(Object toElement) tailSet(Object fromElement) subSet(Object fromElement, Object toElement) remove(int index) List
same as Collection plus
AbstractSequentialList ArrayList
Vector Stack
add(int index, Object obj) addAll(int position, Collection coll) get(int position) set(int position, Object obj) indexOf(Object obj) lastIndexOf(Object obj) listIterator() listIterator(int startPos) Map put(Object key, Object value) putAll(Map map) get(Object key) remove(Object key) containsKey(Object obj) containsValue(Object obj) clear() keySet() values() entrySet() size() isEmpty() SortedMap same as Map plus comparator() firstKey() lastKey() headMap(Object toKey) tailMap(Object fromKey) subMap(Object fromKey, Object toKey) Iterator hasNext() next() remove() ListIterator AbstractMap HashMap Dictionary Hashtable
TreeMap
Enumeration*
Reminder Remember that methods associated with modification of a collection (e.g., add(Object obj)) are considered "optional" in a Collection interface implementationmodification operations may be written to throw an UnsupportedOperationException.
Best Practices
Now that we've covered the fundamentals of the Java Collections API, let's examine the best practices for making use of it.
Arrays vs. Collections Using Sets Using Lists Using Maps Using Wrapper Classes Bulk Methods for Addition, Removal, and Testing Comparing and Sorting with Comparables and Comparators Using the Static Utility Methods of the Collections Class Writing Your Own Custom Classes Using Collection Objects as Member Variables in JavaBeans
Using Sets
Use a Set when you have a "set" of objects that will not contain duplicates, where you're concerned more about presence/absence of a particular element value than order. In a HashSet, you can add and remove elements at will. Adding an element value that's already there does not change the set. The Set's Iterator will return the element values, but not in any guaranteed order.
group.add(someValue) ; group.add(anotherValue) ; if (group.contains(anotherValue)) { // DO WONDROUS THINGS HERE } Iterator i = group.iterator() ; while (i.hasNext()) { // Casting always necessary Class object = (Class) i.next() ; System.out.println(object.toString()) ; } Use a SortedSet if you want the Set's Iterator to return the element values in a natural sorted order (e.g., alphabetical, numeric). The add(Object obj) method will ensure that the
element is put in the "right" place. Note that there is added cost in creating and using a SortedSets, because the addition and removal operations are obviously more complex. The primary implementation of the SortedSet interface is the TreeSet.
Set orderedGroup = new TreeSet() ; orderedGroup.add("b") ; orderedGroup.add("a") ; orderedGroup.add("c") ; Iterator i = orderedGroup.iterator() ; while (i.hasNext()) { Class object = (Class) i.next() ; System.out.println(object.toString()) ; } // Result = "a", "b", "c"
You always have the option of taking a non-sorted Set and constructing a TreeSet from it. The Iterator from the resulting object will provide elements in natural sorted order. Remember, though, that there is a cost associated with creating a new object on the fly for this purpose. If you require that members always be iterated in a sorted order (not necessarily the order in which they were entereduse a List for that!), construct a TreeSet from the beginning. (The same principle holds true for Mapsthere is a SortedMap interface and a corresponding TreeMap implementation that returns the Map's keys in a sorted order.)
Using Lists
Use a List when the order in which an element is added to the collection matters
myList.add(0, "goodbye") ; // inserts "goodbye" before "x" myList.add(0, "hello") ; // inserts "hello" before "goodbye" myList.remove(4) ; // then deletes "aaa" myList.set(3, "23j") ; // and replaces "17b" with "23j"
The Iterator associated with a List returns its items in the order that they were entered (taking into account additions/removals/replacements).
Iterator i = myList.iterator() ; while (i.hasNext()) { String s = (String) i.next() ; System.out.println(s) ; } System.out.println((String) s.get(2)) ; // Result = "hello", "goodbye", "x", "23j", then "x"
Its ListIterator (accessible via the List.listIterator() method) supplies not only the standard hasNext() andnext() methods associated with an Iterator, but also hasPrevious() and previous() methods if you want to go back and forth at will. The subList(from, to) method returns a List consisting a subset of the original List. You can convert a List into an array using the toArray() method, which returns an Object[]. You can also convert an array to a List using the Arrays.asList(Object[]) method. If there is a good deal of insertion and deletion, use a LinkedList.
Using Maps
If you want a lookup or mapping, use a Map class rather than the deprecated Hashtable. Add key-value mappings using theput(key, value) method.
Map myMap = new HashMap() ; myMap.put("id", new Integer(12345)) ; myMap.put("name", "John Doe") ;
To traverse through the mappings, first get the keys using the keySet() method, which returns the keys as a Set. Once again, if you have a Map that does not employ natural ordering and you desire to go through the keys in natural sorted order, you can wrap the keys in a TreeSet.
Map myMap = new HashMap() ; Set keys = new TreeSet(myMap.keySet()) ; Iterator i = keys.iterator() ; while (i.hasNext()) { Class key = (Class) i.next() ; Class value = (Class) myMap.get(key) ; }
Only objects can be used as members in Collections. This means that Java primitives like int, long, and char can't be used as members in Collections, since strictly speaking they are not Java objects. Thus, in order to include a primitive in a Collection, it must be "wrapped" in its corresponding wrapper class.
List myList = new Vector(12) ; int myNumber = 17 ; myList.add(new Integer(myNumber)) ; Map myMap = new HashMap() ; int id = 123456 ; String name = "Somebody's Name" ; myMap.put(new Integer(id), name) ; ... String retrievedName = (String) myMap.get(new Integer(123456)) ;
List availableOffers = ; Set seenOffers = ; // Delete all offers that have been seen from available offer list availableOffers.removeAll(seenOffers) ; Set myChosenItems = ; Set requiredItems = ; // Do something if not all required items were chosen if (! myChosenItems.containsAll(requiredItems)) { } mySet.addAll(someOtherSet) ;
There are two ways to provide custom sorting of elements in a Collection: 1. create your own custom class, implementing the Comparable interface (and its required compareTo(Object obj)method), as well as overriding the equals(Object obj) and hashcode() methods, or 2. create your own Comparator class that performs a custom comparison between two objects. By default, sorted collections (e.g., SortedSet) use the compareTo(Object obj) method of each object in the collection to determine ordering (i.e., where each object goes in the sequence). This method essentially compares the value of this object with that of another object. It returns: o a negative number if this object is less than the other object, o zero if the two objects are equal, or o a positive number if this object is less than the other object.
For this reason, all classes of objects added to sorted Collection objects (or used as keys or values in Maps) must implement the Comparable interface, which essentially requires that this compareTo method be implemented. This interface is implemented by the classes most commonly used in Collections, namely the String class and the various wrapper classes used to enclose Java primitives (e.g., Integer for int). Any custom classes of your own that you intend to use as members of Collections should also implement this interface and provide an implementation of the compareTo method. In addition, you should override your class's equals(Object o) andhashcode() methods to ensure proper behavior of these objects within Collections. Both equals(Object o) andhashcode() have "default" implementations in the java.lang.Object class (from which all other classes are descended), but these implementations are too generalized for use in custom classes, especially if those classes are to be used as members of Collections.
As per Java language recommendations, the compareTo method should be "consistent with equals", meaning that for any comparand (other object) that causes an object's compare(Object o) method returns zero, the equals(Object o) method should return true, and vice versa. The hashcode() method is used to facilitate storage and lookup of objects stored as members of a collection, and thus this method should always return the same value for two objects considered to be "equal" by the equals(Object o) method. You should explicitly implement this method in your custom classes, especially custom Collection classes.
If you want a special kind of sorted ordering (e.g., case insensitive), you can build a custom Comparator class and construct aTreeSet using an instance of that Comparator class as an argument to the constructor.
public class MyCaseInsensitiveComparator implements Comparator { public int compare(Object o1, Object o2) throws ClassCastException { String s1 = ((String) o1).toLowerCase() ; String s2 = ((String) o2).toLowerCase() ; return s1.compareTo(s2) ; } } Set strings = new TreeSet(new MyCaseInsensitiveComparator()) ; strings.add("acaa") ; strings.add("aBAa") ; strings.add("aaaa") ; Iterator i = strings.iterator() ; while (i.hasNext()) { String s = (String) i.next() ; System.out.println(s) ; }
Set specialNumbers = new HashSet() ; specialNumbers.add(new Integer(23)) ; specialNumbers.add(new Integer(42)) ; if (specialNumbers.contains(new Integer(42)) { ... }
Because the wrapper classes have equals(Object o) methods (and compareTo(object o) methods) that return true (or zero) for another object containing the same primitive value, the test above will return true. It is a very good idea to ensure that any custom objects you intend to add to collections (as members, values, or keys) should likewise implement the Comparableinterface, and override the compareTo(Object o) method for your class to ensure proper behavior of these objects within collections.
Set immutableSet = Collections.unmodifiableSet(mySet) ; Most Collection classes are by default not synchronized. You can make any Collection object synchronized by enclosing it in an synchronized wrapper. This
essentially wraps the object in another object where methods have been overridden so that they are all synchronized. This is useful when seeking a synchronized Map class that behaves like a Hashtable (which issynchronized while classes like HashMap are not).
If you want to write your own collection classes, there are two approaches: 1. Extend the abstract collection classes (AbstractSet, AbstractList, AbstractMap), since they have implemented most of the tricky interrelated methods already. Each of these classes leaves only a few specific methods to be implemented in the descendent concrete class.
5.
Write a wrapper class by including a collection object as a member variable in your class, then proxying the implementation of all methods required by the collection interface you are using to operate on the underlying member collection object, overriding only when necessary to implement specific behavior.
6. public class MyPowerfulList implements List { 7. 8. private List m_innerList ; 9. 10. public MyPowerfulList(List p_list) { 11. m_innerList = p_list ; 12. } 13. 14. public boolean add(Object o) { 15. return innerList.add(Object o) ; 16. } 17. 18. ... 19. 20. }
public class CheeseInformationService { private Set m_favoriteCheeses = new TreeSet() ; private Map m_cheeseSharpnessMap = new HashMap() ; public void addFavoriteCheese(String cheeseName) { m_favoriteCheeses.add(cheeseName) ; } public void removeFavoriteCheese(String cheeseName) { m_favoriteCheeses.remove(cheeseName) ; } public void setCheeseSharpness(String cheeseName, int sharpness) { m_cheeseSharpnessMap.put(cheeseName, new Integer(sharpness)) ; } public int getCheeseSharpness(String cheeseName) { return ((Integer) m_cheeseSharpnessMap.get(cheeseName).intValue() ;
} // BAD - Because it exposes the object to external manipulation public Map getCheeseSharpnessMap { ... } // BAD - Replaces the member collection with a new collection object public void setFavoriteCheeses(Set p_set) { ... } // BETTER - Use xxxxAll methods to control bulk additions/removals public void addFavoriteCheeses(Set anotherCheeseSet) { m_favoriteCheeses.addAll(anotherCheeseSet) ; } } CheeseInformationService cis = ... ; int sharp = cis.getCheeseSharpness("gouda") ; cis.setCheeseSharpness("cheddar", 80) ;
Encapsulation Remember that it is better to encapsulate this kind of functionality in a discrete method than to expose the underlying object to programmers and rely on them to perform the manipulations appropriately. Say, for instance, that the specification for setting cheese sharpness changes, requiring that some other operation be performed in addition to setting the value for an entry in the Mape.g., perhaps a boolean flag needs to be set. You must now search your application's code base for all situations where cheese sharpness was set manually, and modify the code accordingly. You must also ensure that, in the future, programmers remember to set that boolean flag manipulating the underlying Map directly.
public void setCheeseSharpness(String cheeseName, int sharpness) { m_cheeseSharpnessMap.put(cheeseName, new Integer(sharpness)) ; if (sharpness > 100) { m_hasVerySharpCheese = true ; } } ... cis.setCheeseSharpness("jalapeo pepper jack", 150) ;
public List getMyStuff(Map m) { List myList = new Vector() ; myList.addAll(m.values()) ; return(myList.subList(1, myList.size() - 2) ; } HashMap hm = new HashMap() ; ... List stuff = getMyStuff(hm) ; Normally, inserting an item into a Map requires two arguments: the key and the value. By defining
one of the object's attributes as an "ID" (like a primary key in a database table), you can write methods that add it to a Map with one argument. Remember that if the object's id is a primitive like an int, you must convert it to a wrapper object in order to use it as a key in a Map.
private Map m_entityMap = new HashMap() ; public void addEntityToMap(Entity entity) { int id = entity.getId() ; m_entityMap.put(new Integer(id), entity) ; } public void getEntityFromMap(int id) { return m_entityMap.get(new Integer(id)) ; } Collections can, of course, be populated from external sources such as databases. You can
use a form of lazy initializationto get the required mapping from the database when the key-value mapping is explicitly requested for the first time, then save it in the Map for subsequent retrievals. In this case the getter is in essence a setter, too (since it "sets" the value when you try to "get" it), but you cannot set a value "manually". (Obviously this is a read-only implementation of this idea; there could be setter functions that modify the contents of this Map and persist them to the database when necessary.
private Map m_entityMap = new HashMap() ; public void getEntity(int id) { Integer idInteger = new Integer(id) ; Entity e ; if (! m_entityMap.contains(idInteger)) {