You are on page 1of 17

Lecture 2: Arrays, Linked Lists, and Recursion

31/7/2014
Topic: Stack and heap memory

Array
Contiguous uniform storage
Pros:
Constant time access to indexed memory location: O(1)
Memory efficient
Cons:
Resizing requires copying to new array
Worst case for sorted insert is must shuffle entire array

Linked List
Pros:
Insertion: O(1)
Cons:
Does not allow access via index must traverse entire list: O(n)
Uses more memory
Types:
Single linked list
Double linked list
o Easy to insert new node before a certain node

Recursion

Linear recursion
Tail recursion
Binary recursion
Multiple recursion

Lecture 3: Abstract Data Types


31/7/2014

Model of data structure that specifies:


o Type of data stored
o Operations supported and types of parameters
Specifies what each operation does, but not how

In Java: interface
Factory Pattern can be used to allow different ADT implementations to be
instantiated at runtime

Lecture 4 Counting Primitive Operations


4/8/2014
Primitive Operations
Assumed to take constant time
Big-O

f ( n ) is O ( g ( n ) )

f ( n ) cg ( n )

(Big Omega)

Describes a lower bound

Big-

n n0

cg ( n ) f ( n)

for

cn 0 such that

Describes an upper bound

Big-

if there are positive constants

(Big Theta)

c 1 g ( n ) f ( n ) c2 g(n)
Describes a tight bound

f (n) is asymptotically equal to

g(n)

Lecture 5: Asymptotic Analysis


7/8/2014

Lecture 6:
7/8/2014

Lecture 7: The Stack ADT and Array-based Stacks


11/8/2014
The stack ADT:

Insertion/deletion: LIFO
Main operations:
o push(object)
o pop()
Auxiliary operations:
o top()
o size()
o isEmpty()
For empty stack, pop and top return null

Java Virtual Machine (JVM)


Keeps track of chain of active methods with a stack
Performance and limitations
Let n be no. of elements, N >= n fixed sized of array
Space: O(N)
Time: O(1)
Trying to push element into full stack exception
Note: Using linked list instead would give space O(n)

Lecture 8: The Queue ADT and Array-based Queues


14/8/2014
Queue ADT:
Insertion/deletion: FIFO
Main operations:
o enqueue(object)
o dequeue()
Auxiliary operations:
o first()
o size()
o isEmpty()
Array-based queue
Array size N
Two vars keep track of front and size

Array location

r=( f +sz ) mod N

is the first empty slot past the rear of the

queue

List and Iterator ADTs and Amortisation

Growable Array-based array list:


push(o) adds o to the end of list
o When array is full, replace array with a larger one
Incremental strategy: Increase size by constant
Doubling strategy: double the size
o Worst time complexity: O(n). Best: O(1)
Amortized time of a push operation is the average time taken by push: T(n)/n
Turns out doubling strategy is better
Positional list ADT:
Accessor
o first()
o last()
o before(p)
o after(p)
Update
o addFirst(e)
o addLast(e)
o addBefore(p,e)
o addAfter(p,e)
o set(p,e)
o remove(p)
Natural implementation: doubly-linked list
Iterator
next()
hasNext()

Lecture 9: Trees
18/8/2014
Tree: Hierarchical structure of nodes with parent-child relation
Traversal
Preorder traversal
o A node is visited before its descendants

Postorder traversal

A node is visited after its descendants

Binary tree
Each node has at most two children: left child and right child
Proper binary tree: each node has exactly zero or two children
Inorder traversal
o A node is visited after its left subtree and before its right subtree

Lecture 10: Trees continued


22/8/2014
Evaluate Arithmetic Expressions
Euler Tour Traversal
Preorder, postorder, inorder traversals are special cases of this
Walk around tree and visit each node three times

Linked Structure for Trees

Node stores:
o Element
o Parent node
o Sequence of children nodes
For binary tree, node stores:
o Element
o Parent node
o Left child node
o Right child node
Node implement position ADT

Array-Based Binary Tree


Node v is stored at A[rank(v)]
o rank(root) = 0
o Left child: rank(node) = 2 * rank(parent(node)) + 1
o Right child: rank(node) = 2 * rank(parent(node)) + 2

O(2n ) , all nodes on right side

Worst case space:

May be ok to use if every row is filled. Saves a little space by not needing
links

Priority Queue
Priority Queue ADT
Entry: pair(key, value)
Main methods:
o insert(k, v)
o removeMin(): removes and returns entry with smallest key, or null if
empty
Additional methods:
o min()
o size()
o isEmpty()
Ordering

x y y x

Comparability:

Antisymmetric property:

Transitive property:

Entry ADT
Methods:
getKey()
getValue()

x y y x x= y

x y y z x y

Comparator ADT
Compare(x,y) returns i such that:

i< 0 if a<b

i=0 if a=b

i> 0 if a>b

Sequence-based priority queue


Unsorted:
o Insert: O(1)
o removeMin and min: O(n)
Sorted:
o Insert: O(n)
o removeMin and min: O(1)
Selection-Sort
PQ-sort function implemented with unsorted sequence
Running time
o n insert operations takes O(n)

Runs in

O ( n2 )

time

Insertion-Sort
PQ-sort function implemented with sorted sequence

Runs in

O(n2 ) time

In-place insertion-sort
Use swaps

Lecture 11: Heap


25/8/2014
Heap
A binary tree storing keys at nodes and satisfying properties:
o

Heap-Order: for every internal node

other than root,

key ( v ) key ( parent ( v ) )


o

Complete Binary Tree: Fill top to bottom, left to right.


depth

2i

nodes at

The last node of a heap is the rightmost node of maximum depth

Height of heap:

O ( log n )

Insertion: Insert after last node, then restore heap-order by swapping with
parent
Removal: Replace root with last node, then set new last node. Restore
heap-order (downheap)
Updating the last node:
o If previous left node is left child, go to right child. Otherwise:
o Go up from previous last node until left child or root
o If left child is reached, to the right child
o Go down left until a leaf is reached
o If root is reached, go left until new row
o

Have to traverse height twice at most, so

n
log
O

Heap-Sort

O ( n)

Space:

Insert and removeMin:

Size, isEmpty, min:

Heap-sort:

O ( log n )

O ( 1)

O ( n log n )

Array-based heap implementation


Can represent heap with n keys by array of length n
For node at rank I, left child 2i + 1, right child 2i + 2
Merging two heaps
Merge two heaps and key k by storing k as root with the two heaps as
subtrees, then perform downheap
Bottom-up heap construction

Constructs a heap from n given keys in

O(n)

time

Lecture 12: Adaptable Priority Queue


29/08/2014
Adaptable Priority Queue ADT:
replaceKey(e, k)
- returns previous key of entry e
replaceValue(e, v) - returns previous value

Location-aware list

Each entry stores:

o
o
o

key
value
position (or rank)

Maps and Hashtables


Map
Searchable collection of key-value entries
Multiple entries with the same key are not allowed
Map

ADT
get(k)
put(k,v)
remove(k)
size(), isEmpty()
entrySet()
keyset()
values()

Hash function
Usually specified as composition of hash code and compression:

keys integers [0, N1]

To minimise collisions when combining sequence of components, use


polynomial accumulation, e.g.

p ( z )=ao +a 1 z +a2 z 2+ z= prime number e . g . 33

Polynomials can be evaluated in

O(n)

time with Horners rule

Lecture 13
1/9/2014
Collision handling
Separate chaining: let each cell in table point to linked list of entries
Open addressing: place colliding item in different cell
Linear probing
o Place colliding item in next (circularly) available table cell
o Future collisions longer sequence of probes (inspections)
o remove(k): if entry (k,o) is found replace with DEFUNCT and return
element o, else return null
Double hashing
o Have secondary hash function d(k)

If collision, place in first available cell at

( h ( k ) + jd ( k ) ) mod N , j=1,2,3,
Performance of hashing

Worst case search/insert/remove on hash table is

O(n)

(i.e.

everything hashes to same spot)

=n /N

Load factor:

Assuming hash values are like random numbers, expected number of


probes for an insertion with open addressing is

1/(1 )

O(1)

Settle for average

In practice hashing is very fast provided load factor is not close to 100%

Set, Multiset, Multimap, Skip Lists


10/9/2014

Set
Set ADT

S T

addAll(T):

retainAll(T):

removeAll(T):

S T

ST

Implementations
Sorted list
o

Generic merging: Runs in


(>,<,==) run in

O ( n A +n B )

provided auxiliary methods

O ( 1)

Generic Merging

Auxiliary methods aIsLess, bIsLess, bothAreEqual can be replaced for e.g.


intersection, union

Runs in

O ( n A +n B )

provided auxiliary methods run in

Multimap
Similar to map, but can have multiple entries with same key

Skip Lists

O(1)

A data structure for maps that uses a randomized insertion algorithm

A series of lists

Sh

S0 , S1 , , Sh

contains only the two keys

E.g. search for 78

Implementation
Quad-node: stores entry, link to nodes prev, next, below, above
Define special keys PLUS_INF, MINUS_INF
Space Usage
h

Expected number of nodes:

2ni n 21i 2 n O ( n )
i=0
i=0

Height

With high probability, height is

log n
O )

Time
Search time proportional to number of drop-downs + number of scanforwards.

O(log n) , scan-forward also

Drop-down bounded by height:

Expected search, insertion and deletion time is

O(log n)

O(log n)

(with high

probability)

Binary search

Can perform nearest neighbour queries on an ordered map that is


implemented with an array, sorted by key
At each step, the number of candidate items is halved

O(log n)

Search tables
Ordered map implemented by sorted sequence

O ( log n )

Search using binary search:

Insert/remove:

Good for small maps or when search is the most common operation

O ( n)

Binary search tree


Binary tree storing keys/key-value entries at its internal nodes

Let u be In left subtree of v, w in right subtree.

External nodes do not store items

key ( u ) v key ( w )

Deletion

Performance
For ordered map with

Space used is

O ( n)

items implemented by binary search tree with height

Get, put and remove take

Height is

O(n)

O ( h ) time

in worst case (spread in a line) and

O(log n)

in best

case

Search Tree
8/9/2014

AVL Tree

Binary search tree with height balance: for every internal node v, heights
of children of v can differ by at most 1

n
log
O

Height of AVL tree storing

Fixing height-balance property after insertion: trinode restructuring

Search Trees
11/9/2014

Multi-way search tree

keys is

(2, 4) tree
A multi-way search tree with:
o Node-Size property: Every internal node has at most four children
o Depth property: All external nodes have the same depth

h log ( n+1 )

search takes

O(log n)

Splay tree
18/10/2014

Is a binary search tree, but after each operation do splaying


Does not enforce logarithmic upper bound on tree height
o

Worst-case time complexity of search, delete and inserts is

Amortised time complexity is

O(n)

O ( log n )

Efficient for looking up frequently accessed elements thanks to splaying

Splaying
Move a node to root using rotation
Performed after all operations
See slides for
o Which node to splay
o Zig/zag operations. Note: easier to understand with actual numbers
for nodes
Splaying costs
o O(h) rotations, each O(1). Still O(n) worst-case
o Amortised cost is O(log n)
Time T(n) needed to perform a series of n splay operations,
divided by n, i.e. T(n)/n

Red-Black Trees
18/9/2014

A red-black tree is a representation of a (2,4) tree by means of a binary


search tree whose nodes are coloured red or black
o Has same logarithmic worst-case asymptotic time complexity for
search, insert, removal
o Simpler implementation with a single node type
Is a binary search tree with:
o Root property: root is black
o External property: every leaf is black

o
o

Internal property: the children of red node are black


Depth property: All leaves have the same black depth

Height
Height O(log n)
o Proof: height is at most twice the height of associated (2,4) tree,
which is O(log n)
Search works the same way as for binary search tree. O(log n)
Insertion
Colour the newly inserted node red, unless root.
If there is double red,

Sorting & Selection


22/9/2014
Merge-sort
Divide and conquer
Like heap-sort, has O(n log n) time
Unlike heap-sort
o Does not use an auxiliary priority queue
o It accesses data in a sequential manner (suitable for sorting data on
a disk)
Algorithm
o

Divide: partition

into

S1

each.

Recur: recursively merge-sort

Conquer: merge

S 1 and

S2

and

S1

S2

and

of about n/2 elements

S2

into a unique sorted sequence

Merging two sorted sequences implemented by doubly linked lists, each


length n/2, is O(n). Height is O(log n). At each recursive call the sequence
length is halved. Thus total running time of merge-sort is O(n log n)

Quick-Sort

Is a randomised sorting algorithm based on divide-and-conquer paradigm


Algorithm
o Divide: pick a random element x (called pivot) and partition S into
L: elements less than x
E: elements equal x
G: elements greater than x
Note: each insert/remove is at beginning/end of sequence, so is
O(1). Thus partition takes O(n)
o Recur: sort L and G
o Conquer: join L, E and G

Worst-case running time is when pivot is min/max

Expected height is O(log n), work done at nodes of same depth is O(n)
Expected running time of quick-sort is O(n log n)

O(n )

In-place quick-sort
See slides

Shortest Paths
13/10/2014
Dijkstras Algorithm
Solves single-source shortest path problem
Similar to uniform cost search, but no goal vertex
Used on weighted graph with non-negative edge weights

Grow a cloud of vertices starting at source vertex s. Each step add


vertex with smallest distance label d

Edge relaxation:

min { d ( z ) , d (u )+ weight ( e ) } d ( z )

Text Processing Tries


20/10/2014
Trie

Trie: A compact data structure for representing a set of strings, e.g. all the
words in a text
Supports pattern matching queries in time proportional to the pattern size
Standard trie: ordered tree with each node labelled with a character.
Children alphabetically ordered
Compressed trie: Every internal node is at least degree 2. Obtained from
standard trie by compressing chains of redundant nodes.
See slides for time complexity

Compact representation
Compact representation of compressed trie, store ranges of indices at
nodes instead of strings.
Suffix Trie