You are on page 1of 39

Array and string

Array is a very basic data structure representing a group of similar elements, accessed by index. Array data structure can be effectively stored inside the computer and provides fast access to the all its elements. Let us see an advantages and drawbacks of the arrays.

Advantages

No overhead per element. Any element of an array can be accessed at O(1) time by its index.

Drawbacks

Array data structure is not completely dynamic. Many programming languages provides an opportunity to allocate arrays with arbitrary size (dynamically allocated array), but when this space is used up, a new array of greater size must be allocated and old data is copied to it. Insertion and deletion of an element in the array requires to shift O(n) elements on average, where n is size of the array.

Static and dynamically-allocated arrays


There are two types of arrays, which differ in the method of allocation. Static array has constant size and exists all the time, application being executed. Dynamically allocated array is created during program run and may be deleted when it is not more needed. Dynamically allocated arrays can be quite large, even bigger, than amount of physical memory. Yet, dynamically allocated array can not be resized. But you can expand an array as noted below: 1. Create new array of bigger size; 2. Copy data from old array to the new one; 3. Free memory, occupied by the old array.

Fixed-size and dynamic arrays


As it mentioned above, arrays can't be resized. In this case array is called fixed-size array. But we can use a simple trick to construct a dynamic array, which can be resized. The idea is simple. Let us allocate some space for the dynamic array and imaginary divide it into two parts. One part contains the data and the other one is free space. When new element is added, free space is reduced and vice versa. This approach results in overhead for free space, but we have all advantages of arrays and capability of changing size dynamically. We present some definitions about this kind of arrays below.

Dynamic array has its capacity, which shows the maximum number of elements, it can contain. Also, such an array has the logical size, which indicates, how much elements it actually contains. For instance, we would like to find minimum of the values user entering. We allocate space to store 15 elements, but user has entered only 5 numbers. In the example, capacity of an array is 15 elements, but logical size is 5 elements. When dynamic array becomes full, it must be expanded by creating new larger array and copying elements from the old array to the new one. Notice, that copying arrays is supported by the hardware and can be done very efficiently. Example. Dynamic array with capacity 10, logical size 5. 1 5 7 -8 4 0 -49 15 86 46 NB. Dynamic arrays most often is also dynamically allocated so that they can be expanded. More information about dynamic arrays and their implementation can be found here: Dynamic Array.

Connection with strings


We consider null-terminated strings here. Strings are similar to the dynamic arrays, but their logical size is indicated by null character. Therefore, its capacity is always one element more, than the maximum logical size. Logical size of the string called length. Example. ASCII string "Hello!", represented inside the computer. H e l l o ! \0 72 101 108 108 111 33 0

Code snippets
Sample program finds a minimal value among entered. Note that Java allows only dynamically allocated arrays.

Java
import java.util.Scanner;

public class Arrays { public static void main(String[] args) { Scanner keyboard = new Scanner(System.in);

// dynamically allocated array int arr[] = new int[15]; int n = 0; int value = 0; System.out.println("Enter values. Type \"-1\" to stop: "); while (n < 15 && value != -1) { value = keyboard.nextInt(); keyboard.nextLine(); if (value != -1) { arr[n] = value; n++; } } if (n == 0) { System.out.println("You have entered no values, bye!"); } else { int minimum = arr[0]; for (int i = 1; i < n; i++) { if (arr[i] < minimum) minimum = arr[i]; } System.out.print("The minimal value is " + minimum); } } }

C++
#include <iostream>

using namespace std;

int main() { // static array int arr[15]; int n = 0; int value = 0; cout << "Enter values. Type \"-1\" to stop: "; while (n < 15 && value != -1) { cin >> value; if (value != -1) { arr[n] = value; n++; } } if (n == 0) { cout << "You have entered no values, bye!"; } else { int minimum = arr[0]; for (int i = 1; i < n; i++) { if (arr[i] < minimum) minimum = arr[i];

} cout << "The minimal value is " << minimum; } return 0; }

Dynamic arrays
One of the problems occurring when working with array data structure is that its size can not be changed during program run. There is no straight forward solution, but we can encapsulate capacity management.

Internal representation
The idea is simple. Application allocates some amount of memory and logically divides it into two parts. One part contains the data and another one is a free space. Initially all allocated space is free. During the data structure functioning, the boundary between used / free parts changes. If there no more free space to use, storage is expanded by creating new array of larger size and copying old contents to the new location. Dynamic array data structure has following fields:

storage: dynamically allocated space to store data; capacity value: size of the storage; size value: size of the real data.

Functions

Capacity management: Ensure Capacity, Pack Data access functions: Set, Get, InsertAt, RemoveAt

Code snippets
Java
There is no need to store capacity in Java. Use storage.length to get it.
public class DynamicArray { private int[] storage; private int size;

public DynamicArray() { storage = new int[10]; size = 0; }

public DynamicArray(int capacity) { storage = new int[capacity]; size = 0; } }

C++
class DynamicArray { private: int size; int capacity; int *storage; public: DynamicArray() {

capacity = 10; size = 0; storage = new int[capacity]; }

DynamicArray(int capacity) { this->capacity = capacity; size = 0; storage = new int[capacity]; }

~DynamicArray() { delete[] storage; }

};

Capacity management: Ensure Capacity, Pack


Capacity management mechanism should be developed first, before we can add or remove values. The mechanism consists of two functions: ensure capacity and pack.

Ensure capacity
Before value or several values is added, we should ensure, that we have enough capacity to store them. Do the following steps:

check, if current capacity isn't enough to store new items; calculate new capacity by the formula: newCapacity = (oldCapacity * 3) / 2 + 1. Algorithm makes a free space reserve in order not to resize the storage too often. check if new capacity is enough to store all new items and, if not, increase it to store exact amount of items;

allocate new storage and copy contents from the old one to it; deallocate the old storage (in C++); change the capacity value;

Enlargement coefficient can be chosen arbitrary (but it should be greater, than one). Proposed value is 1.5 and it is optimal on average. Example. capacity = 6, size = 6, want to add 1 new item.

Pack
When items are removed, amount of the free space increases. If there are too few values in the dynamic array, unused storage become just a waste of space. For the purpose of saving space, we develop a mechanism to reduce capacity, when it is excessive.

check, if size is less or equal, than half of the capacity; calculate new capacity by the formula: newCapacity = (size * 3) / 2 + 1. Algorithm leaves exact the amount of space, as if storage capacity had been trimmed to the size and then method to ensure capacity was called. allocate new storage and copy contents from the old one to it; deallocate the old storage (in C++); change the capacity value.

Example. capacity = 12, size = 6, do packing.

Lower boundary for size, after which packing is done, may vary. In the current example it is 0.5 of the capacity value. Commonly, pack is a private method, which is called after removal. Also, dynamic array interface provides a trim method, which reduces capacity to fit exact amount of items in the array. It is done from the outside of the implementation, when you are sure, that no more values to be added (for instance, input from user is over).

Code snippets
Both Java and C++ provides efficient tools to copy memory, which are used in the implementations below.

Java
import java.util.Arrays;

public class DynamicArray {

public void ensureCapacity(int minCapacity) { int capacity = storage.length; if (minCapacity > capacity) { int newCapacity = (capacity * 3) / 2 + 1;

if (newCapacity < minCapacity) newCapacity = minCapacity; storage = Arrays.copyOf(storage, newCapacity); } }

private void pack() { int capacity = storage.length; if (size <= capacity / 2) { int newCapacity = (size * 3) / 2 + 1; storage = Arrays.copyOf(storage, newCapacity); } }

public void trim() { int newCapacity = size; storage = Arrays.copyOf(storage, newCapacity); } }

C++
#include <cstring>

void DynamicArray::setCapacity(int newCapacity) { int *newStorage = new int[newCapacity]; memcpy(newStorage, storage, sizeof(int) * size);

capacity = newCapacity; delete[] storage; storage = newStorage; }

void DynamicArray::ensureCapacity(int minCapacity) { if (minCapacity > capacity) { int newCapacity = (capacity * 3) / 2 + 1; if (newCapacity < minCapacity) newCapacity = minCapacity; setCapacity(newCapacity); } }

void DynamicArray::pack() { if (size <= capacity / 2) { int newCapacity = (size * 3) / 2 + 1; setCapacity(newCapacity); } }

void DynamicArray::trim() { int newCapacity = size; setCapacity(newCapacity); }

Data access functions: Set, Get, InsertAt, RemoveAt


Dynamic array data structure encapsulates underlying storage, but the interface must provide access functions to work with it. We can also add range check to the access functions.

Range check
There is no much to say about the range check. Algorithm checks, whether index is inside the 0..size-1 range and if not, throws an exception.

Get and set


After we ensured, that index is inside of the proper range, write a value to the storage or read a value from the storage.

InsertAt
This operation may require array expanding, so algorithm invokes ensure capacity method first, which should ensure size + 1 minimal capacity. Then shift all elements from i to size - 1, where i is the insertion position, one element right. Note, that if new element is inserted after the last element in the array, then no shifting required. After shifting, put the value to i-th element and increase size by one.

RemoveAt
Shift all elements from i to size - 1, where i is the removal position, one element left. Then decrease size by 1 and invoke pack opeartion. Packing is done, if there are too few elements left after removal.

Code snippets
Java
public class DynamicArray {

private void rangeCheck(int index) { if (index < 0 || index >= size) throw new IndexOutOfBoundsException("Index: " + index + ",

Size: "

+ size); }

public void set(int index, int value) { rangeCheck(index); storage[index] = value; }

public int get(int index) { rangeCheck(index); return storage[index]; }

public void removeAt(int index) { rangeCheck(index); int moveCount = size - index - 1; if (moveCount > 0) System.arraycopy(storage, index + 1, storage, index, moveCount); size--; pack(); }

public void insertAt(int index, int value) {

if (index < 0 || index > size) throw new IndexOutOfBoundsException("Index: " + index + ", Size: " + size); ensureCapacity(size + 1); int moveCount = size - index; if (moveCount > 0) System.arraycopy(storage, index, storage, index + 1, moveCount); storage[index] = value; size++; } }

C++
#include <cstring> #include <exception>

void DynamicArray::rangeCheck(int index) { if (index < 0 || index >= size) throw "Index out of bounds!"; }

void DynamicArray::set(int index, int value) { rangeCheck(index); storage[index] = value; }

int DynamicArray::get(int index) { rangeCheck(index); return storage[index]; }

void DynamicArray::removeAt(int index) { rangeCheck(index); int moveCount = size - index - 1; if (moveCount > 0) memmove(storage + index, storage + (index + 1), sizeof(int) * moveCount); size--; pack(); }

void DynamicArray::insertAt(int index, int value) { if (index < 0 || index > size) throw "Index out of bounds!"; ensureCapacity(size + 1); int moveCount = size - index; if (moveCount != 0) memmove(storage + index + 1, storage + index, sizeof(int) * moveCount); storage[index] = value; size++; }

Singly-linked list
Linked list is a very important dynamic data structure. Basically, there are two types of linked list, singly-linked list and doubly-linked list. In a singly-linked list every element contains some data and a link to the next element, which allows to keep the structure. On the other hand, every node in a doubly-linked list also contains a link to the previous node. Linked list can be an underlying data structure to implement stack, queue or sorted list.

Example
Sketchy, singly-linked list can be shown like this:

Each cell is called a node of a singly-linked list. First node is called head and it's a dedicated node. By knowing it, we can access every other node in the list. Sometimes, last node, called tail, is also stored in order to speed up add operation.

Operations on a singly-linked list


Concrete implementation of operations on the singly-linked list depends on the purpose, it is used for. Following the links below, you can find descriptions of the common concepts, proper for every implementation.

Singly-linked list traversal Adding a node Removing a node

See how singly-linked list is represented inside the computer.

Singly-linked list. Internal representation.


Every node of a singly-linked list contains following information:

a value (user's data); a link to the next element (auxiliary data).

Sketchy, it can be shown like this:

First node called head and no other node points to it. Link to the head is usually stored it the class, which provides an interface to the resulting data structure. For empty list, head is set to NULL. Also, it makes sense to store a link to the last node, called tail. Though no node in the list can be accessed from the tail (because we can move forward only), it can accelerate an add operation, when adding to the end of the list. When list is big, it reduces add operation complexity essentially, while memory overhead is insignificant. Below you can see another picture, which shows the whole singly-linked list internal representation:

Code snippets
Commonly, the whole structure of singly-linked list is put into two classes. Main class, SinglyLinkedList, is a public interface and SinglyLinkedListNode mean for private use inside the main class. Because of SinglyLinkedListNode is auxiliary class, it's not necessary to encapsulate its fields (make them private). Notice, that SinglyLinkedList interface class may be replaced by another one, such as a Stack class, while internal implementation of the stack remains a singlylinked list.

Java implementation
public class SinglyLinkedListNode { public int value; public SinglyLinkedListNode next;

public SinglyLinkedListNode(int value) {

this.value = value; next = null; } }

public class SinglyLinkedList { private SinglyLinkedListNode head; private SinglyLinkedListNode tail;

public SinglyLinkedList() { head = null; tail = null; } }

C++ implementation
class SinglyLinkedListNode { public: int value; SinglyLinkedListNode *next;

SinglyLinkedListNode(int value) { this->value = value; next = NULL; } };

class SinglyLinkedList { private: SinglyLinkedListNode *head; SinglyLinkedListNode *tail; public: SinglyLinkedList() { head = NULL; tail = NULL; } }

Singly-linked list. Traversal.


Assume, that we have a list with some nodes. Traversal is the very basic operation, which presents as a part in almost every operation on a singly-linked list. For instance, algorithm may traverse a singly-linked list to find a value, find a position for insertion, etc. For a singly-linked list, only forward direction traversal is possible.

Traversal algorithm
Beginning from the head, 1. check, if the end of a list hasn't been reached yet; 2. do some actions with the current node, which is specific for particular algorithm; 3. current node becomes previous and next node becomes current. Go to the step 1.

Example
As for example, let us see an example of summing up values in a singly-linked list.

For some algorithms tracking the previous node is essential, but for some, like an example, it's unnecessary. We show a common case here and concrete algorithm can be adjusted to meet it's individual requirements.

Code snippets
Although we have two classes for singly-linked list, SinglyLinkedListNode class is used as storage only. Whole algorithm is implemented in the SinglyLinkedList class.

Java implementation
public class SinglyLinkedList {

public int traverse() { int sum = 0; SinglyLinkedListNode current = head; SinglyLinkedListNode previous = null; while (current != null) { sum += current.value; previous = current; current = current.next; } return sum; } }

C++ implementation
int SinglyLinkedList::traverse() { int sum = 0; SinglyLinkedListNode *current = head; SinglyLinkedListNode *previous = NULL; while (current != NULL) { sum += current->value; previous = current; current = current->next; } return sum;

Singly-linked list. Addition (insertion) operation.


Insertion into a singly-linked list has two special cases. It's insertion a new node before the head (to the very beginning of the list) and after the tail (to the very end of the list). In any other case, new node is inserted in the middle of the list and so, has a predecessor and successor in the list. There is a description of all these cases below.

Empty list case


When list is empty, which is indicated by (head == NULL)condition, the insertion is quite simple. Algorithm sets both head and tail to point to the new node.

Add first
In this case, new node is inserted right before the current head node.

It can be done in two steps: 1. Update the next link of a new node, to point to the current head node.

2. Update head link to point to the new node.

Add last
In this case, new node is inserted right after the current tail node.

It can be done in two steps: 1. Update the next link of the current tail node, to point to the new node.

2. Update tail link to point to the new node.

General case
In general case, new node is always inserted between two nodes, which are already in the list. Head and tail links are not updated in this case.

Such an insert can be done in two steps: 1. Update link of the "previous" node, to point to the new node.

2. Update link of the new node, to point to the "next" node.

Code snippets
All cases, shown above, can be implemented in one function with two arguments, which are node to insert after and a new node. For add first operation, the arguments are (NULL, newNode). For add last operation, the arguments are (tail, newNode). Though, this specific operations (add first and add last) can be implemented separately, in order to avoid unnecessary checks.

Java implementation
public class SinglyLinkedList {

public void addLast(SinglyLinkedListNode newNode) { if (newNode == null) return; else { newNode.next = null;

if (head == null) { head = newNode; tail = newNode; } else { tail.next = newNode; tail = newNode; } } }

public void addFirst(SinglyLinkedListNode newNode) { if (newNode == null) return; else { if (head == null) { newNode.next = null; head = newNode; tail = newNode; } else { newNode.next = head; head = newNode; } } }

public void insertAfter(SinglyLinkedListNode previous,

SinglyLinkedListNode newNode) { if (newNode == null) return; else { if (previous == null) addFirst(newNode); else if (previous == tail) addLast(newNode); else { SinglyLinkedListNode next = previous.next; previous.next = newNode; newNode.next = next; } } } }

C++ implementation
void SinglyLinkedList::addLast(SinglyLinkedListNode *newNode) { if (newNode == NULL) return; else { newNode->next = NULL; if (head == NULL) { head = newNode; tail = newNode;

} else { tail->next = newNode; tail = newNode; } } }

void SinglyLinkedList::addFirst(SinglyLinkedListNode *newNode) { if (newNode == NULL) return; else { if (head == NULL) { newNode->next = NULL; head = newNode; tail = newNode; } else { newNode->next = head; head = newNode; } } }

void SinglyLinkedList::insertAfter(SinglyLinkedListNode *previous, SinglyLinkedListNode *newNode) { if (newNode == NULL) return;

else { if (previous == NULL) addFirst(newNode); else if (previous == tail) addLast(newNode); else { SinglyLinkedListNode *next = previous->next; previous->next = newNode; newNode->next = next; } } }

Singly-linked list. Removal (deletion) operation.


There are four cases, which can occur while removing the node. These cases are similar to the cases in add operation. We have the same four situations, but the order of algorithm actions is opposite. Notice, that removal algorithm includes the disposal of the deleted node, which may be unnecessary in languages with automatic garbage collection (i.e., Java).

List has only one node


When list has only one node, which is indicated by the condition, that the head points to the same node as the tail, the removal is quite simple. Algorithm disposes the node, pointed by head (or tail) and sets both head and tail to NULL.

Remove first
In this case, first node (current head node) is removed from the list.

It can be done in two steps: 1. Update head link to point to the node, next to the head.

2. Dispose removed node.

Remove last
In this case, last node (current tail node) is removed from the list. This operation is a bit more tricky, than removing the first node, because algorithm should find a node, which is previous to the tail first.

It can be done in three steps: 1. Update tail link to point to the node, before the tail. In order to find it, list should be traversed first, beginning from the head.

2. Set next link of the new tail to NULL.

3. Dispose removed node.

General case
In general case, node to be removed is always located between two list nodes. Head and tail links are not updated in this case.

Such a removal can be done in two steps: 1. Update next link of the previous node, to point to the next node, relative to the removed node.

2. Dispose removed node.

Code snippets
All cases, shown above, can be implemented in one function with a single argument, which is node previous to the node to be removed. For remove first operation, the argument is NULL. For remove last operation, the argument is the node, previous to tail. Though, it's better to implement this special cases (remove first and remove last) in separate functions. Notice, that removing first and last node have different complexity, because remove last needs to traverse through the whole list.

Java implementation
public class SinglyLinkedList {

public void removeFirst() { if (head == null) return; else {

if (head == tail) { head = null; tail = null; } else { head = head.next; } } }

public void removeLast() { if (tail == null) return; else { if (head == tail) { head = null; tail = null; } else { SinglyLinkedListNode previousToTail = head; while (previousToTail.next != tail) previousToTail = previousToTail.next; tail = previousToTail; tail.next = null; } } }

public void removeNext(SinglyLinkedListNode previous) { if (previous == null) removeFirst(); else if (previous.next == tail) { tail = previous; tail.next = null; } else if (previous == tail) return; else { previous.next = previous.next.next; } } }

C++ implementation
void SinglyLinkedList::removeFirst() { if (head == NULL) return; else { SinglyLinkedListNode *removedNode; removedNode = head; if (head == tail) { head = NULL; tail = NULL; } else { head = head->next;

} delete removedNode; } }

void SinglyLinkedList::removeLast() { if (tail == NULL) return; else { SinglyLinkedListNode *removedNode; removedNode = tail; if (head == tail) { head = NULL; tail = NULL; } else { SinglyLinkedListNode *previousToTail = head; while (previousToTail->next != tail) previousToTail = previousToTail->next; tail = previousToTail; tail->next = NULL; } delete removedNode; } }

void SinglyLinkedList::removeNext(SinglyLinkedListNode *previous) {

if (previous == NULL) removeFirst(); else if (previous->next == tail) { SinglyLinkedListNode *removedNode = previous->next; tail = previous; tail->next = NULL; delete removedNode; } else if (previous == tail) return; else { SinglyLinkedListNode *removedNode = previous->next; previous->next = removedNode->next; delete removedNode; } }

You might also like