You are on page 1of 55

Depth First Search & Directed Acyclic Graphs

Charalampos Papamanthou cpap@csd.uoc.gr

Department of Computer Science University of Crete A Review for the Course Graph Algorithms Winter 2004 A L TEX

1
1.1

Depth First Search


General

Depth First Search (DFS ) is a systematic way of visiting the nodes of either a directed or an undirected graph. As with breadth rst search, DFS has a lot of applications in many problems in Graph Theory. It comprises the main part of many graph algorithms. DFS visits the vertices of a graph in the following manner. It selects a starting vertex v . Then it chooses an incident edge (v, w) and searches recursively deeper in the graph whenever possible. It does not examine all the incident edges one by one at the same time, but on the other hand, it goes deeper and deeper in the graph till no other such path exists. When all edges incident to v have been explored, the algorithm backtracks to explore edges leaving the vertex from which v was discovered.

1.2

The Algorithm

DFS has two main dierences form BFS. First of all, it does not discover all vertices at distance k before discovering vertices at distance k + 1, but discovers one vertex at a specic depth each time. Additionally, the predecessor subgraph produced by DFS may be composed of several trees, because the search may be repeated from several sources. This predecessor subgraph forms a depth rst forest E composed of several depth rst trees and the edges in E are called tree edges. The predecessor subgraph in DFS is the graph G = (V, E ) where E = {( [v ], v ) : v V [v ] = nil} On the other hand, the predecessor subgraph of BFS forms a tree. The DFS procedure takes as input a graph G, and outputs its predecessor subgraph in the form of a depth-rst forest. In addition, it assigns two timestamps to each vertex: discovery and nishing time. The algorithm initializes each vertex to white to indicate that they are not discovered yet. It also sets each vertexs parent to null. The procedure begins by selecting one vertex u from the graph, setting its color to gray to indicate that the vertex is now discovered (but not nished) and assigning to it discovery time 0. For each vertex v that belongs to the set Adj [u], and is still marked as white, DFS-Visit is called recursively, assigning to each vertex the appropriate discovery time d[v ] (the time variable is incremented at each step). If no white descendant of v exists, then v becomes black and is assigned the appropriate nishing time, and the algorithm returns to the exploration of v s ancestor p[v ]. If all of us descendants are black, u becomes black and if there are no other white vertices in the graph, the algorithm reaches a nishing state, otherwise a new source vertex is selected, from the remaining white vertices, and the procedure continues as before. As referred above, the DFS procedure computes two timestamps for each vertex v , the dis1

covery time, denoted with d[v ] and the nishing time denoted with f [v ]. These timestamps are integers between 1 and 2n, since there is one discovery event and one nishing event for each of the n vertices. Next, we present the pseudocode of the algorithm. Algorithm DFS(G)
1 for each vertex u V (G) 2 color[u] white 3 [u] nil 4 time 0 5 for each vertex u V (G) 6 if color[u] == white 7 DFS-visit(u) DFS-visit(u) 1 color[u] gray {white vertex u has just been discovered } 2 d[u] time time + 1 3 for each vertex v Adj [u] {explore edge (u, v )} 4 if color[v ] == white 5 [v ] u 6 DFS-visit(v ) 7 color[u] black {Blacken u;it is nished } 8 f [u] time time + 1

Procedure DFS works as follows. Lines 1-3 paint all vertices white and initialize their eld to nil. Line 4 resets the global time counter. Lines 5-7 check each vertex in V in turn and, when a white vertex is found, visit it using DFS-visit. Every time DFS-visit(u) is called in line 7, vertex u becomes the root of a new tree in the depth-rst forest. When DFS returns, every vertex u has been assigned a discovery time d[u] and a nishing time f [u]. In each DFS-visit(u) call, vertex u is initially white. Line 1 paints u gray, and line 2 records the discovery time d[u] by incrementing and saving the global variable time. Lines 3-6 examine each vertex v adjacent to u and recursively visit v if it is white. As each vertex v Adj [u] is considered in line 3, we say that edge (u, v ) is explored by the depth-rst search. Finally, after every edge leaving u has been explored, lines 7-8 paint u black and record the nishing time in f [u]. The initialization part of DFS has time complexity O(n), as every vertex must be visited once so as to mark it as white. The main (recursive) part of the algorithm has time complexity O(m), as every edge must be crossed (twice) during the examination of the adjacent vertices of every vertex. In total, the algorithms time complexity is O(m + n). An example of the BFS execution can be seen in Figure 1. 2

W Q

Y (a)

1/

1/

2/

1/

2/

3/

(b) 1/ 2/
3/4 1/

(c) 2/5
3/4 1/

(d) 2/5
3/4

(e) 1/ 2/5
3/4 1/

(f) 2/5
3/4 1/

6/ (g) 2/5
3/4

7/

6/ (h) 2/5
3/4

7/8

6/ (i)

7/8

6/ (j)

1/

1/10

2/5

3/4

1/10

2/5

3/4

11/ 7/8 6/9 (k) 2/5


3/4

7/8

6/9 (l) 2/5


3/4

7/8

6/9 6/9 (m) 2/5


3/4

1/10

1/10

1/10

11/ 7/8 6/9 (n)


1/10

11/ 7/8 6/9 (o) 2/5


3/4 11/14

11/ 7/8 6/9 (p)


12/13

12/

12/

7/8

6/9 (q)

12/13

Figure 1: (b) shows the progress of the DFS algorithm for the graph of (a). Starting from node U we can either discover node V or Y . Suppose that we discover node V , which has a single outgoing edge to node W . W has no outgoing edges, so this node is nished, and we return to V . From V there is no other choice, so this node is also nished and we return to U . From node U we can continue to discover Y and its descendants, and the procedure continues similarly. At stage (l) we have discovered and nished nodes U , V , W , X , Y . Selecting node Q as a new starting node, we can discover the remaining nodes (in this case Z ). 3

1.3

Some Properties

Depth-rst search provides us with much information about the structure of the graph. Perhaps the most basic property of depth-rst search is that the predecessor subgraph G , as it has been dened in the previous section, does indeed form a forest of trees, since the structure of the depth-rst trees exactly mirrors the structure of recursive calls of DFSvisit. This means that the algorithm sets u = [v ] if and only if DFS-visit(v ) was called during a search of us adjacency list. Another important property of DFS is that discovery and nishing times have a parenthesis structure. If we represent the discovery of a vertex u with a left parenthesis (u and represent its nishing by a right parenthesis u), then the history of discoveries and nishings makes a well-formed expression in the sense that the parentheses are properly nested. The parenthesis structure p corresponding to the depth-rst traversal of gure1 is the following p = (s (z ( y (x x) y ) (w w) z ) s) (t (v v ) (u u) t) Another way of stating the condition of the parenthesis structure is given in the following
y
3/6

z
2/9

s
1/10

t
11/16

4/5

7/8

12/13

14/15

Figure 2: A depth rst traversal with the corresponding discovery and nishing times. theorem. Theorem 1.1 (Parenthesis Theorem). In any depth-rst search of a (directed or undirected) graph G = (V, E ), for any two vertices u and v , exactly one of the following three conditions holds: the intervals [d[u], f [u]] and [d[v ], f [v ]] are entirely disjoint, the interval [d[u], f [u]] is contained entirely within the interval [d[v ], f [v ]], and u is a descendant of v in the depth-rst tree, or the interval [d[v ], f [v ]] is contained entirely within the interval [d[u], f [u]], and v is a descendant of u in the depth-rst tree.

Proof We begin with the case in which d[u] < d[v ]. There two subcases to consider, according to whether d[v ] < f [u] or not. In the rst subcase, d[v ] < f [u], so v was discovered while u was still gray. This implies that v is a descendant of u. Moreover, since v was discovered more recently than u, all of its outgoing edges are explored, and v is nished, before the search returns to and nishes u. In this case, therefore, the interval [d[v ], f [v ]] is entirely contained within the interval [d[u], f [u]]. In the other subcase, f [u] < d[v ], and d[u] < f [u], implies that the intervals [d[u], f [u]] and [d[v ], f [v ]] are disjoint. The case in which d[v ] < d[u] is similar, with the roles of u and v reverse in the above argument. Corollary 1.2 (Nesting of descendants intervals). Vertex v is a proper descendant of vertex u in the depth-rst forest for a (directed or undirected) graph G if and only if d[u] < d[v ] < f [v ] < f [u]. Proof Immediate from the Parenthesis Theorem. Theorem 1.3 (White Path Theorem). In a depth-rst forest of a graph G = (V, E ), vertex v is a descendant of vertex u if and only if at the time d[u] that the search discovers u, vertex v can be reached from u along a path consisting entirely of white vertices. Proof To prove the direct, assume that v is a descendant of u. Let w be any vertex on the path between u and v in the depth-rst tree, so that w is a descendant of u. By Corollary 1.2, d[u] < d[w], and so w is white at time d[u]. For the reverse case, suppose that vertex v is reachable from u along at path of white vertices at time d[u], but v does not become a descendant of u in the depth-rst tree. Without loss of generality, assume that every other vertex along the path becomes a descendant of u. (Otherwise, let v be the closest vertex to u along the path that doesnt become a descendant of u.) Let w be the predecessor of v in the path, so that w is a descendant of u (w and u may in fact be the same vertex) and, by corollary 1.2, f [w] f [u]. Note that v must be discovered after u is discovered, but before w is nished. Therefore, d[u] < d[v ] < f [w] f [u]. Theorem 1.1 then implies that the interval [d[v ], f [v ]] is contained entirely within the interval [d[u], f [u]]. By corollary 1.2, v must after all be a descendant of u.

1.4

Edge Classication

One of the most important properties of depth rst search is that the search can be used to clarify the edges of the input graph G = (V, E ). During DFS, the edges are separated into certain groups that reveal important information about the graph itself. For example, we will see that a directed graph is acyclic if and only if a depth rst search yields no back edges. We can dene four edge types in terms of the depth-rst forest G produced by a depth-rst search on a graph G. 5

1. Tree edges are edges in the depth-rst forest G . Edge (u, v ) is a tree edge if v was rst discovered by exploring edge (u, v ). A tree edge always describes a relation between a node and one of its direct descendants. This indicates that d[u] < d[v ], (us discovery time is less than v s discovery time), so a tree edge points from a low to a high node. 2. Back edges are those edges (u, v ) connecting a vertex u to an ancestor v in a depthrst tree. Self-loops are considered to be back edges. Back edges describe descendantto-ancestor relations, as they lead from high to low nodes. 3. Forward edges are those non-tree edges (u, v ) connecting a vertex u to a descendant v in a depth-rst tree. Forward edges describe ancestor-to-descendant relations, as they lead from low to high nodes. 4. Cross edges are all other edges. They can go between vertices in the same depth-rst tree as long as one vertex is not an ancestor of the other, or they can go between vertices in dierent depth-rst trees. Cross edges link nodes with no ancestor-descendant relation and point from high to low nodes. The DFS algorithm can be modied to classify edges as it encounters them. The key idea is that each edge (u, v ) can be classied by the color of the vertex v that is reached when the edge is rst explored (except that forward and cross edges are not distinguished): White indicates a tree edge, Gray indicates a back edge, and Black indicates a forward or cross edge It is important to point out that, however, in an undirected graph there may be some ambiguity in the edge classication, since (u, v ) and (v,u) is really the same edge. In such a case, the edge is classied as the rst type in the classication list that applies. This means that the edge is classied according to whichever (u, v ) or (v, u) is encountered rst during the execution of the algorithm. We will now prove that forward and cross edges never occur in a depth rst search of an undirected graph. Theorem 1.4. In a depth-rst search of an undirected graph G, every edge of G is either a tree edge or a back edge. Proof Let (u, v ) be an arbitrary edge of G, and suppose without loss of generality that d[u] < d[v ]. Then, v must be discovered and nished before we nish u, since v is on us adjacency list. If the edge (u, v ) is explored rst in the direction from u to v , then (u, v ) becomes a tree edge. If (u, v ) is explored rst in the direction from v to u, then (u, v ) is a back edge, since 6

u is still gray at the time the edge is rst explored. If we want to distinguish between back and cross edges, there is an easy way to do this. For every non-tree edge (u, v ), that leads from a high node to a low node, we must determine if v is an ancestor of u: starting from u me must traverse the depth-rst tree to visit us ancestors. If v is found then (u, v ) is a back edge, else -if we reach the root without having found v - (u, v ) is a cross edge. This can easily be done in O(n) time, as we have to traverse at most n 1 edges to identify connection between two certain nodes of a tree. Following, we present a relative lemma.
a b g

d e f
Tree edge Cross edge Forward edge Back edge

(a)

e e
(b)

(c)

Figure 3: 3 possible scenarios of edge classication with DFS Lemma 1.5. A directed graph G contains no cycles if and only if a depth-rst search of G yields no back edges. Proof For the direct, assume that there is a back edge (u, v ). Then vertex v is an ancestor of vertex u in the depth-rst forest. There is thus a path from v to u in G, and the back edge (u, v ) completes a cycle. In reverse, suppose that G contains a cycle c. We show that a depth-rst search of G yields 7

a back edge. Let v be the rst vertex to be discovered in c, and let (u, v ) be the preceding edge in c. At time d[v ], there is a path of white vertices from v to u. By the white-path theorem, vertex u becomes a descendant of v in the depth-rst forest. Therefore (u, v ) is a back edge. In gure 3, you can see three possible scenario of edge classication produced by a depthrst traversal of a directed graph. At this point, we must add that DFS can be implemented non-recursively, by simulating the system stack of the recursion with a classic higher-level implementation of a stack. In this way, one can see the similarities between the two kinds of traversals, (i.e. in BFS we use a queue, whereas in DFS we use a stack).

2
2.1

Directed Acyclic Graphs


General

The Directed Acyclic Graphs (DAGs) comprise a very important class of graphs that play a major role in graph applications. DAG is a directed graph where no path starts and ends at the same vertex (i.e. it has no cycles). It is also called an oriented acyclic graph. A DAG has at least one source (i.e. a node that has no incoming edges) and one sink (i.e. a node that has no outgoing edges). If it has more that one source, we can create a DAG with a single source, by adding a new vertex, which only has outgoing edges to the sources, thus becoming the new single (super-) source. The same method can be applied in order to create a single (super-) sink, which only has incoming edges from the initials sinks.

2.2

Topological Numberings and Sortings

Let G = (V, E ) be a directed acyclic graph. A topological numbering is a function x : V {1, . . . , n} such that x(v ) x(u) (u, v ) E (2.1) Respectively, we can talk about topological sorting of a directed acyclic graph when equation 2.1 is satised as a strict inequality. Therefore, a topological sorting is a function y : V {1, . . . , n} such that y (v ) > y (u) (u, v ) E (2.2)

It is easy to see that equation 2.1 denes a linear ordering of all its vertices such that if G contains an edge (u, v ) then u appears before v in the ordering. Note that topological sorting assigns distinct integers to every vertex of G. Thus, a topological numbering can easily be derived from a topological sorting. Additionally, the numberings dened above can be redened as weighted numberings if we assign positive weights w(e) for every edge e E and add to the right part of the inequalities 2.1, 2.2 the quantity w(e), e = (u, v ). Thus a weighted topological numbering of a directed acyclic graph G = (V, E ) with a positive weight function w is a function r : V {1, . . . , n} such that r(v ) r(u) + w(e) e = (u, v ) E (2.3)

Note that the normal numberings can be derived form the weighted numberings if we set all the weights of the graph equal to zero. Additionally, we say that a weighted topological is optimal if the quantity maxu,v |x(v ) x(u)| is minimized. In gure 4, we can see a directed acyclic graph G. The reader can verify that vector x = [ 5 4 4 3 3 2 1 ] is a topological numbering and vector y = [ 7 6 5 4 3 2 1 ] is a topological sorting of graph G. Note that the labels beside the nodes of gure 4 are the identity numbers of the nodes of the graph, and not a topological numbering or sorting. 9

4 6

Figure 4: A directed acyclic graph. 2.2.1 Computing a Topological Sorting

As the topological numbering can easily be derived from a topological sorting we will present two algorithms for the computation of a topological sorting. One very clever algorithm for the computation of the topological sorting of a directed acyclic graph G = (V, E ) is based on bucket sorting. The algorithm works as follows. Our graph has for sure a source with in-degree equal to zero. We build up a list with n = maxiV {deg (i)} + 1 entries. Each entry 0 i n is a linked list of nodes k with deg (k ) = i. Each node in this list has pointers to its reachable nodes (i.e. nodes t that can be reached following the outgoing edges (i, t)). The algorithm subtracts one by one nodes k such that deg (k ) = 0, giving them an increasing topological sorting index and simultaneously decreasing the in-degree of their out-neighborhood by one. The algorithm goes on till all nodes get an index. The main idea is that at each step we exclude a node which does not depend on any other. That maps to removing an entry v1 from the 0th index of the bucket structure (which contains the sources of G) and so reducing by one the in-degree of the entrys adjacent nodes (which are easily found by following the red arrows from v1 ) and re-allocating them in the bucket. This means that node v2 is moved to the list at the 0th index, and thus becoming a source. v3 is moved to the list at the 1st index, and v4 is moved to the list at the 2nd index. After we subtract the source from the graph, new sources may appear,

10

because the edges of the source are excluded from the graph, or there is more than one source in the graph (see gure 5). Thus the algorithm subtracts a source at every step and inserts it to a list, until no sources (and consequently no nodes) exist. The resulting list represents the topological sorting of the graph. It easy to prove that the algorithm indeed

0 1 2 3

v1 v2 v3 v4

Figure 5: Bucket structure example. produces a legal topological sorting as every directed arc (u, v ) is processed such as its origin u is always subtracted before its destination v . Thus v will never get a lower number and the topological sorting will be legal. Figure 5 displays an example of the algorithm execution sequence. Another algorithm used for the computation of a topological sorting is totally based on depth rst traversal: Perform DFS to compute nishing times f [v ] for each vertex v . As each vertex is nished, insert it to the front of a linked list Return the linked list of vertices As far as the time complexity of the algorithm is concerned, DFS takes O(m + n) time and the insertion of each of the vertices to the linked list takes O(1) time. Thus topological sorting using DFS takes time O(m + n). Three important facts about topological sorting are: 1. Only directed acyclic graphs can have linear extensions, since any directed cycle is an inherent contradiction to a linear order of tasks. This means that it is impossible to determine a proper schedule for a set of tasks if all of the tasks depend on some previous task of the same set in a cyclic manner. 2. Every DAG can be topologically sorted, so there must always be at least one schedule for any reasonable precedence constraints among jobs.

11

3. DAGs typically allow many such schedules, especially when there are few constraints. Consider n jobs without any constraints. Any of the n! permutations of the jobs constitutes a valid linear extension. That is if there are no dependencies among tasks so that we can perform them in any order, then any selection is legal. In the gure 6, we present an execution of the topological sorting algorithm using bucket sorting. Figure 7 shows the possible topological sorting results for three dierent DAGs.
0

(a)

(b)

( c)

(d)

( e)

Figure 6: Execution sequence of the topological sorting algorithm. (a) Select source 0. (b) Source 0 excluded. Resulting sources 1 and 2. Select source 1. (c) Source 1 excluded. No new sources. Select source 2. (d) Source 2 excluded. Resulting sources 3 and 4. Select source 3. (e) Source 3 excluded. No new sources. Select source 4.

2.3

Single Source Shortest Path

The computation of the single source shortest path in a weighted directed acyclic graph G = (V, E ) can be achieved in time O(m + n) by applying relaxation of its edges, according to a topological sort of its vertices. Before the description of the algorithm, we give some basic denitions. We dene the weight of a path p = v1 v2 . . . vn as
n1

w(p) =
i=1

w(vi , vi+1 ) 12

0 1 2 2 3 4 4 0, 1, 2, 3, 4 0, 1, 2, 4, 3 0, 2, 1, 3, 4 0, 2, 1, 4, 3 5

0 1 3 4 1

5 3 6

0, 1, 3, 2, 4, 5 0, 1, 3, 2, 5, 4 0, 1, 2, 3, 4, 5 0, 1, 2, 3, 5, 4 0, 1, 2, 4, 3, 5 0, 1, 2, 4, 5, 3 0, 1, 2, 5, 3, 4 0, 1, 2, 5, 4, 3

0, 1, 2, 4, 5, 3, 6 0, 1, 2, 5, 3, 4, 6 0, 1, 2, 5, 4, 3, 6 0, 1, 5, 2, 3, 4, 6 0, 1, 5, 2, 4, 3, 6

Figure 7: Possible topological sorting results for three DAGS. We also dene the shortest path from vertex u to vertex v by (u, v ) = min{w(p) : v p v )} , path from u to v , otherwise

A shortest path from vertex u to vertex v is then dened as any path p with weight w(p) = (u, v ). There are variants of the single-source shortest-paths problem: 1. Single-destination shortest-paths problem 2. Single-pair shortest-path problem 3. All-pairs shortest-paths problem We can compute a single-source shortest path in a DAG, using the main idea of the topological sorting using bucket sorting algorithm. Starting from a given node u, we set the minimum path (u, v ) from u to every node v adj (u) as the weight w(e), e being the edge (u, v ), and we remove u. Then we select one of these nodes as the new source u and we repeat the same procedure and for each node z we encounter, we check if (u, u ) + w(u , z ) < (u, z ) and if it holds we set (u, z ) = (u, u ) + w(u , z ). We must point out that graph G cannot contain any cycles (thus it is a DAG), as this would make it impossible to compute a topological sorting, which is the algorithms rst step. Also, the initial node u is converted to a DAG source by omitting all its incoming 13

edges. The algorithm examines every node and every edge once at most (if there is a forest of trees, some nodes and some edges will not be visited), thus its complexity is O(m + n). The algorithm that will be presented uses the technique of relaxation. For each vertex v V , we maintain an attribute d[v ], which is an upper bound on the weight of a shortest path from source s to v . We call d[v ] a shortest-path estimate and we initialize the shortest-path estimates and predecessors during the initialization phase. The process of relaxing an edge (u, v ) consists of testing whether we can improve the shortest path to v found so far by going through u. If the shortest path can be improved then d[v ] and [u] are updated. A relaxation step may decrease the value of the shortest-path estimate d[u] and update v s predecessor eld [u]. Next, we present the algorithm that computes the shortest paths of a directed acyclic graph. Algorithm DAG-SHORTEST-PATHS(G, s)
1 compute a topological sorting of G 2 for each vertex v V [G] 3 d[v ] 4 [v ] nil 5 d[s] 0 6 foreach u taken in topologically sorted order 7 for each v adj (u) 8 if d[v ] > d[u] + w(u, v ) 9 d[v ] d[u] + w(u, v ) 10 [v ] u

Theorem 2.1. If a weighted, directed graph G = (V, E ) has a source vertex s and no cycles, then at termination of the DAG-SHORTEST-PATHS(G, s) algorithm, it is d[v ] = (s, v ) for all vertices v V . Proof If v is not reachable from s, then it is obvious that d[v ] = (s, v ) = . Now, suppose that v is reachable from s, so that there is a shortest path p = v0 v1 . . . vk , where v0 = s and vk = v . Because we process the vertices in topologically sorted order, the edges on p are relaxed in the order (v0 , v1 ), (v1 , v2 ), . . . , (vk1 , vk ). Hence, d[vi ] = (s, vi ) at termination for i = 0, 1, 2, . . . , k .

2.4

Single Source Longest Path

The algorithm for the computation of the shortest path in a directed acyclic graph presented in the previous section can easily be modied in order to compute longest paths in a given graph G = (V, E ). To achieve this, we must perform the following changes. 14

We set d[u] 0 instead of d[u] during the initialization phase. We increase the value of the estimate d[u] by changing the direction of the inequality at the relaxation phase. These changes are necessary because in the case of longest-path we check if the path being examined is longer than the one previously discovered. In order for every comparison to work properly, we must initialize the distance to all vertices at 0. The time complexity of the algorithm is obviously the same as the one of shortest-path (i.e. O(m + n)). The pseudocode of the algorithm is similar with the one presented before concerning the shortest paths problem. Algorithm DAG-LONGEST-PATHS(G, s)
1 compute a topological sorting of G 2 for each vertex v V [G] 3 d[v ] 0 4 [v ] nil 5 d[s] 0 6 foreach u taken in topologically sorted order 7 for each v adj (u) 8 if d[v ] < d[u] + w(u, v ) 9 d[v ] d[u] + w(u, v ) 10 [v ] u

At this point, we must refer to the nature of the longest path problem. This problem is generally a dicult problem, as there is a polynomial time algorithm only for the case of directed acyclic graphs. If we try to apply the algorithm in directed graphs with cycles, the algorithm will never end, as it will always discover longer and longer paths, approaching the innity. Hence, longest paths in directed graphs with cycles, are not well dened. This case is similar to the case of the shortest paths problem, where negative weight cycles are not allowed. As a consequence, longest paths in an undirected graph is N P -complete, something we will immediately prove. Theorem 2.2. Longest Path Problem in undirected graphs is NP-complete. Proof When we refer to the longest path problem, we are given an undirected graph G = (V, E ) and a positive integer k |V | and we want to know if G has a simple path with k or more edges. Obviously, Longest Path belongs to the class N P , as a solution of the problem can be veried in polynomial time (there a succinct positive certicate). For the N P completeness, we reduce Hamilton Path, which is a known N P -complete problem, to it. Given an instance G = (V , E ) for Hamilton Path, count the number |V | of nodes in G 15

and output the instance G = G , k = |V | 1 for Longest Path. Obviously, G has a simple path of length |V | 1 if and only if G has a Hamilton Path. In gure 8, we present an example of the shortest path algorithm execution in a directed acyclic graph. Using the single-source longest-path algorithm, the distance matrix d would initialize to 0, and in every step of the algorithm we would compare the weight of every new path we
6
3

A
2

B
1

d[]:

A 0

(a)
A
2

6
3

B
1

d[]:

A 0

B 6

C 2

(b)
A
2

6
3

B
1

d[]:

A 0

B 5

C 2

D 3

(c)
A
2

6
3

B
1

d[]:

A 0

B 4

C 2

D 3

(d)
A
2

6
3

B
1

d[]:

A 0

B 4

C 2

D 3

(e)

Figure 8: Execution sequence of the shortest-path algorithm (a) Initial graph-All shortest paths are initialized to (b) Starting from node A, we set shortest paths to B and C , 6 and 2 respectively (c) From node C , we can follow the edges to B and D. The new shortest path to B is 5 because w(AC ) + w(CB ) < w(AB ). The shortest path to D is A C D and its weight is the sum w(AC ) + w(CD) = 3 (d) From node D, we follow the edge to B . The new shortest path to B is 4 because w(ACD) + w(DB ) < w(ACB ). (e) From node B there is no edge we can follow. The algorithm is nished and the resulting shortest paths are: A C , A C D and A C D B with weights 4, 2, 3 respectively. 16

discover to the content of the matrix and record the higher value. The resulting longest paths would be A B , A C and A C D with weights 6, 2 and 3 respectively.

2.5

The Bellman Ford Algorithm

The Bellman Ford algorithm solves the single source shortest paths problem in the general case in which edge weights can be negative. Given a weighted directed graph G = (V, E ) with source s and weight function w : E , the algorithm returns a boolean value indicating whether or not there is a negative weight cycle that is reachable form the source. If there is such a cycle, the algorithm indicates that no solution exists. If not, the algorithm produces the shortest paths and their weights. Like Dijkstras algorithm, the algorithm uses the technique of relaxation. This means that it produces an estimate d[v ] on the weight of the shortest path from the source s to each vertex v V until it achieves the actual shortest path weight (s, v ). Algorithm BELLMAN-FORD(G, W, s)
1 for each v V [G] 2 d[v ] 3 [v ] nil 4 d[s] 0 5 for i 1 to V [G] 1 6 for each (u, v ) E [G] 7 if d[v ] > d[u] + w(u, v ) 8 d[v ] d[u] + w(u, v ) 9 [v ] u 10 for each (u, v ) E [G] 11 if d[v ] > d[u] + w(u, v ) 12 return FALSE 13 return TRUE

We will now apply the algorithm to the graph of gure 9. The execution of the algorithm is depicted in gure 10. The Bellman Ford algorithm runs in time O(mn), since the initialization takes O(n) time, each of the |V | 1 passes over the edges takes O(m) time and the nal for loop takes time O(m). The Bellman Ford algorithm runs in time O(mn), since the initialization takes O(n) time, each of the |V | 1 passes over the edges takes O(m) time and the nal for loop takes time O(m). The main dierences between this algorithm and the Dijkstra algorithm are the following: 1. The Dijkstra algorithm works only with positive weighted graphs with no cycles, while the Bellman Ford algorithm works with graphs with positive and negative weighted 17

edges and non negative cycles. 2. The Dijkstra algorithm at every step discovers the shortest path to a new node and inserts this node to its special set. On the other hand, the Bellman Ford algorithm uses no special set, as at every step updates the shortest paths for the nodes that are

2 -1 D E

-4

Figure 9: A directed graph with source A containing a cycle B C D B with positive weight w = 1.
A

5
B

D= -1 2
E

A 0

B 5

C 3

D inf

E inf

3
C

3 -4 2
D

(a)

5
B

D=

A 0

B 5

C 1

D inf

E 4

3
C

3 -4 2
D

-1 2
E

(b)

5
B

D= -1 2
E

A 0

B 5

C 1

D 3

E 3

3
C

3 -4 2
D

(c)

5
B

D=

A 0

B 5

C 1

D 3

E 3

3
C

3 -4 2
D

-1 2
E

(d)

Figure 10: The sequence of steps of the Bellman - Form algorithm for the graph of Figure 9. (a) D[B ] becomes 5 and D[C ] becomes 3. (b) D[E ] becomes 4 and D[C ] becomes 1, since D[B ] + w(B, C ) < D[C ]. (c) D[D] becomes 3 and D[E ] becomes 3, since D[C ] + w(C, E ) < D[E ]. (d)Nothing changes, since no shortest path can be improved. At this point there are no more relaxation operations to be performed and the algorithm returns the distance matrix D. 18

adjacent to the node being processed at the current step. Thus, the shortest path to a node is not determined at an intermediate step, because a new shortest path to this node can be discovered at a later step.

2.6
2.6.1

All Pairs Shortest Paths


A Matrix Multiplication-like Algorithm

In this section we will present an algorithm for the all pair shortest paths problem and we will see how it is connected to the classic algorithm used for matrix multiplication. In fact, the algorithm is a dynamic programming algorithm and each major loop of the dynamic program will invoke an operation that is very similar to matrix multiplication. Firstly, we will refer to the structure of a shortest path. It is true that the subpaths contained in a shortest path are also shortest paths for the certain nodes. Suppose that our (m) graph is represented with the adjacency matrix W = wij . Now, let dij be the minimum (0) weight of any path from vertex i to vertex j that contains at most m edges. Dene dij = 0 (0) (m) (m1) if i = j and dij = otherwise. To compute dij as a function of dij , we can set dij = min {dik
1kn (m) (m1)

+ wkj }

But what are the actual shortest paths ij ? If the graph contains no negative cycles, then all shortest paths are simple and thus contain at most n 1 edges. This means that a path form a vertex i to a vertex j cannot have less weight than a shortest path from i to j . The actual shortest path weights are therefore given by ij = dij
(n1)

= dij = dij

(n)

(n+1)

= ...

The algorithm works as follows. It accepts as input the matrix W = wij , it computes a (m) series of matrices D(1) , D(2) , . . . , D(n1) where for m = 1, 2, . . . , n 1, we have D(m) = dij . The nal matrix D(n1) contains the actual shortest path weights. Observe that since (1) dij = wij for all vertices i, j V , we have D(1) = W . The heart of the algorithm is the following procedure, which, given matrices D(m1) and W , returns the matrix Dm . That is, it extends the shortest paths computed so far by one more edge. Algorithm EXTEND-SHORTEST-PATHS(D(m1) , W )
1 n rows[D(m1) ] 2 for i 1 to n 3 for j 1 to n (m) 4 dij 5 for k 1 to n

19

6 dij min{dij , dik 7 return D(m)

(m)

(m)

(m1)

+ wkj }

We can now see the relation to matrix multiplication. Suppose we want to compute the matrix product C = A B of two n n matrices A and B . Then, for i, j = 1, . . . n it is
n

cij =
k=1

aik bkj

Observe that if we substitute d(m1) with a, w with b, dm with c, min with + and + with , we turn line 6 of EXTEND-SHORTEST-PATHS(D(m1) , W ) to matrix multiplication. So it easy to obtain the straightforward O(n3 )-time algorithm for matrix multiplication. Algorithm MATRIX-MULTIPLY (A, B )
1 n rows[A] 2 for i 1 to n 3 for j 1 to n 4 cij 0 5 for k 1 to n 6 cij cij + aik bkj 7 return C

Returning to the all pairs shortest paths problem, we compute the shortest path weights by extending shortest paths edge by edge. We dene to be the e-product between two n n matrices A, B returned by EXTEND-SHORTEST-PATHS(A, B ) algorithm. Similarly, we can dene the n-th e-power An of a matrix A to be An1 A. In this way, we can compute the sequence of n 1 matrices in the following way D(1) = D(0) D
(2)

W =W W = W2 W = W3 W = W n1

= D

(1)

D(3) = D(2) . . . D(n1) = D(n2)

As we argued above, the matrix D(n1) = W n1 contains the shortest path weights. So, we have the following algorithm Algorithm SHOW-ALL-PAIRS-SHORTEST-PATHS(W )
1 n rows[W ]

20

2 D(1) W 3 for m 2 to n 1 4 D(m) EXTEND-SHORTEST-PATHS(D(m1) , W ) 5 return D(n1)

This algorithm runs in O(n4 ) time. We can, however, improve the running time to O(n3 log n) by applying some matrix multiplication properties. 2.6.2 The Floyd-Warshall Algorithm

In this section we present a very ecient, simply programmed, and widely used algorithm that nds the shortest paths between all pairs of nodes, all at once. Furthermore, it has the important advantage over Dijkstas algorithm of working when the arc weights are allowed to be negative and will in fact allow us to detect negative-cost cycles. The algorithm works with an n n distance matrix dij , initially set to the arc weights cij of the directed graph G = (V, E ). For our purposes we assume that cii = for every i. Something very important for the function of our algorithm is the triangle operation. Given an nn distance matrix dij , a triangle operation for a xed node j is dik = min{dik , dij +djk } for all i, k = 1, . . . , n but i, k = j . Note that we allow i = k . This operation replaces, for all i and k , the dik entry with the distance dij + djk if the latter is shorter.
j

d(i, j)

d( j,k)

i d(i, k)

Figure 11: The triangle operation for xed j and all other i and k . Following, we present a theorem that states the correctness of the algorithm. Theorem 2.3. If we perform a triangle operation for successive values j = 1, . . . , n, each entry dik becomes equal to the length of the shortest path from i to k , assuming the weights cij 0. Proof We shall show by induction that after the triangle operation for j = j0 is executed, dik is 21

the length of the shortest path from i to k with intermediate nodes v j0 , for all i and k . For the basis, we set j0 = 1, which is clear. Assume then that the inductive hypothesis is true for j = j0 1. We must prove that for j = j0 it is dik = min{dik , dij0 + dj0 k } We must see two dierent cases. If the shortest path from i to k with intermediate nodes v j0 does not pass through j0 , dik will be unchanged, as the second argument of the min operation will be . So it will satisfy the inductive hypothesis. On the other hand, if the shortest path from i to k with intermediate nodes v j0 does pass through j0 , dik will be replaced by dij0 + dj0 k . By the inductive hypothesis, dij0 and dj0 k are both optimal distances with intermediate vertices v j0 1, so dij0 + dj0 k is optimal with intermediate vertices v j0 . This completes the proof. Following, we present the main body of the algorithm. Algorithm FLOYD-WARSHALL(W )
1 for i 1 to n 2 for j 1 to n 3 dij wij 4 ij null 5 for j 1 to n 6 for i 1 to n 7 for k 1 to n 8 if dik > dij + djk 9 dik dij + djk 10 ik j 11 return D

Note that the algorithm uses an extra matrix, where the shortest paths are stored. It is easy to see that the algorithm runs in time O(n3 ). We will now apply the algorithm to a graph G = (V, E ) of the following matrices: null null 1 null null 0 1 2 null null null 2 8 0 2 W = 3 0 4 , = null 3 null 3 null null 4 null null null 4 0 5 null 5 null null 7 5 0 By applying the algorithm, we get the following sequence of matrices:

22

j=1 W = 0 1 null null 1 null null 8 0 9 2 2 null 1 null 2 , = 3 0 4 null 3 null 3 null null 4 null null null 4 0 7 5 0 5 null 5 null null

j=2 W = null null 1 null null 0 1 8 0 9 2 2 null 1 null 2 3 null 3 2 11 3 0 4 5 , = 2 2 4 null null 2 12 4 0 6 5 null 5 null null 7 5 0

j=3 W = 0 4 1 3 8 0 9 5 11 3 0 4 12 4 0 7 10 5 3 6 2 5 6 0 null 3 1 3 3 2 null 1 3 2 2 3 null 3 2 2 4 null null 2 5 3 5 3 null

, =

j=4 W = 0 8 8 12 7 1 1 3 3 0 9 5 2 0 0 4 2 4 0 6 7 5 3 0 null 4 1 3 4 2 null 1 3 2 4 4 null 3 4 2 4 null null 2 5 4 5 3 null

, =

j=5 W = 0 8 8 12 7 1 1 3 0 7 5 0 0 4 4 11 0 7 5 3 3 2 2 6 0 null 4 1 3 4 2 null 5 3 2 4 4 null 3 4 2 4 5 null 2 5 4 5 3 null

, =

23

2.6.3

Transitive Closure of a Directed Graph

Using a slightly modied version of the Floyd - Warshall algorithm, we can answer the question whether there is a path that leads from node u to node v , or if node v is reachable from u. This can be done by running a DFS from node u, and see if v becomes a descendant of u in the DFS tree. This procedure though takes O(m + n) time which may be too long if we want to repeat this check very often. Another method is by computing the transitive closure of the graph which can answer the reachability question in O(1) time. In order to compute the transitive closure of graph G, we can use the Floyd - Warshall algorithm with the following modications: Instead of the distance matrix D we use a transitive closure matrix T which is initialized to the values of the adjacency matrix A of G. We change the relaxation part by substituting lines 8-10 of the FLOYDWARSHALL(W ) algorithm with the statement: tik tik (tij tjk ) Algorithm TRANSITIVE-CLOSURE(G)
1 for i 1 to n 2 for j 1 to n 3 tij aij 4 for j 1 to n 5 for i 1 to n 6 for k 1 to n 7 tik tik (tij tjk ) 8 return T

In the above algorithm, matrix A is the adjacency matrix of graph G and is dened as follows: 0 , if (i = j ) ((i, j ) / E) aij = 1 , if (i = j ) ((i, j ) E ) It is evident that the two algorithms presented are very similar. They are both based on a type of algebraic structure called a closed semiring.

2.7
2.7.1

Program Evaluation and Review Technique (PERT)


General

Program evaluation and review technique (PERT) charts depict task, duration, and dependency information. Each chart starts with an initiation node from which the rst task, or 24

tasks, originates. If multiple tasks begin at the same time, they are all started from the node or branch, or fork out from the starting point. Each task is represented by a line which states its name or other identier, its duration, the number of people assigned to it, and in some cases the initials of the personnel assigned. The other end of the task line is terminated by another node which identies the start of another task, or the beginning of any slack time, that is, waiting time between tasks. Each task is connected to its successor tasks in this manner forming a network of nodes and connecting lines. The chart is complete when all nal tasks come together at the completion node. When slack time exists between the end of one task and the start of another, the usual method is to draw a broken or dotted line between the end of the rst task and the start of the next dependent task. A PERT chart may have multiple parallel or interconnecting networks of tasks. If the scheduled project has milestones, checkpoints, or review points (all of which are highly recommended in any project schedule), the PERT chart will note that all tasks up to that point terminate at the review node. It should be noted at this point that the project review, approvals, user reviews, and so forth all take time. This time should never be underestimated when drawing up the project plan. It is not unusual for a review to take 1 or 2 weeks. Obtaining management and user approvals may take even longer. When drawing up the plan, be sure to include tasks for documentation writing, documentation editing, project report writing and editing, and report reproduction. These tasks are usually time-consuming, so dont underestimate how long it will take to complete them. PERT charts are usually drawn on ruled paper with the horizontal axis indicating time period divisions in days, weeks, months, and so on. Although it is possible to draw a PERT chart for an entire project, the usual practice is to break the plans into smaller, more meaningful parts. This is very helpful if the chart has to be redrawn for any reason, such as skipped or incorrectly estimated tasks. Many PERT charts terminate at the major review points, such as at the end of the analysis. Many organizations include funding reviews in the projects life cycle. Where this is the case, each chart terminates in the funding review node. Funding reviews can aect a project in that they may either increase funding, in which case more people have to made available, or they may decrease funding, in which case fewer people may be available. Obviously more or less people will aect the length of time it takes to complete the project. A PERT network can be modelled as a weighted, acyclic digraph (directed graph) in which each edge represents an activity (task), and the edge weight represents the time needed to perform that activity. An acyclic graph must have (at least) one vertex with no predecessors, and (at least) one with no successors, and we will call those vertices the start and stop vertices for a project. All the activities which have arrows into node x must be completed before any activity out of node x can commence. At node x, we will want to compute two job times: the earliest time et(x) at which all activities terminating at x can be completed, and lt(x), the latest time at which activities terminating at x can be 25

completed so as not to delay the overall completion time of the project (the completion time of the stop node, or - if there is more than one sink - the largest of their completion times). 2.7.2 Critical Path Method

The main interest in time scheduling problems is to compute the minimum time required to complete the project based on the time constraints between tasks. This problem is analogous to nding the longest path of a PERT network (which is a DAG). This is because in order for a task X to commence, every other task Yi on which the former may depend must be completed. To make sure this happens, X must wait for the slowest of Yi to complete. For example in Figure 9, task X must wait for Y1 to complete because if it starts immediately after Y2 , Y1 , which is required for X , will not have enough time to nish its execution.
Y1 5

X 2 Y2

Figure 12: X must wait for Y1 to complete before it starts. The longest path is also called the critical path, and this method of time scheduling is also called critical path method. A simple example of a PERT diagram for an electronic device manufacturing is shown in Figure 9.

26

1. IC Design ma 12.1.1998 ke 14.1.1998

2. IC Program ming la 17.1.1998 su 18.1.1998

3. Prototyping ke 21.1.1998 to 22.1.1998

4. User - Interface ti 13.1.1998 su 18.1.1998

5. Field Tests su 25.1.1998 ma 26.1.1998 8. Package Design su 25.2.1998 ke 28.2.1998

6. Manufacturing to 29.1.1998 ma 2.2.1998 9. Users Manual su 8.2.1998 ma 9.2.1998

7. QA Lab Tests to 5.2.1998 ma 9.2.1998

Figure 13: PERT diagram for the production of an electronic device.

27

GPSQL Advanced Topics


Note - This document is intended for individuals who currently have a solid working knowledge of GPSQL but want a more detailed look at the underlying technology, options and query building. Brief review - GPSQL is a front end for a Microsoft Access Database (MDB). Microsoft own front end application called Access is not required on the users system because the core data access objects libraries aka, DAO) is installed along with GPSeismic. MDB databases have the following limitations:

Item
Database size Number of Tables Number of characters in a password Number of characters in a user name or group name Number of concurrent users Number of characters in a table name Number of characters in a field name Number of fields in a table Number of characters in a Text field Number of characters in a Memo field 2 Gb 2048 14 20 255 64 64 99 255

Maximum sizes/numbers

65,535 / 1 Gb

Strategies for large databases - While the maximum file size would seem to allow MDB databases to handle almost any conceivable project, large databases below this size still cause resource problems (read sluggishness). What you can do to avoid this? One key point is that things slow down when you have many records in a table, particularly the POSTPLOT table which has over 50 fields. One thing you can do if you have quite a bit of culture is to keep this in a separate table. Do this by using one of the Append/Create Table routines in the Modifications menu. It s also possible to keep data such as culture in a separate database. Do this by selecting the Culture database in QuikView and using masks to filter normal production or opening the Culture database and using the MDB Table item in the Import menu. Another thing you can do is to use Compact/Repair occasionally, an action analogous to defragging a hard drive. Two others involve liberal use of Aliases and Table Links. Aliases are discussed in detail later in this document. Linking to a table allows you to use a table in another database as if it were just another table in your open database. However, while all reporting routines are available to you, practically all modification routines are unavailable. A recent change is the ability to carry out an action on multiple databases in one step. So if you maintain several databases instead of one extremely large database, the new multiple database actions could help. However, note that the actions you can take with multiple databases are limited to what you see here. Future development We will be moving to the ACCDB database format (which can support 32 and 64 bit architectures) in the next couple of years. This is essentially a recent change in MDB format. It is downward compatible meaning that MDB databases will be supported.

Shared mode notes If you are going to want to access the database using a couple of applications simultaneously, make sure that you have selected shared mode in Project Manager options. For example, if your intention is to launch QuikMap from GPSQL and give QuikMap the ability to update the database, this option has to be selected and in GPSQL, when you get to the field selection dialog, you have a second checkbox to check. The fields to null can be explained by an example. Lets say you move some points in QuikMap and update the database. The grid coordinates change but the latitude and longitude dont. By selecting the latitude and longitude fields here, any points that move result in erasure of the latitude and longitude field contents for those records. Useful Options - There are over a dozen options on the Miscellaneous dialog. Many were placed there to satisfy one or two users. In my opinion the most useful are these:

Create copy of database on startup This does what it says. The name of the backup is the

same name of the database but has a BU extension. Just change the extension to MDB if you need to. Otherwise the backup is re-written each time you open the database. You are prompted for whether to delete when you close the application.

Remember last field selections for each query If this is on, each query (0-99) remembers the last fields you used in the Field Selection dialog. The point is, this might or might not be right. If your query changed, you might be referencing a whole new table so the field selection would be wrong. If you always use the same queries, and IF you use non-default fields, this is for you.. Attempt to restore geodetic parameters when opening a database With this on, every time you
open a database, GPSQL looks for the project PRJ file and automatically changes the coordinate system and geoid model to whats inside.

Confirm/Lock Transformations - Turn these on and when you do something a coordinate

system (e. g., xy -> lat/lon conversion), if your coordinate system selection doesnt match what the current database coordinate system is, you get slapped on the wrist. User Interface (UI) Subtleties If you are training someone and want them to perform a number of routines in a particular order, you could write them down on a piece of paper, but a cooler way is to set this up in the ToDo list available in the Current Settings panel of the UI. Essentially, it starts the user out on the road to performing each function. Theres two text boxes at the bottom of the UI. You can enter coordinates here and launch Google Earth or your browser (and Google Maps) with the results.

Adding Fields To The Database You can click on a table, rightclick and select Add Field to add a field to a table. Generally, a field is numeric or alphanumeric (string type). If you add a string, you must specify a length. If you specify a numeric field, you need to choose what type. If you dont know, make it a double which can be very small or large values. If you need to change the length of a string field, it can be done without losing data. Note that when you create a project, there is a new feature that allows you to create a number of user fields. I personally feel you should limit the number of user fields and an option to them are often something called aliases which are discussed later in this document. A couple of options when adding a field is to initialize each record with a value or string or in the case of a long type, a unique value.

Advanced Query Building - Its assumed that the reader can and has built a number of queries and requires no instruction about constructing a query with multiple criteria. Therefore, this document skips over this aspect of query building. We will cover three items: 1) Query building tools that you might not realize are there, 2) Joins, and 3) Aliases. Distinct Key Word Lets make this brief. This isnt a very useful item. If you use it, and there are exact duplicates based on the field selection of your query, then only one record is returned. For example, consider this query: Select [POSTPLOT].`Track` From [POSTPLOT] You will get every record in the database. However, add the DISTINCT keyword, e. g.: Select Distinct [POSTPLOT].`Track` From [POSTPLOT] And you will get a list of unique Track numbers. You could possibly use it to find exact record duplicates and indicate that you need to purge the database. For example, consider the following: Select DISTINCT [POSTPLOT].`WGS84 Latitude`,[POSTPLOT].`WGS84 Longitude`,[POSTPLOT].`Survey Time (GMT)` From [POSTPLOT] If the number of records without the DISTINCT key word is greater than with it, you probably have to use the purge routine. Miscellaneous Query Helpers The query builder has a few helpful items to assist you in entering criteria using some seldom used keywords. These include Between and In. They are also useful in helping you identify fields without contents. Wildcard Characters One thing you might not know is that there are a couple of different flavors of SQL floating around. One flavor uses the asterisk (*) as a wildcard and a question mark

(?) as a single character wildcard. Another flavor uses the percentage sign (%) and the underscore (_) for the same wildcards. GPSeismic uses the former, however, when you write a query and display it in the display viewer (which uses the second flavor), our software analyzes the query and replaces any asterisks with percentage signs and any question marks with underscores before executing the query. At present, we don't do anything with underscores and percentage signs you might use in queries displayed by the display viewer because its difficult to figure out what context they are being used. Keep this in mind because you might get some unexpected results. So how do you write a query to find all records where a string field has a wildcard? You would bracket it. So instead of... Select [POSTPLOT].* From [POSTPLOT] Where [POSTPLOT].`Station (text)` Like '*?*' ...you would write... Select [POSTPLOT].* From [POSTPLOT] Where [POSTPLOT].`Station (text)` Like '*[?]*' The same holds true for any wildcard.

Group By queries Group By queries are often used to do what their name implies, that being to create groups of records based on some field. For example, you could group records by Survey Mode. There is a query that will actually display the totals but you cant build it from the query building dialog. There is a Table menu item that will build the Group By query for you. Lets say your pack operators are assigned specific GPS receivers so you can determine the operator by the receiver serial number. Let us further assume you want to see a total of all points grouped by receiver serial number (pack op). This is what you do: 1- select an unused query 2- click on the field Receiver SN on the main interface 3- Right click to display the Table menu and select Build Group By Query item 4- Configure the dialog above. Note the fact that you can add a Where clause from an existing query if you want. Once you are finished, press the OK button. SELECT [POSTPLOT].`Receiver SN` , COUNT( [POSTPLOT].`Receiver SN`) AS Total FROM [POSTPLOT] GROUP BY [POSTPLOT].`Receiver SN` Displaying the query will give you something like this, where you get all distinct occurrences of receivers and how many there were. Unfortunately, GPSQL will balk at creating a custom or special report, but you can always copy and paste or export a CSV file. Importantly, this is useful data that gives you how many points were shot by each pack op. You can modify the query, but you have to do it manually. Note that the Where clause goes right before the GROUP BY. Another useful Group By query is this one: SELECT [POSTPLOT].`Track` , COUNT( [POSTPLOT].`Track`) AS Total FROM [POSTPLOT] GROUP BY [POSTPLOT].`Track` This gives you a list of all tracks and how many bins for each. You might want to modify it with a Where clause to ensure culture and other records dont get included: SELECT [POSTPLOT].`Track`, COUNT([POSTPLOT].`Bin`) AS Total FROM [POSTPLOT] Where ([POSTPLOT].`Track`>0) GROUP BY [POSTPLOT].`Track` The following query is a tricky way to produce a list that allows you to spot duplicates: SELECT [POSTPLOT].`Station (value)`, [POSTPLOT].`Bin` , COUNT( [POSTPLOT].`Station (value)`) AS Total FROM [POSTPLOT] GROUP BY [POSTPLOT].`Station (value)`, [POSTPLOT].`Bin`

If there were no duplicates based on Station (value) there would be 1 as the count for each record. Click on the spreadsheet column head twice to order in a descending manner and the duplicates will rise to the top. COUNT is something called an aggregate function. There are several others including AVG, SUM, MAX, MIN. Sometimes they can be used effectively to isolate information but they have to be defined manually. The Group By builder cant do it. . For example, supposed we wanted a list of each receiver serial number and what the highest PDOP was recorded by each. This query would do it: SELECT [POSTPLOT].`Receiver SN`,MAX([POSTPLOT].`PDOP`) AS MaxPdop FROM [POSTPLOT] Group By [POSTPLOT].`Receiver SN` Change the MAX to AVG and we will get a concise list of the average geometry each pack operator is utilizing for his overall work.

Table Joins Queries that employ table joins are useful for a number of reasons. For example if you wanted to generate a report in which each record contained both the survey coordinates and the preplot coordinates, then a table join will do the trick: Select [POSTPLOT].*,[PREPLOT].* From [POSTPLOT],[PREPLOT] Where [POSTPLOT].`Station (value)`=[PREPLOT].`Station (value)` Notice the criteria in which we are only generating records where there is a match in the field, Station (value). This join is an effective way to not only make the fields of both tables available to you, but to also insure that you only have points that have been surveyed.

Table join downsides - There are only a couple of cautions. One is that if you try to execute a join without a field specified to join them on (Station (value) in the query above), the query will return a number of records that is the number of records in one table times the other. This means that if you have 50,000 records in both tables, the query will return 2.5 billion records!
The other downside is that table joins are not actually editable, so practically all modification routines will fail. One notable exception is updating a field using the Update dialog. Left and Right Joins The default join is called an Inner join. The Inner keyword is not required. There are other type of joins called Left and Right that look exactly like the query above but do contain the Left and Right keywords.

Here is a Left join: Select [POSTPLOT].*,[PREPLOT].* From [POSTPLOT] Left Join [PREPLOT] On [POSTPLOT].`Station (value)`=[PREPLOT].`Station (value)` And here is a right join: Select [POSTPLOT].*,[PREPLOT].* From [POSTPLOT] Right Join [PREPLOT] On [POSTPLOT].`Station (value)`=[PREPLOT].`Station (value)` As you can see, they are exactly the same except the single keyword. However, what they do is totally different. The Left join will produce all records of the table listed immediately after the Select keyword along with matching records of the other. If, for a particular record, there is no match, you still get the fields of the other but they have no content. For example, let s say we had 4 records in the Postplot table and 2 in the Preplot table. A Left (Postplot) table join query would produce 4 records:

Postplot record 1 Postplot record 2 Postplot record 3 Postplot record 4

Preplot matching record 1 Preplot matching record 2

A right join for the same tables would produce 2 records. In this case both Preplot and Postplot fields would be filled because each existing Preplot record has a match. In GPSQL, tables are always listed alphabetically, therefore a Left join is always as described above, namely, all records of the Postplot table. We could use this to our advantage to find all survey data that does not have a match in the Preplot table. This could be useful information because it essentially represents everything we surveyed without a preplot (no match). The key is to take the Left table join query and add some criteria that eliminates the matches. The query below will do that: Select [POSTPLOT].*,[PREPLOT].* From [POSTPLOT] Left Join [PREPLOT] On [POSTPLOT].`Station (value)`=[PREPLOT].`Station (value)` Where ([PREPLOT].`Station (text)` Not Like '*') Aliases An alias is an expression in a query that appears to add a field to the table! However, its really not a field and this is a good thing because it allows you to enhance you reporting and other capabilities without increasing the size of the database. As our first example, lets say you have to produce a SEG file where each record has the format below: R5121 51211189 02491672N074265872W25987965 3290073 0

Here is a SEG file that has an R and the track number in the first few columns of the record. What do you do? You could make the file and go at it with a good text editor, but thats fairly labor intensive.

When you build a query, got to the Alias tab page and make it look like this:

Now when you use that query in any way, you will notice an extra field called MyCharacter that has an R in it:

This alias field is also available in the custom and seismic report builder like any other field. The actual query looks like this: Select "R" AS `MyCharacter`,[POSTPLOT].* From [POSTPLOT]

You should note the syntax of an alias is

Expression AS Name
So in the above query, our expression was simply one character (surrounded by quotes) and we gave it the name of MyCharacter However, this is still not exactly what we want because we want to see R 5121, not just R. here is where we step it up a notch and make the alias a bit more complex:

Notice we added an ampersand which is used to concatenate items, and the Track field.

This displays as:

And the query looks like this: Select "R" & [POSTPLOT].`Track` AS `MyCharacter`,[POSTPLOT].* From [POSTPLOT] Now you have everything you need to make the file. In the field selection dialog, swap MyCharacter with Descriptor. Here are a number of aliases and what they do:

Simple String Manipulation Creates a field that is all uppercase . Use LCASE for lower case. Creates a field that is the leftmost 4 characters of the Station (text) field. Use Right to get the rightmost characters. Creates a field which is the three characters of the Station (text) field starting from the second character. Select UCASE([POSTPLOT].`Descriptor`) AS `MyAlias`,[POSTPLOT].* From [POSTPLOT] Select Left([POSTPLOT].`Station (text)`,4) AS `MyAlias`,[POSTPLOT].* From [POSTPLOT] Select MID ([POSTPLOT].`Station (text)`,2,3) AS `MyAlias`,[POSTPLOT].* From [POSTPLOT]

Removes the spaces from either side of a Select TRIM ([POSTPLOT].`Station (text)`) AS string field. There is also LTRIM and RTRIM `MyAlias`,[POSTPLOT].* From [POSTPLOT] to remove left and right spaces only.

Date Related Aliases Creates a field that is the number of days ago the point was shot Creates a field that is the Julian Day the point was shot Select DateDiff("d",[POSTPLOT].`Survey Time (Local)`,Now) AS `MyAlias`,[POSTPLOT].* From [POSTPLOT] Select DateDiff("d","01/01/12",[POSTPLOT].`Survey Time (Local)`) AS `MyAlias`,[POSTPLOT].* From [POSTPLOT]

Coordinate/Height Manipulation Aliases Creates a field that is the difference between Select [POSTPLOT].`Local Height`local height and DEM height (assuming the [POSTPLOT].`DEM Height` AS latter exists). `MyAlias`,[POSTPLOT].* From [POSTPLOT] Select [POSTPLOT].`Local Easting` There are two aliases here that use the offset [POSTPLOT].`Offset (East)` AS `Preplot fields and survey points to produce the Easting`,[POSTPLOT].`Local Northing` original preplot. [POSTPLOT].`Offset (North)` AS `Preplot Northing`,[POSTPLOT].* From [POSTPLOT] Select [POSTPLOT].`Local Easting` [PREPLOT].`Local Easting` AS `DX`,[POSTPLOT].`Local Northing` [PREPLOT].`Local Northing` AS `DY`,[POSTPLOT].*,[PREPLOT].* From [POSTPLOT],[PREPLOT] Where [POSTPLOT].`Station (value)`=[PREPLOT].`Station (value)`

Yes, you can use aliases with table joins. There are two aliases here that produce the difference between the preplot and postplot coordinates.

If you need an instant stub line, try this tactic. Create a query that isolates a desired Select [PREPLOT].`Local Northing` - 500 AS line and then create an alias that manipulates `MyAlias`,[PREPLOT].* From [PREPLOT] the coordinates appropriately. Send to Where ([PREPLOT].`Track`=1203) QuikMap, replacing the default coordinate fields with the alias. The following are example of aliases using the IIF keyword. The syntax is like this: IIF ( Some Expression , Do this if true, Do this if false) So after the IIF keyword, there are three comma delimited items in parentheses. If the expression is true, the alias will display the second argument and if its false, the alias displays the last argument. Select IIF([POSTPLOT].`PDOP`>2,"bad dop","good dop") AS `GoodDopBadDop`,[POSTPLOT].* From [POSTPLOT] Select IIF([POSTPLOT].`Survey Mode (value)`=3,"SuperFine",[POSTPLOT].`Survey Mode (text)`) AS `ReplaceField`,[POSTPLOT].* From [POSTPLOT] Select IIF ([POSTPLOT].`Offset (Crossline)`>.2,"->","") AS `OffsetRight`,IIF ([POSTPLOT].`Offset (Crossline)`<-.2,"<-","") AS `OffsetLeft`,[POSTPLOT].* From [POSTPLOT] Where (abs([POSTPLOT].`Offset (Crossline)`)>=.2) And ([POSTPLOT].`Track`>3000)

Creates a field that says good dop or bad dop Creates a field that is whatever is in the survey mode (text) field unless the survey mode (value) field is 3.

A very complex example. There are two aliases here which display arrows indicating the cross line offset.

More IIF Aliases Select IIF ([POSTPLOT].`Offset (Crossline)`>0,INT([POSTPLOT].`Offset (Crossline)`) & "->","") AS `OffsetRight`,IIF ([POSTPLOT].`Offset (Crossline)`<0,"<-" & abs(INT([POSTPLOT].`Offset (Crossline)`)),"") AS `OffsetLeft`,[POSTPLOT].* From [POSTPLOT] Where abs(INT([POSTPLOT].`Offset (Crossline)`)) > 10 Select IIF ([POSTPLOT].`Offset (Crossline)`> 0,IIF([POSTPLOT].`Offset (Inline)`>0,"Offset to right and ahead","Offset to right and back"),IIF([POSTPLOT].`Offset (Inline)`>0,"Offset to left and ahead","Offset to left and back")) AS `Offset`,[POSTPLOT].* From [POSTPLOT] Where ([POSTPLOT].`Station (value)`>0)

As above but also indicates cross line distance

As above but includes in-line information and is more verbose.

A Couple More Using the IsNumeric and IsNull Keywords This IIF alias does something based on if the contents of a field is numeric This IIF alias does something based on if there is something in a field. Select IIF(IsNumeric([POSTPLOT].`Descriptor`), "OilWell",[POSTPLOT].`Descriptor`) AS `IsANumber`,[POSTPLOT].* From [POSTPLOT] Select IIF(IsNull([POSTPLOT].`Offset (Range)`),"undefined",Int([POSTPLOT].`Offset (Range)`)) AS `IsNothing`,[POSTPLOT].* From [POSTPLOT]

Heres a list of all keywords that I know work: VAL (exp) CSTR (exp) SIN (exp) COS (exp) TAN (exp) INT (exp) FIX (exp) SQR (exp) ABS (exp) LOG (exp) TRIM (exp) RTRIM (exp) LTRIM (exp) MID (exp,i,j) IIF (exp,a,b) the value of the expression the string value of the expression the trigonometric sine of the expression trigonometric cosine of the expression trigonometric tangent of the expression integer portion but converts -8.4 to -9 for example integer portion but converts -8.4 to -8 for example square root of the expression absolute value of the expression natural logarithm of the expression removes any spaces left or right of the expression removes any spaces to right of the expression removes any spaces to left of the expression returns the number of characters specified by j starting at i returns a if expression is true otherwise return b

DateDiff (format, date1, date2) - returns difference in two times where format is as shown below: Year quarter Month Day Week yyyy q m d ww

String (exp1, exp2) IsNumeric (exp) IsNull (exp) -

returns expression 2 a certain number of times dictated by expression 1 returns -1 if expression is a number (true) and 0 if not (false) returns -1 if expression has no contents (true) and 0 if it does (false)

And finally, a slightly special case: Exp MOD n takes the expression and divides by n and returns the remainder A Few Final Facts About Aliases

Expressions can be used elsewhere - One important thing to keep in mind is that what you have

learned here about aliases translates well to the topic of updating a field. If you actually want to modify a field, you can do that by selecting your query, selecting the Update Field(s) item from the Modifications menu, choosing the field you want to modify and finally picking an option. One of those options is to make the selected field equivalent to an expression. The expression can involve any of the keywords we have seen here. So for example, if you had actually added a string field to your Postplot table and wanted to make it equivalent to the Track and a preceding character, the expression might look like this: "R" & [POSTPLOT].`Track`

Using an Alias to flag points in QuikMap One thing you cant do is put the cart before the horse.
Another is to use the result of an alias to do something else. For example, if you tried to write a query where the criteria involved an alias it wont work. For example this wont work:

Select [POSTPLOT].* From [POSTPLOT] Where [POSTPLOT].`Local Height` -100 AS `MyHt` = 0 However, we did put a special function in QuikMap to deal with the result of Aliases. Lets say you have an initial layer in QukMap and you want to turn all points divisible by 4 to hit status. What you could do is display the points in the display viewer (using the Outputs dialog), and then build a query with an alias as shown here: [PRIMARY].`Station` MOD 4

This produces a field where only the stations divisible by 4 are zero. There rest have values greater than zero.

From the File menu select Turn Current Query Records Hit Using Alias. This displays a dialog that allows you to specify which of the alias values to use for turning the points hit. There are some more complicated aliases you can create that allows you to pick out specific points. Recently, a client had interpolated half stations in QuikMap and wanted to isolate only 104.5, 106.5, 108.5, etc. Yhe following alias created a field where only the desired records had values of zero: ([PRIMARY].`Station` - 104.5) MOD 2 = 0 Essentially, the alias takes each station, subtracts 104.5 from it, divides it by 2 and returns the remainder.

Aliases can be used for all reporting purpose Try to remember that aliases add flexibility to almost all reporting routines. Case in point is the production statistics, also known as the mileage calculator. Recently, a client requested that that entire routine be modified so as to include the BOL and EOL grid coordinates. This wasnt necessary because when the field selection dialog is presented you could simply replace the default Descriptor field with an alias, that alias being something like:
INT ([POSTPLOT].`Local Easting`) & "/" & INT([POSTPLOT].`Local Northing`)

Using GPSQL To Assist You In Creating Maps GPSQL has a number of tools that allow you to create layers for maps. This essentially comes down to DXF and SHP files. If you are a GPArc or ArcView user, you probably want to stay with SHP files because these mapping applications render these types of files efficiently. There are two utilities to create multiple files quickly. The first described here creates SHP files only. Multiple Map Utility (Custom DBF) - This utility allows you to create one point shape file for any or all GPSQL queries. The queries you select must not be table joins and must be either POSTPLOT or PREPLOT table queries. A shape file is made for each query you select (by checking the OK checkbox). The file name is specified in the third column. If you accidentally use the same name, the file made first will be overwritten so be careful when naming files. The Station, Easting and Northing fields will ALWAYS be included in the DBF so there is no need to select them using the '+' checkbox. You can select up to 15 additional fields. If you select more than 15, only the first 15 will be used. The NA checkbox will indicate any field that cannot be reliably entered in the DBF file (for example, binary fields). Remember that DBF field names will be different from the MDB field name in many cases. This is because DBF field names are limited to 10 characters and cannot have many of the special characters allowed in MDB databases. Once you have specified the files you want to create and a folder to create the files in, you can create all files by pressing .

Multiple Map Utility In order to use the Multiple Map Utility, at the bottom of the dialog, you select a database by pressing , and a folder to create the files in by pressing . Then you construct up to twenty map layer configurations. The parameters for a map layer must include the selection of one of twenty queries (that you build), whether the map layer is SHP or DXF type, and several settings that are applicable to that type of file.

The number of bin digits is important if you intend to create line files identified by track. You can build a number of DXF and SHP point and line files. For either of these options exist for whether to create selected points (line ends, every nth, etc.) SHP lines along tracks The user must enter a value that dictates where to break the lines. For example, if you enter 10, then when the difference between two consecutive stations (ordered by their station numbers) differs by more than 10, this represenst a break in the line. Note that because of the numbering associated with some oblique grids and the gap nature of brick patterns, you probably dont want to use this for those types of preplots.

SHP lines connecting postplot and preplot If this is selected, its assumed you selected a Postplot query that represents the points you want to connect to their Preplot counterpart. It also assumes that this is a standard GPSeismic database that contains the Preplot table and it is populated. If it doesnt detect all of these conditions, it does nothing. DXF points as with SHP, options exist for whether to create selected points. The user also specifies symbol and all associated parameters (size, layer and associated text). DXF lines along tracks This is similar to SHP but you have a number of line labeling options. What the Token? To my knowledge, this feature is only used by a couple of people but just so you know how it works, heres the gist. If a query is constructed that has the string <Token> in it, and in configuring the map layer, you indicate that the token string is OilWell, then when the query is executed, the word OilWell replaces the word <Token>. Why do this? Because if you want to configure a separate map layer and use a second string, say GasWell, you c an. So in summary, its a way you can use one query to produce several map layers. Confused? Dont be. I doubt youll ever use it. Creating queries This is the standard query building dialog so hopefully you dont have much problem here. Note that one query might actually be used for more than one map layer. For example, suppose you isolated the source points with a query. This one query could be used for the point layer, the lines along tracks layer and the postlot/preplot connecting line query. IMPORTANT Press this after you have made any changes to a map layer configuration.

Are you feeling lucky? If you are new to this utility and want to jumpstart yourself, after selecting your database and folder, press this button. It will instantly configure the utility for eight map layers consisting of postplot and preplot points, along track lines, connecting lines and culture (as defined by alpha station names). Daily use of the utility Heres the deal. The first time you use this, you should create the preplot files, or perhaps you want to create the preplots using one of GPSQLs query routines. However, on a daily basis, you re-create all of the postplot data, re-writing the file you created the day before. This is a one click procedure once the utility is configured. Hopefully, your mapping application is like GPArc in the sense that you create your map by rendering particular layers in a certain manner and then save this to a file which contains the rendering settings. After you recreate all your layers with the utility, you simply re-display your map using your rendering settings. GPArc is blissfully unaware that there are now more points in the files. Creating SHP and DXF Files From Selected Queries

SHP Point When you select a query and create a SHP point files, your options on the field

selection dialog include point selection (BOL, EOL,) and whether you want all table fields to go into the underlying database. each pair of points and allows you to enter a value which is used to break the line between two points whose station values differ by more than this amount. One option creates one line per

SHP Line There are three options for how to connect the points. One will create lines between

track (requiring the correct number of bin digits be entered), and one option connects all the points in the query.

SHP Polygon You have two choices here. Create one polygon with all the points in the query or one polygon per track. For the latter choice, your field procedures for surveying a number of lakes or archaeological sites for example would involve coming up with a numbering scheme like 1001, 1002, 1003, and so on for one feature, then 2001, 2001, 2003, for the next, and so on. You could then use 3 for the number of bin digits and GPSQL would create the individual polygons. SHP Contours This is a relatively new feature. You are first prompted with the name of the SHP
file to create. On the field selection dialog, you need to specify easting, northing and z-field. The latter is typically the height, but remember that you can use any field to represent 'z'. Some interesting plots can be made using precision, dop, etc. On the field selection dialog, you also enter the contour interval. You can leave this at 0 for automatic.

The second item entered is the number of grid nodes. This is an important value and requires some explanation. Behind the scenes, the routine must first create a digital elevation model (DEM)) from the points from the query. The evenly spaced DEM is then used to create the contours. The best case scenario is that the points in your query exhibit good spatial separation, that is, the entire area to be contoured is neither narrow (in any direction) or bunched up (e.g., many points in one area and devoid of points in another). The grid node value is used to evenly divide up the spatial extents of your data. For example with a value of 100, the 'behind the scenes' DEM consists of 10,000 grid nodes (100 x 100). From experience, the number of grid nodes should be fairly high (say 100 or more). The maximum is 2000 and will provide the highest resolution set of contours (at the expense of time to create).

DXF files You can create DXF point or line files. For line files, you have a number of selections
for how to label the lines.

Selected point Annotations This obscure routine will use the points selection options on the field selection dialog to place specific strings in a specified field. It will place the string BOL at the first one (or two) points of a track, EOL at the last one (or two), and the word, BOX for all of the others. Why do this? Because some mapping applications (like GPArc) can conditionally label points based on the contents of certain fields of the underlying database. Therefore, this gives you a good way of labeling a point selection that would otherwise be impossible to label.

Vibe Data Processing Tools GPSQL and QuikMap have several tools that allow you to deal with vibe data.

Import There are import routines for the most common vibe files including Pelton, Sercel and

TigerNav. The tables resulting from the import are very similar to the Postplot table through the first dozen fields or so. You will also notice that if the file type supports it, there will be a vibe ID field and a sweeps field. There is provision for transforming coordinates on input and applying a geoid height correction in order to obtain local heights. Caution I personally think there is a conspiracy to include header records in some of these files that are difficult if not impossible to recognize. So try removing header records before importing. you have a query to select your vibe data, you can select Append/Create Table Of Averaged Data. This averages all points with the same station name. However, there are two ways to come up with the average. One is lump all coordinates together to come up with the average, and the other is to come up with averages for each individual vibe, and then average those averages. This latter method is the most logical method. On the field selection dialog (or Vibe Data Options dialog), there is also an option for whether to add some fields to the parent table indicating the distance of each position to the resulting average. This information is good for spotting problems since a large value would normally indicate one or more positions were outliers. Regardless of what you choose to do here, you will still get similar information for each record in the table of averaged data.

Table averaging When data is imported and

QuikMap and the Multiple Comparison Layer After the

parent table and the averaged table exist, one of the most graphic things you can do is send two queries to QuikMap (in this case our parent table and our averaged table), and select File/Compare Offsets from the File menu in QuikMap. Once you do, it will be fairly obvious where there are problems. Also, there will be a File/Compare sub-menu item called Multiple that will allow you to display some diagnostic pointers to make it clear where certain groups have missing or additional positions and high COG statistics. Duplicate manager The Duplicate Manager is a useful tool for identifying trouble spots in the parent table vibe data. Its set up to use sweeps information and can summarize the time duration of the sweeps. Its possible to edit data in the Duplicate Manager including deleting records.

Seismic file (Vibe Positions) This is a relatively new feature that does not necessarily analyze imported vibe data, but rather creates SEG and other type of ASCII coordinate files. Specifically, you specify a query of positions and a template (TPL) file. The TPL file is a small file which defines points relative to a center point and includes a reference azimuth. What the routine does is uses the offset information of the TPL file to create a group of points for each point in the query. Possible uses include generating preplots for all individual vibe or creating a file to compare to the vibe positions as recorded.

Query Actions You Should Know About Production utilities These are handy to determine production both in a tabular and graphic manner. The Production Statistics utility has been around for a while and was formerly called the Mileage Calculator. It gives you a BOL/EOL summary for the selected query. The Production Calendar requires that you specify the start and stop dates for the selected query and a station interval (and of course the bin digits). There is a auto-select button that analyzes the query to determine the start and stop dates for you. The resulting calendar can have total linear distance or line by line totals. If you have too many distinct lines, Id opt for the totals only. Right click on the resulting calendar and you can display daily, weekly or monthly views. The Production QuikMap/SHP utility will require that you specify the colors you want particular days displayed in. Then you can create a QuikMap plot of your prospect colorized to match your selection. This is a great way to tell what was shot on what day(s). In the example here, we have used ramp colors but specified that one particular day was to be plotted in green. Below is the resulting QuikMap plot:

So the next time the client asks where the pack ops were working on a particular day, you can show them. This utility can also create a SHP file of polygons where the underlying database contains day of year, day of month, and day of week. In this manner the polygons can be rendered to the users tastes. Other utilities include the PackOp Summary and Google Earth time KML files.

GPSQL/QuikMap And The Mapping Table If you need to isolate records in the database that fall in distinctly different geographical areas, it can be hard to do (or impossible) using a single table query. The Mapping table is a special table that can be created by first sending a query to QuikMap. Once in QuikMap, you would lasso a desired subset of points and when you display the popup menu, select the Graphical Query item. GPSQL will report that there is data to process. Say YES and a table called Mapping is created. The table contains two fields (Station (value) and Station (text) which will initially contain all the stations you lassoed. Repeat the procedure and you will be allowed to append to the table. The significance of this is that you now have a means of performing a table join to isolate only the records you want. Interpolation utilities

Interpolate The standard interpolation routine is not new. You select a query, configure the field selection dialog and then are presented a dialog which shows a plan view and height profile of the data. Your query is normally one that isolates one specific track or line. The key items for interpolation are in the Preferences menu and the most important item is the increment. Other items are seldom used. Note that all interpolated points have a descriptor with the word interpolated in it.
When you create interpolated points using the Postplot table, very few of the fields actually get populated. More often than not only the station, grid coordinates and height fields are filled. However there is utility for filling those fields described below. on a user supplied increment. The station gap dictates where not to interpolate points. For example, with a value of 100, if the difference between two consecutive stations differ by more than this amount, no points are interpolated between the two. This utility might be preferable to the one above. Its depends if you need the sophistication of the interpolation dialog and all it brings to the table. Note that all interpolated points have a descriptor with the word interpolated in it. graphically. You want to note that on the field selection dialog, the required fields are station, height and descriptor and the latter field is used to write the string interpolated into.

Interpolate All This bad boy interpolates all points within a query based

Interpolate Heights As the name implies, this utility allows you to interpolate heights. It s done

Fill Postplot Table This utility can be used to fill most of the blank fields in interpolated records.
You really need to make sure your current coordinate system, geoid model and number of bin digits are specified correctly.

DEM Utilities Theres two basic things you can do: 1) Create a DEM and 2) use a DEM to create an additional field with what the DEM says the height is. and height, a DEM is created. You do need to enter the number of grid nodes the DEM is to contain The DEM could be pretty good if the points have good spatial extent. If they dont the DEM is still created but DEM heights in areas where there were no points are basically interpolated (if points exist to either side of the void). Note that there is no extrapolation.

Create A DEM You select a query and based on the easting, northing

Compute DEM Heights Assuming you had a DEM, you can send the grid coordinates of a query
to it and have a field (DEM Heights) be populated with what the DEM says the height should be for each point. If the DEM is in a different coordinate system from the database system, you need to specify the coordinate system of the DEM on the Miscellaneous dialog. Otherwise, make sure the check box on the Miscellaneous dialog that says systems are identical is checked.

The DEM Height field will be populated with either the DEM height or a value of -9999 depending on whether the point fell inside or outside the DEM. This mechanism can be used to your advantage if you have a number of DEMs that cover your prospect. Send all the points to the first DEM and then write a query that isolates all records where the DEM Height = -9999. Using this query, keep running the same routine with different DEMs until there are no records in the query (which means evry point was assigned a DEM height). Google Earth We already saw where you can enter some geographic coordinates into a textbox located in the status bar and launch Google Earth (if installed) or your browser with Google Maps. You can also select a query and create points or lines in Google Earth.

Tip - If you have many points, use lines rather than points. The behind the scenes stuff going on involves creating a file with an extension of KML and starting Google Earth with this file as an argument. The same thing could be done by finding the KML file and double clicking on it. Because Windows associates this extension type with Google Earth, its launched and immediately displays what it finds in the file. The contents of the file should be latitudes and longitudes and be in the WGS84 datum because this is the reference system for all of Google Earths (and Google Maps) coordinates.

Tip - Configure the Google Earth settings from the Miscellaneous Settings dialog before creating files. However, remember that in Google Earth, you can turn labels on or off and size symbols once the map is displayed.

And Now The Rest Of The Story As I look over the query actions and modifications GPSQL is capable of, I cant help but list a number of items that we should cover, if only briefly. Creation Of Grid Definitions Files Grid definition files can be used for many operations including re-binning and preplot design. Both GPSQL and QuikMap have an automatic grid definition builder but its not always going to give you the right answer. However, there are tests for whether the grid definition file is right and with a few tweaks, you can make the grid definition file perform flawlessly. One key point if you select a query for use with the automatic grid definition builder is that you better be selecting preplots and just the source or receiver. Also, if any part of the preplots have been offset for culture, you want to avoid them since they will cause the parameters to be bogus (or not computed at all). One key point when the utility is finished is to look at the RMS value. The last thing the utility does after it has computed the grid definitions is to take a dozen random stations, and based on the station value, computes the stations coordinates. It compares this with the actual coordinates and then provides you with an average of the differences. A small number means the grid definition parameters appear to be working. The words appear to be working isnt a mistake. It is possible that the grid definition file created correct parameters for a grid unlike the one you wanted. What I mean by this is that the utility determined the number of bin digits incorrectly. Consider the grid definition below:

This looks fine if we are looking at lines 5001 and 5011 (stations 1351 to 1353). However, the is another grid that entirely valid:

If we are looking at lines 500 and 501 (stations 11351 to 11353), then the above grid is fine. So bottom line, if you want to get a valid grid definition file, create it, then look closely at the parameters and make sure the bin digits are what they are supposed to be. Also make sure you look at it visually in QuikMap. Its easy to see if its correct or not.

Use Of Grid Definition Files In Re-Binning First, re-binning in GPSQL means coming up with the name of the station based on where it is located in the theoretical grid. GPSQL will minimally add a field called Station (rebinned) and place the station value there based on the stations coordinates.

Tip Never, and I mean never, instruct your pack ops to re-number stations that they offset. Why? Well, they might get the station number wrong. Also, since there is no preplot for a re-numbered station, your offset information goes bye-bye. It is very confusing to have a number of manufactured surveyed stations to deal with. Trust me. Ive been contracted out to go to projects for the sole reason of trying to resolve problems that are caused by this. Ok, where were we? Rightre-binning. You want to note the available options on the dialog. You can add other fields including a indicator for whether a station was re-binned and yet others that are offsets relative to the actual position. Dont get too uptight about doing this either. Remember that you can always delete the fields you create. Also, note carefully that with the original station number and the rebinned one, you are in a perfect position to create a SEG or other file with whatever station the client asks for.

Handling Duplicates Theres been quite a few requests for this so lets take a quick look. The first way of spotting duplicates is through the display viewer. Once you select duplicate view from the dropdown in the toolbar and then click on a column head of any field, the query is sorted by that field. The other thing that happens is that in the panel at left, you will see a list of all duplicates based on the selected field. Right click on a row and select Go To and the spreadsheet will be scrolled to the point. What you do at this point is up to you. My preference would be not to delete one of the records, but rather, change it in some way so I dont consider it to be a valid record. Perhaps, I would change the Station (value) to zero and place BAD in front of the Station (text) name.

The Duplicates Manager can be used. We have seen that this utility lists al l duplicates and does allow for deletion of points. Again, however, my preference would be to change the record, not delete it. The Purging Duplicates utility is a specialized utility that is really more suited for exact duplicates, i. e., ones that might be the result of processing the same data collector file twice. The utility relies on three user fields for determining what duplicates are. The default are latitude, longitude and time, the theory being that you cant be in the same place at the same t ime without it being an exact duplicate.

Importing Data Its important to look at this topic because one real powerful tool when it comes to managing data is the table join. Consider getting a file of points that were drilled. Import that and do a table join and you now have the ability to give a client a coordinate file of everything drilled or producing a map of the same. The question is, how do I get the file in. You first have to ask yourself what format the file is in. Chances are it s an ASCII text file but it could be an Excel XLS file. In either case, you are in good shape. For example purposes, lets assume we have information that includes the station drilled, the depth at which it was drilled and the charge amount. Lets first assume its ASCII and looks like this: 50911226,34,6 50911227,33,6 50911228,34,6 50911229,34,7 Here we would use the ASCII File item in the Import menu. When all is said and done, the dialog would look like it does here. You might use different names for the fields but chances are that the rest would be the same. Notice that I made all fields doubles. For the station, its not necessary but if you do make it a string field and hope to join it to the Postplot or Preplot tables on this field, the field size has to match. Also, if there is as much as a space in one station that is not in the other, you wont get a match. So its much better and more reliable to stay with numbers, and in this case, doubles, since the Postplot and Preplot Station (value) field is a double. Of note for the ASCII Import is that the order you place the fields in the list box is the order that the fields will be created in the new table. Also, each delimited field has to have its delimited place in the record specified and delimited fields dont have to be sequential. In other words, even if the station was the 100th delimited field in a record, it wouldnt matter. We simply would indicate this when we add a field to the list. Finally note that if we get a new file from the driller, if it is to be appended, just specify the table to append to. If the data is a superset, then its best just to delete the previous table and recreate it. Now, you might get your data in the form of an Excel XLS file. This is usually no problem because there is an Excel import mechanism. You can also opt to make a comma delimited file from Excel and import it as above. If you do import directly from Excel, heres a brief explanation of how its done.

After being asked for the table to append to (or create), you will get a special dialog from which you can open an XLS file. You would then click on the desired sheet and the dialog will now look like what you see here.

At this point, you would choose Define Field Names And Format from the Preferences menu and do two things: 1) Give each field a name and 2) select what the data types are. Once you do this, from the main dialog you would elect to Place records in the import cache. The reason for this is that there may be more than one sheet in the XLS file and you will have the opportunity to place the records from the various sheets into the cache. Once done, select Finish Importing from the File menu and the import will be completed.

Importing And Using Swath Definition Tables This is a specialized import feature and its purpose is to allow the user a way to isolate swaths when swaths are defined by low and high track values and low and high bin values. You start by developing a comma delimited file of: swath number, low track, high track, low bin, high bin Below could be a typical file which would be imported: 1,0,5000,1000,2000 2,0,5000,2000,3000 3,0,5000,3000,4000 4,0,5000,4000,5000 5,0,5000,5000,6000 6,0,5000,6000,7000 7,0,5000,7000,8000 8,0,5000,8000,9000 9,0,5000,9000,10000 In the above example, there are 9 swath as identified by the numbers 1-9. The first swath is defined by a low track of zero, a high track of 5000, a low bin of 1000 and a high bin of 2000. The remaining swaths are identified similarly.

Once imported, a table is created. Fir the example below, make sure you call the table Swaths. If the above data was imported, you would have five fields and nine records. The field names are called: Swath LoTrack1 HiTrack1 LoBin1 HiBin1 The query builder is designed so that it is possible to drag and drop table fields into the specified value textbox. This allows 'BETWEEN' statements to be constructed based on multiple table fields. For example, if two tables are selected and called SWATHS and POSTPLOT, you could build a criteria like: Select [POSTPLOT].*,[Swaths].* From [POSTPLOT],[Swaths] Where [POSTPLOT].`Track` Between [Swaths].`LoTrack1` And [Swaths].`HiTrack1` And [POSTPLOT].`Bin` Between [Swaths].`LoBin1` And [Swaths].`HiBin1` When you build such a query, you are cautioned to add a join to the table, but you don't have to. Trust me! And here is the same query for the Preplot table: Select [PREPLOT].*,[Swaths].* From [PREPLOT],[Swaths] Where [PREPLOT].`Track` Between [Swaths].`LoTrack1` And [Swaths].`HiTrack1` And [PREPLOT].`Bin` Between [Swaths].`LoBin1` And [Swaths].`HiBin1` And finally, here is how you can isolate only one swath: Select [PREPLOT].*,[Swaths].* From [PREPLOT],[Swaths] Where [PREPLOT].`Track` Between [Swaths].`LoTrack1` And [Swaths].`HiTrack1` And [PREPLOT].`Bin` Between [Swaths].`LoBin1` And [Swaths].`HiBin1` And [Swaths].`Swath` = 4

Table To Table Appends Here is a topic that pertains to importing of data from other databases and moving data from one table to another table in the same database. If you are importing data and choose MDB Table you will be prompted for the database to import from, the table in that database to append from, and the table in the current table to append to. The field structures between the two tables must be an exact match. It doesnt matter what the names are, but the field types have to agree in type (an in the case of string fields, length) and order. The same holds true for table to table appends within the same database. In this case, you would choose Append/Cre ate Table (wo/Mapping) from the Modifications menu and again, the field types and order must match exactly between the two.

Now, what do you do when tables dont match. This is probably more common than when they do. Fortunately, there is a mechanism for both importing tables and appending table data within the same database, even if they dont match. We will look at appending from one table to another within the same database although the technique can be used in an almost identical fashion when importing.

At left is the dialog displayed when you append data with mapping. You select a table on the left which is the from table and select a table on the right which is the to table. The next step is to select a field on the left and right and press the Map button. This places a number at the end of the from field. This number indicates what field in the list on the right that the data will be placed on. The auto-map button will simply look at the field names and automatically append the field number based on a name match. This might be right or wrong. The user should always check this. An UnMap button removes the field assignment number.

Re-Calculating Offsets Only a quick mention about recalculating offsets. First why would you have to do this? I can think of two reasons. The first is that when you processed a data collector file in QuikView, there were no preplots and thus no offsets. The other is that you had the wrong reference azimuth in QuikView so the offsets might be present but are incorrect. There are three methods to compute offsets. If you base the new offsets on a grid definition file, you select a grid definition file and from this GPSQL computes the theoretical preplot location for each station based on the station value. This theoretical preplot and the actual survey coordinates are used to come up with the offsets. The other two methods require that you specify the preplot table from where a match to the postplot can be found. The two methods require a reference azimuth. One option requires you enter this and the other will obtain it from a grid definition file you specify.

Proximity Utilities These utilities have been around for a while. In general, they are used to specify either one or two queries so that you can find points that are within some user entered distance of another. Im only mentioning these utilities because in the case of the two-query proximity tests, an option was recently added to find the nearest single point. This could be useful in a situation where you have a query that defines hazard points and you need to find the one nearest survey point to each. In this example, what you would do is to run the two query proximity routine (2D or 3D) and select the hazard points for the first query and survey points for the second. On the small dialog that comes up first that allows you to select the two queries, the first is on the left, the second is on the right. When you get to the field selection dialog, you want to enter a large threshold value and check the nearest only check box which is a new option on this dialog. Note that if there is no point within the threshold then no record is written. This means that you want a fairly large threshold if this option is utilized. One final note. When the first field selection dialog comes up, you might want to replace Station (value) with Station (text) if your hazard points have a distinctive alphanumeric name but values of zero. It will aid in the identification of the hazard.

You might also like