Professional Documents
Culture Documents
Vijay Kanabar
Boston University
Ron McFadyen
University of Winnipeg
Table of Contents 2
TABLE OF CONTENTS
TABLE OF CONTENTS.....................................................................................................................................2
Chapter 1.........................................................................................................................................................................1
SQL Architecture___________________________________1
Introduction..............................................................................................................................................................1
Client/Server Architecture.....................................................................................................................................1
Databases.................................................................................................................................................................3
The Relational Model............................................................................................................................................3
Structured Query Language (SQL)............................................................................................................................4
Chapter Two....................................................................................................................................................................6
Database Definition__________________________________6
The Library Database................................................................................................................................................6
Patron...................................................................................................................................................................8
Creating Tables.........................................................................................................................................................9
Altering a Table......................................................................................................................................................10
Dropping Tables.................................................................................................................................................10
Copying Tables...................................................................................................................................................11
Data Types.............................................................................................................................................................12
Predefined Types....................................................................................................................................................12
Numeric types.....................................................................................................................................................12
String types.........................................................................................................................................................13
SQL Data Types.....................................................................................................................................................13
The null value.....................................................................................................................................................14
Chapter Three................................................................................................................................................................15
Data Manipulation__________________________________15
Selecting Columns..................................................................................................................................................15
Selecting Rows ......................................................................................................................................................17
Arithmetic Operations.............................................................................................................................................18
Boolean Operators .................................................................................................................................................19
Special Operators for the WHERE clause ..........................................................................................................20
System Variables ..............................................................................................................................................23
Column Functions ..................................................................................................................................................23
Use of COUNT...................................................................................................................................................24
Ordering the Result ...........................................................................................................................................25
Grouping Data ......................................................................................................................................................26
Restricting groups using HAVING ....................................................................................................................28
Interaction of clauses in the SELECT .................................................................................................................29
Joins.......................................................................................................................................................................29
Outer Join...........................................................................................................................................................31
Nested Queries ......................................................................................................................................................32
Correlated Subqueries .......................................................................................................................................35
Exists..................................................................................................................................................................36
Use of ALL, ANY..............................................................................................................................................36
Set Operations........................................................................................................................................................37
Union.................................................................................................................................................................37
Difference .........................................................................................................................................................38
Chapter Four.................................................................................................................................................................39
Table of Contents 3
V i e w s__________________________________________45
Creating Views ......................................................................................................................................................45
Dropping views .................................................................................................................................................46
Advantages of Views..........................................................................................................................................46
Grouped Views .................................................................................................................................................46
Check Option ........................................................................................................................................................47
Chapter 1
SQL Architecture
Client/Server and Web Technology Architecture
Introduction
There are two types of architectures used to develop SQL applications. The first category is
the traditional mainframe application and the second is the client/server architecture.
2) Client/Server Architecture: With client/server architecture, the DBMS system and user
applications are partitioned into two or more logical segments or tiers. The first tier is the
front-end or client, which is responsible for the Graphical User Interface (GUI) and some
of the application business logic. The second tier (last tier) functions as the back-end
database server. This is typically a powerful computer and hosts the DBMS. A key
component of the client/server architecture is middleware. Middleware is software “glue”
that ties a client and server together. The partition of the client and database server
represents a functional division of work and also included a division of hardware and
software components amongst the various tiers.
Client/Server Architecture
1. One process plays the role of the client, the other plays the role of the server
The first generation client/server systems were two tiered architectures. Such architectures tend to
lose their efficiency when scaled up to support large-scale computing environments that service
hundreds of clients. Scalability is poor because the business logic (application program
components) resides in the front-end. This creates heavy network traffic between the client and
database server. Three tiered client/server architectures overcome this limitation.
1
INTRODUCTION 2
With three tiered architectures an application is typically divided into three logical components. The
client front-end is only responsible for the GUI and does no application programming. The
intermediate layer can also support a variety of services. For example the application server is
typically on the intermediate layer. Such a server typically shares application logic with other
servers or clients.
Since this book deals with the SQL-environment of typical client/server architectures we
introduce its key components next. An SQL-environment comprises:
One SQL-agent.
One SQL-implementation.
SQL-Agent: An SQL-agent is that which causes the execution of SQL-statements. In the case
of the direct invocation of SQL it is implementation-defined. Alternatively, it may consist of one
or more compilation units that, when executed, invoke externally invoked procedures in an
SQL-client module.
Executes SQL-statements received from the SQL-client, receiving and sending data as
required.
Maintains the state of the SQL-session, including the authorization identifier and certain
session defaults.
Databases
Most organizations today store their data resource in a database. So then, what is a database? A
database is a central reservoir for all data. A computerized database system facilitates sharing of
data (and information). Consider a library users database. Any branch of the same library should
be capable of checking your book in or out. They should also be in a position to answer questions
such as “How much do I owe in fines?” or “Is this book available?” In order to respond to such
questions the library database should be available to all the employees at all the branches.
Databases are queried and updated frequently - new data inserted, old removed, and existing data
changed; we use a Data Base Management System (DBMS) to facilitate these tasks. It enables us
to store, retrieve and manipulate the database.
The relational model is almost the exclusive choice for implementation and use on a
microcomputer. The model is simple, flexible and easy to use. We consider a database to be a
collection of tables. Each table is a two dimensional construct of rows and columns. The
mathematical concept comparable to the term table is relation--hence, the name relational
database.
Consider the sample table shown below - the first book has a call number 100, title Physics
Handbook and subject category Physics. The information for each book is stored in rows; rows are
sometimes referred to as records, and more formally in database terminology, rows are called
tuples. The columns define fields: Title field, Subject field, etc. A field is also called an attribute.
INTRODUCTION 4
In the relational model each row should have a unique identifier - a primary key, for example,
callno, is a unique identifier in the table above. When one gives a specific value for the
primary key, it identifies a single row. Generally speaking, one does not put information into a
database unless it is possible to recall it later. Each book, and each patron, of the library is
uniquely identified.
To define and access information in the database (to retrieve and update the data) we need a
language for expressing our requests. In this book we discuss the Structured Query
Language (SQL) for accessing data in a relational database.
Every DBMS has a data sub-language embedded in its architecture. This data sub- language is used to
communicate with the database. On the basis of the types of operations performed, the data sub language can
be classified into a data definition language DDL), a data manipulation language (DML), and a data control
language (DCL).
The DDL commands facilitate the creation and description of a database, the DML commands deal with
manipulation of the data, and the DCL commands are used to specify security constraints.
SQL is based on a data language initially designed for the IBM prototype relational database system called
System R. SQL is often pronounced sequel. SQL has been standardized by ISO and will continue to be
enhanced; when appropriate we make references to the standard as ISO SQL. It has been adopted as an
industry standard for relational database systems and such systems appear on microcomputers,
minicomputers, and mainframes. In some cases the same product is available on all these types. The number
of SQL database systems is enormous and growing; we cannot list all of them here, but the list includes IBM’s
SQL/DS, and DB2, Relational Technologies’ INGRES, Oracle Corporation’s Oracle, Cincom’s SUPRA, Ashton-
Tate’s dBASE IV.
2.DML: data manipulation language. DML is concerned with the retrieval and update of the
database. Example: Select * from syntax.
3.DCL: data control language. DCL is used to specify who can access data in the database
and what operations they can perform. Example: Grant Table syntax.
We have introduced the relational model. A relational database consists of tables; tables, of
course, are a very simple data structure which everyone is capable of understanding. Every
database system requires a language for accessing data; the language discussed in this text is
SQL. We discussed the term primary key. Every table should have a primary key if we are to
access the information in our database. A discussion of primary keys would lead to another
topic, structural integrity, which is part of the relational model. This is discussed in a later
chapter. Implementations of primary key integrity and referential integrity are beginning to
appear in commercial systems, but such aspects of the relational model are not currently part
of ISO SQL.
Chapter Two
Database Definition
The database consists of tables; each table is a two dimensional structure of rows and columns. A
row is sometimes referred to as a record and a column is sometimes referred to as a field (these
terms are carried over from data processing). The DDL is used to define (create) tables, views,
and indexes.
We create tables in accordance with our data management needs; as our needs grow it may be
necessary to add new fields to existing tables.
A view is a virtual table. It is a table defined in terms of other tables; it does not occupy any real
space in the database. The view mechanism is very powerful and useful, providing us a way to
simplify a user’s perception of the database or for providing some types of security.
Indexes provide the database system with efficient access paths to rows of tables. We create
indexes in order to improve the efficiency of our requests for data.
In this chapter we concentrate on the CREATE TABLE command for creating tables, and leave the
discussion of views and indexes to subsequent chapters.
To illustrate features of SQL we employ a sample database consisting of three tables used in a
library environment.
BOOK
6
DATA DEFINITION 7
PATRON
LOAN
Book
Loan
the loan id (loanid) key that uniquely identifies the loan row.
the date the book was returned (dateret); no value is assigned until the book is
returned
the fine is calculated for all books returned late, at the rate of 10 cents per day (no
value is assigned until the book is actually returned)
the attribute paid; if it’s value is yes it indicates that the fine has been paid.
The LOAN table is illustrated in Figure 2.1. You will note three separate cases:
Example 1: A book has not been returned. Fine accrues and has not been paid.
Example 2: Case where the book was returned before the due date. No Fines.
Example 3: The book was returned late. A fine was assessed, and paid.
Patron
We refer to users of the library as patrons; for each patron of the library we record information
in the PATRON table:
DATA DEFINITION 9
PATRON: userid name age
their name
their age
This database represents a subset of the information requirements of any library. The contents of our library
database is illustrated below:
Creating Tables
To create a table one uses CREATE TABLE command. The general syntax of CREATE TABLE
is:
In the above we have defined three columns for the BOOK table as CALLNO, TITLE,
SUBJECT. Call numbers are defined as numbers, titles and subjects are defined as character
strings of lengths 20 and 13 respectively. The specification of data type is required and is
restrictive. Once the book table is defined as above we could not have a book with call
number QA76.9 since QA76.9 is not a number. For book titles and subjects we must refer to
specific values by enclosing them in quotes, such as ‘Introduction to Database Systems’.
Suitable definitions for the PATRON table and the LOAN tables would be:
AGE NUMBER(2),
CONSTRAINT PATRON_PRIMARY_KEY PRIMARY KEY
(USERID));
Altering a Table
Most SQL systems include a command to allow one to include new columns in an existing table.
For instance, to add a new column for addresses to the PATRON table one would execute a
command such as:
Example:
The ALTER TABLE command can typically be used for other purposes such as changing the
column length or adding a property such as NOT NULL (these may only be allowed in very
restrictive cases).
Dropping Tables
To remove or drop a table from the database one uses the DROP TABLE command. To remove
the BOOK table one executes a command such as:
DATA DEFINITION 11
DROP TABLE book ;
This command is very powerful; all rows in the table are deleted and the table object
disappears.
Copying Tables
Some SQL systems, such as Oracle, allow the user to create a new table from an old one using
CREATE TABLE with a subquery. Queries (SELECT statements) are discussed in the next
chapter. To create a new table SENIORS (age 65 or older) from the PATRON table we could
execute the command:
AS
NAME VARCHAR2(14)
AGE NUMBER(2)
The following rows are automatically inserted into the Seniors table:
25 Das 67
35 Smith 72
The table inherited the columns of the PATRON table since we did not name any. In some
cases we would need to name the columns. For instance, from the LOAN table we could
derive a table giving the call number, user id, and the number of days the book was kept:
FROM loan;
Data Types
Every data value belongs to some data type. ISO SQL has several data types: character,
numeric, decimal, integer, small integer, float, real, and double precision. A SQL system may
not provide all these types, and may include others. Most systems provide additional types
that really are necessary (SQL might be considered deficient on this) such as MONEY, DATE,
and TIME.
Some SQL systems simplify the choices; Oracle, for instance, places all the numeric types
-numeric, integer, small integer, float, real, double precision - into one category: NUMBER. It
is best, when using a SQL system to refer to its documentation to determine the data types
available.
Predefined
Constructed
User-defined.
A user-defined data type is a schema object. A predefined data type is a data type provided by
the SQL-implementation. A data type is predefined even though the user is required (or
allowed) to provide certain parameters when specifying it (for example the precision of a
number).
A predefined data type is atomic. An atomic type is a data type whose values are not
composed of values of other data types. The existence of an operation (SUBSTRING,
EXTRACT) that is capable of selecting part of a string or datetime value does not imply that a
string or datetime is not atomic.
A constructed type is either atomic or composite. A composite type is a data type each of
whose values is composed of zero or more values, each of a declared data type.
Predefined Types
Numeric types
There are two classes of numeric type: exact numeric, which includes integer types and types with
specified precision and scale; and approximate numeric, which is essentially floating point, and for
DATA DEFINITION 13
which a precision may optionally be specified. Every number has a precision (number of digits),
and exact numeric types also have a scale (digits after the radix point). Arithmetic operations may
be performed on operands of different or the same numeric type, and the result is of a numeric
type that depends only on the numeric type of the operands. If the result cannot be represented
exactly in the result type, then whether it is rounded or truncated is implementation-defined. An
exception condition is raised if the result is outside the range of numeric values of the result type, or
if the arithmetic operation is not defined for the operands.
String types
A value of character type is a string (sequence) of characters drawn from some character
repertoire. The characters in a character string S are all drawn from the same character set CS. If
S is the value of some expression E, then CS is the character set specified for the declared type of
E.A character string type is either of fixed length, or of variable length up to some implementation-
defined maximum. A value of character large object (CLOB) type is a string of characters from
some character repertoire and is always associated with exactly one character set. A character
large object is of variable length, up to some implementation-defined maximum that is probably
greater than that of other character strings.
Either a character string or character large object may be specified as being based on a specific
character set by specifying CHARACTER SET in the data type; a particular character set chosen
by the implementation to be the national character set may be specified by specifying NATIONAL
CHARACTER, NATIONAL CHARACTER VARYING, or NATIONAL CHARACTER LARGE
OBJECT (or one of several syntactic equivalents) as the data type.
A value of bit string type is a string of bits (binary digits). A bit string type is either of fixed length, or
of variable length up to some implementation-defined maximum.
A value of binary string type (known as a binary large object, or BLOB) is a variable length
sequence of octets, up to an implementation-defined maximum.
Boolean type
A value of the Boolean data type is either true or false. The truth value of unknown is sometimes
represented by the null value.
Datetime types
There are three datetime types, each of which specifies values comprising datetime fields. A value
of data type TIMESTAMP comprises values of the datetime fields YEAR (between 0001 and
9999), MONTH, DAY, HOUR, MINUTE and SECOND. A value of data type TIME comprises
values of the datetime fields HOUR, MINUTE and SECOND. A value of data type DATE
comprises values of the datetime fields YEAR (between 0001 and 9999), MONTH and DAY. A
value of DATE is a valid Gregorian date. A value of TIME is a valid time of day.
TIMESTAMP and TIME may be specified with a number of (decimal) digits of fractional seconds
precision.
DATA DEFINITION 14
TIMESTAMP and TIME may also be specified as being WITH TIME ZONE, in which case every
value has associated with it a time zone displacement. In comparing values of a data type WITH
Interval types
A value of an interval type represents the duration of a period of time. There are two classes of
intervals. One class, called year-month intervals, has a datetime precision that includes a YEAR
field or a MONTH field, or both. The other class, called day-time intervals, has an express or
implied interval precision that can include any set of contiguous fields other than YEAR or MONTH.
Reference types
A reference type is a predefined data type, a value of which references (or points to) some site
holding a value of the referenced type. The only sites that may be so referenced are the rows of
Every data type includes a special value, called the null value, sometimes denoted by the
keyword NULL. This value differs from other values in the following respects:
— Since the null value is in every data type, the data type of the null value implied by the
keyword NULL cannot be inferred; hence NULL can be used to denote the null value only in
certain contexts, rather than everywhere that a literal is permitted.
ISO SQL specifies two additional properties that can be specified for a column: NOT NULL,
and UNIQUE. If we specify NOT NULL for a column, then whenever a row is inserted or
updated that column must have a value assigned to it. Null is a special term which applies
when a field has never been assigned a value; its value is unknown or missing; null is not the
same as blank characters or zero value.
If we specify UNIQUE for a column then every row in the table must have a unique value in
that column. DB2 does not allow the UNIQUE property to be specified in the table definition;
but one can achieve this by creating a unique index (see the chapter on indexing). We shall
not use UNIQUE in our examples.
Chapter Three
Data Manipulation
The SELECT command is primarily used to retrieve data from the database. It is also used
when creating a copy of a table, creating views, and can be used to specify rows for updating.
In this chapter we concentrate on its use for retrieving data.
FROM table-list
WHERE field-expression
GROUP BY group-fields
HAVING group-expression
ORDER BY field-list;
The result of the SELECT is a listing of data derived from some set of tables in the database.
The “field-list” specifies the fields to be listed, such as USERID, NAME, AGE. The data listed
is obtained from the set of tables (table-list) specified in the FROM clause. The “field-
expression” in the WHERE clause specifies a Boolean expression that rows in the “table-list”
must satisfy to be included in the listing. The GROUP BY clause is used when we wish to
summarize information in the underlying tables. For example, GROUP BY subject causes
the rows to be organized into groups, one group for each unique value of the subject field. The
HAVING clause is used to specify which groups are to be included. ORDER BY is used to
sequence the rows of the listing. The WHERE, GROUP BY, HAVING, and ORDER BY
clauses are optional.
Selecting Columns
TITLE
Physics Handbook
Database Systems
HTML
XML
Software Testing
E-Commerce
6 rows selected.
When an SQL system executes the above example it accesses the BOOK table.
Since the SELECT does not involve a WHERE clause all rows of BOOK are
accessed and from each row the system extracts and displays the title. There
are as many rows in the result as there are rows in BOOK; the result are
shown above.
Example 3.2 List the title and subject for each book.
TITLE SUBJECT
Physics Handbook Physics
Database Systems Computing
HTML Computing
XML Computing
Software Testing Computing
E-Commerce Business
6 rows selected.
FROM book;
b) SELECT *
FROM book;
6 rows selected.
SUBJECT
Physics
Computing
Computing
Computing
Computing
Business
6 rows selected.
The answer to this query has as many lines as there are books in the library. Since there are
very few subject areas the result appears awkward. To remove the redundancies from the
display we can use the DISTINCT key word; it causes redundant rows to be eliminated:
SUBJECT
Business
Computing
Physics
Selecting Rows
In the previous section we were concerned with listing one or more fields from a table; every row of the table
corresponded to a line in the listing. Now we consider retrieval of a subset of the rows of a table. To limit the
retrieval to specific rows of a table we include the WHERE clause in our commands. The WHERE clause gives
DATA MANIPULATION 18
the condition that a row must satisfy to be included; any row of the table not satisfying the condition is not
considered. In the following we discuss operators that may appear in conditions:
TITLE
Database Systems
HTML
XML
Software Testing
The condition subject = ‘Computing’ is evaluated for each row of BOOK. If the
expression evaluates to true then the row contributes to the result.
TITLE
Database Systems
Arithmetic Operations
The standard operators +, -, *, / are available for addition, subtraction, multiplication and division respectively.
These can be applied to the fields being retrieved or to fields within the WHERE clause. The use of
parentheses is encouraged to force, or clarify, the order of computation. Expressions in parentheses are
evaluated first. Unless parentheses or priorities dictate otherwise, an expression is evaluated from left to right. *
and / have equal priority; likewise + and - have equal priority. However, * and / have higher priority (hence
evaluated first) than + and -.
DATA MANIPULATION 19
Example 3.8 List loans where the fine is over 50 Canadian dollars.
SQL> SELECT *
2 FROM loan
3 WHERE (fine*1.5) > 50.00;
Boolean Operators
The condition specified in the WHERE clause can be a Boolean expression involving AND, OR,
and NOT. The priority of the Boolean operators from highest to lowest is NOT, AND, OR. Of
course parentheses can be used to clarify or force evaluation to be performed in a certain order.
CALLNO
500
600
DATA MANIPULATION 20
The results could be different if parenthesis is not used. (This is not evident in
our data set.)
LIKE is used when searching for the appearance of a particular character string.
IS NULL is used to test for a field not having been assigned any value.
IN is used to test for a field having a value contained in some set of values.
LIKE
LIKE is used with character data to determine the presence of a substring. Special notations
are available to specify unknown or irrelevant characters in the field being tested:
SQL> SELECT *
2 FROM book
3 WHERE title LIKE '%Database%';
In example 3.10 each BOOK title is examined to determine if it contains the character string
Database.
Example 3.11 List books with titles having an o as the second character.
SQL> SELECT *
2 FROM book
3 WHERE title LIKE '_o%';
BETWEEN
The between operator is used with numeric data to determine if some field lies in a certain
range.
Example 3.12 List books with call numbers between 200 and 400.
SQL> SELECT *
2 FROM book
SQL> SELECT *
2 FROM book
IS NULL
NULL is a keyword that must be used to determine whether or not a field has been assigned a
value. Note that in our database the date returned field is not assigned any value until a book
is returned. If a book is returned on time then no value is assigned to the fine or paid fields.
CALLNO
300
500
600
Example 3.14 List the books that have been returned by patron 100.
CALLNO
100
200
IS NOT NULL is the expression used to determine if a field of a row has been assigned a value. It
is not permitted to use the expressions DATERET = NULL, or DATERET <> NULL.
IN
The set of values used for comparison can be either explicitly specified or given as a subquery.
We illustrate the first case here and leave subqueries for a later section.
Example 3.15 List the names of patrons whose user id is 100, 200, 300, or
350.
NAME
Niall
King
Wong
The list of values used for comparison is enclosed in parentheses; the name of a patron is
listed if the userid is in the list.
TITLE
Database Systems
HTML
XML
Software Testing
E-Commerce
System Variables
SQL systems have system variables used to hold values of general use or interest. All systems
have the variable USER which holds the user id of the current logged on user. Special uses of
USER can be made if it also corresponds to a column of one or more tables in the database.
Certainly the system does make use of it to determine if the current user has the access privileges
pertinent to a request (see the section on DCL).
Systems like Oracle and DB2 have many other variables such as: CURRENT DATE, CURRENT TIME,
CURRENT TIMESTAMP, CURRENT TIMEZONE. Oracle has a variable SYSDATE which can be used to
determine the current time and/or date.
Example 3.17 List patrons who have outstanding books. We shall use
Oracle’s SYSDATE in this example. Any system that includes date as a data
type would likely permit arithmetic operations and comparisons.
USERID
10
10
Column Functions
SQL offers special functions, also called aggregate functions, to determine maximums, minimums,
averages, totals and counts for entire columns. These are MAX, SUM, AVG, and COUNT
respectively. Note that AVG and SUM are defined for numeric values only, and that their
computations exclude NULL values.
Example 3.18 What is the largest fine paid for an overdue book?
MAX(FINE)
DATA MANIPULATION 24
50
SUM(FINE)
110
Use of COUNT
ISO SQL specifies that COUNT can be used in just two ways. First, we can use COUNT(*) to
count the number of rows which satisfy a query. Second, we can use COUNT with DISTINCT to
count the number of unique values in a column. We illustrate this use of COUNT in the next
section.
COUNT(*)
6
COUNT(FINE)
3
COUNT(*)
4
DATA MANIPULATION 25
This use of COUNT, counting rows, does not involve any elimination of null values. That is, if a row
has all null values it will be counted.
Use of Distinct
ISO SQL permits the specification of DISTINCT with an aggregate function; the effect is to remove
duplicate values prior to the function being applied.
COUNT(DISTINCTSUBJECT)
----------------------
3
COUNT(DISTINCTUSERID)
---------------------
4
Whenever DISTINCT is used, the argument must be a simple field reference as shown above.
The ORDER BY clause is used to force the result to be ordered by one or more column values in
either ascending or descending order.
6 rows selected.
The default is ascending order; this could be specified explicitly by coding ORDER BY title ASC.
Instead of specifying a field name we can refer to it indirectly by its relative position in the field-list.
For the above we could have used: ORDER BY 2.
Example 3.26 List books in subject order and within subject order them by
call number.
SQL> SELECT *
2 FROM book
3 ORDER BY subject ASC, callno DESC;
6 rows selected.
Note that the result is ordered first by subject; biology is first, followed by business, then
computing and finally mathematics. Within each group the ordering is by call number.
Grouping Data
The GROUP BY clause is used to specify one or more fields that are to be used for organizing
rows into groups. Rows are grouped together that have the same value(s) for the specified field(s).
The only simple fields that can be displayed are the ones used for grouping; any result from other
fields must be specified using a column function. The column function will be applied to a group of
rows instead of to the entire table.
Example 3.27 For each patron list the number of books he has borrowed.
USERID COUNT(*)
10 2
20 1
30 1
40 1
2.display the user id for each group and a count of the number of rows in the group
A common error is to include a field in the listing that is not unique for the group. For example,
2 FROM loan
3 GROUP BY userid;
ERROR at line 1:
The above would be incorrect because callno is not single valued for a group. When GROUP
BY is used each element of the select list must be single valued; each element must either be
specified in the GROUP BY or be the result of a column function.
Example 3.28 For each patron list his total fines paid.
2 FROM loan
3 GROUP BY userid;
USERID SUM(FINE)
10 0
20 50
30 50
40 10
In the above we see some patrons have not paid a fine but are still listed. To exclude rows from
the grouping process (and from the SELECT) we specify the appropriate WHERE clause.
Example 3.29 For each patron who has paid a fine, list his total.
USERID SUM(FINE)
20 50
30 50
40 10
The only difference between the two preceding examples is that the latter does not report anything
for patrons who have not paid any fines.
In the foregoing examples all groups have been reported. To eliminate groups from the result we
use the HAVING clause specifying an appropriate group oriented boolean expression.
Example 3.30 List the patron id’s for those who have paid more than $30 in
fines on books with call numbers greater than 400.
USERID
30
DATA MANIPULATION 29
The previous example has used all components of the SELECT command. It is important to
understand the order in which the above SELECT is processed.
HAVING third.
The rows of LOAN are accessed; each row of LOAN which satisfies the WHERE clause is retained
for further processing (the grouping). So only those rows of LOAN which correspond to books with
call numbers larger than 400 are retained. These rows are organized into groups according to the
specifications in the GROUP BY clause. In our case the rows are grouped by userid. When a
HAVING clause is present each group must satisfy the condition in order to be displayed in the
result; in our case each group must have a total for fine that is greater than $30.
Joins
A SELECT which performs a join is one which specifies more than one table in the FROM clause.
That is, it corresponds to a situation where it is necessary to retrieve information from more than
one table.
Example 3.31 Suppose we wish to list the names of patrons, their id’s, and the call numbers
of the books they have borrowed. There are two tables we use to get the information: PATRON
and LOAN. What we need to do is specify that a row of PATRON should be matched with a row of
LOAN whenever they have the same value for the user id field. Note that one row of PATRON can
be matched with many rows of LOAN since a patron is expected to borrow many books. With SQL
it is necessary for us to specify this join condition explicitly:
Note that we have used prefixes for our field specifications. This is only necessary whenever
there would be ambiguity regarding the table where the field appears. In this case prefixes are
only necessary for the user id field.
DATA MANIPULATION 30
If we forgot to include the clause
our SELECT would produce a result referred to as a cartesian product. For our sample
database we would have 7*5=35 rows in the result, since PATRON has 7 rows and LOAN has
5 rows. Without the WHERE clause each row of PATRON would be joined with every row of
LOAN.
Example 3.32 Suppose we wish to list the names of patrons who have
books out on loan. The information needed to do this is found in two tables:
LOAN contains the record of books loaned out (specifically we are interested in
those rows where the DATERET column has no value) and PATRON contains
the name for each patron. We need to match up a selected row of LOAN with
the pertinent row of PATRON; this is done by requiring them to have the same
value for the USERID field.
NAME
DATA MANIPULATION 31
King
Niall
Jones
Aliases
Sometimes it is necessary and often convenient to use an alias (an alternate name) for a table.
An alias is specified in the FROM clause immediately following the actual table name. In the
next example we need the alias - a table is joined with itself - a self-join. It may be useful to
think of the join being performed between two copies of the same table.
Example 3.33 List each patrons name and the number of other patrons
older than him/her. In the following, one “copy” of the PATRON table is
referred to as A; the other as B.
SQL> SELECT a.name, COUNT(*)
2 FROM patron a, patron b
3 WHERE a.age < b.age
4 GROUP BY a.userid, a.name;
NAME COUNT(*)
Wong 4
Colin 3
King 5
Das 1
Niall 6
Jones 2
Note the GROUP BY clause in the example; it illustrates an important point regarding the
elements listed by the query. We have grouped the response using two fields: USERID and
NAME. In the SELECT clause we have included NAME and COUNT(*). If we grouped only
by USERID, then NAME would not be considered by SQL to be single-valued for each group.
As far as SQL is concerned, NAME could have a different value in each row of a group.
Everything included in the SELECT must be single-valued for a group when GROUP BY is
used.
Outer Join
ISO SQL does not provide for the outer join operation, but it is a useful form of join for many
situations. Some systems (such as Oracle) provide the outer join and other systems (such as
DB2) do not. Let us examine the very simple database request
For each book in the library list its title and the number of times it has been loaned out.
TITLE COUNT(*)
Physics Handbook 1
Database Systems 1
HTML 1
Software Testing 1
E-Commerce 1
This query does give us correct counts for those books that have been loaned out, but nothing is
listed for any book that has never been loaned out at all. Such books will not be matched to any
row of LOAN and therefore do not contribute to the result.
Let us consider now how Oracle and the outer join can be used to formulate a simple and correct
query. In the WHERE clause where we specify the join condition (“b.callno=l.callno”) Oracle allows
us to place a “(+)” to designate a column which is to be matched with an imaginary row of null’s if
there are no matches otherwise. Instead of specifying COUNT(*) we specify COUNT(l.userid):
TITLE COUNT(L.USERID)
Physics Handbook 1
Database Systems 1
HTML 1
XML 0
Software Testing 1
E-Commerce 1
6 rows selected.
For the above, if a row of BOOK is not matched on call number to a row of LOAN then it will be
matched to an imaginary row where all column values are null. Hence every row of BOOK is
matched to something and the title and a count will be produced. Note that Oracle permits
COUNT(l.userid) to be specified without DISTINCT. Oracle’s COUNT(...) ignores null’s and
produces a value of zero when a book is matched to a row of NULL’s.
Nested Queries
SQL allows us to nest one query inside another, but only in the WHERE clause and the HAVING
clause. ISO SQL permits a subquery only on the right hand side of an operator. The operators
available are IN, EXISTS, and the relational operators =, <>, >, >=, <, <=. When a subquery will
return just a single row (normally a single field value) SQL allows the use of a relational operator.
DATA MANIPULATION 33
When a subquery returns more than one row (a set of rows) a relational operator must be used
with ALL or ANY.
Use of IN
The IN operator is used to determine if some value is present in a list of values. That list can
be explicitly specified (covered previously) or generated by a subquery.
Example 3.34 List names and ages of patrons who have books out on loan.
We shall proceed by giving two separate queries to accomplish this and then
we specify it as one nested query. Consider the query
USERID
20
30
40
NAME AGE
Jones 41
Niall 17
King 21
These two queries can easily be combined; we replace the list (100,250,400) with the first
subquery:
3 WHERE userid IN
4 (SELECT userid FROM loan
5 WHERE callno IN
6 (SELECT callno FROM book
7 WHERE subject IN
8 ('Business','Computing')));
NAME
Wong
King
Niall
Jones
Let’s analyze the above SELECT; it is composed of three queries. The innermost query is a
simple one and we can consider it separately. That is, we can consider that the database system
replaces it by a list of call numbers that have been retrieved from the BOOK table. The “middle”
query retrieves user id’s where the person has borrowed a book in the computing or history
categories. The outer query lists the names of these people.
We use two examples to illustrate the use of the comparison operators with subqueries. To the
beginning SQL user, the need for subqueries in these cases is not obvious.
Example 3.36 Who has paid the largest fine? This is such a simple
question, but unfortunately it is not trivial to formulate the correct SQL query.
One may be first tempted to express the query as
However, this is not correct since MAX is a function which operates on the
whole table and hence is single-valued and userid is multi-valued - there are
multiple values, one for each row. We cannot mix these two types of
expressions.
MAX(FINE)
50
DATA MANIPULATION 35
This query becomes our nested query and all that we need to do is compare each fine to this
maximum value:
USERID
20
30
The ‘equals’ is permitted for testing with respect to the subquery since it is known that the
subquery returns just one value.
NAME
Smith
Correlated Subqueries
The previous subqueries have been simple in that there was no “interaction” between rows of the
outer table and rows of the inner table.
Example 3.38 Suppose we wish to list the names of patrons who have
borrowed more than one book. The PATRON table has the names of patrons
and we need to access the relevant rows of LOAN for each patron to determine
how many books have been borrowed.
1 SELECT name
2 FROM patron
3 WHERE 2 <= (SELECT COUNT(*)
4 FROM loan
5 WHERE loan.userid = patron.userid)
NAME
DATA MANIPULATION 36
--------------
Wong
We can think of the subquery being executed once for each row of PATRON. For each row of
PATRON the loan table is accessed and a count made of the number of rows retrieved. If
that count is greater than or equal to 2 then the patrons name is listed.
Exists
Sometimes one is only concerned with whether or not a subquery retrieves any rows. The
EXISTS operator evaluates to true or false according to whether a subquery retrieves at least
one or no rows respectively.
TITLE
Physics Handbook
Database Systems
HTML
Software Testing
E-Commerce
In this example a book title is listed only when the subquery retrieves one or more rows. Note that
Database Systems is not listed. To list the books which have not been borrowed at all one just
replaces the EXISTS above by NOT EXISTS.
These two qualifications can easily lead to difficulties and so we have left them
till last. Any of the comparison operators: =, <, >, <>, <=, >=, can be used
with ANY or ALL to test a specific value against a list of values. The following
outlines alternative approaches/operators.
original alternative
= ANY IN
<> ANY NOT IN
>= ALL = ... MAX(..) ...
<= ALL = ... MIN(..) ...
DATA MANIPULATION 37
Example 3.40 Determine the name of the oldest library patron. This was
done previously in example 37. Using >=ALL we can reformulate the query as
NAME
Smith
Set Operations
ISO SQL provides the UNION operator to combine the results of two SELECTs. Some SQL
implementations include DIFFERENCE or MINUS to subtract one result from another, and
INTERSECTION to determine the rows in common for two results
Union
These operators require their operands to be union compatible. This means that the columns
of one select must agree in number and type with the other select. The results of two
SELECTs can be combined using UNION.
Example 3.41 List the call numbers of books borrowed by user 200, and
users with id’s larger than 300.
CALLNO
300
600
The result of the above is the union of two sets - the union of the result of the two SELECTs.
SQL permits any number of SELECTs to be combined with UNION. Note that redundant rows
in the result were eliminated; if this is not desired then one specifies UNION ALL.
DATA MANIPULATION 38
Difference
CALLNO
400
Intersection
Example 3.43 Suppose we wish to determine which computing books have
been borrowed. We can proceed as above using two SELECTs, but now we
want to determine what rows (call numbers) the two queries have in common:
Modifying Rows
SET field-assignments
WHERE condition
The field assignments are of the form field = expression and are used to assign specific
values to the fields of a row. The expressions must be of the appropriate type for the data type
of the corresponding column. These expressions cannot be subqueries or involve the
aggregate operators (AVG, COUNT, ...). The WHERE clause is optional; if absent then the
UPDATE applies to all rows.
DATA UPDATING 40
7 rows updated.
Note that every row of PATRON is modified since there is no WHERE clause. Each row of
PATRON is modified according to the field assignments. There is only one field assignment:
AGE=AGE+10. The effect of this is to cause the current value of age in a row to be
incremented by 10; the value of this expression becomes the new value of the AGE field for
the row.
Example 4.2. Determine fines for books at the rate of one cent a day. In this example we
want to modify only the Loan row for userid 30.
aSQL> edit
Wrote file afiedt.buf
1 UPDATE loan
2 SET fine = (datedue - SYSDATE) * 0.01
3 WHERE dateret is NULL
4* AND userid = 30
SQL> /
1 row updated.
SQL> SELECT *
2 FROM loan
3 WHERE userid = 30;
Deleting Rows
DELETE
FROM table
WHERE condition;
The effect of DELETE is to remove rows from a table; the rows deleted are those that satisfy
the condition specified in the WHERE clause. The WHERE clause is optional; if absent then
all rows are deleted.
2 rows selected.
In this section we illustrate how new rows are appended to a table. There are two forms of the INSERT command
a) INSERT
b) INSERT
subquery;
The first form is used to insert a single row in a table; the second form is used to insert multiple rows that come
from one or more existing tables. If all fields are included and the values given in order, then the field list can be
omitted. Example 4.5 illustrates how one adds one row to a database.
8 rows selected.
DATA UPDATING 43
This method is clumsy and some other approach is needed if one is to add
several rows to a table. SQL systems typically provide a forms-based
interface, or a load utility, for entering many rows or to load a database.
Chapter Five
Views
All of our discussions to this point have dealt with tables. Views are similar to tables;
in many cases they can be used interchangeably. A view is defined using the
CREATE VIEW command and is defined in terms of tables or other views. A view
exists in definition only; it does not occupy space in the database as a table does; it is
materialized by the database system whenever it is used or referenced. There are
two reasons for using views: convenience and security.
Creating Views
The CREATE VIEW command is used to create a view; it has the general syntax:
AS select-specification
The column-specification-list is the list of columns that appear in the view. This
can be omitted when each column in the view is derived directly and unambiguously
from a column of an underlying table. The select-specification is the SELECT
statement which defines the view. The WITH CHECK OPTION is optional and is
concerned with security aspects of the view.
As an example, suppose we wanted to supply our patrons with a view of the LOAN
table consisting of the call number and date due columns for those books that are
currently out on loan. Consider the definition BOOKSONLOAN:
One can query the data using the SELECT syntax discussed in Chapter 3.
Dropping views
ISO SQL does not include a command to drop (remove) a view, but SQL systems
generally include a DROP statement for this purpose. So, to remove the view
OUT_BOOKS from the system one would execute:
Advantages of Views
The view mechanism of SQL is a very useful tool for providing a form of security in the
database system. A view represents a subset of some underlying tables. The users
perception is simplified and what may be sensitive data is excluded from a user’s view.
For example, it may be undesirable for people to know who has the book they want. If
somehow we can force users to use BOOKSONLOAN instead of LOAN then we have
excluded (hidden) the sensitive field userid in LOAN from them.
Grouped Views
A grouped view is one where the GROUP BY clause has been used in the definition.
Such views are very useful, but unfortunately they are subject to numerous
restrictions in ISO SQL. Consider the view:
View created.
ISO SQL does not allow one to use a WHERE clause, a GROUP BY clause, or a
HAVING clause with respect to a grouped view referenced in a SELECT. For
example,
is not acceptable since the SELECT references a grouped view and contains a
WHERE clause on that view. As well, if one of the tables/views in the FROM clause
of a SELECT references a grouped view then no other tables/views can be
referenced. For example,
is illegal since there are two tables/views in the FROM clause and one of these is a
grouped view. If a SELECT references a grouped view in it’s FROM clause, then an
associated WHERE, GROUP BY, or HAVING is not permitted. Some SQL
implementations are more liberal in their treatment of grouped views.
Check Option
The WITH CHECK OPTION phrase is optional, but certainly recommended if the
view could be used for updating the database. Regardless of whether or not the WITH
CHECK OPTION is specified the user of a view is restricted to retrieving only that
portion of the database specified in the view. However, that is not the case for the
update operations of INSERT and UPDATE. Consider the view
View created.
A SELECT command referencing this view can only access computing books, but it
is possible in the absence of WITH CHECK OPTION to insert non-computing books
or change a
1 row created.
and insert the history text into the database. This book, however vanishes from the
view; we cannot retrieve it using COMPUTING_BOOKS.
A similar situation exists for the UPDATE command. To avoid this kind of behavior
we just include the WITH CHECK OPTION when we define the view.
View created.
This option tells the system to check and reject updates (INSERTs and UPDATEs) for
rows that vanish from the view.
Some views are inherently non updatable. That is, for some views an update operation
does not make any sense and must be disallowed. For example, consider the update
operation
Updating the BOOK_COUNT field does not make any sense; it doesn’t even correspond
to a field in an underlying real table. What views can be updated sensibly, and which
cannot, is an area of current research in database systems. Consequently, SQL systems
are quite restrictive on what types of views can be updated. In general, a view is
updateable if it corresponds to a simple row and column subset of a single table. The
specific conditions that an updateable view must meet are:
In the future these restrictions will in all likelihood be relaxed; at the present
time some database systems have eased these restrictions. For instance,
DB2 allows a view to be updated even if the WHERE clause contains a
subquery as long as the FROM clause in the subquery and the “outer” FROM
clause reference different tables.
DATA UPDATING 49