You are on page 1of 69

Assignment No.

1
Title of Assignment:
Implement a system using multivalued Attributes and
Inheritance in ORDBMS.

Relevant Theory / Literature Survey:


ORDBMS Definition
An object relational database is also called an object
relational database management system (ORDBMS). This system
simply puts an object oriented front end on a relational
database (RDBMS). When applications interface to this type
of database, it will normally interface as though the data
is stored as objects. However the system will convert the
object information into data tables with rows and columns
and handle the data the same as a relational database.
Likewise, when the data is retrieved, it must be reassembled
from simple data into complex objects.

About Oracle Objects and Object Types

Oracle object types are user-defined data types that make it


possible to model complex real-world entities such as
customers and purchase orders as unitary
entities--"objects"--in the database.

Oracle object technology is a layer of abstraction built on


Oracle's relational technology. New object types can be
created from any built-in database types and any previously
created object types, object references, and collection
types. Metadata for user-defined types is stored in a schema
that is available to SQL, PL/SQL, Java, and other published
interfaces.

Object types and related object-oriented features such as


variable-length arrays and nested tables provide higher-
level ways to organize and access data in the database.
Underneath the object layer, data is still stored in columns
and tables, but you are able to work with the data in terms
of the real-world entities--customers and purchase orders,
for example--that make the data meaningful. Instead of
thinking in terms of columns and tables when you query the
database, you can simply select a customer.

Internally, statements about objects are still basically


statements about relational tables and columns, and you can
continue to work with relational data types and store data
in relational tables as before. But now you have the option
to take advantage of object-oriented features too. You can
begin to use object-oriented features while continuing to
work with most of your data relationally, or you can go over
to an object-oriented approach entirely. For instance, you
can define some object data types and store the objects in
columns in relational tables. You can also create object
views of existing relational data to represent and access
this data according to an object model. Or you can store
object data in object tables, where each row is an object.

Advantages of Objects

In general, the object-type model is similar to the class


mechanism found in C++ and Java. Like classes, objects make
it easier to model complex, real-world business entities and
logic, and the reusability of objects makes it possible to
develop database applications faster and more efficiently.
By natively supporting object types in the database, Oracle
enables application developers to directly access the data
structures used by their applications. No mapping layer is
required between client-side objects and the relational
database columns and tables that contain the data. Object
abstraction and the encapsulation of object behaviors also
make applications easier to understand and maintain.

Below are listed several other specific advantages that


objects offer over a purely relational approach.
• Objects Can Encapsulate Operations Along with Data
• Objects Are Efficient
• Objects Can Represent Part-Whole Relationships

Basic Components of Oracle Objects

Object-Relational Elements

Object-relational functionality introduces a number of new


concepts and resources. These are briefly described in the
following sections.

Object Types

An object type is a kind of data type. You can use it in the


same ways that you use more familiar data types such as
NUMBER or VARCHAR2. For example, you can specify an object
type as the data type of a column in a relational table, and
you can declare variables of an object type. You use a
variable of an object type to contain a value of that object
type. A value of an object type is an instance of that type.
An object instance is also called an object.

Object types also have some important differences from the


more familiar data types that are native to a relational
database:

• A set of object types does not come ready-made with the


database. Instead, you define the object types you
want.
• Object types are not unitary: they have parts, called
attributes and methods.

You can think of an object type as a structural blueprint or


template and an object as an actual thing built according to
the template.

Type Inheritance

You can specialize an object type by creating subtypes that


have some added, differentiating feature, such as an
additional attribute or method. You create subtypes by
deriving them from a parent object type, which is called a
super type of the derived subtypes.

Subtypes and super types are related by inheritance: as


specialized versions of their parent, subtypes have all the
parent's attributes and methods plus any specializations
that are defined in the subtype itself. Subtypes and super
types connected by inheritance make up a type hierarchy.

Objects

When you create a variable of an object type, you create an


instance of the type: the result is an object. An object has
the attributes and methods defined for its type. Because an
object instance is a concrete thing, you can assign values
to its attributes and call its methods.

Design Analysis / Implementation Logic:


Implementation:

Object Tables

An object table is a special kind of table in which each row


represents an object.

For example, the following statements create a person object


type and define an object table for person objects:

CREATE TYPE person AS OBJECT (


name VARCHAR2(30),
phone VARCHAR2(20) );

CREATE TABLE person_table OF person;

You can view this table in two ways:

• As a single-column table in which each row is a person


object, allowing you to perform object-oriented
operations
• As a multi-column table in which each attribute of the
object type person, namely name and phone, occupies a
column, allowing you to perform relational operations

For example, you can execute the following instructions:

INSERT INTO person_table VALUES (


"John Smith",
"1-800-555-1212" );

SELECT VALUE(p) FROM person_table p


WHERE p.name = "John Smith";

The first statement inserts a person object into person_table,


treating person_table as a multi-column table. The second selects
from person_table as a single-column table, using the VALUE
function to return rows as object instances.

Varrays

An array is an ordered set of data elements. All elements of a


given array are of the same data type. Each element has an
index, which is a number corresponding to the element's
position in the array.

The number of elements in an array is the size of the array.


Oracle allows arrays to be of variable size, which is why
they are called varrays. You must specify a maximum size when
you declare the array type.

For example, the following statement declares an array type:

CREATE TYPE prices AS VARRAY(10) OF NUMBER(12,2);

The VARRAYs of type PRICES have no more than ten elements,


each of datatype NUMBER(12,2).

Creating an array type does not allocate space. It defines a


datatype, which you can use as:

• The datatype of a column of a relational table.


• An object type attribute.
• The type of a PL/SQL variable, parameter, or function
return value.

A varray is normally stored in line, that is, in the same


tablespace as the other data in its row. If it is
sufficiently large, Oracle stores it as a BLOB.

A varray cannot contain LOBs. This means that a varray also


cannot contain elements of a user-defined type that has a
LOB attribute.

Nested Tables
A nested table is an unordered set of data elements, all of the
same datatype. It has a single column, and the type of that
column is a built-in type or an object type. If the column
in a nested table is an object type, the table can also be
viewed as a multi-column table, with a column for each
attribute of the object type.

For example, in the purchase order example, the following


statement declares the table type used for the nested tables
of line items:

CREATE TYPE lineitem_table AS TABLE OF lineitem;

A table type definition does not allocate space. It defines


a type, which you can use as

• The datatype of a column of a relational table.


• An object type attribute.
• A PL/SQL variable, parameter, or function return type.

When a column in a relational table is of nested table type,


Oracle stores the nested table data for all rows of the
relational table in the same storage table. Similarly, with
an object table of a type that has a nested table attribute,
Oracle stores nested table data for all object instances in
a single storage table associated with the object table.

For example, the following statement defines an object table


for the object type PURCHASE_ORDER:

CREATE TABLE purchase_order_table OF purchase_order


NESTED TABLE lineitems STORE AS lineitems_table;

The second line specifies LINEITEMS_TABLE as the storage table


for the LINEITEMS attributes of all of the PURCHASE_ORDER
objects in PURCHASE_ORDER_TABLE.

A convenient way to access the elements of a nested table


individually is to use a nested cursor.
Testing
• The object person is created with name & phone no
• Create the person_table using the object person
• Nested table purchase_order table with puchase_order
and lineitems is created.

Conclusion:
Multivalued attributes and inheritance in ORDBMS is
implemented.

Assignment No.2
Title of Assignment:
Implement K-Means Data Mining Clustering Algorithm.

Relevant Theory / Literature Survey: (Brief Theory Expected)


.

What is K-Means Clustering?

In simple words, it is an algorithm to classify or to group


your objects based on attributes/features into K number of
group. K is positive integer number. The grouping is done by
minimizing the sum of squares of distances between data and
the corresponding cluster centroid. Thus, the purpose of K-
mean clustering is to classify the data.

Step by step k means clustering algorithm


Step 1. Begin with a decision on the value of k = number of
clusters

Step 2. Put any initial partition that classifies the data


into k clusters. You may assign the training samples
randomly, or systematically as the following:
1. Take the first k training sample as single-element
clusters
2. Assign each of the remaining (N-k) training sample to
the cluster with the nearest centroid. After each
assignment, recomputed the centroid of the gaining
cluster.

Step 3 . Take each sample in sequence and compute its


distance from the centroid of each of the clusters. If a
sample is not currently in the cluster with the closest
centroid, switch this sample to that cluster and update the
centroid of the cluster gaining the new sample and the
cluster losing the sample.

Step 4 . Repeat step 3 until convergence is achieved, that


is until a pass through the training sample causes no new
assignments.

If the number of data is less than the number of cluster


then we assign each data as the centroid of the cluster.
Each centroid will have a cluster number. If the number of
data is bigger than the number of cluster, for each data, we
calculate the distance to all centroid and get the minimum
distance. This data is said belong to the cluster that has
minimum distance from this data.

Applications of K-mean clustering

There are a lot of applications of the K-mean clustering,


range from unsupervised learning of neural network, Pattern
recognitions, Classification analysis, Artificial
intelligent, image processing, machine vision, etc. In
principle, you have several objects and each object have
several attributes and you want to classify the objects
based on the attributes, then you can apply this algorithm.

Design Analysis / Implementation Logic:

Numerical Example of K-Means Clustering

The basic step of k-means clustering is simple. In the


beginning we determine number of cluster K and we assume the
centroid or center of these clusters. We can take any random
objects as the initial centroids or the first K objects in
sequence can also serve as the initial centroids.
Then the K means algorithm will do the three steps below
until convergence
Iterate until stable (= no object move group):
1. Determine the centroid coordinate
2. Determine the distance of each object to the centroids
3. Group the object based on minimum distance

Suppose we have several objects (4 types of medicines) and


each object have two attributes or features as shown in
table below. Our goal is to group these objects into K=2
group of medicine based on the two features (pH and weight
index).
Object attribute 1 (X):attribute 2 (Y): pH
weight index
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
Each medicine represents one point with two attributes (X,
Y) that we can represent it as coordinate in an attribute
space as shown in the figure below.

1. Initial value of centroids : Suppose we use medicine A


and medicine B as the first centroids. Let and denote
the coordinate of the centroids, then and

2. Objects-Centroids distance : we calculate the distance


between cluster centroid to each object. Let us use
Eculidian Distance, then we have distance matrix at
iteration 0 is
Each column in the distance matrix symbolizes the object.
The first row of the distance matrix corresponds to the
distance of each object to the first centroid and the second
row is the distance of each object to the second centroid.
For example, distance from medicine C = (4, 3) to the first

centroid is , and its distance to

the second centroid is , etc.


3. Objects clustering : We assign each object based on the
minimum distance. Thus, medicine A is assigned to group 1,
medicine B to group 2, medicine C to group 2 and medicine D
to group 2. The element of Group matrix below is 1 if and
only if the object is assigned to that group.

4. Iteration-1, determine centroids : Knowing the members of


each group, now we compute the new centroid of each group
based on these new memberships. Group 1 only has one member
thus the centroid remains in . Group 2 now has three
members, thus the centroid is the average coordinate among

the three members: .


5. Iteration-1, Objects-Centroids distances : The next step
is to compute the distance of all objects to the new
centroids. Similar to step 2, we have distance matrix at
iteration 1 is

6. Iteration-1, Objects clustering: Similar to step 3, we


assign each object based on the minimum distance. Based on
the new distance matrix, we move the medicine B to Group 1
while all the other objects remain. The Group matrix is
shown below

7. Iteration 2, determine centroids: Now we repeat step 4 to


calculate the new centroids coordinate based on the
clustering of previous iteration. Group1 and group 2 both
has two members, thus the new centroids are

and
8. Iteration-2, Objects-Centroids distances : Repeat step 2
again, we have new distance matrix at iteration 2 as

9. Iteration-2, Objects clustering: Again, we assign each


object based on the minimum distance.

We obtain result that . Comparing the grouping of last


iteration and this iteration reveals that the objects does
not move group anymore. Thus, the computation of the k-mean
clustering has reached its stability and no more iteration
is needed. We get the final grouping as the results
Object Feature 1 (X):Feature 2 (Y):Group (result)
weight index pH
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine D 5 4 2
Testing:

When User Click picture box to input new data(X,Y)the


program will make group/cluster the data by minimizing the
sum of squares of distances between data and the
corresponding cluster centroid. Each dot is representing an
object and the coordinates (X, Y) represents the two
attributes of the object. The colours of the dot and label
number represents the cluster.

Conclusion:
Thus grouped all the user data (X,Y)into three clusters by
minimizing the sum of squares of distances between data and
the corresponding cluster centroid.

Assignment No.3
Title of Assignment:
Design a Web-based application using ASP involving
Database.
Relevant Theory / Literature Survey:
The need for ASP

Why bother with ASP at all, when HTML can serve your needs?
If you want to display information, all you have to do is
fire up your favorite text editor, type in a few HTML tags,
and save it as an HTML file.
But wait – what if you want to display information that
changes? Supposing you’re writing a page that provides
constantly changing information to your visitors, for
example, weather reports, stock quotes, a list of your
girlfriends, etc, HTML can no longer keep up with the pace.
What you need is a system that can present dynamic
information. And ASP fits the bill perfectly.

What is Active Server Pages?

Active Server Pages (ASPs) are Web pages that contain


server-side scripts in addition to the usual mixture of text
and HTML tags. Server-side scripts are special commands you
put in Web pages that are processed before the pages are
sent from the server to the web-browser of someone who's
visiting your website. When you type a URL in the Address
box or click a link on a webpage, you're asking a web-server
on a computer somewhere to send a file to the web-browser
(also called a "client") on your computer. If that file is a
normal HTML file, it looks the same when your web-browser
receives it as it did before the server sent it. After
receiving the file, your web-browser displays its contents
as a combination of text, images, and sounds.

In the case of an Active Server Page, the process is


similar, except there's an extra processing step that takes
place just before the server sends the file. Before the
server sends the Active Server Page to the browser, it runs
all server-side scripts contained in the page. Some of these
scripts display the current date, time, and other
information. Others process information the user has just
typed into a form, such as a page in the website's
guestbook. And you can write your own code to put in
whatever dynamic information you want. To distinguish Active
Server Pages from normal HTML pages, Active Server
Pages are given the ".asp" extension.
Requirements to run ASP

Since the server must do additional processing on the ASP


scripts, it must have the ability to do so. The only servers
which support this facility are Microsoft Internet
Information Services & Microsoft Personal Web Server. Let us
look at both in detail, so that you can decide which one is
most suitable for you.

Internet Information Services


This is Microsoft’s web server designed for the Windows NT
platform. It can only run on Microsoft Windows NT 4.0,
Windows 2000 Professional, & Windows 2000 Server. The
current version is 5.0, and it ships as a part of the
Windows 2000 operating system.

Personal Web Server


This is a stripped-down version of IIS and supports most of
the features of ASP. It can run on all Windows platforms,
including Windows 95, Windows 98 & Windows Me. Typically,
ASP developers use PWS to develop their sites on their own
machines and later upload their files to a server running
IIS. If you are running Windows 9x or Me, your only option
is to use Personal Web Server 4.0.

The Object Model


ASP is a scripting environment revolving around its Object
Model. An Object Model is simply a hierarchy of objects that
you may use to get services from. In the case of ASP, all
commands are issued to certain inbuilt objects, that
correspond to the Client Request, Client Response, the
Server, the Session & the Application respectively. All of
these are for global use
Request: To get information from the user
Response: To send information to the user
Server: To control the Internet Information Server
Session: To store information about and change settings for
the user's current Web-server session
Application: To share application-level
information and control settings
for the lifetime of the application

The Request and Response objects contain collections (bits


of information that are accessed in the same way). Objects
use methods to do some type of procedure (if you know any
object-oriented programming language, you know already what
a method is) and properties to store any of the object's
attributes (such as color, font, or size).
Design Analysis / Implementation Logic:

Implementation:

Database Connectivity

<HTML>
<HEAD>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (“ADODB.Connection”)
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
“C:\Databases\Students.mdb”)

Dim RS
Set RS = Server.CreateObject (“ADODB.Recordset”)
RS.Open “SELECT * FROM Students”, DB
%>
</BODY>
</HTML>

The first few lines are the opening HTML tags for any page.
There’s no ASP code within them. The ASP block begins with
the statement,

Dim DB

which is a declaration of the variable that we are going to


use later on. The second line,

Set DB = Server.CreateObject (“ADODB.Connection”)

does the following two things:


Firstly, the right-hand-side statement,
Server.CreateObject() is used to create an instance of a COM
object which has the ProgID ADODB.Connection. The Set
Statement then assigns this reference to our variable, DB.
Now, we use the object just created to connect to the
database using a Connection String.

The string,

"PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
“C:\Databases\Students.mdb”

is a string expression that tells our object where to locate


the database, and more importantly, what type the database
is – whether it is an Access database, or a Sybase database,
or else, is it Oracle. (Please note that this is a
Connection String specific to Access 2000 databases. This
example does not use ODBC.)

If the DB.Open statement succeeds without an error, we have


a valid connection to our database under consideration. Only
after this can we begin to use the database.
The immediate next lines,
Dim RS
Set RS = Server.CreateObject (“ADODB.Recordset”)

serve the same purpose as the lines for creating the


ADODB.Connection object. Only now we’re creating an
ADODB.Recordset! Now,

RS.Open “SELECT * FROM Students”, DB

is perhaps the most important line of this example. Given an


SQL statement, this line executes the query, and assigns the
records returned to our Recordset object. The bare-minimum
syntax, as you can see, is pretty straight-forward. Of
course, the Recordset.Open (...) method takes a couple of
more arguments, but they are optional, and would just
complicate things at this juncture.

Inserting Data into a Table


<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (“ADODB.Connection”)
DB.Mode = adModeReadWrite
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
“C:\Databases\Students.mdb”)

Dim RS
Set RS = Server.CreateObject (“ADODB.Recordset”)
RS.Open “Students”, DB, adOpenStatic, adLockPessimistic
RS.AddNew
RS (“FirstName”) = “Kavitha”
RS (“LastName”) = “Nair”
RS (“Email”) = “kavitha@kavithanair.com”
RS (“DateOfBirth”) = CDate(“4 Feb, 1980”)
RS.Update
%>
</BODY>
</HTML>

Updating Records
<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (“ADODB.Connection”)
DB.Mode = adModeReadWrite
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
“C:\Databases\Students.mdb”)

Dim RS
Set RS = Server.CreateObject (“ADODB.Recordset”)
RS.Open “SELECT * FROM Students WHERE FirstName =
‘Kavitha’”, DB, adOpenStatic, adLockPessimistic

RS (“Email”) = “mynewemail@kavithanair.com”
RS (“DateOfBirth”) = CDate(“4 Feb, 1980”)
RS.Update
%>
</BODY>
</HTML>

Deleting Records
<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (“ADODB.Connection”)
DB.Mode = adModeReadWrite
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
“C:\Databases\Students.mdb”)

DB.Execute (“DELETE * FROM Students WHERE FirstName =


‘Kavitha’”)

%>
</BODY>
</HTML>

Retrieving Data
<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (“ADODB.Connection”)
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
“C:\Databases\Students.mdb”)

Dim RS
Set RS = Server.CreateObject (“ADODB.Recordset”)
RS.Open “SELECT * FROM Students”, DB

If RS.EOF And RS.BOF Then


Response.Write “There are 0 records.”
Else
RS.MoveFirst
While Not RS.EOF
Response.Write RS.Fields (“FirstName”)
Response.Write RS.Fields (“LastName”)
Response.Write “<HR>”
RS.MoveNext
Wend
End If
%>
</BODY>
</HTML>
Testing:
1. Insert data into Student Database.
2. Establish the connectivity with the database
3. Insert record, Delete Record , Update Record and
retrieve data from the database.

Conclusion:
A web based application for student registration is
implemented with ASP. The application also performs adding
new student, deleting a student and modifying a students
record .

Assignment No.4
Title of Assignment:
To create a simple multi-dimensional cube.
Relevant Theory / Literature Survey:
Installation of Analysis Services Of MSSQL 2000 is the
primary requirement. When we installed MSSQL 2000 Analysis
Services, Analysis Manager was also installed as a tool .

What is a Cube?

Cubes are the main objects in online analytic processing


(OLAP), a technology that provides fast access to data in a
data warehouse. A Cube is a set of data that is usually
constructed from a subset of a data warehouse and is
organized and summarized into a multidimensional structure
defined by a set of dimensions and measures. A Cube
provides an easy-to-use mechanism for querying data with
quick and uniform response times.

Every cube has a schema, which is the set of joined tables


in the data warehouse from which the cube draws its source
data. The central table in the schema is the fact table, the
source of the cube's measures. The other tables are
dimension tables, the sources of the cube's dimensions.

A cube is defined by the measures and dimensions that it


contains. For example, a cube for sales analysis includes
the measures Item_Sale_Price and Item_Cost and the
dimensions Store_Location, Product_Line, and Fiscal_Year.
This cube enables end users to separate Item_Sale_Price and
Item_Cost into various categories by Store_Location,
Product_Line, and Fiscal_Year.

Each cube dimension can contain a hierarchy of levels to


specify the categorical breakdown available to end users.
For example, the Store_Location dimension includes the level
hierarchy: Continent, Country, Region, State_Province, City,
Store_Number. Each level in a dimension is of finer
granularity than its parent. For example, continents contain
countries, and states or provinces contain cities.
Similarly, the hierarchy of the Fiscal_Year dimension
includes the levels Year, Quarter, Month, and Day.

Dimension levels are a powerful data modeling tool because


they allow end users to ask questions at a high level and
then expand a dimension hierarchy to reveal more detail.

Cubes are immediately subordinate to the database in the


object hierarchy. A database is a container for related and
cubes the objects they share. You must create a database
before you create a cube.

Data warehousing Objects

Fact tables and dimension tables are the two types of


objects commonly used in dimensional data warehouse schemas.

Fact tables are the large tables in your warehouse schema


that store business measurements. Fact tables typically
contain facts and foreign keys to the dimension tables. Fact
tables represent data, usually numeric and additive, that
can be analyzed and examined.

Dimension tables, also known as lookup or reference tables,


contain the relatively static data in the warehouse.
Dimension tables store the information you normally use to
contain queries

Star Schema
The star schema is the simplest data warehouse schema. It is
called a star schema because the diagram resembles a star,
with points radiating from a center. The center of the star
consists of one or more fact tables and the points of the
star are the dimension tables.
Hierarchies
Hierarchies are logical structures that use ordered levels
as a means of organizing data. A hierarchy can be used to
define data aggregation.

Design Analysis / Implementation Logic:

The assignment includes


1. Prepare Analysis Services, as our environment, for the
cube model we intend to design;
2. Create the basic cube model;
3. Perform dimension design and other steps as part of the
cube creation process;
4. Save the model;
5. Design storage for the cube we have planned;
6. Process the cube and
7. Overview basic cube browse functionality.

Testing:
(Input/ Output):
Conclusion:
A simple multi-dimensional cube is created and studied

Assignment No.5
Title of Assignment:
Study OF LDAP (Light weight Directory Access Protocol)

Relevant Theory / Literature Survey:


Directory Service

A Directory is like a database: you can put information in,


and later retrieve it. But it is specialized. Some typical
characteristics are: designed for reading more than writing,
offers a static view of the data, simple updates without
transactions. Directories are tuned to give quick-response
to high-volume lookup or search operations.

A Directory Service sports all of the above, plus a network


protocol used to access the directory. And perhaps also a
replication scheme, a data distribution scheme.

The Lightweight Directory Access Protocol (LDAP) is a


protocol for accessing online directory services. It runs
directly over TCP, and can be used to access directory
services back-ended by X.500, standalone LDAP directory
services or other kinds of directory servers.

X500

LDAP was originally developed as a front end to X.500, the


OSI directory service. X.500 defines the Directory Access
Protocol (DAP) for clients to use when contacting directory
servers. DAP is a heavyweight protocol that runs over a full
OSI stack and requires a significant amount of computing
resources to run. LDAP runs directly over TCP and provides
most of the functionality of DAP at a much lower cost. This
use of LDAP makes it easy to access the X.500 directory.

X500 in more depth

In X.500, the namespace is explicitly stated and is


hierarchical. Such namespaces require relatively complicated
management schemes. The naming model defined in X.500 is
concerned mainly with the structure of the entries in the
namespace, not the way the information is presented to the
user. Every entry in a X.500 Directory Information Tree, or
DIT, is a collection of attributes, each attribute composed
of a type element and one or more value elements.

The X.500 standard defines 17 object classes for directories


as a baseline. Being extensible, X.500 directories may
include other objects defined by implementors. The 17 basic
object classes include:
• Alias
• Country
• Locality
• Organization
• Organizational Unit
• Person

Objects in these object classes are defined by their


attributes. Some of the basic 40 attribute types include:

• Common Name (CU)


• Organization Name (O)
• Organizational Unit Name (OU)
• Locality Name (L)
• Street Address (SA)
• State or Province Name (S)
• Country (C)

Putting this all together, an unambiguous entry for an


addressee would be specified by its distinguished name, say
{C=US, O=Acme, OU=Sales, CN=Fred}

Sample X.500 hierarchy. Starting at the highest level, or


Root, we can traverse the tree to successively lower levels,
called Country, Organization, and Common Name, for instance.

Applications and users access the directory via a directory


user agent, or DUA. A DUA transfers the directory request to
a DSA, or Directory System Agent, via DAP, the Directory
Access Protocol. The directory itself is composed of one or
more DSAs. The DSAs can either communicate among themselves
to share directory information or may perform what is called
a referral, i.e., direct the DUA to use a specific DSA.
Referrals may occur when DSAs are not set up to exchange
directory information, perhaps due to lack of interworking
agreements between the administrators, or for security
reasons.

LDAP

The LDAP standard defines

• a network protocol for accessing information in the


directory. It defines the operations one may perform
e.g. search, add, delete, modify, change name. It also
defines how operations and data are conveyed.
• an information model defining the form and character of
the information
• a namespace defining how information is referenced and
organized
• an emerging distributed operation model defining how
data may be distributed and referenced (v3)
• Both the protocol itself and the information model are
extensible

Data Types

Any data types can be into the directory: Text, Photos,


URLs, Pointers to whatever, Binary data, Public Key
certificates.

Different types of data are held in attributes of different


types. Each attribute type has a particular syntax. The LDAP
standard describes a rich set of standard attribute types
and syntax (based on X.500's set). Plus, you may define your
own attributes, syntax, and even object classes -- you can
tailor your directory to your own site's specific needs.

The information model and namespace

They are based on Entries. An entry is simply a place where


one stores attributes. Each attribute has a type and one or
more values.

Entries themselves are "typed". This is accomplished by the


objectClass attribute.

The namespace is hierarchical, so it has the concept of


fully-qualified names called Distinguished Names (DN).
Here, test Entry's DN is "cn=test entry, ou=people,
dc=stanford, dc=edu"

Accessing an LDAP-based directory is accomplished by using a


combination of DN, filter, and scope. A base DN indicates
where in the hierarchy to begin the search. A filter
specifies attribute types, assertion values, and matching
criteria. A scope indicates what to search: the base DN
itself, one level below the base DN, the entire sub-tree
rooted at the base DN.

How does LDAP work?

LDAP directory service is based on a client-server model.


One or more LDAP servers contain the data making up the LDAP
directory tree. An LDAP client connects to an LDAP server
and asks it a question. The server responds with the answer,
or with a pointer to where the client can get more
information (typically, another LDAP server). No matter
which LDAP server a client connects to, it sees the same
view of the directory; a name presented to one LDAP server
references the same entry it would at another LDAP server.
This is an important feature of a global directory service,
like LDAP.

Key Points of LDAP:

• LDAP is an extensive, vendor-independent, open, network


PROTOCOL standard: so accessing data is done
transparently across a highly heterogeneous network
(i.e. the Internet).
• An LDAP-based directory supports any type of data.
• Can configure an LDAP-based directory to play
essentially any role.
• The LDAP protocol directly supports various forms of
strong security (authentication, privacy, and
integrity) technology.
• Can use general-purpose directory technology, such as
LDAP, to glue together disparate facets of cyberspace,
e.g. email, security, white- & yellow-pages,
directories, collaborative tools, MBone, etc.

Induvidual LDAP records


What's in a name? The DN of an LDAP entry

All entries stored in an LDAP directory have a unique


"Distinguished Name," or DN. The DN for each LDAP entry is
composed of two parts: the Relative Distinguished Name (RDN)
and the location within the LDAP directory where the record
resides.

The RDN is the portion of your DN that is not related to the


directory tree structure. Most items that you'll store in an
LDAP directory will have a name, and the name is frequently
stored in the cn (Common Name) attribute. Since nearly
everything has a name, most objects you'll store in LDAP
will use their cn value as the basis for their RDN. If I'm
storing a record for my favorite oatmeal recipe, I'll be
using cn=Oatmeal Deluxe as the RDN of my entry.

• My directory's base DN is dc=foobar,dc=com


• I'm storing all the LDAP records for my recipes in
ou=recipes
• The RDN of my LDAP record is cn=Oatmeal Deluxe

Given all this, what's the full DN of the LDAP record for
this oatmeal recipe? Remember, it reads backwards - just
like a host name in DNS.

cn=Oatmeal Deluxe,ou=recipes,dc=foobar,dc=com

Now it's time to tackle the DN of a company employee. For


user accounts, you'll typically see a DN based either on the
cn or on the uid (User ID). For example, the DN for FooBar's
employee Fran Smith (login name: fsmith) might look like
either of these two formats:

uid=fsmith,ou=employees,dc=foobar,dc=com
(login-based)
LDAP (and X.500) use uid to mean "User ID", not to be
confused with the UNIX uid number. Most companies try to
give everyone a unique login name, so this approach makes
good sense for storing information about employees. You
don't have to worry about what you'll do when you hire the
next Fran Smith, and if Fran changes her name (marriage?
divorce? religious experience?), you won't have to change
the DN of the LDAP entry.

cn=FranSmith,ou=employees,dc=foobar,dc=com
(name-based)
Here we see the Common Name (CN) entry used. In the case of
an LDAP record for a person, think of the common name as
their full name. One can easily see the downside to this
approach: if the name changes, the LDAP record has to "move"
from one DN to another. As indicated above, you want to
avoid changing the DN of an entry whenever possible.

An example of an induvidual LDAP entry.

Let's look at an example. We'll use the LDAP record of Fran


Smith, an employee from Foobar, Inc. The format of this
entry is LDIF, the format used when exporting and importing
LDAP directory entries.

dn: uid=fsmith, ou=employees, dc=foobar, dc=com


objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
objectclass: foobarPerson
uid: fsmith
givenname: Fran
sn: Smith
cn: Fran Smith
cn: Frances Smith
telephonenumber: 510-555-1234
roomnumber: 122G
o: Foobar, Inc.
mailRoutingAddress: fsmith@foobar.com
mailhost: mail.foobar.com
userpassword: {crypt}3x1231v76T89N
uidnumber: 1234
gidnumber: 1200
homedirectory: /home/fsmith
loginshell: /usr/local/bin/bash

To start with, attribute values are stored with case intact,


but searches against them are case-insensitive by default.
Certain attributes (like password) are case-sensitive when
searching.

Let's break this entry down and look at it piece by piece.

dn: uid=fsmith, ou=employees, dc=foobar, dc=com


This is the full DN of Fran's LDAP entry, including the
whole path to the entry in the directory tree. LDAP (and
X.500) use uid to mean "User ID," not to be confused with
the UNIX uid number.
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
objectclass: foobarPerson

One can assign as many object classes as are applicable to


any given type of object. The person object class requires
that the cn (common name) and sn (surname) fields have
values. Object Class person also allows other optional
fields, including givenname, telephonenumber, and so on. The
object class organizationalPerson adds more options to the
values from person, and inetOrgPerson adds still more options
to that (including email information). Finally, foobarPerson
is Foobar's customized object class that adds all the custom
attributes they wish to track at their company.

uid: fsmith
givenname: Fran
sn: Smith
cn: Fran Smith
cn: Frances Smith
telephonenumber: 510-555-1234
roomnumber: 122G
o: Foobar, Inc.

As mentioned before, uid stands for User ID. Just translate


it in your head to "login" whenever you see it.

Note that there are multiple entries for the CN. As


mentioned above, LDAP allows some attributes to have
multiple values, with the number of values being arbitrary.
When would you want this? Let's say you're searching the
company LDAP directory for Fran's phone number. While you
might know her as Fran (having heard her spill her guts over
lunchtime margaritas on more than one occasion), the people
in HR may refer to her (somewhat more formally) as Frances.
Because both versions of her name are stored, either search
will successfully look up Fran's telephone number, email,
cube number, and so on.

mailRoutingAddress: fsmith@foobar.com
mailhost: mail.foobar.com

Like most companies on the Internet, Foobar uses Sendmail


for internal mail delivery and routing. Foobar stores all
users' mail routing information in LDAP, which is fully
supported by recent versions of Sendmail.

userpassword: {crypt}3x1231v76T89N
uidnumber: 1234
gidnumber: 1200
gecos: Frances Smith
homedirectory: /home/fsmith
loginshell: /usr/local/bin/bash

Note that Foobar's systems administrators store all the NIS


password map information in LDAP as well. At Foobar, the
foobarPerson object class adds this capability. Note that the
user password is stored in UNIX crypt format. The UNIX uid
is stored here as uidnumber.

Conclusion:
Thus, the Light weight Directory Access Protocol is
studied.

Assignment No 6 (a)

Title: Case Study of SQL SERVER

What is SQL Server?

SQL Server 2000 is a family of products designed to meet


the data storage requirements of large data processing
systems and commercial Web sites, as well as meet the ease-
of-use requirements of individuals and small businesses. At
its core, SQL Server 2000 provides two fundamental services
to the emerging Microsoft .NET platform, as well as in the
traditional two-tier client/server environment. The first
service is the SQL Server service, which is a high-
performance, highly scalable relational database engine.
The second service is SQL Server 2000 Analysis Services,
which provides tools for analyzing the data stored in data
warehouses and data marts for decision support.
Microsoft SQL Server is a complete database and analysis
solution for rapidly delivering the next generation of
scalable Web applications.

SQL Server is a key component in supporting e-commerce,


line-of-business, and data warehousing applications, while
offering the scalability necessary to support growing,
dynamic environments.

SQL Server includes rich support for Extensible Markup


Language (XML) and other Internet language formats;
performance and availability features to ensure uptime; and
advanced management and tuning functionality to automate
routine tasks and lower the total cost of ownership.

The SQL Server 2000 Environment


The traditional client/server database environment consists
of client applications and a relational database management
system (RDBMS) that manages and stores the data. In this
traditional environment, the client applications that
provide the interface for users to access SQL Server 2000
are intelligent (or thick) clients, such as custom-written
Microsoft Visual Basic programs that access the data on SQL
Server 2000 directly using a local area network.
The emerging Microsoft .NET platform consists of highly
distributed, loosely connected, programmable Web services
executing on multiple servers. In this distributed,
decentralized environment, the client applications are thin
clients, such as Internet browsers, which access the data
on SQL Server 2000 through Web services such as Microsoft
Internet Information Services (IIS).
SQL Server 2000 Components
SQL Server 2000 provides a number of different types of
components. At the core are server components. These server
components are generally implemented as 32-bit Windows
services. SQL Server 2000 provides client-based graphical
tools and command-prompt utilities for administration.
These tools and utilities, as well as all other client
applications, use client communication components provided
by SQL Server 2000. The communication components provide
various ways in which client applications can access data
through communication with the server components. These
communication components are implemented as providers,
drivers, database interfaces, and Net-Libraries.

Server Components
The server components of SQL Server 2000 are normally
implemented as 32-bit Windows services. The SQL Server and
SQL Server Agent services may also be run as standalone
applications on any supported Windows operating system
platform.
Table lists the server components and briefly describes
their function. It also specifies how the component is
implemented when multiple instances are used.
Table: Server Components and Their Functions
Server
Description
Component
MSSQLServer service implements the SQL Server
SQL Server
2000 database engine. There is one service
service
for each instance of SQL Server 2000.
Microsoft SQL
MSSQLServerOLAPService implements SQL Server
Server 2000
2000 Analysis Services. There is only one
Analysis
service, regardless of the number of
Services
instances of SQL Server 2000.
service
SQLServerAgent service implements the agent
SQL Server that runs scheduled SQL Server 2000
Agent service administrative tasks. There is one service
for each instance of SQL Server 2000.
Microsoft Search implements the full-text
Microsoft search engine. There is only one service,
Search service regardless of the number of instances of SQL
Server 2000.
Distributed Transaction Coordinator manages
distributed transactions between instances of
Microsoft (MS
SQL Server 2000. There is only one service,
DTC) service
regardless of the number of instances of SQL
Server 2000.

Client-Based Graphical Tools


Table lists the 32-bit graphical tools provided by SQL
Server 2000 and briefly describes their function.
Table: Graphical Tools in SQL Server 2000
Graphical
Description
Tool
SQL Server SQL Server Enterprise Manager is the primary
Enterprise administrative tool for SQL Server and
Manager provides a Microsoft Management Console (MMC)–
compliant user interface that helps you to
perform a variety of administrative tasks:
Graphical
Description
Tool
Defining groups of servers running SQL Server
Registering individual servers in a group
Configuring all SQL Server options for each
registered server
Creating and administering all SQL Server
databases, objects, logins, users, and
permissions in each registered server
Defining and executing all SQL Server
administrative tasks on each registered server

Designing and testing SQL statements, batches,


and scripts interactively by invoking SQL
Query Analyzer
Invoking the various wizards defined for SQL
Server

SQL Server SQL Query Analyzer is a graphical


tool that helps you to perform a variety of
tasks:

Creating queries and other SQL scripts and


SQL Query executing them against SQL Server databases
Analyzer Creating commonly used database objects from
predefined scripts
Copying existing database objects
Executing stored procedures without knowing
the parameters

SQL Profiler is a tool that captures SQL


Server events
from a server. The events are saved in
a trace file that can later
SQL Profiler
be analyzed or used to replay a specific
series of steps when
trying to diagnose a problem.

SQL Server A taskbar application used to start,


Service stop, pause, or modify SQL Server
Manager 2000 services.
SQL Server SQL Server Agent runs on the server that is
Agent running instances of SQL Server. SQL Server
Agent is responsible for the following tasks:

Running SQL Server tasks that are scheduled to


Graphical
Description
Tool
occur at specific times or intervals Detecting
specific conditions for which administrators
have defined an action, such as alerting
someone through pages or e-mail, or issuing a
task that will address the conditions
Running replication tasks defined by
administrators
The Relational Database Architecture
SQL Server 2000 data is stored in databases. Physically, a
database consists of two or more files on one or more
disks. This physical implementation is visible only to
database administrators, and is transparent to users. The
physical optimization of the database is primarily the
responsibility of the database administrator.
Logically, a database is structured into components that
are visible to users, such as tables, views, and stored
procedures. The logical optimization of the database (such
as the design of tables and indexes) is primarily the
responsibility of the database designer. ISBN 0-7356-0634-
X).

System and User Databases


Each instance of SQL Server 2000 has four system databases.
Table 1.6 lists each of these system databases and briefly
describes their function.
In addition, each instance of SQL Server 2000 has one or
more user databases. The pubs and Northwind user databases
are sample databases that ship with SQL Server 2000. Given
sufficient system resources, each instance of SQL Server
2000 can handle thousands of users working in multiple
databases simultaneously.
Table: System Databases in SQL Server 2000
System
Description
Database
Records all of the system-level information for a
SQL Server 2000 system, including all other
master
databases, login accounts, and system
configuration settings.
Stores all temporary tables and stored procedures
tempdb created by users, as well as temporary worktables
used by the relational database engine itself.
Serves as the template that is used whenever a new
model
database is created.
SQL Server Agent uses this system database for
msdb scheduling alerts and jobs, and recording
operators.

Physical Structure of a Database


Each database consists of at least one data file and one
transaction log file. These files are not shared with any
other database. To optimize performance and to provide
fault tolerance, data and log files are typically spread
across multiple drives and frequently use a redundant array
of independent disks (RAID).
Extents and Pages
SQL Server 2000 allocates space from a data file for tables
and indexes in 64-KB blocks called extents. Each extent
consists of eight contiguous pages of 8 KB each. There are
two types of extents: uniform extents that are owned by a
single object, and mixed extents that are shared by up to
eight objects.
A page is the fundamental unit of data storage in SQL
Server 2000, with the page size being 8 KB. In general,
data pages store data in rows on each data page. The
maximum amount of data contained in a single row is 8060
bytes. Data rows are either organized in some kind of order
based on a key in a clustered index (such as zip code), or
stored in no particular order if no clustered index exists.
The beginning of each page contains a 96-byte header that
is used to store system information, such as the amount of
free space available on the page.

Transaction Log Files


The transaction log file resides in one or more separate
physical files from the data files and contains a series of
log records, rather than pages allocated from extents. To
optimize performance and aid in redundancy, transaction log
files are typically placed on separate disks from data
files, and are frequently mirrored using RAID.

Logical Structure of a Database


Data in SQL Server 2000 is organized into database objects
that are visible to users when they connect to a database.
Table lists these objects and briefly describes their
function.
Table: Database Objects in SQL Server 2000
Database
Description
Object
A table generally consists of columns and rows
of data in a format similar to that of a
spreadsheet. Each row in the table represents
Tables
a unique record, and each column represents a
field within the record. A data type specifies
what type of data can be stored in a column.
Views can restrict the rows or the columns of
a table that are visible, or can combine data
Views
from multiple tables to appear like a single
table. A view can also aggregate columns.
An index is a structure associated with a
table or view that speeds retrieval of rows
from the table or view. Table indexes are
Indexes
either clustered or nonclustered. Clustering
means the data is physically ordered based on
the index key.
A key is a column or group of columns that
uniquely identifies a row (PRIMARY KEY),
Keys
defines the relationship between two tables
(FOREIGN KEY), or is used to build an index.
User-defined A user-defined data type is a custom data
data types type, based on a predefined SQL Server 2000
data type. It is used to make a table
Database
Description
Object
structure more meaningful to programmers and
help ensure that columns holding similar
classes of data have the same base data type.
A stored procedure is a group of Transact-SQL
Stored statements compiled into a single execution
procedures plan. The procedure is used for performance
optimization and to control access.
Constraints define rules regarding the values
Constraints allowed in columns and are the standard
mechanism for enforcing data integrity.
A default specifies what values are used in a
column in the event that you do not specify a
Defaults
value for the column when you are inserting a
row.
A trigger is a special class of stored
procedure defined to execute automatically
Triggers
when an UPDATE, INSERT, or DELETE statement is
issued against a table or view.
A user-defined function is a subroutine made
up of one or more Transact-SQL statements used
User-defined to encapsulate code for reuse. A function can
functions have a maximum of 1024 input parameters. User-
defined functions can be used in place of
views and stored procedures.

The Security Architecture

Logins, users, roles, and groups are the foundation for the
security mechanisms of SQL Server. Users who connect to SQL
Server must identify themselves by using a Specific Login
Identifier (ID). Users can then see only the tables and
views that they are authorized to see and can execute only
the stored procedures and administrative functions that
they are authorized to execute. This system of security is
based on the IDs used to identify users.
Allocating Space for Tables and Indexes
Before SQL Server 2000 can store information in a table or
an index, free space must be allocated from within a data
file and assigned to that object. Free space is allocated
for tables and indexes in units called extents. An extent
is 64 KB of space, consisting of eight contiguous pages,
each 8 KB in size. There are two types of extents, mixed
extents and uniform extents. SQL Server 2000 uses mixed
extents to store small amounts of data for up to eight
objects within a single extent and uses uniform extents to
store, whereas SQL Server 2000 uses uniform extents to
store data from a single object.
When a new table or index is created, SQL Server 2000
locates a mixed extent with a free page and allocates the
free page to the newly created object. A page contains data
for only one object. When an object requires additional
space, SQL Server 2000 allocates free space from mixed
extents until an object uses a total of eight pages.
Thereafter, SQL Server 2000 allocates a uniform extent to
that object. SQL Server 2000 will grow the data files in a
round-robin algorithm if no free space exists in any data
file and autogrow is enabled.
When SQL Server 2000 needs a mixed extent with at least one
free page, a Secondary Global Allocation Map (SGAM) page is
used to locate such an extent. Each SGAM page is a bitmap
covering 64,000 extents (approximately 4 GB) that is used
to identify allocated mixed extents with at least one free
page. Each extent in the interval that SGAM covers is
assigned a bit. The extent is identified as a mixed extent
with free pages when the bit is set to 1. When the bit is
set to 0, the extent is either a mixed extent with no free
pages, or the extent is a uniform extent.
When SQL Server 2000 needs to allocate an extent from free
space, a Global Allocation Map (GAM) page is used to locate
an extent that has not previously been allocated to an
object. Each GAM page is a bitmap that covers 64,000
extents, and each extent in the interval it covers is
assigned a bit. When the bit is set to 1, the extent is
free. When the bit is set to 0, the extent has already been
allocated.

Storing Index and Data Pages


In the absence of a clustered index, SQL Server 2000 stores
new data on any unfilled page in any available extent
belonging to the table into which the data is being
inserted. This disorganized collection of data pages is
called a heap. In a heap, the data pages are stored in no
specific order and are not linked together. In the absence
of either a clustered or a nonclustered index, SQL Server
2000 has to search the entire table to locate a record
within the table (using IAM pages to identify pages
associated with the table). On a large table, this complete
search is quite inefficient.
To speed this retrieval process, database designers create
indexes for SQL Server 2000 to use to find data pages
quickly. An index stores the value of an indexed column (or
columns) from a table in a B-tree structure. A B-tree
structure is a balanced hierarchal structure (or tree)
consisting of a root node, possible intermediate nodes, and
bottom-level leaf pages (nodes). All branches of the B-tree
have the same number of levels. A B-tree physically
organizes index records based on these key values. Each
index page is linked to adjacent index pages.
SQL Server 2000 supports two types of indexes, clustered
and nonclustered. A clustered index forces the physical
ordering of data pages within the data file based on the
key value used for the clustered index (such as last name
or zip code). The leaf level of a clustered index is the
data level. When a new data row is inserted into a table
containing a clustered index, SQL Server 2000 traverses the
B-tree structure and determines the location for the new
data row based on the ordering within the B-tree (moving
existing data and index rows as necessary to maintain the
physical ordering). See Figure 5.1.
The leaf level of a nonclustered index contains a pointer
telling SQL Server 2000 where to find the data row
corresponding to the key value contained in the
nonclustered index. When a new data row is inserted into a
table containing only a nonclustered index, a new index row
is entered into the B-tree structure, and the new data row
is entered into any page in the heap that has been
allocated to the table and contains sufficient free space.
See Figure 5.2.
ASSIGNMENT NO. 6b
TITLE: CASE STUDY of MYSQL

THEORY:

History
Overview of MySQL AB

MySQL AB is the company of the MySQL founders and main


developers. MySQL AB was originally established in Sweden
by David Axmark, Allan Larsson, and Michael “Monty”
Widenius.

The MySQL Web site (http://www.mysql.com/) provides the


latest information about MySQL and MySQL AB.

The “AB” part of the company name is the acronym for the
Swedish “aktiebolag,” or “stock company.” It translates to
“MySQL, Inc.”

Overview of the MySQL Database Management System

MySQL, the most popular Open Source SQL database management


system uses the standard SQL interface. It is developed,
distributed, and supported by MySQL AB. MySQL is very
popular as a back end for web applications. The MySQL
engine can be accessed from most major
programming/scripting languages such as perl, and php
making it easy to develop applications.

- MySQL is a database management system.


- MySQL is a relational database management system.
- MySQL software is Open Source.
- The MySQL Database Server is very fast, reliable, and
easy to use.
- MySQL Server works in client/server or embedded systems.
- A large amount of contributed MySQL software is available.

The Main Features of MySQL

Internals and Portability:


• Written in C and C++.
• Tested with a broad range of different compilers.
• Works on many different platforms.
• APIs for C, C++, Java, Perl, PHP, Python, etc. are
available.
• Fully multi-threaded using kernel threads. It can easily
use multiple CPUs if they are available.
• Provides transactional and non-transactional storage
engines.
• Uses very fast B-tree disk tables (MyISAM) with index
compression.
• Relatively easy to add other storage engines. This is
useful if you want to add an SQL interface to an in-house
database.
• A very fast thread-based memory allocation system.
• Very fast joins using an optimized one-sweep multi-join.
• In-memory hash tables, which are used as temporary tables.
• SQL functions are implemented using a highly optimized
class library and should be as fast as possible. Usually
there is no memory allocation at all after query
initialization.
• The MySQL code is tested with Purify (a commercial memory
leakage detector) as well as with Valgrind, a GPL tool.
• The server is available as a separate program for use in
a client/server networked environment. It is also available
as a library that can be embedded (linked) into standalone
applications. Such applications can be used in isolation or
in environments where no network is available.

Data Types:
• Many data types: signed/unsigned integers 1, 2, 3, 4, and
8 bytes long, FLOAT, DOUBLE, CHAR, VARCHAR, TEXT, BLOB,
DATE, TIME, DATETIME, TIMESTAMP, YEAR, SET, ENUM, and
OpenGIS spatial types.
• Fixed-length and variable-length records.

Statements and Functions:


• Full operator and function support in the SELECT and
WHERE clauses of queries. For example:
mysql> SELECT CONCAT(first_name, ' ', last_name)
-> FROM citizen
-> WHERE income/dependents > 10000 AND age > 30;
• Full support for SQL GROUP BY and ORDER BY clauses.
Support for group functions (COUNT(), COUNT(DISTINCT ...),
AVG(), STD(), SUM(), MAX(), MIN(), and GROUP_CONCAT()).
• Support for LEFT OUTER JOIN and RIGHT OUTER JOIN with
both standard SQL and ODBC syntax.
• Support for aliases on tables and columns as required by
standard SQL.
• DELETE, INSERT, REPLACE, and UPDATE return the number of
rows that were changed (affected). It is possible to return
the number of rows matched instead by setting a flag when
connecting to the server.
• The MySQL-specific SHOW statement can be used to retrieve
information about databases, storage engines, tables, and
indexes. The EXPLAIN statement can be used to determine how
the optimizer resolves a query.
• Function names do not clash with table or column names.
For example, ABS is a valid column name. The only
restriction is that for a function call, no spaces are
allowed between the function name and the ‘(’ that follows
it.
• You can mix tables from different databases in the same
query.

Security:
• A privilege and password system that is very flexible and
secure, and that allows host-based verification. Passwords
are secure because all password traffic is encrypted when
you connect to a server.

Scalability and Limits:


• Handles large databases. We use MySQL Server with
databases that contain 50 million records. We also know of
users who use MySQL Server with 60,000 tables and about
5,000,000,000 rows.
• Up to 64 indexes per table are allowed (32 before MySQL
4.1.2). Each index may consist of 1 to 16 columns or parts
of columns. The maximum index width is 1000 bytes (767 for
InnoDB); before MySQL 4.1.2, the limit is 500 bytes. An
index may use a prefix of a column for CHAR, VARCHAR, BLOB,
or TEXT column types.

Connectivity:
• Clients can connect to the MySQL server using TCP/IP
sockets on any platform. On Windows systems in the NT
family (NT, 2000, XP, 2003, or Vista), clients can connect
using named pipes. On Unix systems, clients can connect
using Unix domain socket files.
• In MySQL 4.1 and higher, Windows servers also support
shared-memory connections if started with the --shared-
memory option. Clients can connect through shared memory by
using the --protocol=memory option.
• The Connector/ODBC (MyODBC) interface provides MySQL
support for client programs that use ODBC (Open Database
Connectivity) connections. For example, you can use MS
Access to connect to your MySQL server. Clients can be run
on Windows or Unix. MyODBC source is available. All ODBC
2.5 functions are supported, as are many others.
• The Connector/J interface provides MySQL support for Java
client programs that use JDBC connections. Clients can be
run on Windows or Unix. Connector/J source is available.
• MySQL Connector/NET enables developers to easily
create .NET applications that require secure, high-
performance data connectivity with MySQL. It implements the
required ADO.NET interfaces and integrates into ADO.NET
aware tools. Developers can build applications using their
choice of .NET languages. MySQL Connector/NET is a fully
managed ADO.NET driver written in 100% pure C#.

Localization:
• The server can provide error messages to clients in many
languages. See Section 5.11.2, “Setting the Error Message
Language”.
• Full support for several different character sets,
including latin1 (cp1252), german, big5, ujis, and more.
For example, the Scandinavian characters ‘å’, ‘ä’ and ‘ö’
are allowed in table and column names. Unicode support is
available as of MySQL 4.1.
• All data is saved in the chosen character set. All
comparisons for normal string columns are case-insensitive.
• Sorting is done according to the chosen character set
(using Swedish collation by default). It is possible to
change this when the MySQL server is started. To see an
example of very advanced sorting, look at the Czech sorting
code. MySQL Server supports many different character sets
that can be specified at compile time and runtime.

Clients and Tools:


• MySQL Server has built-in support for SQL statements to
check, optimize, and repair tables. These statements are
available from the command line through the mysqlcheck
client. MySQL also includes myisamchk, a very fast command-
line utility for performing these operations on MyISAM
tables.
• All MySQL programs can be invoked with the --help or -?
options to obtain online assistance.
System Architecture

Transaction Management
Transaction Overview
Transaction - A sequence of executions of SQL
statements that can be treated as a single unit in which
all data changes can be committed or cancelled as a whole.
Most database servers offer two transaction management
modes:
• Auto Commit On: Each SQL statement is a transaction.
Data changes resulted from each statement are
automatically committed.
• Auto Commit Off: Transactions are explicitly started
and ended by the client program. Data changes are not
committed unless requested by the client program.
Most database server supports the following statements
for transaction management:
• Commit Statement - To commit all changes in the
current transaction.
• Rollback Statement - To rollback all changes in the
current transaction.
• Start Transaction Statement - To start a new
transaction.

Transactions are not explicitly started on the storage


engine level, but are instead implicitly started through
calls to either start_stmt() or external_lock(). If the
preceding methods are called and a transaction already
exists the transaction is not replaced.
The storage engine stores transaction information in
per-connection memory and also registers the transaction in
the MySQL server to allow the server to later issue COMMIT
and ROLLBACK operations.
As operations are performed the storage engine will
have to implement some form of versioning or logging to
permit a rollback of all operations executed within the
transaction.
After work is completed, the MySQL server will call
either the commit() method or the rollback() method defined
in the storage engine's handlerton.
MySQL Support of Transaction Management

MySQL support of transaction management follows the


following rules:

• Only two storage engines support transaction


management: InnoDB and BDB.
• The default storage engine, MyISAM, doesn't support
transaction management.
• To force a table to use a non-default storage engine,
you must specify the engine name in the "create table"
statement.

Statements related to transaction management:

SET AUTOCOMMIT = 0 | 1;
START TRANSACTOIN;
COMMIT;
ROLLBACK;

Note that:

• SET AUTOCOMMIT = 1 - Turns on the auto-commit option.


It also commits and terminates the current
transaction.
• SET AUTOCOMMIT = 0 - Turns off the auto-commit option.
It also starts a new transaction
• By default, auto-commit option is turned on when a new
session is established.
• COMMIT - Commits the current transaction.
• ROLLBACK - Rolls back the current transaction.
• START TRANSACTION - Commits the current transaction
and starts a new transaction.

Transaction Isolation Levels

The impact of a transaction in the current session is


simple. However, concurrent transactions in multiple
sessions may impact each other in many ways. Three
phenomena have been observed in concurrent transactions:

• Dirty Read - One transaction T1 reads uncommitted


changes from another transaction T2. If T2 performs a
rollback later, T1 may have used incorrect data from
the uncommitted changes.
• Non-Repeatable Read - One transaction T1 reads a row,
which is changed and committed by another transaction
T2 later. Now if T1 reads the same row again, the
result will be will be different from the first read.
• Phantom - One transaction T1 reads a set of rows that
satisfy a condition. Another transaction T2 then
inserts some new rows that satisfy the same condition.
If T1 repeats the same read, it will receive some
"phantom" rows.
To be able to control and avoid those phenomena, 4
transaction isolation levels have been defined by SQL
standards:

• Read Uncommitted - This is the lowest isolation level.


All three phenomena are possible.
• Read Committed - Dirty Read is prevented. But Non-
Repeatable Read and Phantom are possible.
• Repeatable Read - Dirty Read and Non-Repeatable Read
are prevented. But Phantom is still possible.
• Serializable - This is the highest isolation level.
All three phenomena are prevented.

MySQL Support of Transaction Isolation Levels

• Transaction isolation levels are supported by the


InnoDB storage engine.
• The default isolation level is "Repeatable Read".
• The SET statement can be used to change the isolation
level for the next transaction: "SET TRANSACTION
ISOLATION LEVEL level_name".
• The SET statement can be used to change the isolation
level for the entire session, starting with the next
transaction: "SET SESSION TRANSACTION ISOLATION LEVEL
level_name".

Starting a Transaction

A transaction is started by the storage engine in


response to a call to either the start_stmt() or
external_lock() methods.
If there is no active transaction, the storage engine
must start a new transaction and register the transaction
with the MySQL server so that ROLLBACK or COMMIT can later
be called.

Implementing ROLLBACK

Of the two major transactional operations, ROLLBACK is


the more complicated to implement. All operations that
occurred during the transaction must be reversed so that
all rows are unchanged from before the transaction began.
To support ROLLBACK, create a method that matches this
definition:
int (*rollback)(THD *thd, bool all);
The method name is then listed in the rollback
(thirteenth) entry of the handlerton.
The THD parameter is used to identify the transaction
that needs to be rolled back, while the bool all parameter
indicates whether the entire transaction should be rolled
back or just the last statement.
Details of implementing a ROLLBACK operation will vary
by storage engine.

Implementing COMMIT

During a commit operation, all changes made during a


transaction are made permanent and a rollback operation is
not possible after that. Depending on the transaction
isolation used, this may be the first time such changes are
visible to other threads.
To support COMMIT, create a method that matches this
definition:
int (*commit)(THD *thd, bool all);
The method name is then listed in the commit (twelfth)
entry of the handlerton.
The THD parameter is used to identify the transaction
that needs to be committed, while the bool all parameter
indicates if this is a full transaction commit or just the
end of a statement that is part of the transaction.
Details of implementing a COMMIT operation will vary
by storage engine. If the server is in auto-commit mode,
the storage engine should automatically commit all read-
only statements such as SELECT. In a storage engine, "auto-
committing" works by counting locks. Increment the count
for every call to external_lock(), decrement when
external_lock() is called with an argument of F_UNLCK. When
the count drops to zero, trigger a commit.

Adding Support for Savepoints

This should be a fixed size, preferably not large as


the MySQL server will allocate space to store the savepoint
for all storage engines with each named savepoint.
When a COMMIT or ROLLBACK operation occurs (with bool
all set to true), all savepoints are assumed to be
released. If the storage engine allocates resources for
savepoints, it should free them.

Indexing and Storage


Indexing

Indexes are a special system that databases use to


improve the overall performance. By setting indexes on your
tables, you are telling MySQL to pay particular attention
to that column (in layman's terms). In fact, MySQL creates
extra files to store and track indexes efficiently.

MySQL allows for up to 32 indexes for each table, and


each index can incorporate up to 16 columns. While a
multicolumn index may not seem obvious, it will come in
handy for searches frequently performed on the same set of
multiple columns (e.g., first and last name, city and
state, etc.)

Indexes are used to find rows with specific column


values quickly. Without an index, MySQL must begin with the
first row and then read through the entire table to find
the relevant rows. The larger the table, the more this
costs. If the table has an index for the columns in
question, MySQL can quickly determine the position to seek
to in the middle of the data file without having to look at
all the data. If a table has 1,000 rows, this is at least
100 times faster than reading sequentially. If you need to
access most of the rows, it is faster to read sequentially,
because this minimizes disk seeks.

Indexes are a way to increase performance and


efficiency in a database table. If you have a table with
many columns but you are always doing searches on one or
two of those columns you can tell MySQL to index those
columns. When you do a search (or a sort) using an indexed
column the MySQL engine only has to process the much
smaller index instead of the entire table to find the right
field. You can also specify that an index is unique which
is an even bigger performance benefit because once the
engine finds the value it can stop because there can't be
another one like it. You can add an index to a table with
the command: alter table <table> add <index|unique> <index>
(<column>[,column2...]). Indexing is a must on large
tables. The performance can be horrible without them.

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and


FULLTEXT) are stored in B-trees. Exceptions are that
indexes on spatial data types use R-trees, and that MEMORY
tables also support hash indexes.
MySQL uses indexes for these operations:

• To find the rows matching a WHERE clause quickly.


• To eliminate rows from consideration. If there is a
choice between multiple indexes, MySQL normally uses
the index that finds the smallest number of rows.
• To retrieve rows from other tables when performing
joins.
• To find the MIN() or MAX() value for a specific indexed
column key_col.
• The index also can be used for LIKE comparisons if the
argument to LIKE is a constant string that does not
start with a wildcard character.

Sometimes MySQL does not use an index, even if one is


available. One circumstance under which this occurs is when
the optimizer estimates that using the index would require
MySQL to access a very large percentage of the rows in the
table. (In this case, a table scan is likely to be much
faster because it requires fewer seeks.) However, if such a
query uses LIMIT to retrieve only some of the rows, MySQL
uses an index anyway, because it can much more quickly find
the few rows to return in the result.

Storage

Data in MySQL is stored in files (or memory) using a


variety of different techniques. Each of these techniques
employs different storage mechanisms, indexing facilities,
locking levels and ultimately provides a range of different
functions and capabilities. By choosing a different
technique you can gain additional speed or functionality
benefits that will improve the overall functionality of
your application.

For example, if you work with a large amount of


temporary data, you may want to make use of the MEMORY
storage engine, which stores all of the table data in
memory. Alternatively, you may want a database that
supports transactions (to ensure data resilience).

Each of these different techniques and suites of


functionality within the MySQL system is referred to as a
storage engine (also known as a table type). By default,
MySQL comes with a number of different storage engines pre-
configured and enabled in the MySQL server. You can select
the storage engine to use on a server, database and even
table basis, providing you with the maximum amount of
flexibility when it comes to choosing how your information
is stored, how it is indexed and what combination of
performance and functionality you want to use with your
data.

This flexibility to choose how your data is stored and


indexed is a major reason why MySQL is so popular; other
database systems, including most of the commercial options,
support only a single type of database storage.
Unfortunately the 'one size fits all approach' in these
other solutions means that either you sacrifice performance
for functionality, or have to spend hours or even days
finely tuning your database. With MySQL, we can just change
the engine we are using.

Programmatically this is nothing special, it is normal


practice to divide a program into modules and layers. But
it is unique for a DBMS (Database Management System),
because a developer and even a DBA (Database Administrator)
is traditionally insulated from the physical storage
methods that the database server may employ. How the data
is stored really does not concern them, as the server just
takes care of everything. That being the case, a developer
or DBA could benefit from knowing a bit more about such
things as it may help them to optimize applications. This
is an angle that may be applied to many aspects of database
servers, but in this article we'll focus on the storage
engines.

Why have storage engine layers? There are a number of


interrelated reasons:

Technology evolves. As new features are developed,


maintaining backward compatibility in the file format is
not always possible. Users, would need to run a conversion
tool when they upgrade, or even dump/import their entire
dataset. This is obviously very inconvenient. It would be
much nicer if users could upgrade their server (for bug-
fixes and other new features) without also having to
migrate all their data. This means that a single version of
the server has to support multiple file formats.

For server developers, changes in the data storage code may


require related changes elsewhere in the server, and like
with all new code there is always the possibility of
introducing bugs. This calls for abstraction: changes in
the underlying code, to a large extent, should not affect
the code at higher levels.

Different applications have different requirements with


regard to data storage, and some of these requirements may
even conflict. Think of a banking application that requires
highly secure transaction processing, versus traffic
logging on a website. Typically, there are differences in
the number and balance of selects and updates, as well as
the need for transactions and isolation levels. There are
always trade-offs, and choices need to be made. With only
one mechanism available, most applications would just have
to do with a solution that is probably not optimal for
them. While accepting that there is no single tool suitable
for every use, we think that there is something to be said
for a moderate "Swiss army knife" style approach. It would
be nice if a server can cater effectively to more than one
type of application.

Fundamentally, different storage media call for a different


approach. A hard disk has characteristics which differ
wildly from RAM, for instance. In a nutshell, a hard disk
can generally contain more data, but getting to it takes
longer. RAM is very fast, but there is a limited supply of
it. Some search algorithms are optimized for RAM, others
are optimized for disk-based storage. And did you know that
a Compact Flash card uses much more power when reading
data? That is an issue that definitely needs to be
considered for an embedded application. Who knows what
other new technologies we will see in the future.

MySQL's storage engine architecture addresses all these


aspects, and not by accident. It was a deliberate design
choice by Michael "Monty" Widenius, MySQL AB's CTO.

Let us look at a simplified high-level diagram of the MySQL


server architecture:
The diagram shows four storage engines, each with different
characteristics:

• MyISAM is a disk based storage engine. Aiming for very


low overhead, it does not support transactions.
• InnoDB is also disk based, but offers versioned, fully
ACID transactional capabilities. InnoDB requires more
disk space than MyISAM to store its data, and this
increased overhead is compensated by more aggressive
use of memory caching, in order to attain high speeds.
• Memory (formerly called "HEAP") is a storage engine
that utilizes only RAM. Special algorithms are used
that make optimal use of this environment. It is very
fast.
• NDB, the MySQL Cluster Storage engine, connects to a
cluster of nodes, offering high availability through
redundancy, high performance through fragmentation
(partitioning) of data across multiple node groups,
and excellent scalability through the combination of
these two. NDB uses main-memory only, with logging to
disk.

One of the things that differs per storage engine is the


locking and isolation mechanism, but most of the server
operates in the same way no matter what storage engine is
used: all the usual SQL commands are independent of the
storage engine. Naturally, the optimizer may need to make
different choices depending on the storage engine, but this
is all handled through a standardized interface (API) which
each storage engine supports.

So to a degree, the application does not need to know how


its data is stored. And it may not matter either, when the
demands are not very high. But for a larger dataset, or
with more demanding access requirements, it does become
increasingly important to make a conscious choice. And the
best news is that an application can use multiple storage
engines, as the selection can be made on a per-table basis.
Also, the server can convert tables between the different
formats using a simple ALTER TABLE command.

Default Storage Engine

If you use CREATE TABLE without specifying the ENGINE=...


option, the server will use the default. The default
storage engine is MyISAM. If you want to change the default
to say InnoDB, you can use the configuration directive
--default-storage-engine=InnoDB.

Something to be aware of is that if you create a table


specifying an engine type that is not enabled, MySQL will
automatically fall back to the default. From MySQL 4.1, a
warning is issued.

Query Processing

The query processing steps:

1. Parser (builds tree)


2. Preprocessor (checks syntax, columns)
3. Optimizer (generates query execution plan)
o query transformation
o search for optimal execution plan
o plan is refined
4. Query sent to execution engine

A query has only a few pre-defined operations, which eases


the task of processing a query:

• access methods (whether table scan or index)


• where conditions
• joins
• union, group, etc

MySQL uses a left-deep linear plan for executing a query.


All of the tables fall into a single line. Many other
systems use the bushy plan, which is more tree-like.
Timour shows a large query with 5 or 6 WHERE conditions and
steps through the process of how the query is parsed.

In optimizing a SQL statement there is quite a bit of


analysis of the cost of a query. The cost is calculated by
looking at things like how many times the disk will need
accessed, the number of pages per table, the length of the
rows and keys and the data schema (key uniqueness etc).
Determining costs involves mathematical operations to
determine the cost using different methods. The type of
storage engine isn't considered in the cost.

MySQL 5.0 has greedy searching. It doesn't consider


everything, just gets enough information to find a good
path and then moves on.

User Interfaces for MySQL- EMS MySQL Manager

• Full support of MySQL versions from 3.23 to 5.06


• New state-of-the-art graphical user interface
• Rapid database management and navigation Simple
management of all MySQL objects
• Advanced data manipulation tools
• Powerful security management
• Excellent visual and text tools for query building
• Impressive data export and import capabilities
• Easy-to-use wizards performing MySQL services
Applications

Applications

• Billing Management
• Compliance & Risk Management
• Customer Relationship Management (CRM)
• Demand Chain Management (DRM)
• Education
• Enterprise Content Management (ECM)
• Enterprise Information Portal (EIP)
• Enterprise Resources Planning (ERP)
• Financials
• Government
• Healthcare
• Human Resources Management (HRMS)
• Inventory Management
• Manufacturing
• Messaging & Collaboration
• Order Management
• Payroll Management
• Point of Sale (POS)
• Project Management
• Purchasing Management
• Retail
• Supply Chain Management (SCM)

MySQL Usage

You might also like