You are on page 1of 19

Introduction to XML

Version 1.0
Introduction
•EXtensible Markup Language (XML) is a data
representation language by using a set of
tags.
•Unlike HTML, which supports only limited set
of tags, using XML, we can have our own
tags defined.
•XML is used to represent most of the
configuration files in J2EE application.
•Example:
<name>
<first>John</first>
<last>Doe</last>
TCS Internal September 3, 2009
Rules for well formed XML
documents
•Every start-tag must have matching end-tag
or be a self-closing tag.
– Examples: Start and End
Tag
•<name>John</name>
•<document-end /> Self closing
Tag

•Tags can’t overlap; elements must be


properly nested.
•XML documents can have only one root
element.
•Element names must obey XML naming
conventions
•XML is case sensitive
TCS Internal September 3, 2009
Attributes in Tags
•An XML tag can include attributes also.
•Attributes are simple name/value pairs
associated with a tag.
•Attributes are used toAttribute
give additional
information for a tag.
•Example:
<name nickname=“PM”>
<first>Manmohan</first>
<last>Singh</last>
</name>

TCS Internal September 3, 2009


XML Declaration
•The typical xml declaration looks like:
– <?xml version=‘1.0’ encoding=‘UTF-16’,
standalone=‘yes’?>
•XML declaration is used to label documents
as being XML and give some additional
information for parsers.
•XML parsers are programs which parse and
extract information from the XML document.
•JAX API, which is part of J2EE is used for XML
parsing.

TCS Internal September 3, 2009


Special Characters
•The characters which are part of XML document like ‘<‘ and
‘>’ can not be used as data.
•Within document these characters are represented in a special
way.
– &amp; the & character
– &lt; the < character
– &gt; the > character
– &apos; the ‘ character
– &quot; the “ character
•Example:
– For representing <cartoon>tom & jerry</cartoon> we use:
<cartoon> tom &amp; jerry</cartoon>

TCS Internal September 3, 2009


Special Characters
•If the special characters are occurring more
frequently, then they can be embedded
within CDATA .tag.
•Example:
<script language=“JavaScript”>
<![CDATA[
function myFunction(){
if( 0 < 1 && 1 < 2)
alert(“hello”);
}
]]></script>

TCS Internal September 3, 2009


XML Namespaces
•By using namespace mechanism, more than one
person can use the same tag.
•In the following example, the tags person and name
are used by two different XML document.
•These tags are differentiated by namespaces
mypers and yourpers
•Example:
<mypers:person>
<mypers:name>
</mypers:name>
</mypers:person>
<yourpers:person>
<yourpers:name></yourpers:name>
</yourper:person>

TCS Internal September 3, 2009


Document Type Definitions (DTD)
•A document type definition allows the
developer to create a set of rules to specify
legal contents and place restrictions on a
XML file.
•If the XML document does not follow the
rules, then a XML parser generate errors.
•An XML document which conforms to the
DTD is said to be a valid XML doucument.

TCS Internal September 3, 2009


DTD Advantages
•A single DTD ensures a common format for
each XML document that references to it.
•An application can use DTD to validate the
data (XML document) it received from
outside.
•DTD helps in interoperability of XML data
between various application.

TCS Internal September 3, 2009


Anatomy of DTD
•The DTD definition will have:
– Element declarations
•Used to define tags
– Attribute declarations
•Used to define attributes for a tag
– Notation declarations
•Used to associate with external
resources.
– Entity declarations
•Used to represent replacement texts.
TCS Internal September 3, 2009
Element Declarations
•Element declarations consists of three parts:
– The ELEMENT declaration
– The element name
– The element content model.

•Example:
– <!ELEMENT dinosaurs(carnivore, herbivore,
omnivore)>
•The content may be:
– Empty
– Element
– Mixed
– Any
TCS Internal September 3, 2009
Examples of Element Declarations:
Example 1: Elements with empty declartion
Declaration:
<!ELEMENT Bool (EMPTY)> <!--DTD declaration of empty element-->
Usage:
<Bool Value="True"></Bool> <!--Usage with attribute in XML file-->

Example 2: Elements with Data


Declaration:
<!ELEMENT Month (#PCDATA)> <!--DTD declaration of an element->
Usage:
<Month>April</Month> <!—Valid usage within XML file-->
<Month>This is a month</Month> <!—Valid usage within XML file-->
<Month> <!—Invalid usage within XML file, can’t have children!-->
<January>Jan</January>
<March>March</March>
</Month>

TCS Internal September 3, 2009


Examples of Element Declarations
• Example 3: Elements with Children
To specify that an element must have a single child element, include the element name
within the parenthesis.
<!ELEMENT House (Address)> <!—A house has a single address-->
<House> <!—Valid usage within XML file-->
<Address>1345 Preston Ave Charlottesville Va 22903</Address>
</House>
An element can have multiple children. A DTD describes multiple children using a
sequence, or a list of elements separated by commas. The XML file must contain
one of each element in the specified order.
<!--DTD declaration of an element-->
<!ELEMENT address (person,street,city, zip)>
<!ELEMENT person (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!—Valid usage within XML file-->
<address>
<person>John Doe</person>
<street>1234 Preston Ave.</street>
<city>Charlottesville, Va</city>
<zip>22903</zip>
</address>

TCS Internal September 3, 2009


Attribute Declarations
•Used to declare a list of allowable attributes
for a given element Attribute
Type
•Example:
– <!ATTLIST dinosaurs source CDATA >

Element Attribute
Name

TCS Internal September 3, 2009


Notation Declarations
•Used to associate external resources.
•Example:
– <!NOTATION jpg SYSTEM “iexplore.exe”>

TCS Internal September 3, 2009


Entity declarations
•Entities are used to refer sections for
replacement text, other XML markup, and
even other external files.
•Example:
– <!ENTITY asap “as soon as possible”>
this
This should be
replaced by

When ever the parser finds ‘asap’ within the document, it will be automatically
replaced into ‘as soon as possible’.

TCS Internal September 3, 2009


Document Object Model (DOM)
•DOM is an interface for programmers to
create XML documents, to navigate through
them, and add, modify or delete parts of
those XML documents.
•DOM provides logical view on the in-memory
structure that represents an XML document
in an hierarchical structure consisting of
nodes.
•Node is the primary object with a set of
properties and methods which programmer
use to manipulate a node in the XML
document.
TCS Internal September 3, 2009
Reference:
•Beginning XML, 3rd Edition, David Hunter et.
al., Wrox Publication, 2005.

TCS Internal September 3, 2009

You might also like