You are on page 1of 15

HTML, XML, PDF

Pros and Cons

HTML : Pros
Simplicity and Open standard HTML is easy to learn because it is very simple. There are only a couple dozen tags, but less than half of them are used in most situations. HTML browsers are cheap or free, and very powerful; with a combination of third-party add-ins and server-side content support HTML document browser interfaces are easy to build into existing products because of the simplicity of HTML. It has become very evident to users that the hypertext link really does work across systems that are otherwise unrelated. Any page can link to any other publicly accessible page simply by entering the address. There are some specialized structures in HTML, but they are mostly used to effect a certain formatting look.

HTML: Cons
HTML is a very weak formatting tool that lacks even the most fundamental page-oriented formatting capabilities, like hanging indents, white-space control, justification, kerning, and hyphenation lead to highly variable coding for even simple designs HTML provides linking capabilities, but the linking is rudimentary; it is only a one-to-one link, and requires an anchor on the target end in order to access anything within the document. Issue of stability and versioning. Browser manufacturers have created non-standard extensions to the "standard" HTML tags, like the "blink" and "center" tags - lead to viewing problems

HTML: more cons


One tag set for all - not extensible Limited, predefined data structures No formal validation Trades power for ease of use Good for simple applications only Handcrafted - links, navigation, indexing Concentrates on form, not substance

PDF: Pros
PDF provides electronic pages with impressive page fidelity. Type, graphics, and color are all reproduced as they are on paper. Solves file sharing problem between platforms Hot links and other electronic object types, like movies and sounds, can be added to a PDF file. New features are being added constantly by Adobe Rights management tools and security features build-in

PDF: Pros
PDF files are cheap to create, and are used by many companies to deliver page-formatted information without the high cost of postage. Since the end user gets something that looks very much like paper, training costs are low

PDF: Cons
Proprietary and not open to outside development Large file size and long to download PDF files are not nearly as flexible as other electronic formats because the main goal is to recreate a paper page, and not to provide a way of delivering intelligent document structure to a user There is limited support for searching, although Adobe has products that can index many different PDF files for cross-document searching and navigating

What about SGML?


Hard to learn Costly to implement Not web friendly in full form Style support poor Hard to get a fast start Tooling up very expensive Good linking (HyTime), hard to implement

XML
A structured markup meta-language A sub-set of SGML designed for the Web Designed to work with companion standards for linking for styling Web friendly: the next step in Web evolution Overcomes limitations of HTML and CSS Enabler for new Web applications

XML and the Web


is Extensible is Quicker, easier and cheaper to implement preserves the structure of data supports complex nesting of structures has strong linking types - URLs and more provides comprehensive style features CSS XSL

XML and theWeb


Is a standard approved by the W3C XLink, XPointer, XSL also standards Incorporates best features of DSSSL, CSS, and HyTime & TEI Uses UNICODE for universality Works with Java and JavaScript

Does your project intend to use XML for data storage or for data exchange or both? What types or classes of data are to be stored or shared using XML? Do standard XML DTDs or XML Schemas for the description of this data exist? How will you create your XML documents? Will they be authored with a suitable editor or generated by software tools?

XML Application Models

XML Application

XML Questions to consider


Are you exchanging data or metadata through specific agreements with a defined number of partners? Are you supplying metadata to specific services? Do those services specify requirements for the syntax, structure and semantics of the data they accept? Do those services specify conformance to XML DTDs or XML Schemas which they provide?

Or are you intending to make data or metadata available in a more "open" environment, with the expectation that it may be used by a potentially unlimited number of services?

XML

With whom do you need to share your data or metadata? Is it appropriate to make that metadata available through OAI?

XML example
PubMed DTD
http://www.ncbi.nlm.nih.gov:80/entrez/query/static/publisher.htm

You might also like