Introduction to XML
Fundamental Principles
The following are the four fundamental characteristics of XML (Usdin & Graham, 1998):
- Separation of Content from Format: What a piece of information is should be managed separately from how the information is presented. Information should be identifiable by its appearance, its use in a particular application, its role in the document in which it is contained, and its nature. For example, "knowing that a phrase is in italic is useful; knowing that it is the title of a subsection of a paper is more useful; and knowing that it is a genus and species name is potentially more useful still."
- Hierarchical Data Structures: In XML, the data is assumed to be hierarchically organized, that is, a piece of information may contain other pieces of information and may be contained by yet another piece of information. Textual documents often exemplify this type of structure. For example, a book contains several chapters, each of which contains sections. Each section may have a heading, paragraphs and subsections, which also contain a heading and paragraphs.
- Embedded Tags: The data marked up with XML contains tags, words or phrases enclosed in point brackets, which identify where the data structures begin and end. These tags can also have attributes, which provide information about the data enclosed by the tags. Example: < tag attribute="value"> content
- User-Definable Structures: XML is a tool, and it defines a method of customized tag creation. "XML assumes that users will create new tags as they create and work with documents, and that software such as browsers will have to display or process the content of these novel tags." As such, XML provides flexibility and extensibility by not providing a standard tag set like HTML.
Goals and Standards
The WC3 came up with the XML specification, and further outlines the following list of goals for XML in the Extensible Markup Language 1.0
- It shall be straightforward to use XML over the Internet. Users must be able to view XML documents as quickly and easily as HTML documents. In practice, this will only be possible when XML browsers are as robust and widely available as HTML browsers, but the principle remains.
- XML shall support a wide variety of applications. XML should be beneficial to a wide variety of diverse applications: authoring, browsing, content analysis, etc. Although the initial focus is on serving structured documents over the web, it is not meant to narrowly define XML.
- XML shall be compatible with SGML. Most of the people involved in the XML effort come from organizations that have a large, in some cases staggering, amount of material in SGML. XML was designed pragmatically, to be compatible with existing standards while solving the relatively new problem of sending richly structured documents over the web.
- It shall be easy to write programs that process XML documents. The colloquial way of expressing this goal while the spec was being developed was that it ought to take about two weeks for a competent computer science graduate student to build a program that can process XML documents.
- The number of optional features in XML is to be kept to an absolute minimum, ideally zero. Optional features inevitably raise compatibility problems when users want to share documents and sometimes lead to confusion and frustration.
- XML documents should be human-legible and reasonably clear. If you don't have an XML browser and you've received a hunk of XML from somewhere, you ought to be able to look at it in your favorite text editor and actually figure out what the content means.
- The XML design should be prepared quickly. Standards efforts are notoriously slow. XML was needed immediately and was developed as quickly as possible.
- The design of XML shall be formal and concise. In many ways a corollary to rule 4, it essentially means that XML must be expressed in EBNF and must be amenable to modern compiler tools and techniques.
There are a number of technical reasons why the SGML grammar cannot be expressed in EBNF. Writing a proper SGML parser requires handling a variety of rarely used and difficult to parse language features. XML does not.
- XML documents shall be easy to create. Although there will eventually be sophisticated editors to create and edit XML content, they won't appear immediately. In the interim, it must be possible to create XML documents in other ways: directly in a text editor, with simple shell and Perl scripts, etc.
- Terseness in XML markup is of minimal importance. Several SGML language features were designed to minimize the amount of typing required to manually key in SGML documents. These features are not supported in XML. From an abstract point of view, these documents are indistinguishable from their more fully specified forms, but supporting these features adds a considerable burden to the SGML parser (or the person writing it, anyway). In addition, most modern editors offer better facilities to define shortcuts when entering text.
References
Usdin, Tommie & Tony Graham. (1998). "XML: Not a Silver Bullet, But a Great Pipe Wrench." StandardView 6, 3. 125-132.
Home | Introduction | Components | XML Editors | Applications
|