Learning XML

While SGML has a long history of use in specific user communities (Humanities, Government documentation), its use as XML "On The Web" is still relatively rare. Support for integrated authoring and browsing ala HTML is simply not there yet. IE 4.0 does contain hooks for XML viewing using plug-ins. Both Netscape and Microsoft plan much greater support in later browser versions. For now though, if you want to write and view XML documents, the available solutions might best be described as "clunky". In this lab, you'll get to try one such solution and experience that 'clunkiness' first hand.

Assumptions

I am assuming that you all know HTML and are literate users of Windows machines and associated applications. In particular, I expect that you know how to use Notepad/Wordpad, Netscape/IE, and perform standard desktop manipulations like copying files and folders.

Goals

The goals of this exercise are for you to

Preliminaries

We must do some set-up first. The course folder is called XML on the G drive of your machine (G:\XML). Your home folder will be a folder called "XML" on the C Drive inside the TEMP folder (C:\TEMP\XML).

C:

cd \TEMP\XML\examples

DIR

Using an XML parser

The XML parser we will use today is called Lark. Recall that an XML parser will read XML source code and decide whether or not it is well-formed. Validating parsers also compare the structure of a given XML file to see if it is valid with respect to a particular DTD. By itself, Lark is a non-validating parser -- it will only tell you whether the XML is well-formed or not.

Checking a well-formed, existing file: course.xml

You have copied over several existing XML files in the examples folder. Take one of these, "course.xml", and use Lark to check the XML for "well-formed-ness".

You have to do this in the MS-DOS window as follows:

jview G:\xml\lark\driver course.xml

If you really want to know what's going on with the above command, ask the instructor. Anyway, since this example is already "well-formed", you should get output something like the following:

Hello Tim

Lark V1.0 final beta Copyright (c) 1997-98 Tim Bray.

All rights reserved; the right to use these class files for any purpose

is hereby granted to everyone.

Parsing...

Done.

Translation: Lark says the file parses cleanly - no error messages are given - its well-formed!

Checking a "messed-up" file, cd.xml

Now check the file cd.xml for well-formedness - you should get some error messages and output something like:

Hello Tim

Lark V1.0 final beta Copyright (c) 1997-98 Tim Bray.

All rights reserved; the right to use these class files for any purpose

is hereby granted to everyone.

Parsing...

Lark:/export/home/viles/xml/cd.xml:4:12:E:Fatal: Encountered </para> expected </em>

...assumed </em>

Lark:/export/home/viles/xml/cd.xml:19:11:E:Fatal: Encountered </document> expected </para>

...assumed </para>

...assumed </em>

Done.

Lark has found at least two errors, though there may be more.

Recall the "rules" of well-formedness. Minimally, your XML markup should

Generating Displayable Content from XML

Well, it ain't easy, because the tool support is not there yet. Conceptually, we want to take the structural markup in the XML code, combine it with formatting instructions in a "style sheet" to produce a displayable product. Ideally, the web browser would handle this transparently, but right now there is little support for viewing XML in browsers.

Of course rendering the XML in a prettified manner is one of many things that you might want to do with that data. The process we will use to get prettified XML is to take three items:

  1. The XML document,
  2. A supplied stylesheet (written in XSL - eXtensible Style Language), and
  3. A conversion program, msxsl,

and use these to generate an HTML file that is palatable to browsers. The conversion program, msxsl, takes the XML document and the stylesheet and produces the HTML file. For the course.xml file, you would do this as follows (as always, in the MS-DOS window)

G:\xml\msxsl\msxsl -i course.xml -s course.xsl -o course.html

where the options to the program specify the XML file, the XSL file, and the HTML file respectively. Ain't this clunky?

The choices for a style sheet syntax have still not been worked out completely by the market place or standards organizations. There is current support for "Cascading Style Sheets" in both web browsers, though Microsoft is pushing strongly for the adoption of Extensible Style Language (XSL) as a standard. Though agreement on the form and substance of XSL is far from reached, we will use XSL formatting rules in this lab because they fit well with our working tool set.

We have provided separate style sheets for each XML document here, though in practice it is likely that a single style sheet will be applied to many documents, not just a single one.

If all goes well, you should be able to load the resulting HTML file into your web browser for display.

Writing your own XML

Now you should be ready to write some XML from scratch - almost. If we were making pizza, then you know now how to order out. Now we'll get the Chef-Boy-ar-Dee package from Harris Teeter. The hard part, designing a DTD(making pizza dough from scratch) requires considerably more time than we have here.

Enough with that pizza metaphor. Now you can use the informal DTD we worked up in the in-class session to write your own XML document. The particular document of interest is a recent story about Microsoft from the Washington Post and its located at

C:\TEMP\XML\examples\microsoft.xml

Your task is to add well-formed XML markup to this file. Although the file has an XML extension, there is no markup in it. Use Notepad, Wordpad, Homesite, or some other text editor to add the markup. Remember the "well-formedness" rules, we talked about in class.

When you think you are done, use the Lark parser to check it.

jview G:\xml\lark\driver microsoft.xml

Once you have well-formed XML, generate the HTML using the supplied stylesheet found in microsoft.xsl. The command to do this looks something like

G:\xml\msxsl\msxsl -i microsoft.xml -s microsoft.xsl -o microsoft.html

If successful, the file microsoft.html will have been created. Go ahead and load this file in your web browser to see what the combination of your markup and the supplied style sheet has yielded.

If you are feeling lucky ...

Try altering any of the supplied style sheets in order to alter the appearance of the document. For example, consider making paragraphs in a large font and the title in a small font (just to be crazy eh?). Start with the cd.xsl stylesheet, as that one is the most straightforward. Note that all of the supplied stylesheets are very elementary. XSL is far more powerful and flexible than what you have seen here.

Further Reading

Books

The following books have been helpful in the preparation of this material.

  1. "XML : A Primer." by Simon St. Laurent, MIS Press, 1998. Available from Amazon Books.
  2. "The XML Handbook" by Charles F. Goldfarb, Paul Prescod, Prentice-Hall, 1998. Available from Amazon Books.
  3. "XML : Extensible Markup Language", by Elliotte Rusty Harold, IDG Books WorldWide, Inc., 1998. Available from Amazon Books

Online

This course's XML page: http://ils.unc.edu/viles/xml/

Microsoft's XML page. http://www.microsoft.com/xml/default.asp

World Wide Web Consortium's Web Page: http://www.w3.org/XML/

Junglee's XML Reference List: http://www.junglee.com/tech/xml_sparchive.html

Oasis.org's XML Resource, maintained by Robin Clover: http://www.oasis-open.org/cover/xml.html