Learning XML
While SGML has a long history of use in specific user communities (Humanities, Government documentation), its use as XML "On The Web" is still relatively rare. Support for integrated authoring and browsing ala HTML is simply not there yet. IE 4.0 does contain hooks for XML viewing using plug-ins. Both Netscape and Microsoft plan much greater support in later browser versions. For now though, if you want to write and view XML documents, the available solutions might best be described as "clunky". In this lab, you'll get to try one such solution and experience that 'clunkiness' first hand.
Assumptions
I am assuming that you all know HTML and are literate users of Windows machines and associated applications. In particular, I expect that you know how to use Notepad/Wordpad, Netscape/IE, and perform standard desktop manipulations like copying files and folders.
Goals
The goals of this exercise are for you to
Preliminaries
We must do some set-up first. The course folder is called XML on the G drive of your machine (G:\XML). Your home folder will be a folder called "XML" on the C Drive inside the TEMP folder (C:\TEMP\XML).
C:
cd \TEMP\XML\examples
DIR
Using an XML parser
The XML parser we will use today is called Lark. Recall that an XML parser will read XML source code and decide whether or not it is well-formed. Validating parsers also compare the structure of a given XML file to see if it is valid with respect to a particular DTD. By itself, Lark is a non-validating parser -- it will only tell you whether the XML is well-formed or not.
Checking a well-formed, existing file:
course.xmlYou have copied over several existing XML files in the
examples folder. Take one of these, "course.xml", and use Lark to check the XML for "well-formed-ness".You have to do this in the MS-DOS window as follows:
jview G:\xml\lark\driver course.xml
If you really want to know what's going on with the above command, ask the instructor. Anyway, since this example is already "well-formed", you should get output something like the following:
Hello Tim
Lark V1.0 final beta Copyright (c) 1997-98 Tim Bray.
All rights reserved; the right to use these class files for any purpose
is hereby granted to everyone.
Parsing...
Done.
Translation
: Lark says the file parses cleanly - no error messages are given - its well-formed!Checking a "messed-up" file,
cd.xmlNow check the file
cd.xml for well-formedness - you should get some error messages and output something like:Hello Tim
Lark V1.0 final beta Copyright (c) 1997-98 Tim Bray.
All rights reserved; the right to use these class files for any purpose
is hereby granted to everyone.
Parsing...
Lark:/export/home/viles/xml/cd.xml:4:12:E:Fatal: Encountered </para> expected </em>
...assumed </em>
Lark:/export/home/viles/xml/cd.xml:19:11:E:Fatal: Encountered </document> expected </para>
...assumed </para>
...assumed </em>
Done.
Lark has found at least two errors, though there may be more.
Recall the "rules" of well-formedness. Minimally, your XML markup should
Generating Displayable Content from XML
Well, it ain't easy, because the tool support is not there yet. Conceptually, we want to take the structural markup in the XML code, combine it with formatting instructions in a "style sheet" to produce a displayable product. Ideally, the web browser would handle this transparently, but right now there is little support for viewing XML in browsers.
Of course rendering the XML in a prettified manner is one of many things that you might want to do with that data. The process we will use to get prettified XML is to take three items:
and use these to generate an HTML file that is palatable to browsers. The conversion program,
msxsl, takes the XML document and the stylesheet and produces the HTML file. For the course.xml file, you would do this as follows (as always, in the MS-DOS window)G:\xml\msxsl\msxsl -i course.xml -s course.xsl -o course.html
where the options to the program specify the XML file, the XSL file, and the HTML file respectively. Ain't this clunky?
The choices for a style sheet syntax have still not been worked out completely by the market place or standards organizations. There is current support for "Cascading Style Sheets" in both web browsers, though Microsoft is pushing strongly for the adoption of Extensible Style Language (XSL) as a standard. Though agreement on the form and substance of XSL is far from reached, we will use XSL formatting rules in this lab because they fit well with our working tool set.
We have provided separate style sheets for each XML document here, though in practice it is likely that a single style sheet will be applied to many documents, not just a single one.
If all goes well, you should be able to load the resulting HTML file into your web browser for display.
Writing your own XML
Now you should be ready to write some XML from scratch - almost. If we were making pizza, then you know now how to order out. Now we'll get the Chef-Boy-ar-Dee package from Harris Teeter. The hard part, designing a DTD(making pizza dough from scratch) requires considerably more time than we have here.
Enough with that pizza metaphor. Now you can use the informal DTD we worked up in the in-class session to write your own XML document. The particular document of interest is a recent story about Microsoft from the Washington Post and its located at
C:\TEMP\XML\examples\microsoft.xml
Your task is to add well-formed XML markup to this file. Although the file has an XML extension, there is no markup in it. Use Notepad, Wordpad, Homesite, or some other text editor to add the markup. Remember the "well-formedness" rules, we talked about in class.
When you think you are done, use the Lark parser to check it.
jview G:\xml\lark\driver microsoft.xml
Once you have well-formed XML, generate the HTML using the supplied stylesheet found in
microsoft.xsl. The command to do this looks something likeG:\xml\msxsl\msxsl -i microsoft.xml -s microsoft.xsl -o microsoft.html
If successful, the file
microsoft.html will have been created. Go ahead and load this file in your web browser to see what the combination of your markup and the supplied style sheet has yielded.If you are feeling lucky ...
Try altering any of the supplied style sheets in order to alter the appearance of the document. For example, consider making paragraphs in a large font and the title in a small font (just to be crazy eh?). Start with the
cd.xsl stylesheet, as that one is the most straightforward. Note that all of the supplied stylesheets are very elementary. XSL is far more powerful and flexible than what you have seen here.Further Reading
Books
The following books have been helpful in the preparation of this material.
Online
This course's XML page:
http://ils.unc.edu/viles/xml/Microsoft's XML page.
http://www.microsoft.com/xml/default.aspWorld Wide Web Consortium's Web Page:
http://www.w3.org/XML/Junglee's XML Reference List:
http://www.junglee.com/tech/xml_sparchive.htmlOasis.org's XML Resource, maintained by Robin Clover:
http://www.oasis-open.org/cover/xml.html