NanoXML/Java 2.1

Chapter 2. Retrieving Data From An XML Datasource

This chapter shows how to retrieve XML data from a standard data source. Such source can be a file, an HTTP object or a text string. The method described in this chapter is the simplest way to retrieve XML data. More advanced ways are described in the next chapters.

2.1. A Very Simple Example
2.2. Analyzing The Data
2.3. Generating XML
2.4. Namespaces

2.1. A Very Simple Example

This section describes a very simple XML application. It parses XML data from a stream and dumps it "pretty-printed" to the standard output. While its use is very limited, it shows how to set up a parser and parse an XML document.

import net.n3.nanoxml.*;                              //1
import java.io.*;

public class DumpXML
{
    public static void main(String[] args)
        throws Exception
    {
        IXMLParser parser                             //2
            = XMLParserFactory
                .createDefaultXMLParser();
        IXMLReader reader                             //3
            = StdXMLReader.fileReader("test.xml");
        parser.setReader(reader);
        IXMLElement xml                               //4
            = (IXMLElement) parser.parse();
        XMLWriter writer                              //5
            = new XMLWriter(System.out);
        writer.write(xml);
    }
}

  1. The NanoXML classes are located in the package net.n3.nanoxml.
  2. This command creates an XML parser. The actual class of the parser is dependent on the value of the system property net.n3.nanoxml.XMLParser, which is by default net.n3.nanoxml.StdXMLParser.
  3. The command creates a "standard" reader which reads its data from the file called test.xml. Usually you can use StdXMLReader to feed the XML data to the parser. The default reader is able to set up HTTP connections when retrieving DTDs or entities from different machines. If necessary, you can supply your own reader to e.g. provide support for PUBLIC identifiers.
  4. The XML parser now parses the data read from test.xml and creates a tree of parsed XML elements. The structure of those elements will be described in the next section.
  5. An XMLWriter can be used to dump a "pretty-printed" view of the parsed XML data on an output stream. In this case, we dump the read data to the standard output (System.out).

2.2. Analyzing The Data

You can easily traverse the logical tree generated by the parser. If you need to create your own object tree, you can create your custom builder, which is described in chapter 3.

The default XML builder, StdXMLBuilder generates a tree of IXMLElement objects. Every such object has a name and can have attributes, #PCDATA content and child objects.

The following XML data:

<FOO attr1="fred" attr2="barney">
    <BAR a1="flintstone" a2="rubble">
        Some data.
    </BAR>
    <QUUX/>
</FOO>

is parsed to the following objects:

Element FOO:
    Attributes = { "attr1"="fred", "attr2"="barney" }
    Children = { BAR, QUUX }
    PCData = null

Element BAR:
    Attributes = { "a1"="flintstone", "a2"="rubble" }
    Children = {}
    PCData = "Some data."

Element QUUX:
    Attributes = {}
    Children = {}
    PCData = null

You can retrieve the name of an element using the method getFullName, thus:

FOO.getFullName() ==> "FOO"

You can enumerate the attribute keys using the method enumerateAttributeNames:

Enumeration enum = FOO.enumerateAttributeNames();
while (enum.hasMoreElements()) {
    System.out.print(enum.nextElement());
    System.out.print(' ');
}
    ==>attr1 attr2

You can retrieve the value of an attribute using getAttribute:

FOO.getAttribute("attr1", null) ==> "fred"

The child elements can be enumerated using the method enumerateChildren:

Enumeration enum = FOO.enumerateChildren();
while (enum.hasMoreElements()) {
    System.out.print(enum.nextElement() + ' ');
}
    ==> BAR QUUX

If the element contains parsed character data (#PCDATA) as its only child. You can retrieve that data using getContent:

BAR.getContent() ==> "Some data."

If an element contains both #PCDATA and XML elements as its children, the character data segments will be put in untitled XML elements (whose name is null).

IXMLElement contains many convenience methods for retrieving data and traversing the XML tree.

2.3. Generating XML

You can very easily create a tree of XML elements or modify an existing one.

To create a new tree, just create an IXMLElement object:

IXMLElement elt = new XMLElement("ElementName");

You can add an attribute to the element by calling setAttribute.

elt.setAttribute("key", "value");

You can add a child element to an element by calling addChild:

IXMLElement child = elt.createElement("Child");
elt.addChild(child);

Note that the child element is created calling the method createElement. This insures that the child instance is compatible with its new parent.

If an element has no children, you can add #PCDATA content to it using setContent:

child.setContent("Some content");

If the element does have children, you can add #PCDATA content to it by adding an untitled element, which you create by calling createPCDataElement:

IXMLElement pcdata = elt.createPCDataElement();
pcdata.setContent("Blah blah");
elt.addChild(pcdata);

When you have created or edited the XML element tree, you can write it out to an output stream or writer using an XMLWriter:

java.io.Writer output = ...;
IXMLElement xmltree = ...;
XMLWriter xmlwriter = new XMLWriter(output);
writer.write(xmltree);

2.4. Namespaces

As of version 2.1, NanoXML has support for namespaces. Namespaces allow you to attach a URI to the name of an element name or an attribute. This URI allows you to make a distinction between similary named entities coming from different sources. More information about namespaces can be found in the XML Namespaces recommendation.

Please note that a DTD has no support for namespaces. It is import to understand that an XML document can have only one DTD. Though the namespace URI is often presented as a URL, that URL is not a system ID for a DTD. The only function of a namespace URI is to provide a globally unique name.

As an example, let's have the following XML data:

<doc:book xmlns:doc="http://nanoxml.n3.net/book">
    <chapter xmlns="http://nanoxml.n3.net/chapter"
             title="Introduction" doc:id="chapter1"/>
</doc:book>

The doc:book top-level element uses the namespace "http://nanoxml.n3.net/book". The prefix is used as an alias for the namespace, which is defined in the attribute xmlns:doc. This prefix is defined for the doc:book element and its child elements.

The chapter element uses the namespace "http://nanoxml.n3.net/chapter". Because the namespace URI has been defined as the value of the xmlns attribute, the namespace is the default namespace for the chapter element. Default namespaces are inherited by the child elements, but only for their names. Attributes never have a default namespace.

The chapter element has an attribute doc:id, which is defined in the same namespace as doc:book because of the doc prefix.

NanoXML 2.1 offers some variants on the standard retrieval methods to allow the application to access the namespace information.

In the following examples, we assume the variable book to contain the doc:book element and the variable chapter to contain the chapter element.

To get the full name, which includes the namespace prefix, of the element, use getFullName:

book.getFullName() ==> "doc:book"
chapter.getFullName() ==> "chapter"

To get the short name, which excludes the namespace prefix, of the element, use getName:

book.getName() ==> "book"
chapter.getName ==> "chapter"

For elements that have no associated namespace, getName and getFullName are equivalent.

To get the namespace URI associated with the name of the element, use getNamespace:

book.getNamespace() ==> "http://nanoxml.n3.net/book"
chapter.getNamespace() ==> "http://nanoxml.n3.net/chapter"

If no namespace is associated with the name of the element, this method returns null.

You can get an attribute of an element using either its full name (which includes its prefix) or its short name together with its namespace URI, so the following two instructions are equivalent:

chapter.getAttribute("doc:id", null)
chapter.getAttribute("id", "http://nanoxml.n3.net/book", null)

Note that the title attribute of chapter has no namespace, even though the chapter element name has a default namespace.

You can create a new element which uses a namespace this way:

book = new XMLElement("doc:book", "http://nanoxml.n3.net/book");
chapter = book.createElement("chapter", "http://nanoxml.n3.net/chapter");

You can add an attribute which uses a namespace this way:

chapter.setAttribute("doc:id", "http://nanoxml.n3.net/book", chapterId);

Copyright ©2001 Marc De Scheemaecker, All Rights Reserved.
Last update: September 25th, 2001.
Valid HTML 4.01! Valid CSS! SourceForge