Reader Level : Basic
Recently i have been involved in a project that uses heavy XML which game me opportunity to look into many Java and XML related technologies/libraries/parsers. I tried to share some of interesting libraries that i dealt with. Interestingly, i have seen very few developers knew what each term (like “Reader”, “Parser”, “Builder” and “Factories”) means in the XML world. The idea of this article is to introduce basic terms and some resources to start in depth dissection.
XML Parser Technology / Types :
Many refer to “XML parsers” as “XML APIs”. Whatever you call it, in the end every one wants to read, process and build xml in some way or the other. Though its quite possible to consider XML file as sequence of characters and write custom parsers, thats not the recommended way if one need to do their job in a “easy” manner. In the XML world we often fined two widely used parsers; SAX (Simple API for XML) and DOM (Document Object Model). I am limiting the discussion only to the SAX and DOM.
SAX : sax is a event-based parsing mechanism. As the “SAX Parser” parses the XML input streams, events like startDocument, endDocument, startElement, endElement, ect… are encountered and the client program gets the call backs. As this parser type does not load the xml document in to the memory, its relatively low on resources. Sax is a READ-ONLY api (i.e. One can not change any content of the XML File). Client is able to traverse the document in a sequential manner. The new SAX2 specification incorporates name spaces, filter chains, and querying. Some time they are also refered to as push-parsers, as parser pushes recognized tokens to the client.
DOM : DOM is a comprehensive API for XML documents. It lets clients to navigate, retrieve, add, modify or delete the contents from the source XML. As opposed to SAX, DOM stores the entire content of XML file in the memory. As one can imagine that storing the XML document would require some sort of object representation for Nodes, Elements, Attributes, ProcessignInstructions, Comments and Text types, its relatively heavy on the memory. The memory consumption size is normally viewed as 5x the XML size. DOM enables clients to access data randomly from the in-memory document. Before we go any further, its important to understand that the current discussion is limited to Java technology. So, lets see a little about the most-frequently-used package from the SDK.
The Java API for XML Processing (JAXP) enables applications to parse, transform, validate and query XML documents using an API that is independent of a particular XML processor implementation. JAXP provides a pluggability layer to enable vendors to provide their own implementations without introducing dependencies in application code. Using this software, application and tool developers can build fully-functional XML-enabled Java applications for e-commerce, application integration, and web publishing.
JAXP is a standard component in the Java platform. An implementation of the JAXP 1.3 is included in J2SE 5.0 and an implementation of JAXP 1.4 is in Java SE 6.0. JAXP 1.4 is a maintenance release of JAXP 1.3 with support for the Streaming API for XML (StAX). JAXP 1.3 contained five JAR files which were jaxp-api.jar, sax.jar, dom,jar, xercesImpl.jar, and xalan.jar. The packaging reflected the technologies covered, as well as sources used in JAXP 1.3. In JAXP 1.4, these technologies and the newly added StAX package have been tightly integrated into the JAXP RI
Parser Implementations :
Xerces-J : The Xerces Java Parser 1.4.4 supports the XML 1.0 recommendation and contains advanced parser functionality, such as support for the W3C’s XML Schema recommendation version 1.0, DOM Level 2 version 1.0, and SAX Version 2, in addition to supporting the industry-standard DOM Level 1 and SAX version 1 APIs. This release includes full support for the W3C XML Schema Recommendation, except for limitations as described on their website.
In order to take advantage of the fact that this parser is very often used in conjunction with other XML technologies, such as XSLT processors, which also rely on standard API’s like DOM and SAX, xerces.jar was split into two jarfiles:
- xml-apis.jar contains the DOM level 3, SAX 2.0.2 and the JAXP 1.3 APIs;
- xercesImpl.jar contains the implementation of these API’s as well as the XNI API.
XPath Implementations :
Jaxen : Jaxen is an open source XPath library written in Java. It is adaptable to many different object models, including DOM, XOM, dom4j, and JDOM. Is it also possible to write adapters that treat non-XML trees such as compiled Java byte code or Java beans as XML, thus enabling you to query these trees with XPath too.
Saxon : Saxon is a full featured library for the XSLT 2.0, XQuery 1.0, and XPath 2.0 Recommendations. Saxon comes in two packages: Saxon-B implements the “basic” conformance level for XSLT 2.0 and XQuery, while Saxon-SA is a schema-aware XSLT and XQuery processor. Both packages are available on both platforms (Java and .NET). Saxon-B is an open source product available from this site; Saxon-SA is a commercial product available from Saxonica Limited. A free 30-day evaluation license is available.
Xalan : Xalan-Java fully implements XSL Transformations (XSLT) Version 1.0 and the XML Path Language (XPath) Version 1.0. XSLT is the first part of the XSL stylesheet language for XML. It includes the XSL Transformation vocabulary and XPath, a language for addressing parts of XML documents. Implements Java API for XML Processing (JAXP) 1.3, and builds on SAX 2 and DOM level 3.Implements the XPath API in JAXP 1.3.May be configured to work with any XML parser, such as Xerces-Java, that implements JAXP 1.3.
Java XML Document Builders :
Do NOT confuse Builders with parsers. Builders basically uses the default/underlaying parsers, gets the org.w3c.Document and converts them to specific Document type (e.g. org.dom4j.Document or org.jdom.Document). DOM4J seems to be quite advanced in terms of the functionality for a Java developer. The JDOM API seems to be quite simple for the implementation.
JDOM : JDOM is, quite simply, a Java representation of an XML document. JDOM provides a way to represent that document for easy and efficient reading, manipulation, and writing. It has a straightforward API, is a lightweight and fast, and is optimized for the Java programmer. It’s an alternative to DOM and SAX, although it integrates well with both DOM and SAX. Most importantly it uses Java Collection API. I hope its easy for a java programmer 🙂 .
As i understand JDOM relies on the Jaxen as the default XPath library. But we can also use any xpath lilbrary of our choice like xalan.
DOM4J : dom4j is an easy to use, open source library for working with XML, XPath and XSLT on the Java platform using the Java Collections Framework and with full support for DOM, SAX and JAXP.