Java SAX Parser Example

SAX Parser in java provides API to parse XML documents. SAX parser is different from DOM parser because it doesn’t load complete XML into memory and read xml document sequentially.

SAX Parser

javax.xml.parsers.SAXParser provides method to parse XML document using event handlers. This class implements XMLReader interface and provides overloaded versions of parse() methods to read XML document from File, InputStream, SAX InputSource and String URI.

The actual parsing is done by the Handler class. We need to create our own handler class to parse the XML document. We need to implement org.xml.sax.ContentHandler interface to create our own handler classes. This interface contains callback methods that receive notification when an event occurs. For example StartDocument, EndDocument, StartElement, EndElement, CharacterData etc.

org.xml.sax.helpers.DefaultHandler provides default implementation of ContentHandler interface and we can extend this class to create our own handler. It’s advisable to extend this class because we might need only a few of the methods to implement. Extending this class will keep our code cleaner and maintainable.

SAX parser Example

Let’s jump to the SAX parser example program now, I will explain different features in detail later on.

employees.xml

So we have a XML file stored somewhere in file system and by looking at it, we can conclude that it contains list of Employee. Every Employee has id attribute and fields age, name, gender and role.

We will use SAX parser to parse this XML and create a list of Employee object.

Here is the Employee object representing Employee element from XML.

Let’s create our own SAX Parser Handler class extending DefaultHandler class.

MyHandler contains the list of the Employee object as a field with a getter method only. The Employee objects are getting added in the event handler methods. Also, we have an Employee field that will be used to create an Employee object and once all the fields are set, add it to the employee list.

SAX parser methods to override

The important methods to override are startElement(), endElement() and characters().

SAXParser starts parsing the document, when any start element is found, startElement() method is called. We are overriding this method to set boolean variables that will be used to identify the element.

We are also using this method to create a new Employee object every time Employee start element is found. Check how id attribute is read here to set the Employee Object id field.

characters() method is called when character data is found by SAXParser inside an element. Note that SAX parser may divide the data into multiple chunks and call characters() method multiple times (Read ContentHandler class characters() method documentation). That’s why we are using StringBuilder to keep this data using append() method.

The endElement() is the place where we use the StringBuilder data to set employee object properties and add Employee object to the list whenever we found Employee end element tag.

Below is the test program that uses MyHandler to parse above XML to list of Employee objects.

Here is the output of the above program.

SAXParserFactory provides factory methods to get the SAXParser instance. We are passing File object to the parse method along with MyHandler instance to handle the callback events.

SAXParser is a little bit confusing in the start but if you are working on a large XML document, it provides a more efficient way to read XML than DOM Parser. That’s all for SAX Parser in Java.

Reference: SAXParser, DefaultHandler

By admin

Leave a Reply