Saturday, December 16, 2006

Sorting XML in Java

Recently I had a requirement to sort an XML document based on the tag names in the document.
You can sort it using XSLT, but this post tells you how to sort the XML nodes through Java.

Lets extend the com.sun.org.apache.xerces.internal.util.DOMUtil or org.apache.xerces.internal.util.DOMUtil class which has some basic utility methods. And I'm going to extend it by adding a method called sortChildNodes() .

This method sorts the children of the given node in descending or ascending order with the given Comparator. And it recurses upto the specified depth if available.



1 package com.googlepages.aanand.dom;
2
3 import java.util.ArrayList;
4 import java.util.Collections;
5 import java.util.Comparator;
6 import java.util.Iterator;
7 import java.util.List;
8
9 import org.w3c.dom.Node;
10 import org.w3c.dom.NodeList;
11 import org.w3c.dom.Text;
12
13 import com.sun.org.apache.xerces.internal.util.DOMUtil;
14
15 public class DOMUtilExt extends DOMUtil {
16
17 /**
18 * Sorts the children of the given node upto the specified depth if
19 * available
20 *
21 * @param node -
22 * node whose children will be sorted
23 * @param descending -
24 * true for sorting in descending order
25 * @param depth -
26 * depth upto which to sort in DOM
27 * @param comparator -
28 * comparator used to sort, if null a default NodeName
29 * comparator is used.
30 */
31 public static void sortChildNodes(Node node, boolean descending,
32 int depth,Comparator comparator) {
33
34 List nodes = new ArrayList();
35 NodeList childNodeList = node.getChildNodes();
36 if (depth > 0 && childNodeList.getLength() > 0) {
37 for (int i = 0; i < childNodeList.getLength(); i++) {
38 Node tNode = childNodeList.item(i);
39 sortChildNodes(tNode, descending, depth - 1,
40 comparator);
// Remove empty text nodes
41 if ((!(tNode instanceof Text))
42 || (tNode instanceof Text && ((Text) tNode)
43 .getTextContent().trim().length() > 1))
44 {
nodes.add(tNode);
45 }
46 }
47 Comparator comp = (comparator != null) ? comparator
48 : new DefaultNodeNameComparator();
49 if (descending)
50 {
51 //if descending is true, get the reverse ordered comparator
52 Collections.sort(nodes, Collections.reverseOrder(comp));
53 } else {
54 Collections.sort(nodes, comp);
55 }
56
57 for (Iterator iter = nodes.iterator(); iter.hasNext();) {
58 Node element = (Node) iter.next();
59 node.appendChild(element);
60 }
61 }
62
63 }
64
65 }
66
67 class DefaultNodeNameComparator implements Comparator {
68
69 public int compare(Object arg0, Object arg1) {
70 return ((Node) arg0).getNodeName().compareTo(
71 ((Node) arg1).getNodeName());
72 }
73
74 }


And I'm also removing the empty text nodes. If descending is set true, then a reverse ordering comparator is obtained from the Collections utility class.

The utility uses a default NodeName comparator if a comparator is not specified. Its sorts based on the name of the nodes in the DOM.

Writing a Comparator implementation is very simple, for example you may want to sort a document based on an attribute in the XML document.



class MyComparator3 implements Comparator {

public int compare(Object arg0, Object arg1) {

if (arg0 instanceof Element && arg1 instanceof Element) {
return ((Element) arg0).getAttribute("id").compareTo(
((Element) arg1).getAttribute("id"));
} else {
return ((Node) arg0).getNodeName().compareTo(
((Node) arg1).getNodeName());
}

}

}



Its a very simple class to sort the nodes in any way you want. Please comment on it, if you point out a problem with the utility.

13 comments:

Bala said...

hi Aanand,

this piece of code is quite useful.
One minor doubt ... in line num 59, u do a appendchild to the passed nodes.

Wont this generate a duplicate childnodes? Did I miss a nodes.clear or something before?

Thanks
Balaji

Indian Lycan said...

hi Bala,

ur doubt is very valid. But the java DOM api says something like this: If the node being added is already present, then it is first removed...
reference: http://java.sun.com/j2se/1.4.2/docs/api/org/w3c/dom/Node.html#appendChild(org.w3c.dom.Node)

Thanks
Aanand

Anonymous said...

Nice, just what I needed, thanks!

Archana said...

I am able to sort based on attribute names using xsl:sort. Will i be able to sort based on attribute name in java as well?

Archana said...

And Aanand thanks for your piece of code..Its beautiful and very useful!

Indian Lycan said...

Thanks Archana. And yes you can sort based on attribute by writing a custom comparator.

Naveen said...

It works great thanks

I got below exception


Exception in thread "main" org.w3c.dom.DOMException: HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted.
at org.apache.xerces.dom.CoreDocumentImpl.insertBefore(Unknown Source)
at org.apache.xerces.dom.NodeImpl.appendChild(Unknown Source)
at test.DOMUtilExt.sortChildNodes(DOMUtilExt.java:55)
at test.DOMUtilExt.main(DOMUtilExt.java:66)


But

node.removeChild(element); // Put this line
node.appendChild(element);

solved the problem

Mike Piye said...

Hi,
I have been using the example to fulfill one of my need to sort the xml file. It is really great to find the complete help as a guide for my development. I would like to ask you to please help to sort the file with node name and not the attribute as you have explained using the attribute in the node.

Thanks and best regards,
Murtaza

Terkhen said...

Thank you for your example! :)

$ure$h @votla said...

Thank you very much Anand. it is very helpful.

Suresh

Mamta said...

Hi Anand

My requirement is to sort the xml based on datetime attribute.how to achieve that.Any help on this.

Unknown said...

There's noticeably a bundle to learn about this. I assume you made certain nice points in options also. mgm online casino

jorgeefrrr828 said...

This really answered my drawback, thank you! real money casino