Convert JSP pages to JSP documents (JSPX) with Jsp2x

Submitted by Hannes Schmidt on Thu, 01/17/2008 - 19:01.

Jsp2X is a command line utility for batch conversion of JSP pages to JSP documents, i.e. JSPs in well-formed XML syntax (aka JSPX, see chapter 5 of the JavaServer PagesTM 1.2 Specification and chapter 6 of the JavaServer PagesTM 2.0 Specification). It is written in Java and incorporates a parser derived from a combined JSP+XHTML grammar using the ANTLR parser generator. It tries very hard to create JSPX output that portable across engines. Jsp2X was designed to be used in an iterative fashion in which it alerts the user of potential problems in the input.


Version 1.2 of the JSP standard introduces the notion of JSP documents which are simply JSP files in well-formed XML syntax. Files in traditional JSP format, also known as JSP pages contain a more or less free-form tag soup for which parsers are difficult to write and which are therefore hard to digest in an automated manner. It took a long time until the various JSP engine vendors agreed on what was valid JSP and what wasn't. I usually prefer the Jetty servlet container for testing a web application during development because it starts up quickly which reduces the time it takes to switch between coding and testing an application. When I later deploy that application to Resin I am bewildered to see Resin reject the JSPs that worked flawlessly in Jetty. An upgrade to Resin 3.0.23 fixes many discrepancies but I still end up tweaking my JSP pages to make them work in both containers.

JSP documents are well-formed XML. XML has a strict and precise (albeit verbose) syntax. There are plenty of parsers and other tools available for XML. Making your JSP files XML-compliant therefore opens a world of possibilities for further processing. For example, I have haven't found a single JSP editor that correctly formats and highlights anything but the simplest pages. With JSP documents these problem have a trivial solution: use your favorite XML editor.

Another annoying trait of JSP pages is that the JSP engine preserves insignificant whitespace. A JSP parser only parses what looks like a JSP tag or a directive even if the text in between is well-formed XML. For that reason it can't detect and remove whitespace that would be considered insignificant by XML or HTML standards. This unnecessarily increases the size of the emitted HTML. The more JSP code is factored out into tag files or included JSP fragments, the more insignificant whitespace generated and sent to the browser. In JSP documents, on the other hand, it is very easy to detect and drop insignificant whitespace. In fact, if the JSP engine uses an XML parser to read the input, the parser will take care of whitespace on behalf of the engine. To give you a rough idea about the potential savings: after I converted all 70+ JSP pages and tag files of a well-factored 100k SLOC web application to JSP documents, the average size of the HTML output decreased by 50% to 75%!

Taking into account that the template text in most JSP pages is in fact XHTML or HTML the JSP committee realized that it isn't a very long road from a JSP page to a well-formed XML document. They only had to get rid of the leniency in the JSP parser and come up with alternatives for crazy constructs like <a href="<c:url …>"> . This thought process led to the definition of JSP documents in the JSP standard at time when millions of JSP pages had already been written an deployed. This is where Jsp2X comes in. It is a tool that assists in the conversion of JSP pages to JSP documents, a process that is generally straight-forward but tends to be tedious and has the potential to introduce subtle errors when executed by hand.

To understand what JspX does you need to keep in mind that unlike a JSP engine, Jsp2X parses both the JSP tags and the template text in between those tags. In that respect Jsp2X incorporates a more complex parser than what you'd find in a typical JSP engine (luckily, I had a very powerful and yet easy-to-use tool at hand: ANTLR, a robust LL(*) parser generator). More importantly, Jsp2X can successfully parse the template text in your JSP pages only if it is reasonably correct XHTML. Jsp2X doesn't expect fully well-formed XML template text. It requires that all tags are nested properly and that empty tags are closed correctly. There is no need for a single root element - Jsp2X will create one on-the-fly if necessary.

Where can I get it?

The latest binary and source distributions be downloaded from this page. To compile the sources you need Maven version 2.0.7 and a JDK 1.6.0_02. Older Maven 2.0 releases >= 2.0.4 may work as well and a recent 1.5 JDK should be fine, too. Jsp2X is released under the LGPL.

The usage of the binary distribution is described in section Usage.

The source code repository is hosted at Google Code.

What exactly does it do?

A conversion of a single JSP page requires a number of different transformations. The following is a hopefully complete list:

  • Jsp2X writes the converted input to an output file whose name is derived from the input file. The extension of the output file name is mapped according to what the JSP standard lists as standard extensions for JSP pages/documents, tag files and fragments (also see Usage).
  • Jsp2X adds four very short utility tag files to the converted project. They have the jspx: prefix and contain functionality that would otherwise clutter the converted JSP document.
  • Jsp2X wraps the JSP page in a <jsp:root> tag.
  • Jsp2X wraps JSP fragments into a <jspx:fragment> tag. <jsp:root> tags in fragments are disallowed so I had to come up with another tag that is transparent with respect to the generated output and that can be used to collect the potentially many top-level elements of a fragment underneath a single top-level element (a requirement of XML well-formedness).
  • Jsp2X converts all taglib declarations to name space references on the new root element (<jsp:root> or <jsp:fragment>). Unused taglibs are omitted. Jsp2X even detects taglibs that are declared in a fragment that is included by the JSP page to be converted. JSP page authors often move their taglib declarations to a separate file that is then included at the top of every JSP page.
  • Jsp2X escapes special XML characters in the input. Keep in mind that an JSP document is parsed twice, once by the JSP engine's XML parser and once on the client side by the browser's HTML/XHTML parser. If you wanted to display a literal < on a page, it was sufficient to put the HTML entity &lt; into the JSP page because the entity had no special meaning to the JSP parser. A JSP document would have to read &amp;lt; to get the desired effect. The JSP parser will substitute &amp; with & such that the browser gets the intended &lt ; and renders that as < . Jsp2X does the necessary escaping for you.
  • Jsp2X wraps template text in <jsp:text> tags, excluding insignificant whitespace.
  • Jsp2X escapes HTML comments and converts JSP comments to XML comments with the intended effect that HTML comments will end up in the output whereas JSP comments do not.
  • Jsp2X wraps scriptlets and expressions in <jsp:scriptlet> and <jsp:expression> tags respectively.
  • Jsp2X inserts escaped HTML comments into the body of elements with empty bodies to prevent them from being collapsed into empty element: <td></td> becomes <td><!----&gt</td> . This is definitely noisy but I found no other way to prevent the JSP engine's XML parser from collapsing empty element bodies. One of the goals for Jsp2X was to preserve the intent of a JSP page as much as possible. Luckily, a typical HTML page doesn't contain that many empty elements so the added syntactic noise will be minimal.
  • Jsp2X tries to detect and convert dynamic attribute constructs. The detection of these constructs is not bullet-proof because Jsp2X does not have a full-blown EL expression parser. Instead it uses regexes to detect the most common cases. The table below lists the supported cases (with additional whitespace and indention for clarity).
    JSP page JSP document
    <foo x="<bar …>">
    <jspx:element name="foo">
        <jspx:attribute name="x"/><bar…></jspx:attribute>
    <foo <c:if test="…">x="…"<c:if>>
    <jspx:element name="foo">
        <c:if test="…">
            <jspx:attribute name="x"/>…</jspx:attribute>
    <foo ${condition : 'x="…"' ? ''}>
    <jspx:element name="foo">
        <c:if test="${condition}">
            <jspx:attribute name="x"/>…</jspx:attribute>
    <foo ${condition : '' ? 'x="…"'}>
    <jspx:element name="foo">
        <c:if test="${!(condition)}">
            <jspx:attribute name="x"/>…</jspx:attribute>
    <foo ${condition : 'x="…"' ? 'y="…"'}>
    <jspx:element name="foo">
            <c:when test="${condition}">
                <jspx:attribute name="x"/>…</jspx:attribute>
                <jspx:attribute name="y"/>…</jspx:attribute>
  • Jsp2X rewrites the file extension in references to an included file as long as the included file is also listed as an input file. This is why you should convert all JSP files in a single invocation of Jsp2X. If you don't Jsp2X will not be able to rewrite references to converted files.
  • Jsp2X converts DOCTYPE declarations to <jsp:output> elements.

You might notice the use of <jspx:element> and <jspx:attribute> tags where you'd expect JSP's built-in <jsp:element> and <jsp:attribute> tags. The reason is that the built-in mechanism doesn't work for conditional attributes (something I consider a blatant oversight in the standard). For example,

<jsp:element …><c:if …><jsp:attribute …>…</jsp:attribute></c:if></jsp:element>

doesn't work because the attribute element applies to the <c:if> tag, not the <jsp:element> tag. It is in accordance with the standard but the standard should have been written to accommodate this very common use case. Jsp2X creates several tag files with custom tags that have similar functionality to <jsp:element> , <jsp:attribute> and <jsp:body> but work for conditional attributes:

<jspx:element name="foo"><c:if …><jspx:attribute name="bar">…</jsp:attribute></c:if></jsp:element> .

Another difference is that <jspx:element> distinguishes between empty tags and tags with empty bodies. For example, a JSP page with

<jspx:element name="foo"><jsp:body/></jsp:element>

will emit <foo></foo> and

<jspx:element name="foo"></jsp:element> or <jspx:element name="foo"/>

will emit <foo/> . The jsp: variant would have emitted <foo/> in either case. This is XML-compliant but violates HTML (not XHTML) in which <div></div> and <div/> are treated differently. The latter is actually disallowed and the its effect differs from browser to browser. FF treats it like an opening <div> and implicitly closes it at the end of the parent tag, e.g.

<td><div class="a"/><div>foo</div><td> is treated like

<td><div class="a"><div>foo</div></div></td> .

IE7 simply ignores everything after the <div/> .

The use of Jsp2X's custom <jspx:element> instead of the built-in <jsp:element> assists in creating output that is more likely to preserve the JSP page author's intent. It also enables the use of HTML (albeit a somewhat stricter dialect of it) as opposed restricting the template text to pure XHTML.


  • mandatory: JDK 5 or higher
  • recommended: JSP files named with standardized extensions ( .tag , .jsp and .jspf .
  • recommended: Access to the complete set of all JSP files that comprise the web application (i.e. everything underneath the WEB-INF directory).
  • recommended: The include directives in every input JSP page should use context-relative URIs to refer to other JSP files (as in /WEB-INF/jsp/taglibs.jspf ).


Jsp2X is distributed as an executable JAR file. It is invoked as follows:

# java -jar <path to distribution jar> …

Invoking it with --help shows the command line options.

# java -jar jsp2x-VERSION-bin.jar --help
Jsp2X [--help] [-c|--clobber] [(-o|--output) <output>] file1 file2 … fileN

Converts JSP pages to JSP documents (well-formed XML files with JSP tags).

Prints this help message.

Overwrite output files even if they already exist.

[(-o|--output) <output>]
The path to the output folder. By default output files and logs are
created in the same directory as the input file.

file1 file2 … fileN
One or more paths to JSP files. Should not be absolute paths.

Unless you specify --clobber , Jsp2X will never overwrite existing files. For every input file it will create a converted output file and possibly a log file in the same directory of the input file unless the --output switch is specified. With --output <path> , output files are written to a directory structure underneath the directory specified by <path>. The directory structure will mimic the one of the input files and non-existing directories will be created on the fly as required. The base name of the output file will be derived from the input file using the following mapping between standard JSP page extensions and standard JSP document extensions:

Input extension Output extension
jsp jspx
tag tagx
jspf jspx

If the input file's extension doesn't match any of the ones listed in above table, Jsp2X will generate the output file name simply by appending .xml to the input file name.

Input file paths should always be relative paths. They must be relative paths if --output is specified. If they are relative paths they may start with './' but they don't need to, e.g. ./foo/bar.jsp is treated equivalent to foo/bar.jsp . JSP pages may include other JSP fragments. Jsp2X can handle this as long as the value of every include directive's uri attribute should point to the included file when prepending the uri value with the current working directory. In other words, you should

  • run Jsp2X from with the webapp directory of your source tree (usually src/main/webapp ) and
  • your JSP pages use context-relative URIs to refer to the included fragment, e.g. /WEB-INF/jsp/taglibs.jspf .

In all other cases Jsp2X will emit a warning and the conversion result might be incomplete.

A typical conversion session might look like this:

# cd src/main/webapp
# find -name "*.tag" -or -name "*.jsp" -or -name | 
  xargs java -jar jsp2x-VERSION-bin.jar --clobber
# cd ../../..

Jsp2X will print the total number of input files and the number of successfully converted input files. You will find as many log files as there are input files for which the conversion was unsuccessful. Read the log files and tweak the input pages or come running to me if you think you found a bug.

When converting the JSP pages in Provider Portal, I used a slightly more elaborate approach that yielded better diffs in SVN. The key to that approach is that I first renamed the JSP pages to their JSP document counterparts in one commit then replaced the content of the renamed file with its converted form in a second commit. The diff of the second commit lists all modifications made by Jsp2X allowing you to later go back and see what exactly it did. Here's a transcript of my conversion session (before you copy-and-paste it make sure you understand what's going on with all those find commands):

  1. Convert all JSP files into a separate temporary directory:
    # cd src/main/webapp
    # find -name "*.tag" -or -name "*.jsp" -or -name | 
      xargs java -jar jsp2x-VERSION-bin.jar --clobber --output temp
  2. Use find to generate a script that renames all JSP files:
    # find \( -name  -or -name "*.jspf" \) -and -printf | 
      sed -r "s/jspf?\$/jspx/" | bash
    # find -name "*.tag" -and -printf "svn rename %p %p\\n" | sed -r  | bash
    # svn commit -m "…"
  3. Use find to generate another script that copies the converted files from the tempotary directory to the real one:
    # cd temp/WEB-INF
    # find \( -name "*.tagx" -or -name "*.jspx" \) -and -printf  | sed s/\\/\\.\\//\\// | bash
    # cd ../..
    # rm -r temp
    # svn commit -m "…"

How it works

Jsp2X is split into four main parts: the parser, the transformer, the dumper and the main class with some glue code for command line and file management. The parser was hardest to get right because unlike a true JSP page parser it can't just scan the template text for JSP constructs. The transformer needs a complete tree structure of the input including the tags in the template text. So the parser has to scan for markup in the template textand JSP constructs at the same time. The input is not just simple markup with elements, attributes and some text. JSP constructs can literally occur anywhere in the document. The parser needs to accept input like this:

<a href="<c:url value="foo"/>" ${isBold ? 'class="bold"' : ''}>

This is an <a> element with an href attribute whose value is a <c:url> tag which has more attributes. Next to the href attribute there is an EL expression with a conditional class attribute. I refer to these constructs as being recursive because tags are allowed within tags (this is different to elements occurring in the body of other elements). Also note the nesting of the quotation marks. As you can see, parsing this is not trivial. Luckily, I had a very powerful tool at hand: ANTLR. Given the grammar of an input language ANTLR generates the Java source code of a class that can parse the input language and turn it into an in-memory tree representing the input. So as long as you can come up with a grammar for the desired input, ANTLR generates a program that parses the input for you. ANTLR can generate source code for Java, C#, C and other languages. It supports complex LL(*) grammars (any context-free language if you know who Chomsky is) in which the decision about which grammar rule to apply can not be made by just looking a constant number of tokens ahead (it uses backtracking in conjunction with memoization to alleviate the exponential cost of backtracking). I am an ANTLR newbie so I expect my JSP grammar to have deficiencies.

The transformer is a simple recursive tree walker that can change, delete and add nodes during the walk. Most of the work is done in a first pass. It also detects and converts the afore-mentioned recursion in attributes and tags. The second pass combines consecutive PCDATA (i.e. text) nodes and escapes XML entities. The third pass attempts to detect insignificant whitespace. For example, it converts




The difference between the two fragments is that the first one would cause the JSP engine to emit HTML output that includes the whitespace:


The second fragment on the other hand would emit


This is because the whitespace around "Foo" became whitespace-only text between tags and can be safely eliminated by the JSP engine. The text child of the <td> element in the first fragment contains both whitespace and non-whitespace. The JSP standard says that in JSP documents only text that exclusively consists of whitespace can be eliminated.

The dumper is a very simple XML serializer. After the transformer did its work, the tree is basically in XML form and serializing it is a trivial task. ANTLR supports tree parsing to some extent so I used that mechanism for the dumper.

There's not much to say about the main class, except maybe that it uses a neat little command line parser called JSAP.

jsp2x-0.9.1-SNAPSHOT-bin.jar530.26 KB
( categories: Java | Programmer )
Submitted by Anonymous on Sun, 07/17/2011 - 03:36.
I could solve the include issue, at least for me, however, I found another issue: If an include file is empty, no include statement is issued into the output. So if you have empty includes for some reason, make sure they contain at least a <%@ page %> directive before converting. A jar with the patch can be found in the corresponding issue I created on . For the empty-includes issue I have created another issue on googlecode. My problem is solved :-)
Submitted by Anonymous on Sun, 07/17/2011 - 02:20.
I did try to remove any leading whitespace before the include directive as suggested, but I still get an error message. I have created an issue with the log output, see
Submitted by Hannes Schmidt on Sun, 06/26/2011 - 11:38.
Makes total sense to me. I didn't know about that possibility - or didn't care to investigate it. Another wheel reinvented. Thanks for the suggestion. -- Hannes
Submitted by Anonymous on Fri, 06/10/2011 - 09:07.
Nice piece of work! I seriously considered using it but it would be an overkill for my needs. anyway, being you I would consider reusing an existing JSP parser instead of creating a new one unless there is some problem with that. I've just blogged about how to easily reuse Tomcat's Jasper to transform JSP into whatever you want (based on its internal, hierarchical representation of tags or 'nodes' on the page), see here.
Submitted by Anonymous on Mon, 10/04/2010 - 07:16.
I have the same issue. It seems that if you put an include directive outside the JSP page's root element (as per the bundle example) then it works. So it's OK for pulling in fragments containing more directives, for example. But if you put the include directive inside the main body of the JSP page, like for a footer or some other standard body content, then it spits out an error. I can't find a workaround, which is a shame, because otherwise it's exactly what I need.
Submitted by Hannes Schmidt on Tue, 04/07/2009 - 19:29.

Hi Shailesh,

can you check whether removing the leading fixes the problem:

<%@ include file="common/footer.jsp" %>

I am also surprised that removing the @ sign worked for you. Doesn't that generate a jsp:scriptlet element instead of a jsp:directive.include element?

-- Hannes

Submitted by Anonymous on Tue, 04/07/2009 - 05:44.
This is really good work, thanks a lot for making this available. I am not sure if this worked for you, but I couldn't get any of the native jsp include directives to work correctly with this converter, I had to remove the "@" sign in the beginning. Here's an example line if you want to try it out:
  • <%@ include file="/common/footer.jsp" %>
Cheers. Shailesh
Submitted by Hannes Schmidt on Mon, 04/06/2009 - 23:46.

No Maven repository yet but at least the source is hosted at Google Code.

-- Hannes

Submitted by Anonymous on Fri, 02/13/2009 - 14:35.

It'd be nice if you could include a tld in the built jar so I can just add the jar and reference the tag library.

This is how I currently use it in my project:

1. Add jar via Maven dependency (manually uploaded to my own Maven repo for now until an official one is available)
a. Alternatively just drop the jar into WEB-INF/lib

2. Extract the WEB-INF/tags/jspx folder from the jar into your own project WEB-INF/tags/jspx

3. Create a tld file called jspx.tld and place it in your WEB-INF/tlds folder (see below for what I used)

4. In your jspx page import the library by adding the following to your jsp:root

5. Use the tags with (less than sign) jspx:element ...

Below is the jspx.tld I created. It should be placed in META-INF/jspx.tld of the jar. I hope it posts completely since I can't attach it. If not then just e-mail me at website2008 -AT- rodneybeede -dot.- com.

<?xml version="1.0" encoding="UTF-8" ?>
<taglib xmlns=""







Submitted by Hannes Schmidt on Fri, 02/13/2009 - 13:57.
I totally agree. I see what I can do over the weekend. -- Hannes
Submitted by Anonymous on Fri, 02/13/2009 - 12:37.
It'd be nice if the Maven Central repository had this project. Or at least some type of Maven repository for it.
Submitted by Anonymous on Mon, 01/12/2009 - 15:44.
Thanks, it's ok now ! I put this inside element of my web.xml <jsp-config> <jsp-property-group> <url-pattern>*.jsp</url-pattern> <is-xml>true</is-xml> </jsp-property-group> </jsp-config> _______ seb ^^
Submitted by Hannes Schmidt on Mon, 01/12/2009 - 00:32.

JSPX is XML. The correct way to comment in XML is <!-- -->. Tomcat's XML parser is supposed to drop the XML comments before further processing the document. Could it be that Tomcat isn't recognizing your JSPX files as such and instead treating them as JSP?

Check if setting is-xml to true helps (inside <servlet> element of your web.xml:


Also, read Appendix D of the JSP 2.0 spec linked at the top of this article.

-- Hannes

Submitted by Anonymous on Sun, 01/11/2009 - 11:50.
Hello, The tag <%-- comments --%> is converted into <!-- comments --> in JSPX. When I run my file JSPX in the server apache tomcat 6, the server try to interpret still and all the "comments". How to really do a comment in JSPX ? Thanks you.
Submitted by Hannes Schmidt on Wed, 08/20/2008 - 22:39.
Hi Leon, I'd be interested in your patch. Can you email me at dp2 AT -- Hannes
Submitted by Anonymous on Tue, 08/19/2008 - 19:21.
Ironically, the comment system bombs on unescaped "<" symbol too :) So, one problem I ran into is that the tool bombs on "<" symbol in inline JavaScript. I think this wouldn't be that hard to fix in ANTLR grammar (Eclipse JSP editor parses it correctly), and I am willing to contribute a patch. Where do I send it? Thanks, Leon
Submitted by Anonymous on Tue, 08/19/2008 - 19:17.
I've been playing with this tool for a couple of days, and it worked surprisingly well on a few insane JSP pages I almost gave up refactoring. One problem I ran into is that the tool bombs on " Thanks, Leon
Submitted by Hannes Schmidt on Sun, 06/22/2008 - 09:31.
That sounds like a good idea. I won't have time to add it but patches are always welcome. -- Hannes
Submitted by Anonymous on Sat, 06/14/2008 - 23:15.
I'm talking about the implementation of jspx:attribute
Submitted by Anonymous on Sat, 06/14/2008 - 23:14.
I noticed that you didn't fully duplciate the behaviour as there doesn't seem to be support for trim=true. Let me suggest that if you're going to add this, you also add normalize="true" where normalizing will take all the whitespace of the attribute value and replace it with a single space.
Submitted by Hannes Schmidt on Thu, 02/14/2008 - 12:28.
There you go! -- Hannes
Submitted by Hannes Schmidt on Wed, 02/13/2008 - 12:21.
I will release them under the LGPL this week. Please check back later. -- Hannes
Submitted by Anonymous on Mon, 02/11/2008 - 07:52.
Hi. I'm interested in the sources, where can I get them? Br, Timo.