Decompilation Dava

Program-Transformation.Org: The Program Transformation Wiki

McGill's "Dava" Java Decompiler

The Sable group at McGill University, under the leadership of Professor Laurie Hendren, are working on a framework called Soot for analysing and optimising Java bytecode. An offshoot of this work, though not described directly on the Soot page, is the Dava decompiler. This is mostly the work of Jerome Miecznikowski, whose masters thesis is available. A paper published at the Decompilation Workshop at WCRE 2001 describes the restructuring algorithm used. See also their tech report Decompiling Java Bytecode: Problems, Traps and Pitfalls. Also of interest to decompilation researchers is their paper on types: Efficient Inference of Static Types for Java Bytecode. You can get a working version (with a little effort) from the website. See DecompilationDavaTest for tests.

Making Dava

There is a version of Dava in the latest release of Soot (but it is undocumented, and predates some enhancements mentioned in their paper). According to the author, the new version is ready, it just needs integration into a current version of Soot. This version was supposed to happen early 2003. These instructions and tests relate to the old version.

Extremely brief installation instructions can be found in the main author's (Jerome Miecznikowski's) web page, which was at http://www.sable.mcgill.ca/~jerome/public, in the file INSTALL (still accessable here from archive.org). So to use Dava, you just install Soot, and use some special command line options. You can download the full source version (5MB), or the version with just the classes directory (2MB). Note that the Jasmin needed is a modification of the standard Jasmin "assembler for Java", so make sure you use the one provided.

The main problem I had in getting it going is to set the CLASSPATH environment variable correctly. This worked for me: 1) current directory (.), 2) path to the rt.jar file (include rt.jar in this path), 3) path to the Soot classes directory, and 4) path to the jasmin classes directory.

To use Dava, alter the CLASSPATH is set up as above, cd to the directory with the top level .class file to be decompiled, and

% java soot.Main --dava foo     or
% java soot.Main --dava --app foo

as per the INSTALL instructions. Note: if you get warnings about phantom library classes, you probably need to get your CLASSPATH right. There is more about the rt.jar file in the Soot tutorial. Note: you should not need to unjar the rt.jar file. I found it convenient to set up a shell program to set the classpath and add the soot.Main --dava part automatically. Mine looks like this (all on one line):

java -Xmx128M -cp .:/usr/java/jre/lib/rt.jar:
/home/44/emmerik/soot-1.2.4/soot/classes/:
/home/44/emmerik/soot-1.2.4/jasmin/classes:. soot.Main --dava $@

About Dava

As you can read in the various papers, in particular Decompiling Java Bytecode: Problems, Traps and Pitfalls, not all Java decompilers are created equal. In particular, 4 of the most popular ones completely failed when faced with optimised bytecode (you can use soot as an optimiser, as well as its many other uses). Also, Dava is tested on bytecode generated by relatively exotic languages, such as Haskell, Eiffel, ML, Ada, and even Fortran. A good Java decompiler should be able to produce correct, compilable code for any verifyable bytecode, and they claim that Dava comes close to this (certainly a lot closer than the other decompilers they tested, including two commercial decompilers). However, the more recent open source decompiler JODE seems to fare better than the decompilers they tested, and a little better than Dava as well.

The main problems that good decompilers solve are

  • Finding the types of local variables (including references to objects),
  • creating "stack variables" as needed, and
  • sorting out the control flow issues associated with gotos, breaks, and the like, so that these can be implemented in pure Java.

There are other problems, associated with exceptions, class literals, package and class resolution, etc, which they have solved along the way.


CategoryDecompilation