History Of Decompilation 3

Program-Transformation.Org: The Program Transformation Wiki

History of Decompilation (2000-present)

University of London's Asm21toc reverse compiler, 2000.

This assembly language decompiler for Digital Signal Processing (DSP) code was written in a compiler-compiler called rdp [ JSW00 ]. The authors note that DSP is one of the last areas where assembly language is still commonly used. As usual, the decompilation from assembly language is considerably easier than from executable code; in fact the authors "doubt the usefulness" of decompiling from binary files.

Proof-Directed De-compilation of Low-Level Code, 2001.

Katsumata and Ohori published a paper [ KO01 ] on decompilation based on proof theoretical methods. The input is Jasmin, essentially Java assembly language. The output is an ML-like simply typed functional language. Their example shows an iterative implementation of the factorial function transformed into two functions (an equivalent recursive implementation). Their approach is to treat each instruction as a constructive proof representing its computation. While not immediately applicable to a general decompiler, their work may have application where proof of correctness (usually of a compilation) is required.

In [ Myc01 ], Mycroft compares his Type-Based decompilation with this work. Structuring to loops and conditionals is not attempted by either system. He concludes that the two systems produce very similar results in the areas where they overlap, but that they have different strengths and weaknesses.

Computer Security Analysis through Decompilation and High-Level Debugging, 2001.

Cifuentes et al suggested dynamic decompilation as a way to provide a powerful tool for security work. The main idea is that the security analyst is only interested in one small piece of code at one time, and so high level code could be generated "on the fly". One problem with traditional (static) decompilation is that it is difficult to determine the range of possible values of variables; by contrast, a dynamic decompiler can provide at least one value (the current value) with no effort [ CWVE01 ].

Type Propagation in IDA Pro Disassembler, 2001.

Guilfanov describes the type propagation system in the popular disassembler IDA Pro Ida Pro. The types of parameters to library calls are captured from system header files. The parameter types for commonly used libraries are saved in files called type libraries. Assignments to parameter locations are annotated with comments with the name and type of the parameter. This type information is propagated to other parts of the disassembly, including all known callers. At present, no attempt is made to find the types for other variables not associated with the parameters of any library calls [ Gui01 ].

DisC, by Satish Kumar, 2001.

This decompiler is designed to read only programs written in Turbo C version 2.0 or 2.01; it is an example of a compiler specific decompiler. There is no significant advantage to this approach, since general techniques are not much more difficult to implement. It is an interesting observation that since most aspects of decompilation are ultimately pattern matching in some sense, the difference between pattern matching decompilers and general ones is essentially one of the generality of the patterns. http://www.debugmode.com/dcompile/disc.htm.

ndcc decompiler, 2002.

André Janz modified the dcc decompiler to read 32-bit Windows Portable Executable (PE) files. The intent was to use the modified decompiler to analyse malware. The author states that a rewrite would be needed to fully implement the 80386 instruction set. Even so, reasonable results were obtained, while retaining dcc's severe limitations [ Jan02 ].

The Anatomizer Decompiler, circa 2002.

K. Morisada released a decompiler for Windows 32-bit programs, which appears to be comparable in capability to the REC compiler for that platform. See also AnatomizerDecompiler and http://jdi.at.infoseek.co.jp.

Analysis of Virtual Method Invocation for Binary Translation, 2002.

Tröger and Cifuentes show a method of analysing indirect call instructions. If such a call implements a virtual method call and is correctly identified, various important aspects of the call are extracted. The technique as presented is limited to one basic block; as a result, it fails for some less common cases. [ TC02 ].

Boomerang, 2002.

This is an open source decompiler, with several front ends (two are well developed) and a C back end. It uses an internal representation based on the Static Single Assignment form, and pioneers dataflow-based type analysis. At the time of writing, it is still limited to quite small (toy) binary programs. http://boomerang.sourceforge.net.

Desquirr, 2002.

This is an IDA Pro Plugin, written by David Eriksson as part of his Master's thesis. It decompiles one function at a time to the IDA output window. While not intended to be a serious decompiler, it illustrates what can be done with the help of a powerful disassembler and about 5000 lines of C++ code. Because a disassembler does not carry semantics for machine instructions, each supported processor requires a module to decode instruction semantics and addressing modes. The X86 and ARM processors are supported. Conditionals and loops are emitted as gotos, there is some simple switch analysis, and some recovery of parameters and returns is implemented. http://desquirr.sourceforge.net/desquirr.

Analyzing Memory Accesses in x86 Executables, 2004.

Balakrishnan and Reps from the University of Wisconsin have developed a framework for analysing binary programs that they call Codesurfer/x86. The aim is to produce intermediate representations that are similar to those that can be created for a program written in a high-level language. The binary is first disassembled with the IDA Pro disassembler. A plug-in for IDA Pro called Connector, provided by Grammatech Inc, interfaces to the rest of the tool. Value-set Analysis (VSA) is used to produce an intermediate representation, which is presented in a source code analysis tool called CodeSurfer (sold separately by Grammatech Inc for C/C++ source code analysis) [ BR04, RBLT05 ]. Codesurfer/x86 may be commercially available soon.

R. Falke's Type Analysis for Decompilers, 2004

Raimar Falke, in his Diploma Thesis (German) for the Technical University of Dresden, implemented an adaption of Mycroft's type constraint theory in a decompiler called YaDeC. He extended it to handle arrays. To keep the project manageable, he used objdump for the front end, ignored floating point types, assumed only stack parameters, and so on. An English translation of the paper's summary can be found here.

History Of Decompilation 1 (1960-1979)
History Of Decompilation 2 (1980-1999)


References

JSW00
A. Johnstone, E. Scott, and T. Womack. Reverse Compilation for Digital Signal Processors: a Working Example. In Proceedings of the Hawaii International Conference on System Sciences. IEEE-CS Press, January 2000.

KO01
S. Katumata and A. Ohori. Proof-directed de-compilation of low-level code. In European Symposium on Programming, volume 2028 of Lecture Notes in Computer Science, pages 352-366. Springer-Verlag 2001.

Myc01
A. Mycroft. Comparing type-based and proof-directed decompilation. In Proceedings of the Working Conference on Reverse Engineering, pages 362-367, Stuttgart Germany. IEEE-CS Press 2001.

CWVE01
C. Cifuences, T. Waddington, and M. Van Emmerik. Computer Security Analysis through Decompilation and High Level Debugging. In Proceedings of the Working Conference on Reverse Engineering, pages 375-380, Stuttgart Germany. IEEE-CS Press 2001.

Gui01
I. Guilfanov. A simple type system for program reengineering. In Proceedings of the Working Conference on Reverse Engineering, pages 357-361, Stuttgart Germany. IEEE-CS Press 2001.

Jan02
André Janz. Experimente mit einem Decompiler im Hinblick auf die forensische Informatik, 2002. http://agn-www.informatik.uni-hamburg.de/papers/doc/diparb_andre_janz.pdf .

TC02
J. Tröger and C. Cifuentes. Analysis of virtual method invocation for binary translation. In Proceedings of the Working Conference on Reverse Engineering, Richmond, Virginia; pages 65-74. IEEE-CS Press, 2002.

BR04
G. Balakrishnan and T. Reps. Analyzing Memory Accesses in x86 Executables. In Proceedings of Compiler Construction (LNCS 2985) pages 5-23. Springer-Verlag April 2004.

RBLT05
T. Reps, G. Balakrishnan, J. Lim and T. Teilelbaum. A Next-Generation Platform for Analysing Executables. To appear in Proc. 3rd Asian Symposium on Programming Languages and Systems, Springer-Verlag 2005.