This information is pieced together from a few sources. I hope it is still accurate.

In about 1985, Jim Reuter wrote decomp, a decompiler for the Vax BSD 4.2 (a.out format) which took as input object files with symbolic information and produced C-like programs. The nature of this decompiler was to port the Empire game to the VMS environment, given that source code was not available. The decompiler was freely available on the Internet from . Since that is no longer available, I have attached a copy below.

Decomp worked only on object (.o or .obj) files, and made use of the symbol table information present in these files to find the entry points to functions, determine data used in the program, and the names of that data. Subroutines were decompiled one at a time, in the following way: a control flow graph of basic blocks was built and optimised by the removal of arcs leading to intermediate unconditional branches. Control flow analysis was performed in the graph to find high-level control constructs, converting the control flow graph into a tree of generic constructs. The algorithm used by this analysis was taken from the struct program, a program that structures graphs produced by Fortran programs, which was based on the structuring algorithm described by B.Baker in [ Bake77 ]. Finally, the generic constructs in the tree were converted to C-specific constructs, and code was generated. The final output programs required manual modifications to place the arguments on the procedure's argument list, and determine that a subroutine returned a value (i.e. was a function). This decompiler was written in about 5 man-months [ Reut91 ].

Sample programs were written and compiled in C in a Vax BSD 4.2 machine. The resulting C programs are not compilable, and require some hand editing. The programs have the correct control structures, due to the structuring algorithm implemented, and the right data type of variables, due to the embedded symbol table in the object code. The names of library routines and procedures, and the user's program entry point are also known from the symbol table; therefore, no extraneous procedures (e.g. compiler start up code, library routines) are decompiled. The need for a data flow analysis stage is vital, though, as neither expressions, actual arguments, nor function return value are determined. An interprocedural data flow analysis would eliminate much of the hand-editing required to recompile the output programs.

You can read his DecompReadMe file for more information.


Revision: r1.2 - 10 Oct 2005 - 01:38 - MikeVanEmmerik
Copyright © 1999-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback