Decompilation Process

Program-Transformation.Org: The Program Transformation Wiki

The Decompilation Process

The main problems with decompilation are the separation of data and code (i.e. obtaining a complete disassembly of the program), the reconstruction of control structures, and the recovery of high-level data types. In order to achieve a greater percentage of the disassembly automatically, decompilers can make use of knowledge about certain compilers and libraries used in the compilation of the file to be decompiled. Identification of library code has been made possible through signatures, and examples of their usage are dcc's dccSign (see postscript paper) and IDA's FLIRT library support.

The following are the main steps in converting executable programs into a procedural-based high-level language (HLL):

  • Decode the binary-file format.
  • Decode the machine instructions into assembly code for that machine. Extra smarts are needed to handle indirect transfers of control such as indirect calls and indexed jumps. If the targets of these are not all known, the decompilation will be incomplete for that procedure. Alternatively, human intervention may be required.
  • Perform semantic analysis to recover some low-level data types such as long variables, and to simplify the decoded instructions based on their semantics.
  • Store the information in a suitable intermediate representation If a suitable intermediate language is used, the next 2 steps can be used with any assembly language to generate any procedural HLL code.
  • Perform data flow analysis to remove low-level aspects of the intermediate representation that do not exist in HLLs, e.g. registers, condition codes, stack references.
  • Perform control flow analysis to recover the control structures available in each procedure (i.e. loops, conditionals and their nesting level)
  • Perform type analysis to recover HLL data types such as arrays and structures. Recovery of classes requires extra analysis. Note: this is one of the hardest steps and may need human intervention.
  • Generate HLL code from the transformed intermediate code.

-- MikeVanEmmerik - 20 Nov 2001