There may be a few machine code patterns that are impossible to decompile automatically. These would therefore require expert human intervention to decompile successfully.
There are features such as the original comments, variable names and function names that can never be recovered, although very powerful analyses may be able to suggest good (perhaps even better than the original, in some cases) alternatives. These unrecoverable features are not the aim of this page, since correct decompilations can be generated (even if usually less readable that the original) without recovering the original comments or names.
Reference combined with casts
This is an example of a machine code pattern that initially I thought was not decompilable automatically.
However, correct, if less readable, code is possible with an automatic decompiler.
Consider a reference to a memory variable.
In machine code, an expression such as
sp-K where
sp is the stack pointer register and
K is a constant, takes the address of a local variable, and could be passed as a parameter to a library function. It may not be known whether this reference will define or use the implied memory expression (in the example,
m[sp-K], where
m[x] means the memory at address
x). It is possible that the memory location is the coalescing of two or more original variables of different type, so that in the most readable decompilation, the memory location would be split into two different variables in the decomiled output. Usually, the decompiler would be able to use the type of the reference to decide which of several live ranges the reference belongs to, i.e. which original variable the expression is the address of.
lea eax, esp-24
push eax
push 4
call malloc
push eax
call use-as-char-star
lea eax, esp-24
push eax
call use-as-float
lea eax, esp-24
push eax
call use-as-int
In the above example, the four bytes at
m[esp-24] are used as a char*, a float, and as an int.
Are they all the same variable with at least two casts? Or does the call to use-as-float overwrite the original variable, and therefore reference a different variable to the first reference? Here are two of several possible outputs:
char** px; char** px;
float* py;
int* pz;
px = (char**) malloc(4); px = (char**) malloc(4);
use-as-char-star(px); use-as-char-star(px);
use-as-float(py); use-as-float((float*)px);
use-as-int(pz); use-as-int((int*)px); /* or use-as-int(pz)? */
However, the second decompilation, while arguably less readable with the casts, is just as valid as the input program, so an automatic decompiler can generate the second version and always have correct, working output. If the user wants to, s/he can attempt to improve the program to the original version, taking the risk that the program may no longer work.
--
MikeVanEmmerik - 08 May 2006