Decompilation Faq Refutation

Program-Transformation.Org: The Program Transformation Wiki
Question 38.4 of the C++ FAQ LITE demonstrates a typical negative reaction to the basic question "how can I decompile a C++ executable file?". I don't mean any offense to the maintainers of this particular FAQ, or of any of the many naysayers; this appears to be the generally accepted wisdom. Hopefully, we can all learn something from picking apart the arguments one by one.

The original FAQ question is 38.4 How can I decompile an executable program back into C++ source code?.

Their answer is reproduced below in red; my refutation follows in default colours.

You gotta be kidding, right?

Certainly not. There are plenty of legitimate reasons to perform decompilation. See WhyDecompilation.

Here are a few of the many reasons this is not even remotely feasible:

  • What makes you think the program was written in C++ to begin with?

I don't have to think that at all. It doesn't matter what language the original program was compiled in, within reason. The semantics of the original program are there, all decompilation does is translate that binary form of the program to another form, which happens to be source code; for the purposes of refuting this FAQ answer, C++ source code.

  • Even if you are sure it was originally written (at least partially) in C++, which one of the gazillion C++ compilers produced it?

Again, it doesn't matter. Different compilers will produce different executable output, certainly; each of these is equivalent to all the others (assuming the compilers are competent), and each is equivalent to the original source code.

  • Even if you know the compiler, which particular version of the compiler was used?

It doesn't matter.

  • Even if you know the compiler's manufacturer and version number, what compile-time options were used?

It doesn't matter. All the possible outputs are equivalent. Optimisation usually doesn't hurt either; often it makes the decompiler's job slightly easier because there is a little bit less executable code to work through, and a little less of the detail in the executable file is thrown away. Optimisation can reduce the readability somewhat when the compiler inlines function calls. Even so, the result is still correct and readable, just (usually) with more repetition. Such inlining can be undone, perhaps automatically, if necessary and/or desirable.

  • Even if you know the compiler's manufacturer and version number and compile-time options, what third party libraries were linked-in, and what was their version?

It doesn't matter, as long as the decompiler has access to the signatures for the libraries (assuming that they are statically linked. Dynamically linked libraries are no problem; the executable file has to provide the name of the library function, so all the decompiler needs is a prototype of the function, to get its parameters and return types.) If this information is not available, then usually the library function will get decompiled too. Not what you want, but with a little knowledge of the original program's domain, it should not be difficult to figure out a prototype for each library function.

  • Even if you know all that stuff, most executables have had their debugging information stripped out, so the resulting decompiled code will be totally unreadable.

The original comments will be missing, certainly, as will the original variable names. So you will probably end up with generic variable names such as local2 and global17. However, that's far from unreadable, if the structure of the program is good (proper for loops, if/then/else, switch, minimal gotos, proper data types, etc. All these can be achieved.) Even if you call this output unreadable, if the decompiler has done its job correctly, the output will be recompilable. This makes a whole set of applications for decompilation possible. For example, you can recompile the unreadable output with various optimisations specific to your requirements. You may be able to port the program to another platform with minimal understanding of how it works. You may find it easier to spot malware in C++ code with no comments or labels than it is to spot it in a disassembly with no comments or labels. And so on.

It is possible (though admittedly it seems unlikely at this stage) that some form of artificial intelligence will be able to insert meaningful comments and/or variable names, based on design patterns detected in the program. Such automatically generated comments and names could in some cases be more valuable (since they will presumably mostly be correct) than the original comments, which in some cases are out of date and may even be misleading. Sometimes even no information is better than wrong information.

  • Even if you know everything about the compiler, manufacturer, version number, compile-time options, third party libraries, and debugging information, the cost of writing a decompiler that works with even one particular compiler and has even a modest success rate at generating code would be significant on the par with writing the compiler itself from scratch.

Certainly, writing a decompiler is non trivial. I would not recommend it for those without detailed knowledge of assembly language, for example. But why write a decompiler that is specific to one compiler, manufacturer, etc? Machine code is a language; it just happens to be a binary language. General program transformation techniques can be used to transform programs written in machine language into equivalent programs that are in C++. A good compiler doesn't restruct itself to a particular version number of programmer, does it?

But the biggest question is not how you can decompile someone's code, but why do you want to do this?

Again, see WhyDecompilation. Plus, it doesn't have to be somone else's code, as you indicate below (recovering your own lost source code).

If you're trying to reverse-engineer someone else's code, shame on you; go find honest work.

Why does it have to be someone else's code? It's been estimated that about 3% to 5% of the world's source code is missing. Added to that, some of the original compilers are no longer available, or the hardware to run them is no longer convenient. (Consider trying to run a DOS based compiler that requires XMS memory... could you get it to run conveniently? It was only about 10 years ago that such compilers were in regular use.)

However, what's to stop the owner of some executable from coming to me, asking me to decompile his or her program for them? Why can't I provide this service, which may save their company from bankrupcy, or save them money by enabling them to run an old program until they have time to develop a replacement for it?

Also, there are legal reasons to reverse engineer someone else's work. Consider interoperability, like a driver that doesn't work on my system. In most countries, I can reverse engineer the code to the extent necessary for interoperability (in this case, to fix the driver). Some countries also have provisions for studying algorithms for research purposes, and there is also the largely untested area of abandoned software.

Yes, decompilation can be used for illegal purposes. But compilers can be used to compile malware, video recorders can be used to make illegal copies of copyrighted movies, and knives can be used for unlawful killing. Should all these things be banned too? No, because they all have "substantial non-infringing uses".

If you're trying to recover from losing your own source, the best suggestion I have is to make better backups next time.

So you've never had a backup fail? You back up everything you think you'll need? Nobody ever leaves your company? You will never get taken over by another company? Accidents happen. Sometimes, source code isn't missing, but you can't find the right source code to match a particular executable. This is basically what happened to a company that approached me.

I have a better suggestion: use backups by all means, but when they fail, and they will from time to time, there is an option: decompilation. Certainly, it is not perfect, certainly not yet, but let's not dismiss the idea outright because you think it's an inherently bad thing.

(Don't bother writing me email saying there are legitimate reasons for decompiling; I didn't say there weren't.)

Good, then you agree that there are substantial non infringing uses for decompilation. The Sony Betamax case tells us that technology with substantial non infringing uses is legal. Yes, there are those who would prefer to outlaw certain things, such as Peer to Peer software distribution programs, but they seem to want to shoot the messenger rather than address the real concerns (in the case of P2P, how to sensibly license copyrighted materials; in the case of decompilation, how to secure proprietary code distributed in binary form. For a long time, they could pretend that executable programs are not readable; with improvements in decompilation, this is no longer true.)

See also BobStoutRefutation.

-- MikeVanEmmerik - 07 Jul 2005; updated 01 Sep 2005.