The Boomerang Decompiler and Tests

Boomerang is an attempt at a complete, retargetable decompiler, released under a BSD style (open source) license.

Introduction

It was intended to be highly modular, so that different parts of the decompiler can be replaced with experimental modules. (As of mid 2003, this has largely not been realised.) It will one day be interactive, a la IDA Pro, because some things (not just variable names and comments, though these are obviously very important) require expert intervention.

As of August 2003, there is code that can be downloaded (CVS only; nothing will be released for many months). It's now at the stage where it can decompile correctly several small Pentium and SPARC programs. The dataflow design is finally complete; it uses a combination of fairly standard SSA (Static Single Assignment) and a small theorem prover. It recovers parameters and return values fairly well.

The next major step is that of type analysis.

The SourceForge page is http://boomerang.sourceforge.net.

Tests

Fibo

The original source code for Fibo is:
#include < stdio.h >
int fib (int x)
{
    if (x > 1)
        return (fib(x - 1) + fib(x - 2));
    else return (x);
}
int main (void)
{   int number, value;

    printf ("Input number: ");
    scanf ("%d", &number);
    value = fib(number);
    printf("fibonacci(%d) = %d\n", number, value);
    return (0);
}
As of August 2003, this is the output from Boomerang:
int main()
{
int local0;
...
int local9;
    printf("Input number: ");
    scanf("%d", &local7);
    if (local7 <= 1) {
        local9 = local7;
    } else {
        local9 = fib(local7 - 1);
        local8 = fib(local7 - 2);
        local9 = local8 + local9;
    }
    printf("fibonacci(%d) = %d\n", local7, local9);
    local9 = 0;
    return local9;
}

int fib(int param8)
{
int local0;
...
int local7;
    if (param8 <= 1) {
        local7 = param8;
    } else {
        local7 = fib(param8 - 1);
        local6 = fib(param8 - 2);
        local7 = local6 + local7;
    }
    return local7;
}

Boomerang is finally able to process Fibo. The outout compiles and runs correctly with no modifications. There are a few more local variables that I would like, but this is a minor point, and could be fixed soon anyway. The Pentium and Sparc versions both decompile to very similar code, despit the very different code at the machine language level.

TwoProc

Here is a very simple test program:
int proc1(int a, int b) {
    return a + b;
}

int main() {
    printf("%i\n", proc1(3, 4));
}
The Boomerang output is
int main()
{
int local0;
...
int local5;
    local5 = proc1(4, 3);
    local5 = printf("%i\n", local5);
    return local5;
}

int proc1(int param4, int param7)
{
int local0;
int local1;
    local1 = param4 + param7;
    return local1;
}
Again, the code recompiles and runs correctly with no modifications. Again, there are too many local variables. At one stage, with global dataflow analysis, main transformed to essentially "printf("%i\n", 7)", but this no longer happens (dataflow is local to a procedure again).

Switch

Original C source code:
int main(int argc) {
    switch(argc) {
        case 2:
            printf("Two!\n"); break;
        case 3:
            printf("Three!\n"); break;
        case 4:
            printf("Four!\n"); break;
        case 5:
            printf( "Five!\n"); break;
        case 6:
            printf( "Six!\n"); break;
        case 7:
            printf( "Seven!\n"); break;
        default:
            printf( "Other!\n"); break;
    }
    return 0;
}

Here is the Boomerang output (August 2003):

int main(int argc, char** argv, int param4)
{
int local0;
...
int local5;
    if (argc - 2 > 5) {
        local2 = 134517890;
    } else {
        switch(local0) {
        case 2:
            local2 = 134517848;
        break;
        case 3:
            local2 = 134517854;
        break;
        case 4:
            local2 = 134517862;
        break;
        case 5:
            local2 = 134517869;
        break;
        case 6:
            local2 = 134517876;
        break;
        case 7:
            local2 = 134517882;
        break;
        }
    }
    printf(local2, local5, param4, argc, argv);
    local4 = 0;
    return local4;
}

Now that the first parameter to printf is a variable, the strings are not converted. Type analysis (or even simple type propagation) will fix this problem. There is also a problem that the switch variable is not assigned to; this is code from UQBT that has not been converted correctly (it won't be hard to fix). It would be nice to convert the if statement near the start into a "default" case. Again, there are too many locals.


CategoryDecompilation

Revision: r1.6 - 20 Aug 2003 - 01:56 - MikeVanEmmerik
Copyright © 1999-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback