---+ The PILER Decompilation System %TOC% The PILER decompilation system is documented in [ [[http://www.program-transformation.org/twiki/bin/view/Transform/HistoryOfDecompilation1#RefBarb74][Barb74]] ]. However, this document is probably only available in Microfiche form, and would be difficult for many to access. As the first attempt at a general decompiler, I thought it was worth summarising in a little more detail. The PILER decompiler system follows (almost) the classic three phase structure: * Interpretation of the subject (input) program's "commands" (the Interpreter) * Analysis of the source program for function and data usage (the Analyser) * Expression of the source program in the target language (the Converter) Each of these phases is a totally separate program, and could be run on three different machines! The interpreter phase was always run on the source machine. Piler.png A reproduction of Figure 2 of [Barb74], complete with upper case text. ---++ The "interpreter" phase The "interpreter" phase (not a program interpreter as we would use the term now) ran on the source machine, and has the input program loaded with it (just insert the cards into the deck!). The interpreter therefore used the loader and operating system of the source machine. The output of this phase (the first intermediate representation) was called Micro Form. This was supposed to be general, but reflected the narrow variety of machines at that time. Instead of making this a clean interface, they decided that Micro Form could be in any form that was convenient for the interpreter phase, as long as the subroutine RMR (Read Microform Record) in the Analyser phase could read it. The interpreter phase had three subphases: loader, analysis (not the same as the analysis phase; this subphase just attempts to separate code from data), and interpretation. The analysis subphase generated several data structures: a memory map, symbol table, subroutine table, data reference table, and an input/output buffer table. It seems that the opcode field was often used to decide if a word was instruction, data, or I/O control. "Secondary microforms" were appended for indexed instructions and the like. ---++ Micro Form Microform was the low level intermediate representation. It was 3 address. ---++ The analyser phase This phase could be run on any computer, not necessarily the source or target machines. The basic procedures were (from [Barb74]): * First level analysis. Memory mapping for separation of program instructions and data. Division of program into logical blocks. Preliminary "timing analysis". Flagging of indexed or modified branch addresses and operation codes. Subroutine analysis. * Data analysis. Historical use of data locations and registers. Formats of data. * Second level analysis. Analysis of modified addresses and operation codes and completion of first level analysis and data analysis for any additional segments. Program reduction. * "Timing analysis". Assignments of computational priorities based on data usage requirements. * Third level analysis. Reduction of program to functional level, rather than instructive level. * Preparation of flowchart blocks. * Preparation of Intermediate Form subject program. The folowing programs were part of the analyser phase: * Reader (contains the Read Microform Record (RMR) subroutine) * Data handler * Analyser * Flow charter * Program modifier (accepts directives from the user to modify the subject program). The Reader was overlaid by the analyser. Its job was to read the Mircoform. Some of the analyses included jump elimination and crude loop analysis. "Register usage analysis" seems to have been a crude form of dataflow analysis, but pattern based! There were 9 patterns (somewhat high level, e.g. any load followed by another load to the same register). Most patterns seemed to be aimed at removing registers from the intermediate form. (Those not removed were assigned "pseudo memory locations".) In summary, the analyses were not very general, but with the simple machines of the day, they seemed to get away with it. The "timing analysis" does not appear to be associated with the execution time of instructions, but more about sequence of uses of data. Perhaps it was to do with splitting live ranges, or to somehow cope with self modifying code. It seems to have been associated with the "precedence hierarchy" of logical blocks (basic blocks), to "free the program from programmer- and hardware- dictated arbitrariness", and to "locate elements which depend on their relative position in the program flow for their content or meaning". The intermediate form has such high level concepts as indexes (indexed variables, such as arrays). However, it falls short of structuring actual DO (etc) loops (which is the job of the Converter). The Data Handler seems to have been an internal housekeeping program, handling the details of overlapped I/O, look-ahead, "re-linking" records after a deletion or insertion, and so on. It was possible to make changes to the program by "modifying" the flowchart (the Program Modifier). This was achieved with commands (input on punched cards); the following is a subset: SELECT Block Identifier. INSERT BLock Identifier IN PATH ______. EXIT PATH ______ TO Block Identifier. DATA Flow chart symbol,Size,Data Block. If the inserted block was a subroutine: RETURN PATH _____ TO Block Identifier. (Subroutines seemed to have multiple exits in those days.) DELETE Block Identifier. DELETE PATH ______. Flow chart modification affected only the intermediate form. ---++ Intermediate Form Intermediate Form is the high level intermediate representation. It was designed with the following languages as likely targets: FORTRAN, COBOL, ALGOL, JOVIAL, and PL/1. To give a flavour, here are some of the codes: Mathematical Symbols 001 Plus 010 Reciprocal 002 Minus 011 Exponent 003 Unary minus 012 Boolean OR 004 Inverse subtract 013 AND 005 Multiply 014 EXTRACT 006 Divide 015 EXCLUSIVE OR 007 Inverse divide 016 EQUIVALENT 020 Equals 017 CONDITIONAL Internal Functions 140 I/O Subfields 01 Test 10 Input 02 Position 20 Close 04 Output 40 Open 040 Round 050 Modulo 041 Normalise 051 Remainder 042 Adjoin 052 Absolute value 043 Largest integer 053 Set ON (-,1) 044 Smallest integer 054 Set OFF (+,0) 045 Maximum 046 Minimum 100 Test 101 Compare Subfield 102 Search 01 Match 103 Sort 02 Mismatch 104 Enter 03 Max value in array 105 Move 04 Min value in array 05 Greater than comparand 06 Greater or equal to comparand 07 Less or equal to comparand 10 Between limits of 2 comparands 11 Next lower than comparand 12 Next higher than comparand 120 Internal procedure 130 External procedure Operand types: 01 Constant 02 Literal 04 Variable 05 Subscripted variable 06 Subscript 10 Function 11-16 Parameters as 01-06 above 20 I/O list 30 Format list ---++ The Converter phase This phase wrote the high level language. Apparently there were so many dialects of COBOL in those days, that they suggested that the Converter should handle all the variations. They wrote "skeletal converters" for FORTRAN and COBOL (they claim that most of their effort was spent on the analysis phase). ---++ Conclusion ALthough this appears to have been a genuine attempt at a flexible, general system, it seems to have assumed many characteristics of machines common in its day, but not common now. It is a real shame that it was never properly finished (as far as I can tell), e.g. that more interpreter phases were never written. -- Main.MikeVanEmmerik - 07 Jul 2003