---+ The PILER Decompilation System
%TOC%

The PILER decompilation system is documented in [ [[http://www.program-transformation.org/twiki/bin/view/Transform/HistoryOfDecompilation1#RefBarb74][Barb74]] ]. However, this document is probably only available in Microfiche form, and would be difficult for many to access. As the first attempt at a general decompiler, I thought it was worth summarising in a little more detail.

The PILER decompiler system follows (almost) the classic three phase structure:
	* Interpretation of the subject (input) program's "commands" (the Interpreter)
	* Analysis of the source program for function and data usage (the Analyser)
	* Expression of the source program in the target language (the Converter)

Each of these phases is a totally separate program, and could be run on three different machines!
The interpreter phase was always run on the source machine.

<img src="%ATTACHURLPATH%/Piler.png" alt="Piler.png"  />

A reproduction of Figure 2 of [Barb74], complete with upper case text.

---++ The "interpreter" phase

The "interpreter" phase (not a program interpreter as we would use the term now) ran on the source machine, and has the input program loaded with it (just insert the cards into the deck!). The interpreter therefore used the loader and operating system of the source machine. The output of this phase (the first intermediate representation) was called Micro Form. This was supposed to be general, but reflected the narrow variety of machines at that time. Instead of making this a clean interface, they decided that Micro Form could be in any form that was convenient for the interpreter phase, as long as the subroutine RMR (Read Microform Record) in the Analyser phase could read it.

The interpreter phase had three subphases: loader, analysis (not the same as the analysis phase; this subphase just attempts to separate code from data), and interpretation. The analysis subphase generated several data structures: a memory map, symbol table, subroutine table, data reference table, and an input/output buffer table. It seems that the opcode field was often used to decide if a word was instruction, data, or I/O control. "Secondary microforms" were appended for indexed instructions and the like.

---++ Micro Form

Microform was the low level intermediate representation. It was 3 address.

---++ The analyser phase

This phase could be run on any computer, not necessarily the source or target machines. The basic procedures were (from [Barb74]):

	* First level analysis. Memory mapping for separation of program instructions and data. Division of program into logical blocks. Preliminary "timing analysis". Flagging of indexed or modified branch addresses and operation codes. Subroutine analysis.

	* Data analysis. Historical use of data locations and registers. Formats of data.

	* Second level analysis. Analysis of modified addresses and operation codes and completion of first level analysis and data analysis for any additional segments. Program reduction.

	* "Timing analysis". Assignments of computational priorities based on data usage requirements.

	* Third level analysis. Reduction of program to functional level, rather than instructive level.

	* Preparation of flowchart blocks.

	* Preparation of Intermediate Form subject program.

The folowing programs were part of the analyser phase:
	* Reader (contains the Read Microform Record (RMR) subroutine)
	* Data handler
	* Analyser
	* Flow charter
	* Program modifier (accepts directives from the user to modify the subject program).

The Reader was overlaid by the analyser. Its job was to read the Mircoform.

Some of the analyses included jump elimination and crude loop analysis.
"Register usage analysis" seems to have been a crude form of dataflow analysis, but pattern based!
There were 9 patterns (somewhat high level, e.g. any load followed by another load to the same register).
Most patterns seemed to be aimed at removing registers from the intermediate form.
(Those not removed were assigned "pseudo memory locations".)

In summary, the analyses were not very general, but with the simple machines of the day, they seemed to get away with it.

The "timing analysis" does not appear to be associated with the execution time of instructions, but more about sequence of uses of data. Perhaps it was to do with splitting live ranges, or to somehow cope with self modifying code. It seems to have been associated with the "precedence hierarchy" of logical blocks (basic blocks), to "free the program from programmer- and hardware- dictated arbitrariness", and to "locate elements which depend on their relative position in the program flow for their content or meaning".

The intermediate form has such high level concepts as indexes (indexed variables, such as arrays). However, it falls short of structuring actual DO (etc) loops (which is the job of the Converter).

The Data Handler seems to have been an internal housekeeping program, handling the details of overlapped I/O, look-ahead, "re-linking" records after a deletion or insertion, and so on.

It was possible to make changes to the program by "modifying" the flowchart (the Program Modifier). This was achieved with commands (input on punched cards); the following is a subset:

SELECT <u>Block Identifier</u>.

INSERT <u>BLock Identifier</u> IN PATH ______.

EXIT PATH ______ TO <u>Block Identifier</u>.

DATA <u>Flow chart symbol</u>,<u>Size</u>,<u>Data Block</u>.

If the inserted block was a subroutine: RETURN PATH _____ TO <u>Block Identifier</u>. (Subroutines seemed to have multiple exits in those days.)

DELETE <u>Block Identifier</u>.

DELETE PATH ______.

Flow chart modification affected only the intermediate form.

---++ Intermediate Form

Intermediate Form is the high level intermediate representation.
It was designed with the following languages as likely targets:
FORTRAN, COBOL, ALGOL, JOVIAL, and PL/1.
To give a flavour, here are some of the codes:

<u>Mathematical Symbols</u>
<verbatim>
001  Plus					 010  Reciprocal
002  Minus					011  Exponent
003  Unary minus			012  Boolean OR
004  Inverse subtract	 013			 AND
005  Multiply				014			 EXTRACT
006  Divide				  015			 EXCLUSIVE OR
007  Inverse divide		016			 EQUIVALENT
020  Equals				  017			 CONDITIONAL
</verbatim>
<u>Internal Functions</u>
<verbatim>
140 I/O		Subfields
				 01  Test		  10  Input
				 02  Position	 20  Close
				 04  Output		40  Open

040  Round					050  Modulo
041  Normalise			  051  Remainder
042  Adjoin				  052  Absolute value
043  Largest integer	  053  Set ON (-,1)
044  Smallest integer	 054  Set OFF (+,0)
045  Maximum
046  Minimum
100  Test
101  Compare				 Subfield
102  Search				  01  Match
103  Sort					 02  Mismatch
104  Enter					03  Max value in array
105  Move					 04  Min value in array
								 05  Greater than comparand
								 06  Greater or equal to comparand
								 07  Less or equal to comparand
								 10  Between limits of 2 comparands
								 11  Next lower than comparand
								 12  Next higher than comparand
120 Internal procedure
130 External procedure
</verbatim>

Operand types:
<verbatim>
01	 Constant
02	 Literal
04	 Variable
05	 Subscripted variable
06	 Subscript
10	 Function
11-16 Parameters as 01-06 above
20	 I/O list
30	 Format list
</verbatim>

---++ The Converter phase

This phase wrote the high level language. Apparently there were so many dialects of COBOL in those days, that they suggested that the Converter should handle all the variations. They wrote "skeletal converters" for FORTRAN and COBOL (they claim that most of their effort was spent on the analysis phase).

---++ Conclusion

ALthough this appears to have been a genuine attempt at a flexible, general system, it seems to have assumed many characteristics of machines common in its day, but not common now.
It is a real shame that it was never properly finished (as far as I can tell), e.g. that more interpreter phases were never written.


-- Main.MikeVanEmmerik - 07 Jul 2003