Program Transformation

Program-Transformation.Org: The Program Transformation Wiki

(See also ModelTransformation)

A Definition

A program is a structured object with semantics. The structure allows us to transform a program. The semantics gives us the means to compare programs and to reason about the validity of transformations. Semantics includes the extensional and intensional behaviour of a program. A programming language is a collection of programs. This is a rather broad definition that is designed to cover proper programming languages (with an operational interpretation), but also data formats and domain-specific languages.

Programming languages can be clustered into classes with structural and/or semantic similarities. One of the aims of a general framework for program transformation is to define transformations that are reusable accross as wide a range of languages as possible. For example, the notion of variable and variable binding is shared by all programming languages. Transformations dealing with variables such as bound variable renaming, substitution, and unification can be defined in a very generic manner for all languages at once.

Program transformation is the act of changing one program into another. The term program transformation is also used for a formal description of an algorithm that implements program transformation. The language in which the program being transformed and the resulting program are written are called the source and target languages, respectively.

Program transformation is known under many different names

A Taxonomy of Program Transformation

Program transformation is used in many areas of software engineering, including compiler construction, software visualization, documentation generation, and automatic software renovation. In all these applications we can distinguish two main scenarios, i.e., ones where the source and target language are different (translations) from scenarios where they are the same (rephrasings). These main scenarios can be refined into a number of typical sub-scenarios based on their effect on the level of abstraction of a program and to what extent they preserve the semantics of a program. This refinement results in the following TransformationTaxonomy.

Translation
Rephrasing
- ProgramNormalization
- ProgramOptimization
- ProgramRefactoring
  - ProgramObfuscation
- ProgramReflection^?
- SoftwareRenovation / ReEngineering

Translation

In a translation scenario a program is transformed from a source language into a program in a different target language. Translation scenarios can be distinguished by their effect on the level of abstraction of a program. Although translations aim at preserving the extensional semantics of a program, it is usually not possible to retain all information across a translation. Examples of translation scenarios are synthesis, migration, reverse engineering, and analysis. Their relations are illustrated by the diagram in the following figure:

Program Synthesis

ProgramSynthesis is a class of transformations that lower the level of abstraction of a program. In ProgramRefinement an implementation is derived from a high-level specification such that the implementation satisfies the specification. Compilation is a form of synthesis in which a program in a high-level language is transformed to machine code. This translation is usually achieved in several phases in which a high-level language is first translted into an intermediate representation. Instruction selection then translates the intermediate representation into machine instructions. Other examples of synthesis are parser and pretty-printer generation from context-free grammars.

Program Migration

In ProgramMigration a program is transformed to another language at the same level of abstraction. This can be a translation between dialects, for example, transforming a Fortran77 program to an equivalent Fortran90 program or a translation from one language to another, e.g., porting a Pascal program to C.

In some cases you may want to keep programming with the source language (Fortran77 in the example), and migrate your program (to Fortran90) each time you compile. It can be useful when you program in some kind of "old but very powerful source language" that you don't want to throw away, but you don't want to maintain the compiler.

Reverse Engineering

The purpose of ReverseEngineering is to extract from a low-level program a high-level program or specification, or at least some higher-level aspects. Reverse engineering raises the level of abstraction and is the dual of synthesis. Examples of reverse engineering are decompilation in which an object program is translated into a high-level program, architecture extraction in which the design of a program is derived, documentation generation, and software visualization in which some aspect of a program is depicted in an abstract way.

Program Analysis

ProgramAnalysis reduces a program to one aspect such as its control-flow. Analysis can thus be considered a transformation to a sub-language or an aspect language.

Rephrasing

Rephrasings are transformations that transform a program into a different program in the same language, i.e., source and target language are the same. In general, rephrasings try to say the same thing in different words thereby aiming at improving some aspect of the program, which entails that they change the semantics of the program. The main sub-scenarios of rephrasing are normalization, optimization, refactoring, and renovation.

Program Normalization

A normalization reduces a program to a program in a sub-language, with the purpose of decreasing its syntactic complexity. Desugaring is a kind of normalization in which some of the constructs (syntactic sugar) of a language are eliminated by translating them into more fundamental contructs. For example, the Haskell language definition describes for many constructs how they can be desugared to a kernel language. Other examples are module flattening and the definition of EBNF constructs in pure BNF as is done for example in the SDF2 normalizer . Simplification is a more general kind of normalization in which a program is reduced to a normal (standard) form, without necessarily removing simplified constructs. For example, consider canonicalization of intermediate representation and algebraic simplification of expressions. Note that normal form does not necessarily correspond to being a normal form with respect to a set of rewrite rules.

Program Optimization

An optimization is a transformation that improves the run-time and/or space performance of a program. Examples of optimization are fusion, inlining, constant propagation, constant folding, common-subexpression elimination, and dead code elimination.

Program Refactoring

ProgramRefactoring is a transformation that improves the design of a program by restructuring it such that it becomes easier to understand while preserving its functionality. ProgramObfuscation is a transformation that makes a program harder to understand by renaming variables, inserting dead code, etc. Obfuscation is done to hide the business rules embedded in software by making it harder to reverse engineer the program.

Program Reflection^?

ProgramReflection^? is a transformation that changes a program to compute some aspect of the program itself. For instance, one can transform a program such that, in addition to its usual behaviour, the program also calculates a trace of its own execution. Other uses might include the generation of self-profiling programs.

Software Renovation

In software renovation the extensional behaviour of a program is changed in order to repair an error or to bring it up to date with respect to changed requirements. Examples are repairing a Y2K bug, or converting a program to deal with the Euro.