Accepted Papers
ACM SIGPLAN 2010 Workshop on Partial Evaluation and Program Manipulation
The following papers (in no particular order) will be presented at the workshop.
Regular research papers:
- Christopher Brown and Simon Thompson. Clone Detection and Elimination for Haskell.
Abstract: Duplicated code is a well known problem in software maintenance and refactoring, and several studies have shown that duplicated code results in projects that are difficult to maintain. Code clones tend to increase program size and make maintenance and code understanding more complex and time consuming.
This paper presents a new technique for the detection and removal of duplicated Haskell code. The system is implemented within the refactoring framework of the Haskell Refactorer (HaRe), and uses an Abstract Syntax Tree (AST) based approach. Detection of duplicate code is automatic, while elimination is semi-automatic, allowing the user to step through the clone removal. The system is explained, then an example is given to show how it works in practice.
- Adrian Riesco and Juan Rodriguez-Hortala. Programming with Singular and Plural Non-deterministic Functions.
Abstract: Non-strict non-deterministic functions are one of the most distinctive features of functional-logic languages. Traditionally, two semantic alternatives have been considered for these kind of functions: call-time choice and run-time choice. While the former is the standard choice of modern implementations of FLP, the latter lacks some basic properties---mainly compositionality---that have prevented its use in practical FLP implementations. Recently, a new compositional plural semantics for FLP has been proposed. Although this semantics allows an elegant encoding of some problems---in particular those with an implicit manipulation of sets of values---, call-time choice still remains the best option for many common programming patterns.
In this paper we explore the expressive possibilities of the combination of singular and plural non-determinism. After formalizing the intended semantics by means of a logic calculus, several significative examples exploiting the capabilities of the semantics are presented. These examples have been tested and developed in a Maude-based prototype whose implementation is outlined.
- Andrew Moss and Dan Page. Bridging the Gap Between Symbolic and Efficient AES Implementations.
Abstract: The Advanced Encryption Standard (AES) is a symmetric block cypher used to encrypt information in many applications. As a result of its widespread use there have been many published results showing efficient software implementations for a wide range of architectures. The most efficient of these programs are written in very low-level approaches; platform dependent assembly is used to schedule instructions and most of the cipher is precomputed into lookup tables expressed as numerical constants. The need to resort to such low-level techniques can be seen as a failure to provide suitably efficient high level languages to the cryptographic community. This paper investigates the language features necessary to express AES in a natural formulation, and the program transformations necessary to produce efficient but portable code.
- Johannes Rudolph and Peter Thiemann. Mnemonics: Type-safe Bytecode Generation at Run Time.
Abstract: Mnemonics is a Scala library for generating method bodies in JVM bytecode at run time. Mnemonics supports a large subset of the JVM instructions, for which the static typing of the generator guarantees the wellformedness of the generated bytecode.
The library exploits a number of advanced features of Scala's type system (type inference with bounded polymorphism, objects with type members, implicit parameters, and reflection) to guarantee that the compiler only accepts legal combinations of instructions at compile time. Additional instructions can be supported at the price of a check at run time of the generator. In either case, bytecode verification of generated code is guaranteed to succeed.
- Martin Hofmann and Emanuel Kitzelmann. I/O Guided Detection of List Catamorphisms.
Abstract: Inductive programming (IP), usually defined as a search in a space of hypothesis, is an inherent exponentially complex problem. To constrain the search space, program templates have ever been one of the first choices. In previous approaches to incorporate program schemes, either an (often very well) informed expert user has to provide a template in advance, or templates are used simply on suspicion, regardless whether they are target-aiming or not. Instead of rather fit the data to the template, we present an approach to fit a template to the data. We propose to utilise universal properties of higher-order functions to detect the appropriateness of a certain template in the input/output examples. We use this technique to introduce catamorphisms on lists in our inductive programming system IgorII.
- Xin Li and Mizuhito Ogawa. Conditional Weighted Pushdown Systems and Applications.
Abstract: Pushdown systems are well understood as abstract models of programs with (recursive) procedures. Reps et al. recently extend pushdown systems into weighted pushdown systems, which serve as a generalized framework for solving certain kinds of meet-over-all-path problems in program analysis. In this paper, we extend weighted pushdown systems to conditional weighted pushdown systems, by further specifying conditions under which a pushdown transition rule can be applied, and show that model checking problems on conditional weighted pushdown systems can be reduced to those on weighted pushdown systems.
There are various applications of conditional weighted pushdown systems in the analysis of programs with objected-oriented features, for which weighted pushdown systems do not suffice the precision request. As an example, we lift a fundamental stacking-based points-to analysis for Java originally designed in the framework of weighted pushdown systems into a more precise counterpart by conditional weighted pushdown systems. In addition to context-sensitivity in terms of valid call paths, the lifted points-to analysis algorithm further enjoys context-sensitivity with respect to objected-oriented features, including call graph construction, heap abstraction, and heap access. These context-sensitive properties are shown to be crucial to the analysis precision in practice.
- Hugh Anderson and Siau-Cheng KHOO. Regular Approximation and Bounded Domains for Size-Change Termination.
Abstract: The size-change principle devised by Lee, Jones and Ben-Amram, provides an effective method of determining program termination for recursive functions. It relies on a regular approximation to the call structure of the program, operates only over variables whose “size” is well-founded, and ignores the conditionals and return values in the program. The main contribution of our paper is twofold: firstly we improve size-change termination analysis by using better regular approximations to program flow, and secondly we extend the analysis beyond the original well-founded variables to include integer variables. In addition, we pay attention to program conditionals that are expressed by linear constraints and support the analysis of functions in which the return values are relevant to termination. Our analysis is entirely mechanical, exploits the decidability and expressive power of affine constraints and significantly extends the set of programs that are size-change terminating.
- Stefan Holdermans and Jurriaan Hage. Making Stricterness More Relevant.
Abstract: Adapting a strictness analyser to have it take into account explicit strictness annotations can be a tricky business. Straightforward extensions of analyses based on relevance typing are likely to either be unsafe or fail to pick the fruits of increases in strictness that are introduced through annotations. We propose a more involved adaptation of relevance typing, that can be used to derive strictness analyses that are both safe and effective in the presense of explicit strictness annotations. The resulting type system provides a firm foundation for implementations of type-based strictness analysers in compilers for lazy programming languages such as Haskell and Clean.
- Nabil el Boustani and Jurriaan Hage. Corrective Hints for Type Incorrect Generic Java Programs.
Abstract: Since version 1.5, generics (parametric polymorphism) are part of the Java language. Experience with implementations of the Java Language Specification such as EJC and JAVAC have shown that the type error messages provided by these tools leave more than a little to be desired. Type error messages are often uninformative and sometimes show artifacts of the type checking process in the messages. Evidently, providing good type error messages for a language as large and complex as Java currently is, is not easy.
To alleviate the problem, we describe a number of heuristics that suggest fixes for generic method invocations in Generic Java, and illustrate their effect by means of examples. The heuristics are part of an extension to the original type checking process that has been implemented into the JastAdd Extensible Java Compiler.
- Évelyne Contejean, Pierre Courtieu, Julien Forest, Andrei Paskevich, Olivier Pons and Xavier Urbain. A3PAT, an Approach for Certified Automated Termination Proofs.
Abstract: Software engineering, automated reasoning, rule-based programming or specifications often use rewriting systems for which termination, among other properties, may have to be ensured. This paper presents the approach developed in Project A3PAT to discover and moreover certify, with full automation, termination proofs for term rewriting systems.
It consists of two developments: the Coccinelle library formalises numerous rewriting techniques and termination criteria for the Coq proof assistant; the CiME3 rewriting tool translates termination proofs (discovered by itself or other tools) into traces that are certified by Coq assisted by Coccinelle.
The abstraction level of our formalisation allowed us to weaken premises of some theorems known in the literature, thus yielding new termination criteria, such as an extension of the powerful sub-term criterion (for which we propose the first full Coq formalisation). Techniques employed in CiME3 also improve on previous works on formalisation and analysis of dependency graphs.
- José Pedro Magalhăes, Stefan Holdermans, Johan Jeuring and Andres Löh. Optimizing Generics Is Easy!
Abstract: Datatype generic programming increases program reliability by reducing code duplication and enhancing reusability and modularity. Several generic programming libraries for Haskell have been developed in the past few years. These libraries have been compared in detail with respect to expressiveness, extensibility, typing issues, etc., but performance comparisons have been brief, limited, and preliminary. It is widely believed that generic programs run slower than hand-written code. In this paper we present an extensive benchmarking suite for generic functions and analyze the potential for automatic code optimization at compilation time. Our benchmark confirms that generic programs, when compiled with the standard optimization flags of the Glasgow Haskell Compiler, are substantially slower than their hand-written counterparts. However, we also find that the optimization capabilities of GHC can be used to further optimize generic functions, sometimes achieving the same efficiency as hand-written code.
- Arun Lakhotia, Davidson Boccardo, Anshuman Singh and Aleardo Manacero Júnior. Context-Sensitive Analysis of Obfuscated x86 Executables.
Abstract: We introduce a method for context-sensitive analysis of binaries that may have obfuscated procedure call and return operations. These binaries may use direct stack operators instead of the native `call' and `ret' instructions to achieve equivalent behavior. Since definition of context-sensitivity and algorithms for context-sensitive analysis have thus far been based on the specific semantics associated to procedure call and return operations, classic interprocedural analyses cannot be used reliably for analyzing programs in which these operations cannot be discerned. A new notion of context-sensitivity is introduced that is based on the state of the stack at any instruction. While changes in calling-context are associated with transfer of control, and hence can be reasoned in terms of paths in an interprocedural control flow graph (ICFG), the same is not true of changes in stack-context. An abstract interpretation based framework is developed to reason about stack-context and to derive analogues of call-strings based methods for the context-sensitive analysis using stack-context. The method presented is used to create a context-sensitive version of Venable et al.'s algorithm for detecting obfuscated calls. Experimental results show that the context-sensitive version of the algorithm generates more precise results and is also computationally more efficient than its context-insensitive counterpart.
- Fritz Henglein. Optimizing Relational Algebra Operations Using Generic Equivalence Discriminators and Lazy Products.
Abstract: We show how to efficiently implement in-memory execution of generic map-filter-product queries, generalizations of select-project-join (SPJ) queries in relational algebra, based on a combination of two novel techniques: generic discrimination-based joins and lazy (symbolic) products.
Discrimination-based joins employ equivalence discriminators, a partition-like generic function we introduce here. A discriminator partitions a list of values according to a user-specified equivalence relation on keys the values are associated with. Equivalence relations can be specified in a rich embedded language for denoting equivalence relations. We show how discriminators can be generically (by structural recursion on equivalence expressions), purely functionally, and efficiently (worst-case linear time) implemented on top of the array-based basic multiset discrimination algorithm of Paige et al. In contrast to hashing discrimination is fully abstract (only depends on which equivalences hold on its inputs), and in contrast to comparison-based sorting it does not require an ordering relation on its inputs. In particular, it is applicable to references (pointers). Furthermore, it has better asymptotic computational complexity than both sorting and hashing.
We represent a cross-product and union lazily (symbolically) as formal pairs of the argument sets (relations). This allows the selection operation to recognize on the fly whenever it is applied to a cross-product, in which case it can invoke an efficient equijoin implementation. In particular, queries can still be formulated naively, using filter, map and product without an explicit join operation, yet garner the advantages of efficient join-algorithms during evaluation.
The techniques subsume many of the optimization techniques based on relational algebra equalities, without need for a query preprocessing phase. They require no indexes and behave purely functionally. Together they can be considered a form of symbolic execution of set expressions that automate and encapsulate dynamic program transformation of such expressions and lead to asymptotic performance improvements in many cases. Being dynamic means that there is no proper phase distinction between program transformation and execution: they are interleaved, admitting exploiting run-time data to control transformation.
Full source code in Haskell extended with Generalized Algebraic Data Types (GADTS) is included. GADTs are used to represent small embedded languages for sets (relations), functions (projections), predicates and equivalence expressions in a type safe manner. The code included is only intended for operations on in-memory data. Only joins for equivalences are addressed, not for ordering relations.
- Michele Baggi, María Alpuente, Demis Ballis and Moreno Falaschi. A Fold/Unfold Transformation Framework for Rewrite Theories extended to CCT.
Abstract: Many transformation systems for program optimization, program synthesis, and program specialization are based on fold/unfold transformations. In this paper, we present a fold/unfold-based transformation framework for rewriting logic theories which is based on narrowing. For the best of our knowledge, this is the first fold/unfold transformation framework which allows one to deal with functions, rules, equations, sorts, and algebraic laws (such as commutativity and associativity). We provide correctness results for the transformation system w.r.t. the semantics of ground reducts. Moreover, we show how our transformation technique can be naturally applied to implement a Code Carrying Theory (CCT) system. CCT is an approach for securing delivery of code from a producer to a consumer where only a certificate (usually in the form of assertions and proofs) is transmitted from the producer to the consumer who can check its validity and then extract executable code from it. Within our framework, the certificate consists of a sequence of transformation steps which can be applied to a given consumer specification in order to automatically synthesize safe code in agreement with the original requirements. We also provide an implementation of the program transformation framework in the high-performance, rewriting logic language Maude which, by means of an experimental evaluation of the system, highlights the potentiality of our approach.
Tool demonstration papers:
- Florian Haftmann. From Higher-Order Logic to Haskell: There and Back Again.
Abstract: We present two tools which in connection allow to reason about (a certain class of) Haskell programs. One is the code generator of the proof assistant Isabelle, which turns specifications formulated in Isabelle's higher-order logic into executable Haskell source text; the other is Haskabelle, a tool to translate programs written in (a subset of) Haskell into Isabelle specifications. Hereby the translation from Isabelle to Haskell directly profits from the rigorous correctness approach of a proof assistant: generated Haskell programs are always partial correct wrt. to the specification they are generated from.
- Martin Hofmann. Igor2 - an Analytical Inductive Functional Programming System.
Abstract: Igor2 is an implemented prototype for constructing recursive functional programs from few non-recursive, possibly non-ground example equations describing a subset of the input/output (I/O) behaviour of a function to be implemented. Starting from an initial, overly general program hypothesis, stepwise several refinement operators are applied which compute successor hypotheses. Organised as an uniformed cost search, the hypothesis with the lowest costs is developed and replaced by its successors until best hypothesis does not contain any unbound variables.
- Ivan Lazar Miljenovic. The SourceGraph Program.
Abstract: As software has increased in size and complexity, a range of tools have been developed to assist programmers in analysing the structure of their code. One of the key concepts used for such analysis is the concept of a call graph, which is used to represent which entities in a code base call other entities. However, most tools which use call graphs are limited to either visualisation for documentation purposes (such as Doxygen) or for dynamic analysis to find areas to optimise in the software using profiling tools such as gprof.
SourceGraph is a new tool which takes a different approach to call-graph based software analysis for projects written in Haskell. It creates a static call graph directly from the source code and then uses it to perform static analysis using graph-theoretic techniques with the aim of helping the programmer understand how the different parts of their program interact with each other. Whilst still a work in progress, it can already be used to find possible problems in the code base such as unreachable areas and cycles or cliques in the function calls as well as other useful information. SourceGraph thus provides programmers not only with various ways of visualising their software, but helps them to understand what their code is doing and how to improve it.
- Elvira Albert, Miguel Gomez-Zamalloa and German Puebla. PET: A Partial Evaluation-based Test Case Generation Tool for Java Bytecode.
Abstract: PET is a prototype Partial Evaluation-based Test case generation tool for a subset of Java bytecode programs. It performs white-box test generation by means of two consecutive Partial Evaluations (PE). The first PE decompiles the Java bytecode program into an equivalent CLP (Constraint Logic Programming) counterpart. The second PE generates a test-case generator from the CLP program. This generator captures interesting test coverage criteria and it is able to generate further test cases on demand. For the first PE, PET incorporates an existing tool which decompiles bytecode to CLP. The main contribution of this work is the implementation of the second PE and the proof of concept of the approach. This has required the development of a partial evaluator for CLP with appropriate control strategies to ensure the required coverage criteria and to generate test-case generators. PET can be downloaded as free software from its web site, where a repository of examples and a web interface are also provided. Though PET has to be extended to be applicable to larger programs, we argue that it provides some evidence that the approach can be of practical interest.