Invited Talk by Oege de Moor
Joint work with Elnar Hajiyev, and Mathieu Verbaere
Abstract
Code queries are useful for enforcing coding conventions, navigating a large code base, and
for identifying locations to refactor. The program understanding community has long advocated
the use of a relational database to facilitate such code queries, but relational queries over code
have not found widespread use.
We argue this is due to a lack of scalability (it takes too long to evaluate queries), and a lack
of a modern query language with all the tool support that entails (writing code queries in SQL
can take many pages). In this talk we demonstrate a new system that goes some way towards
alleviating both problems.
First, we demonstrate IQL, an object-oriented query language. It allows the concise expression of
complex queries; due to its object-oriented features, it is easy to extend and modify existing
queries. Another benefit of object-orientation is that it gives natural tool support, for instance
for auto-completion.
The semantics of IQL is defined by translation to Datalog. Datalog is a logic programming language
like Prolog, but it lacks data structures. As a consequence, all queries in Datalog are guaranteed
to terminate, and they have a very simple semantics, enabling aggressive optimisations.
While Datalog has been extensively studied in the theoretical database community, it lacks
a proper type system. We introduce a type system with subtyping to account for the class hierarchy
of IQL. Type inference is achieved via an abstract interpretation that maps each relation between runtime
values to a relation between binding sites.
We implement Datalog itself via a compiler to procedural SQL; a configuration file allows us one
target different relational database systems as the backend. The compiler performs many
optimisations, ranging from straightforward constant propagation to a form of the well-known
`magic sets' transformation. It also performs type specialisation, based on the abstract
interpretation in the type inference algorithm.