Research program
AIDA - Abstract Interpretation Design and Applications
University Co-ordinator
Università degli Studi di VERONA -
INFORMATICA - VERONA(VR)
Research Unit Leader
Roberto GIACOBAZZI
Description
The team of Verona has matured a great experience in abstract interpretation theory and practice, in particular in the fields of systematic abstract domain design, in static program analysis, in abstract model checking, and in language-based security. The common research theme in our unit is the design and systematic derivation of abstract domains modeling relevant properties of the system under inspection. We plan to contribute to the AIDA project in this direction in the field of static program analysis, language-based security, verification of hybrid systems by abstract interpretation, abstract model checking and abstract testing. WP1: "Static Program Analysis"Task VR1.1: "Calculational design of abstract domain transformers". We are interested in the definition of a general framework for the calculational design of abstract domain transformers. The main idea to approach this problem is to use the same abstract interpretation framework but now lifted higher-order: the object of discourse are domains instead of program state descriptions. The use of abstract interpretation in higher types will show the potential of abstract interpretation methods to fully design versatile program analyzers, namely parametric on the properties that they want to induce or to preserve under approximation. This research has a foundational perspective for WP1, WP2, and WP6. One of the main aspect of this algebra of domain transformers is the characterization of the reversability of operators. This plays a key role in order to associate with each domain refinement operation a corresponding domain simplification or even compressor. We plan to generalize the results in [GR98unif] in order to study how to make a refinement reversable. This corresponds to specify new higher-order operation making transformers reversible. Task VR1.2: "Static analysis of Java bytecode"We want to continue the development of a generic static analyser for the Java bytecode. The interest of such a tool stems from the fact that Java bytecode is extensively used for Internet and smart cards programming, so that formal and automatic proofs of its correctness and security or automatic techniques for its optimisation have practical significance. A static analyzer for Java bytecode based on abstract interpretation, called Julia [Julia] has been implemented in Verona and it is available under the GNU Public Licence. There are some points of the Julia analyzer that need to be improved. Its efficiency must receive further attention in order to be able to analyse larger and larger applications. In particular we will investigate how the possibility of splitting the analysis in different threads of execution can be exploited in order to make the execution of Julia parallel or distributed, so that the analyser will exploit a multiprocessor environment or work in a grid-computing context. This will be studied in the context of new domains for escape, class and non-interference analyses (see Tasks VR1.4 and VR2.1). A further point to develop in Julia is the front-end to the analyser, so that the user can easily control the analyser and specify the points in the program where to focus the analysis. Together with a technique based on a watchpoint semantics developed in Task VR1.3, this will provide a fast and localised answer to the queries of the user about the abstract behaviour of the program under analysis. Task VR1.3: "Watchpoint analysis"The size and complexity of current software is a real challenge to the application of abstract interpretation and static analyses in general. Since the cost of the analysis increases with the size of the program, many real-world applications cannot be analysed due to lack of computational resources. To cope with this problem, the precision of the analyses is often tuned down. Typically, flow or control sensitivity is sacrified. The goal of our research will be then 1) to show how the watchpoint semantics can be applied to a complex and unstructured language such as the Java bytecode, where dynamic binding of classes and exception handling are ubiquitous; 2) to instantiate it for class and escape analyses, hence obtaining two focused static analyses for non-trivial Java bytecode programs. To solve point 1, we will reconstruct the hidden structure of the Java bytecode as a graph of pieces of code, similar to the dominators graph in compiling technology. Note that the traditional techniques of decompilation cannot be used because they reconstruct a Java code where goto's are still ubiquitous. Moreover, we have no guarantee that the code we are analysing results from the compilation of Java. To solve point 2, we will use the abstract domains already defined in [SJ03,HS02,HS02b] and adapt them to the stack-based philosophy of the Java Virtual Machine. We will consider stack elements as variables, and describe each bytecode through a transfer function on states made of local variables and stack variables.Task VR1.4: "Constraint-based escape analysis"Constraints have been extensively used for the static analysis of object-oriented code. However, they have only been used up to now to compile a program into a constraint whose solution express correct abstract information for the program. This leads to fast but flat analyses, which are both flow and control insensitive. Our goal here is to use constraints to express the abstract transfer functions of instructions (or bytecodes) in order to improve on escape analysis. In particular we want an implementation of the escape analyses defined in [HS02,HS02b] and of some of the class analyses defined in [SJ03] through set-constraints [DPPR00]. Since abstract tranfer functions for these analyses work on sets of creation points of objects (or classes of objects, respectively), set constraints look as the natural solution, instead of a sub-symbolic representation in terms of binary decision diagrams. For instance, an instruction such as v:=w, which copies the value of w into that of v, can be abstractly modeled by a transfer function which copies the creation points (or the possible classes) of w into those of v. However, constraints must be kept small and easy to manipulate, otherwise the analysis will not scale up to large programs. A possible choice is to use constraints built from union and intersection only. This choice might impose some restriction on the representation of the abstract transfer function, so that sub-optimal representations might be used instead of the optimal ones devised in [HS02,HS02b,SJ03]. For instance, transfer fucntions that test the (non-)emptyness of a set cannot be precisely expressed and must be approximated.WP2: "Security"Task VR2.1: "Abstract non-interference"The notion of abstract non-interference in [GM04] is general and in principle applicable to any language-based security problems. One of the most interesting aspects of abstract non-interference is that it is a semantic notion, therefore by refining the semantics we could refine the considered notion of secrecy. In particular this would allow to reveal also covert channels [SM03] due to non-termination, timing channels due to different termination time of computations and probabilistic channels due to different probabilities of computations. We plan to extend abstract non-interference also to concurrent and multi-threaded languages, where the interleaving of processes adds new possible interferences. This extension requires the definition of a bisimulation semantics which is parametric on the attacker specified as an abstraction. This would be particularly useful for applying abstract non-interference to check concrete security protocols, or to derive safe conditions for protocol execution. In order to check abstract non-interference for real programs we can take several ways. A first approach is to define a sound syntax-based proof system for abstract non interference, this would allow also to define proof-carryng code (PCC) architecture for abstract non-interference [N97]. In this case the code producer may create an abstract non-interference certificate that attests to the fact that the code secrecy cannot be violated by the corresponding model of the attacker. Then the code consumer may validate the certificate to check that the foreign code is secure for the corresponding model of attacker. The implementation of this technology requires an appropriate choice of a logical framework [HHP93] for specifying abstractions. This would allow programs to carry the proof of their abstract non-interference and hosting process to efficiently verify the carried proof. Another important applicative aspect of abstract non-interference is program slicing [W84]. Usually the slicing criteria used for slicing programs have to consider the dependecy relations among variables. We can note that if a variable depend on another variable then there is an interference. This means that we could study how to use the existing slicing algorithm in order to check abstract non-interference, or conversely use abstract non-interference to weaken slicing criteria. Task VR2.2: "Models for control/data flow"We are interested in the use of boolean functions to model control/data flow in language-based security. In particular we are interested in the systematic construction of domains for information flow by combining boolean functions and dependence analysis. We are interested in studying the link between information flow analysis and groundness analysis of logic programs. From the information flow point of view, we can consider ground terms as objects to which information cannot flow. This approach can be further generalized to model abstract non-interference. We want to check if we can generalize the techniques in [GM04] to model abstract information flow dependencies. Moreover we are interested in extending the techniques presented in [GGM04] to a richer programming languages with procedures, classes, recursion etc. A good candidate language for applying information flow analysis is Java programing language. We are interested in extending Julia in order to become a bytecode verifier for abstract non-interference properties of Java bytecode. This requires advanced abstract domain technologies in order to combine boolean functions with type-like information for modeling abstract non-interference. This is particularly critical in modeling implicit flows because in the case of the Java bytecode the scope of the two branches of a test can have arbitrary shape, or even include the test itself.WP3: "Testing and debugging"Task VR3.1: "Abstract testing"Our interest in program testing is mainly connected with the non-interference (Task VR2.1). Testing is an important activity for checking the correctness of system implementations. The specification is a prescription of what the system should do: the goal of testing is to check, by means of testing, whether the implemented system indeed satisfies this prescription. This kind of testing is called "conformance testing". Starting from ISO 9646 concepts, conformance is interpreted as the satisfaction of a set of conformance requirements, which are derived from the specification of a system. Given a formal specification, the set of conformance requirements defined by the specification can be obtained through a particular relation. These requirements are a set of properties that all conforming implementations should satisfy. Many testing techniques are based on fault-injection, namely they make environment perturbations, considered as faults, and observe what happen in the execution of the implementation. For us there is a relation between these kind of testing methods and some notions of non-interference. Studying these relations would allow to model testing inside the framework of abstract non-interference, providing both a relation between abstract testing [CC00test] and abstract non-interference [GM04] and a formal semantics-based framework for handling test regression in program testing.WP4: "Hybrid and real-time Systems"Task VR4.1: "Hybrid system verification"In this workpackage we are interested mainly in automatic verification of hybrid systems by abstract model checking. The discretization of a hybrid automaton can be seen in the context of abstract interpretation as a complete abstraction of a hybrid systems, and so can be seen just as a special case of abstract interpretation. Model chacking is interested in complete abstractions, an exception being abstract model checking [CGL94,CGJL00] that aims at preserving only interesting formulas; however, complete abstractions are very expensive from the computational point of view, and the theory of abstract interpretation could be a fruitful starting point for deriving new kinds of abstractions and verification techniques (e.g., widening and narrowing) that are computationally efficient although not necessarily complete. We are interested in studying the classification of hybrid systems due to Henzingher et al.[HM00] in the context of abstract interpretation, by proving that the classes of Henzinger are related by abstractions and that each class can be seen as a complete abstraction of the concrete model of an hybrid automata. The interest in this approach is that it makes applicable the results in [GRS00] on making abstractions complete to derive constructions of symbolic transition systems from the logics that should be preserved. Indeed, abstract interpretation provides us with the most efficient constructions, while the constructions used in standard model checking are simply the best construction that we were able to build based on experience. We would like to see how widening and narrowing could be used as a basis for new aproximate model checking algorithms. In particular we are interested in the use of fractals as a basis for wideding in rectangular hybrid systems. Finally, we would like to see how the results that we derive for hybrid systems can be extended to stochastic hybrid systems.WP6: "Verification".In this workpackage we are interested in systematic domain transformer applications in order to both simplify abstract model checking and achieve strong preservation. The activity in this workpackage is an application of the techniques developed in Task VR1.1.Task VR6.1: "Abstract model checking compression"In [GM03] an algorithm for compressing abstract domains relatively to a completeness refinement has been introduced. This could be important for example in abstract model checking since in [GQ01,RT04] the problem of making models strong preserving has been characterized as a completeness refinement. This implies that compressing models corresponds to add all the possible counterexample, by abstracting as much as possible the transition system yet having the same original complete refinement. At this point the standard counterexample guided refinement [CGJL00] for a given formula would refine the abstract transition system only for making the model strong preserving for the given formula. In this way we obtain the most abstract system which is strong preserving for the given formula.Task VR6.2: "Semantic refinement"A symmetric aspect related with the abstract domain refinement technique is semantics refinement. By this we mean the problem of transforming semantics, viz. Scott continuous functions, such that their approximation with respect to a given fixed domain is complete. This is important in order to chraracterize which are the operations for which a given abstract domain is complete. The most important application of this technology is in abstract model checking, in order to derive the most concrete (strongest) logic for which abstract model checking under a given fixed abstraction is complete or strong preserving. This would provide a logic-based characterization of a taxonomy of abstractions in abstract model checking under the strong preservation requirement.