Vai al contenuto| Home page|

   Ti trovi in: HOME »Programmi, progetti e risultati »I progetti »PRIN - Programmi di ricerca di Rilevante Interesse Nazionale»Programma di ricerca
INIZIO_TESTO_DA_INDICIZZARE

RESEARCH PROGRAM

italiano - inglese
Similar research programs:
Scientific and education field classification
International Patent Classification
Geographical classification
Keywords
ABSTRACT INTERPRETATION, STATIC ANALYSIS, ABSTRACT DOMAINS, SYSTEM VERIFICATION, FORMAL SEMANTICS

AIDA2007 - Abstract Interpretation Design and Applications

Università degli Studi di Padova
Abstract
Information sciences and technologies face the great challenge of developing effective methodologies, technologies, and tools for the requirement specification, architecture design, implementation, verification and test, deployment, maintenance and protection in workings and evolutions of composable, reliable, flexible, scalable and robust software. Scientific breakthroughs and innovation are required to maintain and gain market shares. Abstraction, as formalized by abstract interpretation, is mandatory in the development, integration, verification, analysis, testing and protection processes. The aim of the AIDA2007 project -- which is the continuation of the past successful AIDA2004 project -- is to strengthen the Italian research in abstract interpretation, by promoting abstract interpretation as a basic enabling technology required to build dependable software systems. Abstract interpretation will serve as the foundational theory to design methodologies and to build tools to ensure the interoperability, composability, scalability, reliability, robustness, dependability, security and autonomous self-adaptation of software. This theory has been proved to be a key technology for sequential, constraint, parallel (synchronous or asynchronous), distributed and mobile software as found in software and services -- whether centralized, distributed or embedded -- and has been recently shown to scale up for large industrial applications. AIDA2007 will focus on both basic design methodologies and three key applications in: static program analysis, code protection and system verification. In the basic methodologies we will consider the systematic design of optimal domanis for model checking, the design of abstract domains for points-to and aliasing analysis of C and Java programs, the design of scalable and precise numeric abstract domains, the development of systematic methods for measuring the precision of analyses by improving domains with probabilistic and statistical information, and semantic transformations for achieving completeness. In the context of static program analysis, we will attack the problems of: causality in concurrent and distributed systems, speculative optimization, termination and complexity analysis by abstract interpretation of Java and C code, modularity in numerical abstractions, language extensions by annotation in code design and analysis, and precise analyses of string cleanness. In code protection we will attack the problem of information concealing in programming languages, in particular for software watermarking, copyright protection, code obfuscation against reverse engineering and malware detection in presence of metamorphic (obfuscated) malware. In the context of automatic system verification we will investigate abstraction refinement algorithms for abstract model checking, transformations of temporal languages for achieving strong preservation against a given abstraction, and efficient algorithms based on abstract interpretation for computing behavioural equivalences in process algebras. <<<

Principal Investigator
Francesco Ranzato Università degli Studi di PADOVA
Research Objectives
The outstanding progress of software and hardware technology is at the origin of the modern information and communication society. On the one hand these technologies have provided one of the greatest economical and cultural revolution in the last century, on the other hand they represent the Achille heel of our modern society, becuase software and hardware reliability, trustworthiness, safety and security are often hard to prove. A consequence of this hazard is the increasing importance and cost of software and hardware validation. However, traditional validation methods do not scale up easily, have increasing costs, are slow and so extend time to market and do not guarantee optimal quality.

Abstract interpretation is a general theory and practice for the semantic approximation of dynamic system's behaviour. A profound unity of all well known software validation methods such as static program analysis, (abstract) model checking, type and effect systems is clearly exhibited by their formalization in terms of abstract interpretation. They all aim at obtaining safe but approximate information about the behaviour of a computer system by systematically abstracting the semantics of its specification, model or program. Abstract interpretation not only formalizes these various abstractions but also provides guidelines for their refinement and combination, and it is also the base technology for the design of tools for verification of such properties at various levels of abstraction. In the last decade this technology has gone well beyond the standard academic circles, showing a great potential as a cutting-edge technology for industrial applications. Examples are: the AbsInt SME (www.absint.com) commercializing aiT an abstract interpretation-based tool that is able to determine precise upper bounds on the execution time of real-time programs running on processors used in embedded systems from aeronautics and automotive industries; the Polyspace (www.polyspace.com) component in MathWorks packages providing a semantics-based tool for the automatic analysis of embedded software in Ada, C and C++, for debugging run-time errors; the "Astrée" static analyzer (www.astree.ens.fr) which was able to verify automatically the absence of run-time errors in a safety-critical, real-time, synchronous, flight control program counting over 500.000 lines of C code embedded in civil European aircrafts (e.g., AIRBUS A380 being the latest). Another sign of the times is provided by the GlobalGCC (GGCC) project within the ITEA programme (Information Technology for European Advancement), whose goal is to extend the GNU Compiler Collection (GCC) with program-wide static analysis techniques. While the time is ripe, there are still significant problems to be solved before this revolution in (critical) software development can happen on a larger scale.

The goal of the AIDA2007 project is to strengthen the Italian research community in abstract interpretation, by promoting abstract interpretation as a basic enabling technology required to build dependable software systems. Our goal is to mobilize, unify, galvanize and strengthen the abstract interpretation community in Italy, with the main goal of creating a virtual laboratory with common scientific objectives, both in basic and applied research. This will be achieved by a better integration of the existing research capacities on abstract interpretation in Italy, notably in Padova, Parma and Verona, including the leading Italian research groups in abstract interpretation. Abstract interpretation will serve as the foundational theory to design methodologies and to build tools to ensure the interoperability, composability, scalability, reliability, robustness, dependability, security and autonomous self-adaptation of software.

As far as basic abstract interpretation research is concerned, we are interested in abstract domain and semantics design. It is well known that the the properties of an abstract interpretation strongly depend upon the properties of the underlying semantics and the properties of the chosen abstraction. We will approach this fundamental problem by studying abstractions and semantics from the viewpoint of the properties the resulting abstract interpretation has to meet. This is the case in the systematic design of optimal domains for model checking, the design of abstract domains for points-to and aliasing analysis of C and Java programs, the design of scalable and precise numerical abstract domains, the development of systematic methods for measuring the precision of an analysis by improving domains with probabilistic and statistical information, and semantic transformations for achieving completeness. The common aspect in all these tasks is that they all boil down to a refinement process of abstract domains. The well established theory of abstract domain refinement and manipulation will provide a common infrastructure in order to develop new abstract domains that will enrich the state-of-the-art in this major field.

As far as the domain of applications is concerned, we are interested in exploring new directions in static program analysis, code protection and system verification. In static program analysis, we are mostly interested in the new directions recently provided by complexity and termination analysis of C and Java code, and in causality analysis. We will study techniques to automatically prove the termination or bound the complexity of program fragments by exploiting suitable metric information in abstract domains. A similar domain-theoretic approach will be used to model causality in concurrent and distributed systems, in particular in CCS and pi-calculus, by modeling dependencies between process actions, both in terms of their spatial structure and their communication abilities. While static analysis is a well established domain where abstract interpretation plays a key role since decades, this is not the case in system verification and in the emerging field of code protection. In system verification we are again interested in abstraction refinement algorithms, in language transformations and in efficient algorithms for computing behavioural equivalences. Our goal here is to investigate how abstract interpretation can be applied to the context of CounterExample-Guided Abstraction Refinements (CEGAR) of abstract models, providing more precise algorithms for abstraction refinements in abstract model checking. Symmetrically, we will study language refinements and transformations avoiding spurious counterexamples. These research lines have the common goal of exploiting abstract interpretation in order to provide more precise verification algorithms for large systems. In code protection, we will consider both the risks connected with code hacking (malicious software attack) and theft (malicious host attack). Malicious host attacks include sensible information leakage, knowledge extraction by static and dynamic analysis, program decomposition for code reuse, integrity corruption. Malicious software attacks include code hacking by program deformation, namely inserting malware behaviour by vulnerability exploitation. On the side of malicious host attacks, software piracy and software tampering are the major types of attacks. Software watermarking and code obfuscation are promising defense techniques against, respectively, software piracy and malicious reverse engineering. We are interesed in designing new robust techniques for code obfuscation and software watermarking by abstract interpretation. While these techniques can be used against reverse engineering and copyright infringement, they can also provide the ground base for malicious software attacks. The threat of malware attacks is an unavoidable problem in today computer security. We are interested in going beyond known syntactic signature-based detection towards more semantics-based and property-based detection methodologies. <<<
First Results
The overall goal of the AIDA2007 project proposal is to advance the state-of-the-art of research on abstract interpretation, both on foundational and applicative levels. This will be carried out by investigating some specific tasks within the four WPs that constitute AIDA2007. The following description provides a summary of the expected results for each single task.

WP1: Abstract Interpretation Design

- Task PD-1.1: Abstract domains for model checking
Disjunctive abstract domains turn out to be particularly useful in abstract model checking, e.g. in predicate abstraction [MFHRS06] and in algorithms for computing behavioural equivalences [RT07b]. We plan to investigate how disjunctive abstract domains can be useful in abstraction refinement algorithms. Since disjunctive abstract domains can be characterized as abstract domains that are complete for set unions, we also expect to characterize classes of complete abstract domains that can be advantageously used in abstract model checking, e.g. abstract domains that are complete for some temporal operator.

- Task PR-1.1: Abstract domains for points-to and aliasing analysis of C and Java programs
The approach put forward in [EGH94] for stack pointer analysis of C programs allows for the definition of a flow- and context-sensitive analysis that is able to capture both possible and definite aliasing. The specification in [EGH94] is rather informal and, hence, a precise reasoning on the correctness and precision of the approach is difficult. We plan to re-design, properly formalize and implement a variant of this domain, in particular considering as reference concrete semantics the one specified by the C language standard.
For aliasing analysis in Java programs, we will identify a common conceptual framework where most of the proposals in the literature may be cast. We plan to reformulate the most common variants of aliasing and shape analysis as abstractions of a common denotational, flow and context sensitive semantics. We expect, as a result, the ability to formally evaluate the relative strength and weakness of the various approaches, as well as the ability to decouple the abstract domains representing the dynamic data structures from the algorithms computing the abstract behavior.

- Task PR-1.2: Scalable and precise numeric abstract domains
There is a vast body of literature concerning the design of numeric abstract domains, ranging from simple non-relational properties to sophisticated relational properties. Clearly, issues arise when the efficiency and scalability of the simpler proposals has to be paired with the precision of the more sophisticated ones, so that most of the current research, looking for appropriate tradeoffs, lies in between of the two extremes mentioned above. We will attack this problem by (1) designing approximated operators on the existing domains and (2) designing new combinations of existing domains and of the corresponding operators for their effective integration.

Task VR-1.1: Completeness-driven semantic transformations
We expect the following results: (1) To complete the study of the key problem of completeness in abstract interpretation by adding semantic transformers to the already known abstract domain transformers; (2) Transforming semantics means transforming programs [CC02]. This means that with each transformed complete semantics we will have a corresponding program deformation whose abstract interpretation is complete. (3) The dual problem of making semantics maximally incomplete will correspond to let the deformed code be maximally capacious for holding watermarks.

- Task VR-1.2: Measuring precision in numerical domains
We will investigate the lattice structure of probabilistic abstract interpretations inherited by the Birkhoff-von Neumann lattice of projections, a measure of the relative precision via an associated operator norm and the statistical interpretation of the number expressing the relative precision. An important point in this investigation is the representation of the semantics of a program via a tensor product of the vector spaces associated to the domain of each program variable. This establishes a connection between the notion of relational dependencies in program analysis and that of correlation and independence in statistics.


WP2: Static Analysis by Abstract Interpretation

- Task PD-2.1: Causal semantics for concurrent and distributed systems
We developed in [CVY07] a compositional event structure semantics for the internal pi-calculus, that captures the essential features of the causal dependencies created by both synchronous and asynchronous name passing. We plan to extend this approach to free name passing, and we expect to obtain a fully abstract denotational semantics for the full pi-calculus. We are also interested in studying similar compositional semantics for multithreaded imperative languages, that, compared to process algebras, focus on the state rather than on the control of a thread.

- Task PD-2.2: Static analysis of causal properties
In the literature a number of equivalence notions over event structures exist. We expect to provide logical characterizations of the equivalences over event structures, in particular by investigating the connection between the distribution observation expressed in the spatial logics for the pi-calculus and the one expressible through event structures.
The model checking problem over event structures is an intrinsically difficult task, since in general event structures are infinite models even for finite state processes. We expect that the abstract interpretation approach can be fruitfully applied for defining an abstract model checking framework over event structures.

- Task PR-2.1: Definition of a modular and generic analysis framework
We will study the problems related to the design and implementation of a generic framework for static analysis of imperative languages in widespread use, which will be later instantiated to obtain analyses to approximate pointer aliasing and the values of numeric and Boolean variables. The main innovations of such a framework will be flexibility and generality, both in terms of the analyzed properties and of the analyzed language.

- Task PR-2.2: Annotation of programs
We will investigate the specification of a suitable assertion language that should enable the programmer to encode in the program itself the logic conditions that guarantee that the program is unaffected by some given class of errors and/or it respects the bounds imposed by the resource allocation policies. The availability of an expressive assertion language will also be exploited to enable communication the other way around, from the analyzer to the programmer. The expected final result is a highly integrated development environment, where feedback coming from the comparison of expected and computed properties can start a synergistic refinement process, benefiting both the application development and debugging phases.

- Task PR-2.3: Analysis of string cleanness
We will study a transformational approach to the problem of ensuring cleanness of string operations in C programs. The approach consists in transforming a C program P using strings into another program, P', where arrays of characters and pointers are substituted by ad hoc structures and where assertions are included. The transformation will have the following properties: (1) it will be fully automatic; (2) if P' cannot violate any of its assertions, then all the executions of P do not result in any string manipulation errors. Strings will be replaced by structures that abstract their semantics by means of integer variables and Boolean variables. This idea originates from the works of Dor, Ellenbogen, Rodeh and Sagiv [DRS01, DRS03].

- Task PR-2.4: Analysis of termination and complexity
We expect to design a termination and complexity analysis for C code as follows:
1) Identify control loops: All the sources of possible non-termination in a C program are identified: explicit loops, recursive function calls and backwards jumps.
2) Identify loops in data: By exploiting the information provided by the pointer analysis, possible loops in data structures are examined. This information is used in step 3 below.
3) Generation of constraints: a) on the values of numeric variables; b) on the maximal length of a chain of pointers followed starting from each variable (path-length analysis).
4) Synthesis of ranking functions: The generated constraints are elaborated through automatic techniques so as to try and find a termination function (norm) for the program.

- Task VR-2.1: Static analysis for termination ad complexity
We expect to improve and extend our Julia and BinTerm tools in a number of different ways. Our current termination analysis based on Julia and Binterm is already able to prove termination of small Java bytecode programs, also dealing with data structures. We plan to extend it to the analysis of large, real programs. To that purpose, preliminary analyses will be improved so that they become both faster and more precise. We expect to improve the efficiency of the Parma Polyhedra Library for our purposes by considering the special shape of our polyhedra, where most input and output variables keep the same value. We expect to develop a static analysis of sharing, cyclicity and path-length for concurrent Java programs. We also expect to integrate in our current termination analysis automatic proofs of deadlock-freeness.

- Task VR-2.2: Speculative optimisation
Code parallelisation represents one of today's major challenges. When a compiler has to parallelise two pieces of code it has to consider all potential dependencies. Current techniques are over-conservative: if a dependency cannot be proved, the compiler assumes that it does occur. As a consequence, many opportunities for parallelisation are missed in the code generation. Contrary to this "worst-case" conservative approach, speculative threading is an emerging technique which allows for some incorrect thread parallelisation by postponing the check to some later time. We expect to apply probabilistic abstract interpretation to support such "optimistic" approach.


WP3: Code Protection by Abstract Interpretation

- Task VR-3.1: Signature generalization
We expect to develop a signature generalization process that detects metamorphic malware variants. Let S be a Extended Finite State Automata (EFSA) modeling the malware signature and let O be an obfuscation used by malware writers. If the EFSA model is closed for transformation O then the generalization process returns Closure(S,O) that can be used to detect all possible variants of S obtained through O. The class of transformations for which EFSA is closed corresponds to the class of obfuscations that our generalization can handle.

- Task VR-3.2: Predicate obfuscation
We are interested in investigating the possibility of hiding a precise predicate of a particular program (or class of programs). We expect to define obfuscation of predicates, i.e. the program transformations that aim at protecting a particular information about a program (or a class of programs). We aim at characterizing the predicates that is possible to hide in order to provide a more useful description of the potentiality and limits of code obfuscation.

- Task VR-3.3: Hiding software watermarks in loop structures
Our goal is to inlay inside loops watermarks that are iteratively constructed. At embedding time we plan to unfold the subject loop to find out which iteration is the most appropriate for promoting the construction of our watermark. Then we make our watermark sensible only to the promoter and we can embed it in the folded loop. Since the marked loop is folded, one can hardly guess which iteration should be the promoter. At extraction time, we plan to use a semantic understanding of loop unfolding together with an extraction key to display just the promoter followed by as many iterations as needed to compute our watermark.

- Task VR-3.4: Hiding watermarks in completeness holes
Consider the following simple statement C: x=a*b multiplying a and b and storing the result in x. A (passive) observer inspecting the sign of program values has a complete understanding of the sign of the result by knowing the sign of a and b. An automated program sign analysis that replaces concrete computations with approximated ones -- i.e. the rule of signs -- is therefore able to catch, with no loss of precision, the intended sign behaviour of C because the sign abstraction is complete for integer multiplication. A transformation t replacing C with t(C): x=b; while b !=0 {x = a+x; b = b-1;} obfuscates the sign information for the observer. This because the rule of signs is incomplete for integer addition as signs loose integer magnitude. Thus, while steganography can be seen as hiding information in completeness holes, code obfuscation can be analogously seen as transforming programs in such a way that possible observers cannot use complete abstractions for code analysis unless cost expensive analysis are used. We expect to show that most software watermarking techniques are special cases of watermark insertions that exploit incompleteness of defined observers.


WP4: System Verification by Abstract Interpretation

- Task PD-4.1: Abstraction Refinement Algorithms
Model checkers based on CounterExample-Guided Abstraction Refinement (CEGAR) deal with counterexamples that are paths of abstract states. Recently, Cousot, Ganty and Raskin [CGR07] put forward an innovative abstraction refinement algorithm CGR that is driven by abstract fixpoints rather than by abstract counterexamples. We aim at investigating how the general approach of abstract interpretation can be applied to the context of counterexample-guided refinements of abstract models. We expect to generalize the CEGAR approach to general abstract models specified by abstract interpretation and to generic languages. We also plan to study the role of complete abstract domains in abstraction refinement algorithms. We expect that completeness refinements can be useful for designing or improving abstraction refinement algorithms, in particular in the context of fixpoint-guided algorithms like CGR.

- Task PD-4.2: Language Transformations
Given an abstract model A, we plan to study the problem of systematically modifying, through minimal refinements or simplifications, the operators that define some specification language L (e.g. OR and AU) in order to obtain a transformed language L' such that the abstract model checking on A is strongly preserving for L'. We plan to attack this kind of problems by using typical abstract interpretation techniques, in particular abstract domain refinements and simplifications. We intend to apply this technique to the branching vs linear problem for the expressive power of temporal languages. By generalizing the characterization in [Mai00], we expect to systematically transform LTL to a maximal sub-language L' that characterizes the branching expressive power of ACTL.

- Task PD-4.3: Algorithms for Computing Behavioural Equivalences
We recently designed a new simulation equivalence algorithm [RT07b] that improves the best known time bound for this problem. This procedure exploits typical abstract interpretation tools like abstract domain refinements and disjunctive abstract domains. An experimental evaluation has shown the effectiveness of this algorithm. We expect (1) to provide an efficient symbolic version based on BDDs of this algorithm and (2) to generalize and adapt the techniques used for designing this new simulation equivalence algorithm to other behavioural equivalences like branching bisimulation.
<<<
Timescale
24 months
National and international background
Abstract interpretation is well known as a general and unifying theory for approximating the formal semantics of computational systems. Invented by Cousot and Cousot [CC77, CC79] in the late 1970s as a unifying framework for designing and validating static program analyses, abstract interpretation has increasingly gained popularity as a general methodology for describing and formalizing approximate computations in such diverse areas of computer science as static program analysis, program transformation, system and program verification by model checking, type systems, analysis of mobile and distributed systems, security, constraint solving, etc. As a remarkable recent example of application of abstract interpretation, ASTRÉE (http://www.astree.ens.fr/) is a static program analyzer firmly based on abstract interpretation theory that was able to prove the absence of any run-time errors in the primary flight control software of the Airbus A340 fly-by-wire system (132,000 lines of rather special C code). We are approaching the point where static analyzers based on abstract interpretation can be included in the development environments of mainstream imperative programs.

The objectives of the project AIDA2007 are structured as four WorkPackages (WPs) that have a focus on specific issues of designing and applying abstract interpretation.

WP1: Abstract Interpretation Design
WP2: Static Analysis by Abstract Interpretation
WP3: Code Protection by Abstract Interpretation
WP4: System Verification by Abstract Interpretation

The members of AIDA2007 are well known in the international research community on abstract interpretation and have a strong research background and experience on designing and applying abstract interpretation, in particular on the subjects of the four WPs of AIDA2007.

WP1: Abstract Interpretation Design

Early work by Cousot and Cousot [CC79] laid the foundation for a theory of abstract domains and abstract domain transformers. After this fundamental work, a number of operators have been designed to systematically improve precision, reduce complexity and modularize analyses by refining, simplifying and decomposing abstract domains [CFGPR97, FGR96, GR97, GR99]. Completeness plays a key role in abstract interpretation, and in particular in many problems concerning abstract domain refinements. Complete abstractions can be always constructively designed as least complete refinements and the greatest complete restrictions of any abstract domain [GRS00]. No corresponding transformers are known for semantics, i.e. transformations of observable semantics making them suitable for completeness under a given fixed abstraction. Completeness has been recently considered also as a model for non-interference and information leakage: "What you lose in precision is what you leak in observation" [BGM07,GM05]. This relies upon the generalization of Goguen and Meseguer [GM84] non-interference by abstract interpretation provided by abstract non-interference [GM04]. The idea is to consider attackers as static program analyzers (i.e. abstract interpretations) whose task is to reveal properties of secret data by statically analyzing public resources. The most powerful attacker unable to disclose confidential properties and the maximal amount of information disclosed by a program under attack have been characterized as abstract domain transformers. A substantial effort in the last decade has also been devoted to the problem of systematically refining domains in abstract model checking and in predicate abstraction. The members of AIDA2007 studied a number of issues related to the problem of achieving strong preservation of abstract model checking by abstract domain refinements [GQ01, RT04, RT05a, RT06, RT07a]. <br />
Aliasing/pointer analysis in imperative languages is a useful, sometimes essential step to improve the precision of several other analyses. Despite the vast literature on the subject, it is currently very difficult to obtain reasonably precise results while using a reasonable amount of resources for the analysis [Ber07, Hin01]. Flow- and context-insensitive pointer analyses in use for optimized compilation, while being able to scale to millions of lines of code, do not provide enough precision for the purposes of program verification. On the other hand, flow- and/or context sensitive analyses proposed in the literature either are unable to capture all the constructs of languages such as C [Ber07], or seem to have been abandoned for some (unknown) reason [Deu94,EGH94]. Preliminary results on the analysis of numerical properties of real C programs clearly showed that there are problems in both scalability and precision of the analysis. A blind use of the domain of convex polyhedra implies non scalability, while resorting to intervals or even octagons sometimes does not allow to acceptably reduce the number of false alarms (e.g., for string cleanness analysis). Solving these problems of scalability and precision is thus a priority.

WP2: Static Analysis by Abstract Interpretation

It is widely acknowledged that software defects significantly contribute to the high costs of software development [RTI02]. Moreover, defects that are detected only after the product has been released have an higher cost, both for producers and consumers. Software defects whose consequences are particularly expensive are those that introduce system vulnerabilities. After the thorough analysis of thousands of reports about computer systems vulnerabilities, CERT, the authoritative center for computer security (http://www.cert.org/), concluded that most of the actually exploited security problems are due to a limited number of causes. The most frequent ones originate from programming mistakes: out of bounds accesses to buffers, overflows in operations on native integers, string manipulation errors and other errors related to memory management. It is therefore important to study, on the one hand, software design methodologies that can help preventing these problems and, on the other hand, software validation strategies that, being aware of these problems, do help in their detection and removal. In most cases, of course, a software product unaffected by the aforementioned problems can still have defects that impact the user, as other and more abstract properties must be considered to uncover higher-level flaws. However, the absence of "low level" defects is clearly a prerequisite to the reliability of any higher-level analysis. In order to have an impact on this state of affairs, AIDA2007 focuses its research on mainstream imperative languages such as C (with possible extensions to C++) and Java.

Most of the vulnerabilities occurring in C programs stem from string manipulation errors. The absence of such errors, i.e., the string cleanness, is a prerequisite for secure programs that have a predictable behavior and are robust against malicious attacks. Programs in C are particularly prone to such errors due to the lack of a native string data type. Static analysis techniques for the detection of buffer overflows have been proposed, but work remains to be done in order to make them effective [DRS03, Ell04]. Termination analysis for C programs is a main topic in current static analysis research. The C language is widely used to develop embedded devices, often including critical aspects (in industrial control, automotive industry, etc.), where the proper termination behavior of each functional component is essential. Imagine a device whose goal is to maintain a certain type of interaction with the environment. This can be conceived as an infinite loop consisting of several phases (read sensors, compute a proper reaction, send commands to actuators), whose termination failure would cause malfunctions as bad as classical system failures. Another example comes from device drivers, where a termination failure can cause the whole operating system to hang. For this reason Microsoft started a research project to study automatic termination analysis of C programs: drivers for the Microsoft operating systems are developed by hundreds of hardware producers and their automatic verification is crucial to the US software giant [CPR06]. Once termination is guaranteed, complexity analysis is the logical step forward to ensure, e.g., the hard deadlines are met by software systems. Termination and complexity were studied for imperative programs, restricted to primitive values [CPR06, GST06] or in the case of numerical iterative programs by expressing program semantics in polynomial form [C05]. A great challenge is to extend these methods to programs using dynamically heap-allocated data and complex destructive updates. Recent developments provide a precise model of the heap, with pair-sharing [SS05] and cyclicity [RS06], which are now implemented in Julia [Sp05] by exploiting the PPL [BHZ06]. Similarly, complexity analysis of Java-like languages determines the complexity classes of the methods by inductive reasoning [AAGPZ07].

Quantitative approaches to program analysis aim at developing techniques which provide approximate answers together with numerical estimates of the approximation introduced by the analysis. Probabilistic abstract interpretation [DW01] recasts classical abstract interpretation in a probabilistic setting, providing, in addition to the standard worst-case analysis [M00], also an average-case analysis [DHP06]. In probabilistic abstract interpretation, domains are normed linear spaces, abstractions are defined via linear operators and concretisation via the notion of a linear generalised inverse. Metric closeness provides a quantitative numerical estimate of the precision of the abstraction.

AIDA2007 is also interested in the development of static analysis techniques based on causal models of concurrent and distributed systems. In particular, in the context of process algebras, non-interleaving models for CCS and the pi-calculus have been formalized as Petri nets, generalized labelled transition systems, and event structures [Win82, CVY07]. Moreover, [BS98, DP99, Cas95] show how behavioral equivalences can be used to capture the dependencies between process actions, both in terms of their spatial structure and their communication abilities.

WP3: Code Protection by Abstract Interpretation

Code obfuscation and software steganography are emerging technologies for code protection in modern software design and engineering [CTL98, CT02]. Most of the existing code steganography techniques -- watermarking, fingerprinting and birth-marking -- are classified either as static or dynamic, according to the nature of their extraction processes [NCT02]. The distinction between the static and dynamic nature of code steganography is often too rough to provide an accurate model of attackers, which may combine static and dynamic code analysis techniques as well as other analysis methodologies such as abstract interpretation and model checking [CC04]. An impossibility result is known for code obfuscation [B01] under the assumption that the original and transformed program have identical behaviours and where any transformed program becomes unintelligible to any adversary. In practical contexts these constraints are often relaxed. Practical approaches to make reverse engineering hard can either confuse the disassembly process [LD05] or the decompilation process [CT98, CTL98]. A first attempt towards a semantics-based foundation of code obfuscation is in [DPG04]. The idea is that an obfuscator hides all those properties that are not preserved by the transformation. This approach as been applied to opaque predicate insertion [DPG05, DPG06] where the ability of an attacker in disclosing opaque predicates is reduced to a completeness problem of the abstraction that models the attacker [DPG05,DPG06]. Misuse detection classifies a program as infected by a malware when the malware signature, i.e. the sequence of instructions characterizing the malware, occurs syntactically in the program [Szor05]. In general, signature-based algorithms detect known malware, but they are ineffective against unknown malicious programs or obfuscated code in metamorphic malware [N97]. The basic idea of metamorphism is that each successive transformation of a malware changes the syntax in order to foil misuse detection [J02, SF01]. Recently, a semantics-aware malware detector [CJ2005], a malware detector based on standard model checking [K05], and an abstract interpretation-based malware detector for semantic signatures [DCJD2007] have been proposed in order to make malware detection insensitive to metamorphism.

WP4: System Verification by Abstract Interpretation

One objective of AIDA2007 is to apply abstract interpretation to automatic system verification based on model checking [CES86, CGP99]. In particular, we are interested in studying the relationship between abstract interpretation and abstract model checking. The abstract model checking approach [CGL94, CGP99] provides a solution to the critical issue of model checking scalability, the well-known state explosion problem [CGP99]. The common methodology of any approach to abstract model checking is to run the verification algorithm on a reduced abstract model A that approximates some properties of interest of the concrete system model MA should be strongly preserving: a specification formula f holds on M iff f is satisfied on the abstract model A. However, most abstract models are not strongly preserving [CGP99]. Thus, strong preservation is achieved by performing some refinement action of an initial rough abstract model. The standard solution to the strong preservation problem consists in abstract model refinements that are driven by a given specification formula. In this context, CounterExample-Guided Abstraction Refinement (CEGAR), as pionereed by Clarke et al. [CGJLV03], has become the standard methodology. Well-known examples of model checkers that follows the CEGAR approach are BLAST [HJMS03], MAGIC [CGJLV03] and SLAM [BR02]. The relationship between abstract interpretation and abstract model checking has already been investigated [CC99, CC00, CGL94, DGG97], in particular the members of AIDA2007 investigated the relationship between strong preservation in abstract model checking and completeness in abstract interpretation [GR06, RT04, RT05a, RT06, RT07a].

Remark: The bibliographic citations in each section of this document "Modello A" refer to the bibliography included in the three documents "Modello B" for Padova, Parma and Verona units. <<<