Vai al contenuto| Home page|

   Ti trovi in: HOME »Programmi, progetti e risultati »I progetti »PRIN - Programmi di ricerca di Rilevante Interesse Nazionale»Programma di ricerca
INIZIO_TESTO_DA_INDICIZZARE

RESEARCH PROGRAM

italiano - inglese
Similar research programs:
Scientific and education field classification
International Patent Classification
Bibliografia
[1] Chomicki J. and Toman D. (Eds.) (2005). Proc. 12th Int. Symp. on Temporal Representation and Reasoning (Burlington, Vermont, USA).

[2] Cohn A. nd Mrk D. (Eds.) (2005). Proc. Int. Conf. on Spatial Information Theory (Ellicottville, NY, USA).

[3] Rabiner L. (1989). "A tutorial on hidden markov models and selected applications in speech recognition". Proceedings of IEEE, 77, 57-286.

[4] Regazzoni C. and De Natale F. (Eds.) (2005). Proc. IEEE Int. Conf. on Image Processing (Genoa, Italy).

[5] Andrienko G., Malerba D., May M., and Teisseire M. (Eds.) (2005). Proc. ECML/PKDD Workshop on Mining Spatio-Temporal Data (Porto, Portugal).

[6] Durbin R., Eddy S., Krogh A., and Mitchison G. (1998). Biological sequence analysis, Cambridge Univ. Press, UK.

[7] Valdes A. (Ed.) (2005). Proc. Int. Symp. on Recent Advances in Intrusion Detection (Seattle, Washington, USA).

[8] Shahabi C. and Boucelma O. (Eds.) (2005). Proc. 13th Int. Symp. of ACM GIS (Bremen, Germany).

[9] Bayardo R. and Bennett K. (Eds.) (2005). Proc. 1th ACM Int. Conf. on Knowledge Discovery and Data Mining (Chicago, IL, USA).

[10] Muggleton S. and De Raedt L. (1994). “Inductive logic programming: theory and methods”. J. of Logic Programming, 19/20, 629–679.

[11] Levinson S. (1986). "Continuous variable duration hidden markov models for automatic speech recognition". Computer Speech and Language, 1, 29-45.

[12] Bonafonte A.B.J. (1996). “Duration modeling with expanded hmm applied to speech recognition”. citeseer.ist.psu.edu/507067.html

[13] Fine S., Singer Y., and Tishby N. (1998). "The hierarchical hidden markov model: Analysis and applications". Machine Learning, 32, 41-62.

[14] Murphy K. and Paskin M. (2001). "Linear time inference in hierarchical HMMs". Advances in NIPS, vol. 14.

[15] Xie L., Chang S., Divakaran A., and Sun H. (2002). "Learning hierarchical hidden Markov models for video structure discovery", T.R. 2002-006. ADVENT Group, Columbia Univ., New York, NY.

[16] Galassi U., Giordana A., Botta M. (2006). "Hierarchical Hidden Markov Models for Profile Learning". Fundamenta Informaticae. To appear.

[17] Li J., Gray R.M., Olshen R. A. (2000. "Multiresolution image classification by hierarchical modeling with two dimensional hidden Markov models". IEEE Trans. on Information Theory, 46, 1826-41.

[18] Joshi D., Li J. and Wang J. Z. (2006). "A Computationally Efficient Approach to the Estimation of Two- and Threedimensional Hidden Markov Models,'' IEEE Trans. on Image Processing. To appear.

[19] Feldman D. and Crutchfield J. (1998). "Measures of Statistical Complexity: Why?". Phys. Lett. A, 238, 244-252.

[20] Li W, (1991). "On the Relationship Between Complexity and Entropy for Markov Chains and Regular Languages". Complex Systems, 5, 381-399.

[21] W. Hamscher, L. Console and J. de Kleer (Eds.) (1992). Readings in Model-Based Diagnosis. Morgan Kaufmann.

[22] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis. (1995). “Diagnosability of discrete event system”. IEEE Trans. on Automatic Control, 40, 1555-1575.

[23] L.Console, C. Picardi, D.Theseider Dupré (2003). “Temporal Decision Trees: Model-based Diagnosis of Dynamic Systems On-Board”. J. Artif. Intell. Res., 19, pp. 469-512.

[24] M.Botta, A.Giordana, L.Saitta and M.Sebag (2003). “Relational Learning as Search in a Critical Region”. Int. J. of Machine Learning Research, 4, 431-463.

[25] H. Ade, L. D. Raedt, and M. Bruynooghe (1995). “Declarative bias for specific to-general ILP systems”. Machine Learning, 20, 19–154.

[26] F. Zelezny, A. Srinivasan, D. Page (2004). “A Monte Carlo Study of Randomised Restarted Search in ILP”, In Proc. ILP-04, pp. 341-358.

[27] J.H. Gennari, P. Langley, and D. Fisher (1989). “Model of incremental concept formation”. Artificial Intelligence, 40, 11-61.

[28] S. Wrobel (1994). Concept Formation and Knowledge Revision, Kluwer Academic Publ.

[29] F. Richter (Ed.) (2006). In Order to Learn: How ordering processes and sequencing effects in machines illuminate human learning and vice-versa. Cambridge Univ. Press. To appear.

[30] Cornuéjols A. (2006). "Machine Learning : The necessity of order". In In Order to Learn: How ordering processes and sequencing effects in machines illuminate human learning and vice-versa, F. Richter. (Ed.), Cambridge University Press. To appear.

[31] Holte R. and Choueiry B. (2003). "Abstraction and reformulation in artificial intelligence". Phil. Trans. of the Royal Society - B, vol. 358, 1197 - 1204.

[32] Saitta L. (Ed.) (2003). The Abstraction Path, Special Issue of the Phil. Trans. of the Royal Soc. - Series B, vol. 358.

[33] Giunchiglia F., and Walsh T. (1992). "A Theory of Abstraction". Artificial Intelligence, 56, 323-390.

[34] Nayak P.P., and Levy A.Y. (1995). "A Semantic Theory of Abstraction". In Proc. IJCAI-95 (Montréal, Canada), pp. 196-202.

[35] Struss P. (1997). "Model-Based and Qualitative Reasoning: An Introduction". Ann. Math. Artif. Intell., 19, 355-381.

[36] Zucker J-D. (2003). "A grounded theory of abstraction in artificial intelligence", Phil. Trans. of the Royal Society - B, vol. 358, 1293 - 1309.

[37] Wneck J., and Michalski R. (1994). "Hypothesis-Driven Constructive Induction in AQ17-HCI: A Method and Experiments". Machine Learning, 14, 139-168.

[38] costruct induc recente

[39] Muggleton S., and Buntine W. (1988). "Machine Invention of First-Order Predicates by Inverting Resolution", In Proc. 5th Int. Conf. on Machine Learning (Ann Arbor, MI), pp. 339-352.

[40] I. Mozetic (1991). “Hierarchical Model-Based Diagnosis”. Int. J. of Man-Machine Studies, 35, 329-362.

[41] L. Console and D. Theseider Dupré (1994). “Abductive Reasoning with Abstraction Axioms”. LNCS, vol. 810, 98-112.

[42] K. Autio and R Reiter (1998). “Structural Abstraction in Model-Based Diagnosis”. In Proc. European Conf. on Artificial Intelligence, pp. 269-273.

[43] G. Provan (2001). “Hierarchical Model-Based Diagnosis”. In Proc. Int. Workshop on Principles of Diagnosis.

[44] L. Chittaro and R. Ranon (2004). “Hierarchical model-based diagnosis based on structural abstraction”. Artificial Intelligence, 155, 147-182.

[45] M. Sachenbacher, P. Struss (2005). “Task-dependent qualitative domain abstraction”. Artificial Intelligence, 162, 121-143.

[46] G. Torta and P. Torasso (2003). “Automatic Abstraction in Component-Based Diagnosis Driven by System Observability”. In Proc. Int. Joint Conference on Artificial Intelligence (Acapulco, Mexico), pp. 394-402.

[47] P.Torasso, G. Torta (2005). “Automatic Abstraction of Time-Varying System Models for Model Based Diagnosis”, LNAI, vol. 3698, 176-190.

[48] S. Ferilli, T. M. A. Basile, N. Di Mauro, and F. Esposito (2005). “On the Learnability of Abstraction Theories from Observations for Relational Learning”. LNAI, vol. 3720, 120-132.

[49] Saitta L. and Zucker J.D. (2000). "Abstraction and Phase Transition". In Proc. of the Int. Workshop on Abstraction, Reformulation and Approximation (Horseshoe Bay, TX), pp. 132-143.

[50] M. Gaines, W. Lisowski, S. Press, and N. Shapiro (1980). “Authentication by Keystroke Timing: some preliminary results”. Report R-256-NSF. Rand Corporation.

[51] M. Brown and S. Rogers (1993). “User identification via keystroke characteristics of typed names using neural networks”. Int. J. of Man-Machine Studies, 39, 999-1014.

[52] M. Obaidat and B. Sadoun (1997). “Verification of computer users using keystroke dynamics”. IEEE Trans. on Systems, Man, and Cybernetics - B, 27, 261-269.

[53] P. Dowland, and S. Furnell (2004). “A Long-term Trial of Keystroke Profiling using Digraph, Trigraph and Keyword Latencies”. In Proc. 19th Int. Conf. on Information Security (Toulouse, France).

[54] D. Gunetti and C. Picardi (2005). “Keystroke Analysis of free Text”. ACM Trans. on Information and System Security, 8, 312-347.
Keywords
MACHINE LEARNING, TEMPORAL/SPATIAL DATA, ABSTRACTION, HIDDEN MARKOV MODELS, MODEL-BASED DIAGNOSIS, INDUCTIVE LOGIC PROGRAMMING, INCREMENTAL LEARNING, HIERARCHICAL MODELS, 2D MULTIRESOLUTION HMM

Learning Hierarchical, Abstract Models from Temporal or Spatial Data

Abstract
The project is situated in the field of Machine Learning and Data Mining, specifically mining temporal or (one-dimensional or 2-dimensional) spatial data, with the possible help of available domain theories. The learning process is aimed at acquiring knowledge to be used in classification-related tasks (classification, diagnosis, monitoring, indexing, categorization, ...).

The overall goal of the project is to design a methodology, specifically targeting spatio/temporal data, that learns and uses knowledge fast, i.e., by keeping the computational requirements as low as possible, in order to allow scalability of both learning and performance systems well beyond the current frontiers.

To achieve this goal, a powerful and general tool has been identified in "abstraction", i.e., the mapping of data and knowledge from the original representation space to a "simpler" one, where problems are possibly easier to solve.

A substantial theoretical and experimental evidence indicates that a "good" abstraction allows problems to be solved with a computational complexity of orders of magnitude less, in the abstract space, than in the original one. However, the same evidence tells that a "good" abstraction is by no means easy to discover. Hence, the project, in order to maintain feasibility, shall limit the notion of abstraction to two special cases, that proved to be not only sufficiently powerful to cover a broad range of problems, but also automatically learnable from data (and possibly background knowledge): hierarchy building and component aggregation.

Hierarchy building and component aggregation are particularly well suited to spatio/temporal data: they allow interesting subsequences to be identified in discrete strings, objects to be individuated in images, subsystems to be defined within complex devices.

On the other side, logical languages, equipped with the two mentioned abstraction operators, will support automatic learning of abstract (aggregate) components and component behaviours for on-line diagnosis and monitoring of complex devices, as well as provide the basis for incremental learning strategies. The integration of abstraction operators in relational learning (ILP) promises to be effective in focalizing its expensive search for good hypotheses.

A number of applications will be used to test the ability of the considered approaches and their proposed extensions:

* Identification of users from their keyboard typying style and mouse mouvements, a useful problem for computer intrusion detection,
* Identification of transcriptional regulatory elements within promoter regions, a relevant problem in DNA analysis,
* Classification of medical images, specifically bilateral mammographies used in tumor mass screening,
* Categorization of color pictures, supporting automatic indexing for content-based retrieval,
* Harmonic analysis of tonal music, a relevant problem for its execution.

In addition, a generator of artificial datasets is available to perform an extensive experimentation under controlled conditions. <<<

Principal Investigator
Lorenza Saitta Università degli Studi del PIEMONTE ORIENTALE "Amedeo Avogadro"-Vercelli
Research Objectives
The project is situated in the field of Data Mining and Knowledge Discovery from temporally/spatially structured data and possibly background knowledge.

The activity of building models from data and/or discovering regularities is driven essentially by two motivations: acquisition of new knowledge, per se, and construction of systems devoted to specific tasks. This project essentially relies on the second motivation; in particular, the task aimed at is classification in a broad sense, including labelling, diagnosis, monitoring, identification, and recognition.

Starting from this perspective, the project targets three main goals:

Goal 1 - Automating the process of extracting knowledge and/or building models of 1D or 2D data (temporally or spatially ordered), possibly with the help of pre-existing knowledge, by using Machine Learning techniques.

Goal 2 - Increasing the scalability of both the knowledge learning step and the performance step using hierarchies, abstraction and background knowledge to cope with the restrictions imposed by limited resources.

Goal 3 - Comparing alternative approaches of model building in search of commonalities, possibly devising a generalized methodology, applicable across different domains.

Many real-world applications generate large amounts of data internally structured according to a temporal and/or spatial ordering. Examples include alarms/events originated in complex devices, web server logs, keystrokes and mouse movement timings, biological and molecular sequences, medical records, workflow process logs, and sensor signals. Even though discovery and analysis of structures inside temporal/spatial data have been the target of intensive studies in statistics and data mining, many successful solutions were ad-hoc (e.g., analysis of biological sequences). Moreover, new emergent issues are requiring a growing attention, and pose new challenges calling for more research. The reasons for these new needs are multifold:

° The amount and complexity of accumulated data transcends the power of current learning and analysis tools.

° Increasingly many applications require real-time answers in response to data analysis. Then decision are to be taken with limited resources (bandwidth, computational power, memory) and incomplete information.

° New challenging applications are emerging: for instance, monitoring remote unmanned premises for security and defence, or checking personal health using wearable sensor-embedded garments.

Most current pattern discovery tools are not well equipped to meet these new challenges, and they must be improved in a variety of ways. First of all, temporal/spatial data may hinder regularities that are visible only at a particular scale. In biological sequence, for instance, complex regulatory subsequences may be constituted by correlated patterns separated by possibly long gaps, and these meaningful subsequences can only be found through a “hierarchical” approach.

On the other hand, raw data are often too rich to be useful, because regularities may be hidden amid a lot of irrelevant details. Then, the useful information may either be masked by irrelevancies (and hence impossible to be seen), or may require an excessive amount of computation (and hence, again impossible to be discovered). In this case “abstracting” data, i.e., throwing away unuseful aspects of the data and keeping only the relevant ones, can be a viable solution to the problem. Typical cases are computer logs, where gigabits of information are stored every day (most of which not being useful to identify an intrusion attempt), or measures coming from a sensor networks (only a small subset of which are often sufficient to discover an anomaly). In other applications there is also the need not only to search for a suitable scale, but to work on several scales at the same time (multi-scale analysis), as in environmental studies starting from geographical data, for example.

Luckily enough, substantial help can be provided by prior knowledge; in fact, structural and functional models of devices may be available, as well as some information on event probability distributions in time and space, or even cognitive hypotheses about human perceptual processing (for instance, “Gestalt” theory). Prior knowledge proved to be very useful in interpreting data and results, in selecting appropriate pieces of information, in guiding learning, all of this contributing both to the effectiveness and to the efficiency of the discovery and use of knowledge.

When handling spatio/temporal data, discovery algorithms usually take advantage of the implicit internal order among structures/events. The assumption of a total order allows a significant limitation of the computational needs in the discovery task. However, in several applications temporally and/or spatially arranged data may come from different sources, and only “partial order” among observed events is guaranteed. Examples of this situation are series of signals coming from different sensors, or logs from computer activities recorded in parallel. In several applications it is impossible to ignore that no total order among events exists, because interesting patterns may only exist across data streams. Again, prior information about relative time and/or space location of events may be available in the form of constraints.

Even if the majority of the research has concerned 1-dimensional (1D) sequences, data requiring modelling in more than one dimension (such as images or videos) are becoming increasingly important. The extension of algorithms to 2D or 3D is not a trivial one. For instance, new “distance” measures are required to cluster images than to cluster sequences. (Distance measures are needed both to discover regularities and to match models to data in a graded manner.) At the same time, formal characterization of the “complexity” of 1D-patterns does not scale immediately and unambiguously to more dimensions. (Complexity measures are useful both to define data predictability and to evaluate the cost of handling data.) Things get worse if multi-scale analysis and multi-resolution modelling are required.

Concerning “Goal 1”, more often than not machine learning is not a choice but a must: the amount and complexity of the data to be handled makes impossible to discover regularities manually. However, both learning and using the knowledge may suffer from severe computational complexity problems, which need to be explicitly addressed. Here comes into play “Goal 2”, which is deeply entangled with the first one. In the development/adaptation of algorithms, the attention will be theoretically and experimentally focused on those choices that reduce the computational needs, hopefully without sacrificing quality of the results. The use of hierarchical and abstract models, the introduction of suitable biases, the exploitation of previous knowledge, and the application of incremental learning, all go towards the direction of building more efficient algorithms. Finally, “Goal 3” goes into the direction of economizing efforts, trying to render as re-usable as possible, across domains, the developed methodologies. Even though complex data of different nature may require specialized algorithms, methodologies share common points (for instance, hierarchy building and abstraction) that can possibly be generalized, at the very least suggesting guidelines for approaching new problems, if not offering ready solutions.

The development of new/augmented algorithms shall be guided by a number of experimental testbeds, both artificial (constructed on purpose to test specific working hypotheses) and natural; among the last ones, biomedical and biometrical sequences, computer logs, music and image archives, and sensor signals will be made available in the project. <<<
Timescale
24 months
National and international background
Dealing with time and space is not new in Artificial Intelligence (AI). Since long time researchers have developed various types of temporal and spatial logics [1,2]. However, those approaches were meant to address theoretical issues concerning reasoning rather than to deal with large masses of data. In Pattern Recognition, also, analysis of time series or signal sequences (like speech) [3], as well as image processing [4] have reached a remarkable degree of sophistication. Most of these approaches, however, typically exploited handcrafted models and knowledge provided by human experts.

More recently, when Machine Learning reached a sufficient maturity to cope with complex data, and, particularly, after the wide spread of Data Mining techniques, temporal and/or spatial data are becoming a primary source for automatic model learning [5]. More specifically, sequences from Molecular Biology (DNA and proteins analysis [6]), computer logs (intrusion detection [7]), Geographical Information Systems (GIS) [8], and images [9] are by now preferred targets for machine learning and knowledge retrieval and discovery.

In AI two broad classes of approaches have been used to represent models of 1-dimensional (1D) or 2-dimensional (2D) data: formal languages (grammars, automata) and logical languages (subsets of predicate logic). In the formal language class we are interested in Hidden Markov Models (HMM), which have proved to be particularly well suited to handle discrete sequences [3]. In the logical class, we are interested in Inductive Logic Programming (ILP) for learning [10], and in logical formalizations able to capture the temporal evolution of Discrete Event systems as used in Model-Based Diagnosis and Monitoring, which is one of the focuses of this project.

HIDDEN MARKOV MODELS

A Hidden Markov Models (HMM) L is a stochastic finite-state automaton [3], with a set Q of states. In each state a symbol from a discrete alphabet V can be emitted according to a state-dependent probability distribution. Another probability distribution describes in which state the process starts. Efficient, dynamic programming algorithms, working in quadratic time in the number |Q| of states [3], exist for learning and matching HMMs. Learning usually consists in handcrafting the structure of the HMM, and then estimating the probabilities involved (the “parameters” of L).

A first problem that emerges with HMMs is that even a quadratic algorithm becomes impractical when |Q| becomes large. One way to overcome this problem is to restrict the possible structures of the model, forcing many of the parameters to be 0. The representation power of the resulting HHM is restricted, but may still be sufficient to cover a broad range of applications. This is the case of Profile Hidden Markov Model [6], widely used in DNA analysis.

A second problem is related to the way finite (not istantaneous) durations of events are modelled. Usually (self-)loops are introduced to this effect [6], but they have two undesirable consequences: the duration lengths can only have an exponentially-decreasing probability distribution (which may be unrealistic in many practical situations), and sparse events, separated by long gaps become difficult to uncover. Hidden Semi-Markov Models [11] and Expanded State Markov Models [12] have been proposed as solutions. Unfortunately, both methods cause a significant increase in complexity.

A method to solve both problems at the same time is to introduce a hierarchy in the HMM, obtaining thus a Hierarchical Hidden Markov Model (HHMM) [13]. In an HHMM the number of parameters to estimate may strogly decrease, because chains of elementary events collapse into a single (composite) event, which can be handled as a single item at a higher level in the hierarchy. On the other hand, HHMMs solve the problem of duration modelling by implicitly combining a Hidden Semi-Markov Model with an Expanded State HMM.

Efficient algorithms for training HHMMs are available. In [13] the Baum-Welch algorithm is extended to the HHMMs. A linear (approximated) algorithm has been derived by mapping a HHMM into a Dynamic Bayesian Network [14]. Finally, a multi-strategy method for incrementally adjusting the structure of an HHMM by adding and deleting states on line in presented in [15], but it requires a non empty initial model to be provided. A fully automated method for learning both structure and parameters of an HHMM is described in [16].

Interesting extensions to HHMMs have been proposed in recent years, namely 2D- and 3D-multiresolution Markov Models [17], suitable to deal with images and video. Multiresolution models (and in particular 2D-HMMs) have been introduced to overcome the over-localization of conventional block-based classification algorithms. A classifier based on the multiresolution model maintains tractability by representing context information hierarchically. Efficient algorithms have been described to estimate parameters of 2D and 3D-multiresolution Markov models [18]. Extending 1D Markov models to 2D-multiresolution ones is not immediate. In fact, due to the increased number of degrees of freedom, there is usually more than one way to perform the extension (for instance, to decide the order in which a grid of states must be traversed).

When matching a model to temporal and/or spatial data, numerical functions are to be used to quantify the degree of match between the model and the data, or the similarity between two patterns. One possibility is to use probability, for instance a Bayesian framework of analysis. However, several other types of non-stochastic similarity/distance have been proposed, both for sequences (e.g., the editing or the Manhattan distances) and for images [4].

To find an adequate distance/similarity measure for a specific problem is still an open problem, because measures tend to be application-dependent. An analogous debate goes on for the measure of pattern complexity [19]: both information-theoretic and non-probabilistic definitions have been introduced, but no one became a received view. In particular, complexity measures with a clear definition for 1D patterns (sequences matched by a 1D HMM) do not generalize well to more dimensions (2D HMM), as more than one generalization is possible [20]. A comparison among alternative measures of similarity and complexity, with respect to the type of patterns they apply to, would be most useful.

LOGICAL APPROACHES

Logical representation of knowledge is pervasive in AI. We will consider here only issues connected to model-based diagnosis and representation of time and space for machine learning, which are the central topics in this project.

The notion of model-based diagnosis emerged as a primary concept in the formal characterization of diagnosis, at the end of the 80's [21]: in order to uncover and identify faults in a system, it may be necessary to provide a model of both its nominal and its faulty behavior. Model-based diagnosis is quite powerful, but the price to be paid for this power is a high computational complexity.

Computational complexity is particularly harmful when the system to be diagnosed generates signals over a finite time interval. In this case, on-line diagnosis/monitoring may be required. One of the best known approaches to on-line diagnosis of discrete event systems is DIAGNOSER [22], which assumes that the system model can be compiled into an automaton; this automaton can efficiently track the system status as observations become available. The main problem with the DIAGNOSER approach is the feasibility of the compilation, since DIAGNOSER can have a number of states exponential in the number of global system states. Another approach for dealing with on-line diagnosis has been proposed in [23], and involves learning discrimination trees that take care also of temporal aspects.

In the Machine Learning landscape, Inductive Logic Programming (ILP) [10] allows logical descriptions of objects to be manipulated and generates human-understandable knowledge. Thus, in principle, ILP represents an interesting approach to handle spatial/temporal relations. Again, ILP representational richness has to be paid with possibly prohibitive computational complexity [24]. A means to partly obviate this drawback it to use strong biases [25]. A way to speed up search is to use randomization and restarting of the search. Recently, it has been shown [26] that restarting may significantly decrease the mean search cost or increase the quality of the learned knowledge in ILP.

Concerning temporal data, learning may require that the adopted framework be able to deal with evolution and change in time. Hence, the development and exploitation of incremental techniques can be a key for success in this setting. A few theoretical works [27] and some learning systems (e.g, [28]) have attempted to deal with the progressive availability of observations in time, which remains, nevertheless, a difficult problem [29,30].

TAMING COMPLEXITY : ABSTRACTION AND HIERARCHIES

If we look more closely at the introduction of a hierarchical structure into a basic HMM, and at the multiresolution representation of signal/images, we recognize that both extensions are nothing else that special types of abstraction mappings. The term "abstraction" has been used in AI in a wide range of contexts [31], but it is not just an AI notion, as it occurs in many disciplines. A broad view of abstraction in different fields is provided in [32].

The common idea underlying abstraction refers to the human capacity to focus on "simpler" descriptions. Various theories have been proposed in the attempt to capture a general notion of abstraction. These theories are either syntactic [33], or semantic [34], but all of them fail to characterize the practical aspects of the abstraction process. This failure to produce a satisfactory general definition suggested to adopt a less ambitious approach, trying to define more limited notions of abstractions, i.e., "practical" theories of abstraction [35,36], each suited to a specific domain.

In Machine Learning, the “constructive induction” approach [37,38], which tries to augment the initial language with newly "invented" predicates [39], is a form of abstraction. These new predicates can be defined in terms of the original ones or suggested by a domain theory.

Abstraction has also been used in model-based diagnosis. One of the main approach, starting from the seminal work by Mozetic [40], involves the use of hierarchical models: a model at a higher level is a more abstract representation of the system in terms of components and number of component behaviors. Several components can be aggregated into a subsystem, and diagnosis is performed at the level of subsystems. In [41] the use of abstraction as a mechanism for improving the understandability of explanations has been analysed. However, some diagnoses can be lost with the introduction of even simple and intuitive abstraction axioms to a domain model [42]. Also, the benefits introduced by an abstract model may strongly depend on the kind of diagnostic algorithm that uses the model [43]. Abstraction mechanisms have also been extended for taking into consideration contextual situations [44].

The possibility of automatically learning hierarchical models at more abstract level is a very important, but also a very challenging task. So far, the number of developed solutions is quite limited. A significant contribution comes from [45], where a method for the automatic abstraction of the behavioral modes of the components of the diagnosed system has been defined. A further step in structural abstraction has been proposed in [46], where structural abstractions have been automatically learned from a detailed description of a system and from its observability. Learning hierarchical and abstract models for systems evolving over time has not yet been approached in the area of monitoring and diagnosis, with the exception of the work [47].

Also in ILP the introduction of abstraction can be beneficial [48]. For instance, abstraction has been used to reduce complexity of learning in really difficult problems [49].

One of the application domain that seems particularly well suited to compare alternative methods is the identification of users from keystroke dynamics and mouse mouvements. The ability to ascertain personal identity through the typing rhythms shown by individuals at the computer keyboards is known as "keystroke analysis". Keystroke dynamics are a form of biometry, like fingerprints, voice and handwritten signatures, and the main applications are in the field of Computer Security, to grant access to controlled resources, or to spot intrusions.

The beginning of keystroke analysis is traditionally traced back to [50]. The study achieved impressive performance, but was quite limited in the number of individuals involved. Over the next years the trend was to lower input requirements from large passages down to short phrases or set of words (e.g., [51]). Later on also keystrokes duration has been incorporated [52] and exploited by a variety of machine learning algorithms. More recently, research has moved toward keystroke analysis of free text, which is a much more challenging task [53,54]. In [16] a Hierarchical Hidden Markov Models has been used to profile typists. This has been one of the first works showing that typing habits can be described in terms of hierarchical models useful to discriminate among individuals. <<<