Vai al contenuto| Home page|

   Ti trovi in: HOME »Programmi, progetti e risultati »I progetti »PRIN - Programmi di ricerca di Rilevante Interesse Nazionale»Programma di ricerca»Unità di ricerca
INIZIO_TESTO_DA_INDICIZZARE

UNITA' DI RICERCA

italiano - english

Research program

Algorithms for the Next Generation Internet and Web: Methodologies, Design, and Experiments
University Co-ordinator
Università di PISA - INFORMATICA - PISA(PI)
Research Unit Leader
Fabrizio LUCCIO
Description
The proposed research is described below for each task of the project involving this Unit. We shall perform the tasks jointly with other research units according to the common framework described in Model A of the project proposal.

Task 1.1 - Models and Algorithmic Paradigms

Phase 1

We will consider the following black hole search problems: Establish which conditions (number of agents and level of network awareness) and limitations can make a harmful host be identifiable in several network topologies. Design new protocols for searching the locations of the black holes in common interconnection networks. Devise strategies to make agents tolerant to the destructive process resident in the harmful host.

Clearly, we want to design efficient solutions for the above problems. Our main measures for evaluating the efficiency of the solutions are the number of agents, and the total number of moves performed by the agents. The proposed approaches to solve the problems are based on fault-tolerant distributed computing and load-balancing in mobile computing. Alternative approaches can introduce a finer tuning of the parameters in the proposed model, such as the degree of the damage in a harmful host.

Expected results:
Design of efficient algorithms to locate the black hole in several topologies.

Phase 2

In the second phase of the project, we aim at studying what kind of tasks the agents con accomplish if they are aware of the presence of a black hole in the network. In particular, we want to analyze some of the basic tasks mobile agents usually perform in a distributed environment, and analyze how the presence of the black hole affects the success of their operations. One of these task is, for instance, the rendezvous: it consists in having all the agents gather at the same node; upon arriving there, each agent terminally sets its variable to arrived; there is no a priori restriction on which node will become the rendezvous point. To our knowledge, all previous investigations assume synchronous agents, and this assumption is crucial for the correctness of these solutions; as already mentioned, in our work the network is populated by totally asynchronous agents.

Expected results:
Design of efficient algorithms to allow the agents to solve common tasks in presence of a black hole.

Task 2.1 - Algorithms for Internet Routing

Phase 1

IP lookup and classification in routing: We plan to study new packet classification and IP lookup data structures. The objective is to use small amounts of memory (so as to fit in fast cache memory) and to allow fast queries and fast updates, possibly exploring knowledge from strings algorithms and computational geometry.

Algorithms for shortest path routing: After studying the state-of-art in fault tolerant persistent routing tables, we will study the problem of the distributed computation of 1-fault tolerant routing tables. In particular, we compute the best swap edge for each failing edge (u,v) which minizes the distance from u to the root of SPT. We expect to obtain an efficient distributed algorithm, that should not be more expensive than the distibuted construction of the "all shortest path trees", and the new routing algorithm that the message have to follow in presence of fault. For this problem no distributed algorithms, suitable for a distributed system as Internet, exist.

Expected results:
Efficient distributed algorithms to compute swap edges minimizing the path to the root of SPT.

Phase 2 IP lookup and classification in routing: We plan to collect real data for routers taken from heterogeneous data bases available in Internet. We then simulate and experiments our lookup data structures by comparing the experimental results to those predicted by simulation.

Algorithms for shortest path routing: In the second phase of the project, we plan to study more complicate problems in which the selection of the swap edge is not function of a single node as before, but of all the nodes in the disconnected subtree. For instance, we might ask to select the swap edge so that the sum of the distances to the root of the SPT is minimized. Another interesting reasearch problem regards the possibility of choosing the swap edge so that the diameter of the new tree is minimum. We would like to obtain distributed solution with the same information complexity.

Expected results:
Fast IP routing and classification schemes.

Task 3.1 - Algorithms for Web Searching

Phase 1

Mining of frequent/interesting patterns: In order to overcome the possible exponential size of the output, we concentrate on maximal patterns (according to a suitable notion of maximality) and, among them, on those whose occurrences cannot be recovered from those of others. Tractability results on such subsets already exist for the case of a single input sequence when the goal is to detect overrepresented patterns within it. When dealing with patterns that are regular expressions, a lot of recent work has been dedicated to approximate regular expression search (e.g. [Myers92]) and some efforts (e.g. [CGR03]) have been made in the direction of indexing regular expression databases for the subsequent efficient answering of queries. We plan to investigate some more powerful indices that can exploit the power of restricted classes of regular expressions.

Novel IR-tools for mining Web data: We aim at studying the web snippet clustering by deploying our knowledge on indexing data structures and algorithms for information retrieval. In order to compete with Vivisimo, our algorithmic solution should have the following specialties, missing in the other scientific approaches: (1) the sentences labeling the hierarchy of clusters should have variable length, (2) they should not be substrings of the snippets, and (3) the clusters should be overlapping and labeled in a meaningful way.

Text indexing and compression: In addition to carrying on an analysis and validation of the techniques for compressed text indexing in [FM00,GV00], we plan to perform experiments to test their performance in the practical setting. Also, we want to set up publicly available software libraries for data indexing, compressing and mining in order to bridge the gap between theory and practice. In particular we aim at deploying our deep knowledge in the field of indexing compressed data to the case of XML files. Here compression and indexing may play a crucial role being XML files much large, redundant and structured. Actually we aim at extending the functionalities and properties of the FM-index to this kind of data. We also want to design new data structures for storing strings, such as XML paths, URLs and IP addresses.

Expected results:
Efficient methods for using patterns to detect text similarity.
Web-snippet clustering engine based on commodity components.

Phase 2

Mining of frequent/interesting patterns: We plan to focus on applications on intrusion detection for security in Internet systems. These can be individuated by detecting repeated patterns (corresponding to intrusion attempts) scattered in the log files of the system transactions. In this case the patterns present a structure according to being scattered as mentioned above.

Novel IR-tools for mining Web data: We aim at engineering the solution devised in Phase I and propose some mathematical and "humanly-based" models of comparison of the obtained results (here semantic affinity of the items in the clusters may play a crucial role). Our results should be compared against Vivisimo, the best known available solution for that problem.

Text indexing and compression: We provide a new general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. More precisely, it can turn, in optimal linear time, any memoryless compressor into a compressor that uses the ``best possible'' contexts. Technically, our boosting technique builds upon three main ingredients: the Burrows-Wheeler Transform (BWT), the Suffix Tree data structure, and a greedy algorithm to process them. Further theoretical studies and experiments on the boosting technique are required to establish whether it is appealing also in the real setting. We also plan to investigate new techniques for managing arbitrarily long strings in the cache-oblivious model. These investigations will be accompanied by experimental tests on real text collections.

Expected results:
Efficient data structures for (text) indexing.
Booster tool for compressing data.