Vai al contenuto| Home page|

   Ti trovi in: HOME »Programmi, progetti e risultati »I progetti »PRIN - Programmi di ricerca di Rilevante Interesse Nazionale»Programma di ricerca»Unità di ricerca
INIZIO_TESTO_DA_INDICIZZARE

UNITA' DI RICERCA

italiano - english

Research program

Algorithms for the Next Generation Internet and Web: Methodologies, Design, and Experiments
University Co-ordinator
Università degli Studi di PADOVA - INGEGNERIA DELL'INFORMAZIONE - PADOVA(PD)
Research Unit Leader
Gianfranco BILARDI
Description
Below, we describe the research plan for each task of the project in which this research unit is involved. Task 1.1 Models and Algorithmic ParadigmsGeneral goal: Our research within this task aims at the development of a framework for the design of parametrized algorithms that can adapt to run efficiently on a variety of platforms. Motivation: On modern computing platforms, algorithmic efficiency crucially depends on a proper exploitation of the capabilities of the memory hierarchy and of the communication network. Such capabilities vary considerably across platforms. Adaptive algorithms have built-in parameters whose judicious choice allows them to match the structure of the platform and best harness its computational potential. The goal is to reduce software development and tuning costs while retaining the ability to effectively use the available computing infrastructure. Adaptive algorithms are particularly attractive in a computational grid environment, where machine cycles may be delivered on demand, on whichever platform happens to be available at the time of the request. Context: To put the theme into perspective, it is useful to review a number of concrete examples of parameters that can play a role in algorithm/code adaptation. (a) A switch selecting between algorithmic strategies, e.g., between mergesort and quicksort. Such a switch tends to have global consequences on all aspects of performance. (b) An input-size threshold below which a recursive algorithm switches to a "direct" method. This kind of parameter typically affects the number of instructions. (c) The number of iterations that are unrolled in a loop. Different values of this parameter may result in different register allocation and instruction scheduling on the part of the compiler. (d) Parameters that globally regulate the schedule of operations, such as the size of blocks in a matrix computation. These parameters impact in particular the temporal locality properties; quantitatively speaking, it alters the stack-distance distribution of memory accesses. (e) Parameters that regulate the data layout. An example is the choice between row-major and column-major ordering for two-dimensional arrays, which affects space locality. Another example is the memory offset between data structures (typically obtained by padding) which affects the cache misses due to limited associativity. Problem Structure: There are two major stages in the development of an adaptive package: (1) Parametrization: the identification of a suitable vector of parameters, P, which, by varying in a suitable range, realizes a spectrum of tradeoffs among different machine-resource requirements. (2) Tuning: given a target machine M, the choice of a specific value P* for the parameter vector, with the objective to minimize the running time T(M,P). In most of the existing packages, algorithm parametrization and tuning are pursued in an ad hoc, problem dependent manner. Our efforts will be directed at the development of a framework that can guide both the parametrization stage and the tuning stage. Given the complexity of the problem, realistically, we can expect to take some steps in this direction rather than providing a complete solution.PROPOSED APPROACH to PARAMETRIZATION. We propose to develop a parametric model of computation where the model parameters reflect relevant architectural features. Then, algorithm design can aim at optimizing performance as a function of these features, which become algorithmic parameters. For example, the block size in a matrix computation can be chosen as a function of cache size, if the latter is a parameter of the model. The advantage of this approach is that it enforces an explicit link between algorithmic parameters and architectural ones. While highly desirable, this may not be always feasible. For example, the memory displacement between data structures, although known to affect conflict misses, is hard to formulate in terms of the number of associativity classes of the cache. Hence, in general, we will assume that the parameter vector can be decomposed as P=(P_AI,P_AU), where P_AI are the architecturally interpreted parameters and P_AU are the architecturally uninterpreted ones. As the framework evolves, the boundary between these components can change. In developing the architecturally parametrized model of computation, we plan to proceed in two phases. Phase 1: We will begin by constructing a combination of known models, already widely used in the algorithm design literature. Specifically, we aim at including the following features: - different access time for different levels of the memory hierarchy [AA+97]; - block transfer capability for both internal and external memory [VS94]; - pipelinability of irregular accesses [BEP01,BEP02]; - number of processors and bandwidth/latency properties of submachine clusters [BF+01]. Phase 2: In a second phase, we will explore avenues to extend the model in order to capture further relevant aspects such as register renaming; number, type, and latency of functional units; degree of simultaneous multithreading; finite cache associativity; branch predictors; automatic prefetching (typically, on arithmetic progressions of addresses), structure of interconnection topologies for multiprocessors (beyond bandwidth/latency properties).PROPOSED APPROACH to TUNING. Two diametrically opposed approaches can be considered for the choice of parameter values on a given platform. (1) Empirical search: for each P in a suitable subset of the parameter space, the running time T(M,P) is measured by actually executing the program on M; the value P* that yields the smallest T is chosen. (2) Analytical search: a model is used to obtain an estimate T_est(M,P) as an analytically formulated function of P, which is minimized to yield the choice P*. Empirical search, currently the method of choice in most packages, can be quite expensive due to generally large parameter spaces and to the non negligible time required to measure T(M,P) at a point P. Time constraints will inevitably impose resorting to heuristic search, which in turn leads to a suboptimal parameter choices. The advantage is that it can be implemented even when there is very little understanding of how parameters affect performance. Analytical search, by itself, will also lead to suboptimal solutions both because of a model-induced gap (not all facets of the real machine can be captured in a tractable model) and an analysis gap (even within the model, only an estimate of T is usually obtainable). The potential advantage lies in a substantial reduction of the tuning effort. We plan to explore a combined approach, using analytical techniques to identify a small region around the optimal parameter value and then empirically searching that region. Intuitively, this approach will benefit both search time and quality of solution. In particular, we expect analytical techniques to be very effective in handling the P_AI part of the parameters, whereas empirical search will probably be crucial for the P_AU part in addition to be of help in refining the search for the optimal value of P_AI. We will test this intuition experimentally. We plan to organize the activity in the following two phases: Phase 1: Microbenchmarking. The goal is to develop a suite of microbenchmarks for the automatic extraction of architectural parameters from the target platform, to be fed to the model for the analytical estimate of algorithm parameters. Here, we will build on our own past experience as well as on the efforts of other groups with which we are actively collaborating (in particular, the X-Ray project at Cornell University). Phase 2: Performance Modeling. The goal is to develop accurate timing functions, using code metrics measurable through the hardware performance counters. We will extend a methodology already developed in [AP+00] and successfully applied to shared memory symmetric multiprocessors. CASE STUDIES. We plan a few case studies, pursuing parametrization and tuning of algorithms for specific problems. This process will provide a testbed as well as valuable feedback for shaping the general framework. Selected Dynamic Programming Algorithms. Their choice is motivated both by their key role in several application areas (voice recognition, optimal control, computational biology) and by the fact that they are largely unexplored in the parametrized context, hence have a chance to expose new phenomena. Linear System Solvers for Finite Element Methods. Their choice is motivated by their key role in the solution of PDEs in many areas of science and engineering as well as by a consolidated experience of the research group, developed over a five-year collaboration with structural engineers, on multiphysics models for a number of application domains.Expected partial results of Phase 11. An Architecturally Parametrized Model of Computation (APMC).2. A microbenchmarking infrastructure.3. Parametrized version of dynamic programming and linear system solver algorithms.Expected partial results of Phase 21. Enhancements to the APMC.2. Timing functions to guide parameter tuning.3. Experimental evaluation, on 3 different hardware platforms, of the parametrized algorithms developed as case studies.Task 3.1: Algorithms for Web Searching The wealth of information increasingly available on the Web and in other repositories of digital content has recently triggered a considerable effort aimed at the development of efficient algorithms for several challenging data mining problems. Among the key problems in this field is the discovery of frequent patterns in large datasets, such as collections of transactions or sequences. The problem arises in such diverse fields as weblog and web data analysis, market/customer analysis, bioinformatics, among the others. More precisely, the problem requires to determine all those patterns that occur in a number of input objects with a frequency greater than or equal to a specified fixed threshold. For many practical applications, however, finding a significant threshold value that yields the most interesting/useful output is extremely difficult: too low a threshold could provide a huge number of scarcely relevant patterns, while a high threshold would likely exclude important ones. Threshold sensitivity can be reduced by combining frequency with other `interestingness' criteria, which help guide the search only towards the most relevant patterns. This idea has given rise to mining approaches such as the discovery of maximal, closed or top-k frequent patterns. A number of algorithms for the above mentioned problems has been proposed in recent years (see Section 2.4 for a number of relevant references). These algorithms feature a variety of alternative strategies (e.g., breadth-first vs depth-first exploration) and employ a wealth of data structures (e.g., lists, bit vectors, tries). Often, the choice of a particular strategy/data structure depends on the specific characteristics of the input dataset which are described only in qualitative terms and estimated through simple heuristics. In all the works in the literature, the relative performance of different algorithms is assessed exclusively through experiments by comparing sheer running times on a small number of benchmark datasets. As a result, the outcome of the comparisons may vary according to the specific implementations developed for the algorithms, the platforms they are run on, and the datasets employed. Moreover, in most cases the selected datasets are of small to medium size, unlike those arising in many real applications, hence the influence of accesses to the lower levels of the memory hierarchy (such as external memory) is not properly evaluated. PHASE 1: In the first year, we plan to carry out a detailed study of the most relevant approaches recently proposed for mining frequent patterns satisfying additional interestingness criteria. Our goal is the development of a rigorous parametrized framework to assess the performance of these mining algorithms with respect to uniform and quantitative metrics of performance. Our aim is to obtain a more objective evaluation of the proposed algorithms than what is avaliable in today's literature. In particular, we aim at identifying both algorithmic and architectural parameters that have a strong influence on the algorithm's time and space requirements, and at combining these parameters into effective cost models. We will also set up an experimental testbed to validate the framework on a number of algorithms selected among the most promising ones proposed in the literature. Expected partial results of Phase 1 1. Development of a parametrized framework to assess the performance of algorithms for mining frequent/interesting patterns. 2. Set up of an experimental testbed and validation of the framework on prominent case studies. PHASE 2: In the second year, we plan to develop external-memory algorithms for mining frequent/interesting patterns. The efficiency of these algorithms crucially relies on the adoption of compact data structures to hold (projections of) the input dataset and candidate patterns generated along the way, and the use of pruning strategies to avoid an exhaustive search of the exponentially large pattern space. These issues become particularly challenging when other interstingness measures are combined with frequency, since the management of intermediate data, when kept out-of-core, could easily become a performance bottleneck due to frequent disk accesses. The work done in the first phase will help us identify the most effective pattern generation and candidate-pruning strategies. Moreover, we will devote particular attention to the selection of suitable compressed data structures and to performance optimization with respect to large input sizes, which force the data structures to reside on external memory. To this purpose we will investigate whether the use of compressed tries, which was shown to be very effective for mining frequent itemsets in main memory, can also be beneficial in the external memory setting. Expected partial results of Phase 2 1. External-memory algorithms for mining interesting/frequent patterns in large datasets.Interaction among Tasks/Units. The work of this unit on parametrized algorithms (within Task 1.1) ill benefit from interactions with several other parts of the project, especially those devoted to the design of algorithms for both the internal and the external memory hierarchy, and those with a significant algorithm engineering component.The data mining research (within Task 3.1) will on the one hand draw from the general parametrization principles explored in Task 1.1 and on the other interact with aspects developed by the other units under the same task, in particular, with those working on compressed text indexing tools and data structures for storing strings.