Vai al contenuto| Home page|

   Ti trovi in: HOME »Programmi, progetti e risultati »I progetti »PRIN - Programmi di ricerca di Rilevante Interesse Nazionale»Programma di ricerca
INIZIO_TESTO_DA_INDICIZZARE

RESEARCH PROGRAM

italiano - inglese
Similar research programs:
Scientific and education field classification
International Patent Classification
Geographical classification
Bibliografia
[A98]B.Adelberg.NoDoSE.A tool for semi-automatically extracting structured and semistructured data from text documents.SIGMOD98
[ABS04]S.Amer-Yahia,C.Botev,J.Shanmugasundaram.TeXQuery: A Full-Text Search Extension to XQuery.WWW04
[AG03]A.Arasu,H.Garcia-Molina.Extracting Structured Data from Web Pages.SIGMOD03
[AVF+98]S.Abiteboul,V.Vianu,B.Fordham,Y.Yesha.Relational Transducers for Electronic Commerce.PODS98
[BBC+98]P.Bernstein et. al. The asilomar report on database research,1998. [BCD+03]D.Berardi,D.Calvanese,G.De Giacomo,M.Lenzerini,M.Mecella.Automatic Composition of E-Services that Export their Behavior.ICSOC03
[BCG+05]D.Berardi,D.Calvanese,G.De Giacomo,R.Hull,M.Mecella.Automatic composition of transition based semantic web services with messaging.VLDB05
[BFG01]R.Baumgartner,S.Flesca,G.Gottlob.Supervised Wrapper Generation with Lixto. VLDB01
[BFHS03]T.Bultan,X.Fu,R.Hull,J.Su.Conversation specification: a new approach to design and analysis of e-service composition.WWW03
[BLR97]C.Beeri,A.Y.Levy,M.C.Rousset.Rewriting queries using views in description logics.PODS97
[C03]A.Calì.Reasoning in data integration systems: Why LAVand GAV are siblings.ISMIS03
[CC02]A.Calì,D.Calvanese.Optimized querying of integrated data over the Web.EISIC02
[CCDL01]A.Calì,D.Calvanese,G.De Giacomo,M. Lenzerini.Accessing data integration systems through conceptual schemas.ER01
[CDL01a]D.Calvanese,G.De Giacomo,M.Lenzerini.A framework for ontology integration.SWWS01
[CDL01b] D. Calvanese, G. De Giacomo, M. Lenzerini. Ontology of integration and integration of ontologies. Description Logic Workshop 2001
[CDLV00]D.Calvanese,G.De Giacomo,M.Lenzerini,M.Y.Vardi.View-based query processing and constraint satisfaction.LICS00
[CGL+98]D.Calvanese,G.De Giacomo,M.Lenzerini,D.Nardi,R.Rosati.Information integration: Conceptual modeling and reasoning support.CoopIS98
[CHK01]V.Christophides,R.Hull,A.Kumar.Querying and Splicing of XML Workflows.
[CM01]V.Crescenzi,G.Mecca,P.Merialdo.RoadRunner: Towards Automatic Data Extraction from Large Web Sites.VLDB01
[CM04]V.Crescenzi,G.Mecca.Automatic information extraction from large websites. J. of the ACM,51(5),2004
[CS01]F.Casati,M.Shan.Dynamic and adaptive composition of e-services. In Information Systems 26(3),2001
[DCFS04]S.Das,E.I.Chong,G.Eadon,J.Srinivasan.Supporting Ontology-Based Semantic matching in RDBMS.VLDB04
[DDH01]A.Doan,P.Domingos,A.Y.Halevy.Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach.SIGMOD01
[DHM04]X.Dong,A.Y.Halevy,J.Madhavan,E.Nemes,J.Zhang.Similarity Search for Web Services. VLDB04
[DL97]O.M.Duschka,A.Y.Levy.Recursive plans for information gathering. IJCAI97
[DLN05]A.Deutsch,B.Ludascher,A.Nash.Rewriting queries using views with access patterns under integrity constraints.ICDT05
[DR02]H.H.Do,E.Rahm.COMA-A System for Flexible Combination of Schema Matching Approaches.VLDB02.
[DSV04]A.Deutsch,L.Sui,V.Vianu.Specification and verification of data-driven web services.PODS04
[E02]C.M.Eastman.30,000 hits may be better than 300: Precision anomalies in Internet
searches. J. ASIST 53,11,2002
[ECJL+99] D.W.Embley, M.D.Campbell, Y.S.Jiang, S.W.Liddle, Y.K.Ng, D.Quass, R.D.Smith. Conceptual-model-based data extraction from multiple-record Web pages. Data Knowl.Eng.99
[F98] D.Freitag. Information extraction from html: Application of a general learning approach. AAAI98
[FBS04]X.Fu,T.Bultan,J.Su.Analysis of interacting BPEL web services.2004.
[FGK02]D.Florescu,A.Gruenhagen,D.Kossmann.XL: A Programming Language for Web Service Specification and Composition.WWW02
[FKMP03]R.Fagin,P.G.Kolaitis,R.J.Miller,L.Popa.Data exchange: Semantics and query answering.ICDT03
[FLM99]M.Friedman,A.Y.Levy,T.Millstein. Navigational plans for data integration.AAAI99
[FLMS99]D,Florescu,A.Y.Levy,I.Manolescu,D.Suciu.Query optimization in the presence of limited access patterns.SIGMOD99
[GLR00]F.Goasdoue,V.Lattes,M.C.Rousset.The use of CARIN language and algorithms for information integration: the picsel system.Int. J. on Cooperative Information Systems,2000
[GRGK97]V.N.Gudivada,V.V.Raghavan,W.I.Grosky,R.Kasanagottu.Information retrieval on the World Wide Web. IEEE Internet Comput.Sept-Oct,1997
[GWG96]S.Gauch,G.Wang,M.Gomez.Profusion: Intelligent fusion from multiple, different search engines. J. Univ. Comput.Sci.2,9,Sept,997
[H04]Y.Halevy.Structures, Semantics and Statistics.VLDB04
[HBCS03]R.Hull,M.Benedikt,V.Christophides,J.Su.E-Services: A Look Behind the Curtain.PODS03
[HDIM03]A.Halevy,O.E.A.Doan,Z.Ives,J.Madhavan.Crossing the structure chasm.CIDR03
[J00]B.J.Jansen.The effect of query complexity on web searching results.Inf. Res.,6(1),2000.
[JLVV99]M.Jarke,M.Lenzerini,Y.Vassiliou,P.Passiliadis,editors.Fundamentals of Data Warehouses.Springer1999.
[JQC+00]M.Jarke,V.Quix,D.Calvanese,M.Lenzerini,E.Franconi,S.Ligoudistiano,P.Vassiliadis,Y.Vassiliou.Concept based design of data warehouses: The DWQ demonstra-tors.SIGMOD00
[L02]M.Lenzerini.Data integration: A theoretical perspective.PODS02
[LC00]C.Li,E.Chang.Query planning with limited source capabilities.ICDE00
[LC01]C.Li,E.Chang.On answering queries in the presence of limited access patterns.ICDT01.
[LGMK04]K.Lerman,C.Gazen,S.Minton,C.A.Knoblock.Populating the semantic web.AAAI04 Workshop on Advances in Text Extraction and Mining
[LSK95]A.Y.Levy,D.Srivastava,T.Kirk.Data model and query evaluation in global information systems. J. of Intelligent Information Systems,5,1995.
[LT02]W.Lucas,H.Topi.Form And Function: The Impact Of Query Term And Operator Usage On Web Search Results. J. Asist 53,2,2002
[MBR01]J.Madhavan,P.A.Bernstein,E.Rahm.Generic Schema Matching with Cupid.VLDB01.
[MBR05]J.Madhavan,P.A.Bernstein,A.H.Doan,A.H.Halevy.Corpus-based Schema Matching.ICDE05.
[MGR02]S.Melnik,H.Garcia-Molina,E.Rahm.Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching.ICDE02
[MM03]S.A.McIlraith,D.L.Martin.Bringing semantics to Web Services.IEEE Intelligent Systems,18(1):90-93,2003
[MMK99]I.Muslea,S.Minton,C.A.Knoblock.A hierarchical approach to wrapper induction.Conference on Autonomous Agents 1999.
[MS02]S.A.McIlraith,T.Cao Son.Adapting Golog for composition of Semantic Web services.KR02
[MZ98]T.Milo,S.Zohar.Using Schema Matching to Simplify Heterogeneous Data Translation. VLDB98
[N04]N.F.Noy.Semantic Integration.A Survey Of Ontology-Based Approaches.SIGMOD Record33(4),2004
[NL04]A.Nash,B.Ludascher.Processing first-order queries under limited access patterns.PODS04
[RB01]E.Rahm,P.A.Bernstein.A survey of approaches to automatic schema matching.VLDB J. 10(4),2001
[RSU95]A.Rajaraman,Y.Sagiv,J.D.Ullman.Answering queries using templates with binding patterns.PODS95
[S99]S.Soderrland.Learning information extraction rules for semistructured and free text. Mach. Learn.99.
[SDWG95]M.A.Sheldon,A.Duda,R.Weiss,D.K.Gifford.Discover: A resource discovery system based on content routing.WWW95.
[SO00]W.Sadiq,M.Orlowska.Analyzing Process Models Using Graph Reduction.In Information Systems 25(2):2000
[SPW+04]E.Sirin,B.Parsia,D.Wu,J.A.Hendler,D.S.Nau.Htn planning for web service composition using shop2. Journal of Web Semantics,1(4),2004
[SS04]S.Staab,R.Studer(Editors).Handbook on Ontologies, Springer 2004
[TP04]P.Traverso,M.Pistore.Automated composition of semantic web services into executable processes. Semantic Web Conference04
[U97]J.D.Ullman.Information integration using logical views.ICDT97
[UG04]M.Uschold,M.Grunninger. Ontologies and Semantics for Seamless Connectivity. SIGMOD Record 33(4),2004
[VMP04]Y.Velegrakis,R.J.Miller,L.Popa.Preserving mapping consistency under schema changes. VLDB J.13(3),2004
[WGST04]G.Weikum,J.Graupmann,R.Schenkel,M.Theobald.Towards a Statistically Semantic Web.ER2004
[WMB94]I.H.Witten,A.Moffat,T.C.Bell.Managing Gigabytes: Compressing and Indexing Documents and Images.Von Nostrand Reinhold,New York,1994.
[WYDM04]W.Wu,C.T.Yu,A.Doan,W.Meng.An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web.SIGMOD04
Keywords
WEB SERVICE, WEB SEARCH, JOIN, ONTOLOGIES, WRAPPER

New technologies and tools for the integration of Web search services

Politecnico di Milano
Abstract
The current evolution of the Web is characterized by an increasing number of search engines and query interfaces, ranging from generic ones (Google) to domain-specific ones (geo-localization services or on-line catalogs). Meanwhile, wrapping technology is evolving so as to enable the development of specialized services extracting content from data-intensive Web sites (wrappers of sites delivering bond quotes), and exposing them as Web Services.

While an increasing amount of search services on the Web becomes available, they still work in isolation; their intrinsic limit is the inability to support complex queries ranging over multiple domains. Queries such as “search all vegetarian restaurants close to Milan” require combining search engines specialized over different domains, such as geographic locations and restaurants. The focus of this research proposal is to contribute to the development of a new generation search engine (NGS) which integrates known services and provides the user with a single interface.

The focus of the project is on technology integration and in the development of new algorithms for matching the search requests to independent services. This proposal is not concerned with search engine methods per se, but rather in improving the overall power of search engines by means of the combination of techniques from various fields of research (specifically: keyword-driven concept matching, user-driven query optimization, wrapping development and maintenance). The recent availability of service-enabled and XML-related technologies, and the recent possibility of using ontological knowledge in the context of data mapping (as suggested in Semantic Web research) makes such a project feasible and actually quite relevant and up-to-date.

The project has five task: (a) infrastructure design, required for providing the registration of each search service and its local schema description in terms of the ontological knowledge; (b) search-time support, for query submission and refinement, and for supporting the provision of results to the user: (c) wrapper development, focused on the issue of automatic wrapper generation, (d) query reformulation, focused on the determination of the set of services which are relevant to a given query based upon deduction, and (e) search optimization, building the best strategy for “joining” search engines. The proposed project has two years duration and involves about 250 man-months.

The project is proposed by three units with complementary skills; every unit has recognized international experience and brings to the Consortium a technology required by the project, specifically: wrapping technology, deductive technology, and data management technology for Web. <<<

Principal Investigator
Stefano Ceri Politecnico di MILANO
Research Objectives
The current evolution of the Web is characterized by an increasing number of search engines and query interfaces, ranging from generic ones (Google) to domain-specific
ones (geo-localization services or on-line catalogs). Meanwhile, wrapping technology is evolving so as to enable the development of specialized services extracting content from data-intensive Web sites (wrappers of sites delivering bond quotes), and exposing them as Web Services. While each search engine or wrapper interface can be separately used to issue focused queries, their intrinsic limit is the inability to support complex queries, ranging over multiple domains. Such queries can be only answered, at the current state of art, by a deep involvement of a knowledgeable user, who inspects services one at a time to determine which are relevant for the given request, and then possibly feeds the results of one search as input to the next. However users do not want to be bothered by distinctions between many heterogeneous data sources, and desire to have one system available for querying such sources; moreover, while they can accept to interact multiple times when their query is rather complex, they certainly do not want to “cut-and-paste” query results into query inputs, as such approach is time-consuming and error-prone.

The focus of this research proposal is to contribute to the development of a new generation search engine (NGS) which integrates known services and provides the user with a single interface. The availability of Web Service and XML-related technologies makes such a project feasible and actually we deem that quite promising results could be achieved within the timeframe of the project. The focus of the project is on technology integration and in the development of new algorithms for matching the search requests to independent services. This proposal is not concerned with search engine methods per se, but rather in improving the overall power of distinct search engines by means of the combination of techniques from various fields of research (specifically: optimization of ranked queries, wrapping development and maintenance, use of reasoning).

Description of the problem

Our main objective is to develop and study a framework that offers a common interface to several search services specialized on different domains. Queries that address orthogonal dimensions are the following.

Query 1: Find a good vegetarian restaurant at approximately 30 miles from Milan.
Query 2: Find all VLDB authors from Politecnico di Torino.


These queries are conjunctions of simpler queries over independent dimensions. The first one requires a geographic service and another one for restaurants, the second one requires combining a Web Service about publications and the staff of a Faculty. From a technological viewpoint, this research can be rephrased as follows: how can we integrate Web Services, query engines, and wrappers in order to enable such complex searches?

Service registration

In order to build a common interface to search services, they must be introduced and semantically described (service registration). We consider a scenario in which the NGS exports a Global Ontology of the concepts relevant for the domain of interest. Each service is described in terms of the local schema of the queries that each local service is capable to support and of a mapping of each local schema to the Global Ontology.

The combination of local schema and mapping is necessary in order to describe search capabilities, and will be used, together with semantic constraints expressed in the Global Ontology for routing sub-queries to the appropriate service.

Our first research objective is to enable a flexible registration scheme enabling new services to be added; specifically, we use existing schema mappings and the semantic constraints in the Global Ontology to infer new mapping properties when a local schema is added, so as to simplify registration and at the same time to make it maximally effective.

Source Wrapping

A relevant portion of the Web consists of data intensive web sites: they could represent rich and up-to-date information sources. However, since these site deliver data only as HTML pages, it is not easy to build applications that compute over these sources. To include them in our framework, we need to “wrap” their pages, and build a search interface. This process aims at extracting and organizing into XML the data from the original pages. In order to develop a scalable system, the generation of wrappers should be automatic. Several approaches have been proposed for wrappers generation. However, these approaches are based on formalisms lacking in expressivity, limiting the quality of the extracted data in real-life web sites.

Our second research objective is to enhance wrapper generation in order to improve the availability of Web-supported data sources, with a Web Service interface that imitates the interface of “conventional” search engines.

Query Specification and Reformulation

Users specify their requests to the NGS system via a query language that allows them to express their goals by referring to concepts of the Global Ontology. To capture the semantics of requests, the language should provide means to combine concepts, similarly to what is done, e.g., in Relational Databases with Conjunctive Queries. The choice of a query language is an important research step.

The query language and the Global Ontology hide the specificity of the services. So in our queries, the query processing component should infer that the search requires the composition of two specific services and should then indicate how the services should be invoked. We also note that this context has an additional difficulty: Web Services cannot be queried freely but only via exposed methods.

We also envisage an interaction to guide user’s query refinement. The system might, e.g., propose to replace query components or suggest the use of certain attributes, perhaps after showing partial results.

Generalizing, the NGS system receives as input a user query and infers the set of relevant services; it then outputs local results, together with information enabling Web Service composition. Our third research objective is thus to “decompose” a user query into a chain of Web Service calls paired by means of ontological knowledge.

Join of two Search Services

The proposed query answering strategy uses as building block the join of the results produced by two Web Services; such join takes place between ranked lists of XML result items. Results are returned by search engines in ranking order and in batches, where each batch requires a request-response pair. The “joiner” must evaluate the matching pairs and return then “up” in the chain; results are obtained by composing pairs of entries according to a known composition strategy.

Our fourth research objective is to build a collection of effective and efficient join methods, and strategies for selecting the “best” method; this problem is a variant of the well-known “join optimization problem” of relational databases, however with totally new methods and metrics adapt to Web Services. Methods should output join results in ranking order, serve a batch of user requests as early as possible, and minimize the overall cost of the computation. In many applications such minimization is achieved by reducing the number of request-response pairs.

In conclusion, by achieving the four objectives described above, we develop a new-generation search engine system which exhibits flexibility and user-friendliness. The proposed research will as well produce measures of the performance at various levels, including precision and recall of users’ queries; expressiveness of the deductive process used for query decomposition; effectiveness and performance of the join methods. <<<
Timescale
24 months
National and international background
The general problem of searching the Web with more powerful tools than current search engines is described in details in [WGST04]. This is just one further formulation of a problem which has been posed many times, and addressed each time with respect to the technological state of the art. As an example, eight years ago the database community considered the issue in the Asilomar report [BBC+98].

Search Engines and Information Retrieval

Search Engines, among the most sophisticated and useful resources available on the Internet, assist the user in the task of rapidly and effectively navigating the Web. To some extent, the problem of finding information on the Web can be rephrased as the problem of knowing where search engines are, what they are designed to retrieve, and how to use them. Two different types of engines have been developed so far: large-scale and specific search engines. Large-scale engines exemplify the trade-off between breadth and quality, while the specific ones are more likely to quickly focus a search in one particular area.
Information Retrieval systems are software tools which help users in the task of finding documents contained in a specific corpus or database. Such systems are also widely used on the Web for finding scholarly information as well as for many other recreational activities. The most popular information retrieval technique involves combining the full text of all documents within a corpus into an inverted index. Search engines use the common tf-idf ranking algorithm (term frequency times inverse document frequency), to exploit two important qualities of natural-language text to perform accurate retrieval: term frequency and inverse document frequency [WMB94].
There have been few studies comparing the retrieval results of different search engines using different query formulations [GRGK97, E02, J00]. [GRGK97] presents comparisons using unrelated queries. Lucas and Topi [LT02] use eight search topics from which naive and expert queries were formulated and submitted to various Web search engines to evaluate relevancy. [E02] explores the precision of search engines using a variety of topics and query formulations, noting that precision did not necessarily improve with the use of the advanced query operators.
In the last decade many search tools have been developed.

Matching

The problem of automatically matching schemas has been largely studied in literature (see [RB01] and [HDIM03] for a survey). Several approaches have been proposed that try to capture clues about the semantics of the schemas and suggest matches based on them. Such methods include linguistic analysis [MZ98], structural analysis [DDH01, MBR01, MGR02], and previous matching experience [DR02, VMP04]. However, the problem of matching against Web services differs from schema matching in two significant ways. First, the granularity of the search is different: Web service matching should be complete, while schema matching looks for similar components in two given schemas. Second, the scheme of a Web service provides less information than a database scheme and therefore traditional schema matching techniques cannot be easily adapted to our context.

In semantic query processing a user specifies the output of a query in terms of concepts familiar to him/her (which may not be the same as the names specified in the database schema) and the system figures out how to produce that output. Many approaches have been proposed to this general problem and recently the use of ontologies [SS04] has gained popularity in this context [DCFS04] and, more in general, in the context of semantic matching [N04, UG04].

A nice introduction to the need for a “statistically semantic” Web has been given in [WGST04]. In particular, it has been shown that the analysis of a large number of structures and mappings in a particular domain is a powerful approach for discovering semantic mappings [H04]. The intuition behind this approach is that statistics computed over a large number of structures can be used to provide hints about the semantics of the symbols used in these structures. Then, these statistics can be leveraged to predict when two symbols, from disparate structures, are meant to represent the same domain concept [DHM04]. In [WYDM04] the authors collectively match a number of related Web forms by clustering their fields. In [DHM04] it is shown how a corpus of Web services can be used to find clusters of parameter names corresponding to concepts. Recently, the corpus-based approach has been applied to the schema matching problem [MBR05].

Wrappers

Wrappers address the need for extracting information from Web sites. The first studies [A98,ECJL+99, F98, MMK99, S99] lead to develop semi-automated systems, capable of extracting information in an automatic manner only after a training phase, performed with user intervention. Among such systems, Lixto [BFG01] is particularly relevant.
In order to alleviate the need for a human intervention, several researchers have developed techniques to automatically infer Web wrappers [CM01,AG03,LGMK04]. These techniques are based on the observation that pages from data intensive Web sites can be grouped in classes sharing a common structure.
Arasu and Garcia-Molina have proposed an algorithm, called EXALG, for extracting structured data from a collection of Web pages generated by encoding data from a database into a common template [AG03]. To discover the underlying template that generated the pages, EXALG uses so called Large and Frequently occurring EQuivalent classes (LFEQ), i.e. sets of words that have similar occurrence pattern in the input pages.
Lerman et al. have developed an approach for the automatic extraction and segmentation of records from Web tables [LGMK04]. Their approach relies on a specific pattern that occurs in many Web sites for presenting lists of items: a index page containing a list of short summaries, one for each item, which include a link leading to a page about details of the specific item. Their approach leverages on the redundant information of this pattern and is based on CSP and on probabilistic inference techniques.
Roadrunner is a system for the automatic generation of Web wrappers [CM01] for the extraction of data from data-intensive Web sites, i.e., HTML-based sites with large amounts of data and a fairly regular structure. In these sites, pages are usually generated automatically by scripts that encode data extracted from a back-end database into HTML pages. A nice property of these sites is that pages generated by the same script share a common structure. Roadrunner leverages this feature by exploiting similarities and differences among pages generated by the same script in order to automatically create a wrapper. The core of Roadrunner is Match, an unsupervised algorithm to infer a wrapper from a set of sample pages. Match iteratively refines a wrapper, by parsing pages of the sample set and generalizing the wrapper whenever the parsing process fails. It has been proved that Match can produce exactly one solution in polynomial time for a specific class of languages, called Prefix Mark-up Languages, that nicely abstract the organization of data in HTML pages [CM04].

Query languages for XML and full text search

The W3C (World Wide Web Consortium) promotes two standard textual languages to express XML document transformations and queries over XML data, XSLT and XQuery respectively. The ancestors of XQuery have diverse expressive powers; XSLT and XQuery, instead, are both Turing complete (they both support recursion and composition of user-defined functions). Moreover, XQuery is a fully functional language, in which expressions can be substituted by their results and computations can be decomposed into partial transformations. Therefore, views are supported “by construction”.

One of the key benefits of XML is its ability to represent a combination of structured and unstructured (text) data. One can already find many real XML data repositories that contain such a mix of structured and text data such as the IEEE INEX data collection, ACM SIGMOD Record, Shakespearés plays.

TeXQuery [ABS04] is a powerful full-text search extension to XQuery, which provides a rich set of fully composable full-text search primitives, such as Boolean connectives, phrase matching, proximity distance, stemming and thesauri. TeXQuery also enables users to seamlessly query over both structured and full-text data by embedding TeXQuery primitives in XQuery, and vice versa. Finally, TeXQuery supports a flexible scoring construct that can be used to score query results based on full-text predicates. TeXQuery is also the precursor of the XQuery 1.0 and XPath 2.0 Full-Text language currently being developed by the W3C Full-Text Task Force (FTTF).

Web Services and their Integration

The basic language for describing Web services is WSDL (Web Service Description Language). The composition of Web services to constitute complex conversations is not supported by WSDL, but is the main purpose of several standardization proposals, logically positioned at the top of the so-called Web service standard stack, the most recent of which are BPEL4WS (Business Process Execution Language for Web Services), WSCL (Service Conversation Language), and WSCI (Web Service Choreography Interface). Several XML-based languages for encoding workflows have been proposed (see [CHK01] for a discussion). XL [FGK02] is an XML-based programming language that allows both defining and combining services. Among other Web service composition proposals, [CS01] describes E-Flow developed at HP.
An example of formalization of Web services is offered by [BCD+03]; the article describes a framework defining a set of elementary actions, and a set of Web services whose implementation rely only on these actions. Hull, Benedikt, Christophides, and Su [HBCS03] give a broader view on the Web services scenario, by using contributions from the theory of computation to asses the Web service properties that can be automatically inferred. Several works provide formalisms and verification procedures for ensuring that a specific conversation complies with a given coordination protocol [AVF+98,SO00].

Several works on automatic composition of Web services are found in the literature [MM03, MS02, SPW+04]. Most results are based on the idea of sequentially composing the available services, which are considered as blackboxes, and hence atomically executed. In [BCG+05], services are modelled as Mealy machines equipped with a queue, and they exchange information according to the conversational model [BFHS03]. Other approaches to service compositions are based on guarded automata [FBS04], nondeterministic transition systems [TP04], and relational databases with a tree of Web pages [DSV04].

Investigation on Web service integration can take advantage of the large corpus of research carried out in information integration [CDL01b,CDL01a]. For surveys on data integration, see [JLVV99,L02]. Information integration consists in providing the user a unified view, called global schema or mediated schema, of a heterogeneous set of data sources. Several research
works have addressed the fundamental problem of how to specify the correspondence, called mapping, between the global schema (global ontology) and the local information services (local ontologies), see, e.g., [LSK95, CGL+98, GLR00, JQC+00]. For defining the mapping in an appropriate way, the notion of query is crucial, since it is very likely that concepts of the global schema and those of the local services need to be related to each other by making useof queries (i.e., views). This leads us to observe that the problem of information integration is tightly related to the problem of view-based query processing in data integration [BLR97,CGLV00]. Two basic approaches have been used to specify the relation between the sources and the global schema [U97].The first approach, called global-as-view, requires that the global schema is expressed in terms of the concepts in the sources. The second approach, called local-as-view, requires the global schema to be specified independently from the sources. The relationships between the global schema and the sources are established by defining every source as a view over the global schema. The latter approach favours the extensibility of the integration system, and provides a more appropriate setting for its maintenance: for example, adding a new source to the system requires only to provide the definition of the source, and does not necessarily
involve changes in the mediated schema. On the contrary, in the global-as-view approach, adding a new source typically requires changing the definition of the concepts in the global schema. Interestingly, also approaches that mix the global-as-view and local-as-view perspective have been proposed [FLM99, CDL01a,FKMP03, C03]. Besides the complication of dealing with the mapping, in presence of local and global ontologies specified in an expressive language, reasoning on the ontologies is needed [CCDL01].

Another topic that is related to Web service integration in systems is that of answering relational queries in the presence of access limitations. Access limitations are present, for example, when a data-intensive Web site offers access to its information through a Web form; in such a case, usually some of the fields need to be filled in with some value in order to query the underlying
database. When several of such sources are to be integrated, query processing becomes complicated [RSU95, LC00, FLMS99, DL97, NL04, DLN05], since in this case the known techniques for query answering are in general not sufficient. As shown in [RSU95, LC00, LC01], query answering in the presence of access limitations in general requires the evaluation of a recursive query plan, which can be suitably expressed in Datalog. Since source accesses are costly, an important issue is how to minimize the number of accesses to the sources while still being guaranteed to obtain all possible answers to a query. [LC00, LC01] discuss several optimizations that can be made at compile time, during query plan generation. However, the presented techniques are not applicable in the case where user queries and view definitions are arbitrary conjunctive queries. A technique for optimizing query answering for the full class of conjunctive queries is presented in [CC02], together with a runtime optimization technique.

In the project we also expect to benefit from results of research initiatives such as OWL and WSMO. The OWL Web Ontology Language is a language designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content. WSMO (Web Service Modeling Ontology) describes various aspects related to semantic Web services. WSMO takes the Web Service Modeling Framework (WSMF) as a starting point, it refines and extends this framework and develops a formal ontology and language. WSMF consists of four different main elements for describing semantic Web services: (1) ontologies that provide the terminology used by other elements, (2) goals that define the problems that should be solved by Web services, (3) Web services descriptions that define various aspects of a Web service, and (4) mediators which bypass interpretability problems.
Discover provides both query refinement and query routing to over 500 WAIS sites [SDWG95]. The system suggests modifications to the user’s query so that one can avoid useless results. The MetaCrawler [SE95] project at the University of Washington integrates a set of general Web search engines and dispatches queries to each of them.
The ProFusion system combines many features of the other engines [GWG96]. <<<