Contenuto
Ti trovi in: HOME »Programmi, progetti e risultati »I progetti »PRIN - Programmi di ricerca di Rilevante Interesse Nazionale»Programma di ricerca»Unità di ricercaINIZIO_TESTO_DA_INDICIZZARE
UNITA' DI RICERCA
italiano - english
Research program
New technologies and tools for the integration of Web search servicesUniversity Co-ordinator
Libera Università di BOLZANO - SCIENZE E TECNOLOGIE INFORMATICHE - ()Research Unit Leader
Diego CalvaneseDescription
The proposed project is devoted to realize the infrastructures for NGS services (Next Generation Search services) and is organized in five tasks:T1: Infrastructure design, focused on the design of an infrastructure for registering Web Services and Web Wrappers. Resources are registered together with their local schema and their input/output message tags are mapped on top of the concepts of a Global Ontology.
T2: Search-time support, supporting the execution of users’ queries, which provides facilities for query submission and refinement and support of the join execution strategy and result materialization.
T3: Wrapper development, enabling the searching and extraction of information from Web data with a Web Service interface that imitates the interface of “conventional” search engines.
T4: Query reformulation, determining the set of services relevant to a user query and the conditions for their pairwise join. Reformulation considers the constraints expressed by the Global Ontology, the local schemas of the services, and the mappings between the Global and the local schemas.
T5: Search optimization, determining the “best” join execution strategy between XML fragments returned as results of search engines. This task is also responsible for defining the join methods and the performance metrics according to different application scenarios.
T1 and T2 will be jointly conducted by the three operative units. The remaining tasks are performed primarily by one operative unit; this unit is responsible for T4. In the following, we describe T1, T2, and then (extensively) T4.
Task T1: Infrastructure design
All the content sources considered in this project are accessed by means of Web Services. It is reasonable to assume that the result of a Web Service call is an XML document whose structure is not only compliant with the WSDL of the invoked service, but also reflects some appropriate strategy for effectively “representing and publishing” the retrieved piece of information. However, a WSDL interface is purely syntactical and as such inappropriate for composition; this consideration motivates the research into the so-called “semantic Web”, aiming at enriching Web Services with ontological content so as to support arbitrary Web Service composition.
Thus, “registering” one such Web Service means, essentially, describing the conceptual properties representing the content that can be extracted from the service, and then describing the meaning of each “output element” in terms of its tags and in terms of the “typical semantics” of an output element produced by the service.
In particular, during the registration phase, Web Services are mapped to the NGS Global schema, represented by an Ontology, which we assume to be formulated in a W3C standard Ontology language, such as OWL or one of its less expressive variants. This is done by providing links from both the input and output specification of the Service to concepts of the Ontology.
In this research, we propose to define a generic scheme for registering Web Services that enables the storage of meta-data describing: (a) the Web Service syntax (request/response), (b) the semantics of tags in the request, (c) the semantics of tags in the response. Such meta-data are then linked to concepts of the Ontology describing the domain. The important aspect is that, as a result of such registration, it becomes possible to compare the “output elements” produced by two distinct services (or by subsequent invocations of the same service) by extending a simple equality testing to more complex reasoning tasks, which make, e.g., use of subsumption checking between concepts.
The “scalability” of our approach (i.e., ability to support multiple sources) depends on the ease of registration of a new service. Therefore, while in the first phase of the research the emphasis will be on providing effective manual linkage of a few services (e.g., Google, Amazon, DBLP, and some wrapped sources so as to support our test queries), we will next investigate the possibility of semi-automatic support for service registration by tools that rely on automated reasoning capabilities.
To deal with the necessity of adapting and extending the Ontology so as to accommodate new information needs, we will draw from the experience gained in Information Integration, where good scalability is achieved by expressing local schemas as queries over the global schema, when several Data Sources are expected to be added. Analogously, in the context of NGS, one can, e.g., express the output of a given Service as a query over concepts of the Ontology.
In this context, two different policies for adapting and extending the Ontology can be considered: (i) A Service Driven Extension, occurring when newly registered Web Services refer to or are better represented by concepts which are not yet represented in the Ontology; in this case the Ontology is to be augmented so as to represent the semantic knowledge carried by the Web Service. (ii) A Query Driven Extension, occurring when user queries require information that does not yet have a semantic counterpart in the Ontology; in this case we may consider to add concepts to the Ontology so as to match newly expressed user needs, and to subsequently map Services to the Ontology whose output satisfies the newly added concepts.
In this research, we propose to study such policies in the context of NGS, and specifically to investigate how to support them through automated reasoning.
Task T2: Search-time support
This task consists of querying one source and then storing its results for subsequent processing. The query is performed by invoking a suitable Web Service (request part) and then managing the response and retrieving the results according to an interface that enables the partial loading of the first N entries of the result, where N is a parameter established at calling time. In general, this task is trivial if the results are provided as plain XML records and if the Web Service interface enables to control the number of entries returned as result, as in the cases of Google or Amazon Web Services.
However, in general, Web Services may not have sophisticated control provisions and they may return unbound amounts of information. In such case, the task has responsibility for making good use of available resources.
Moreover, this task is also responsible for managing the answers which are provided in formats other than XML records, and of aligning them to the standard format used for later processing of the query, while at the same time keeping the reference from the aligned record to the result returned by the service, that is probably of interest to the user.
An example of this functionality is the transformation required for “reading” a map provided in a graphical format with XML annotations regarding points on the map, and then for managing the semantics of specific queries, such as requesting the extraction of locations which are “within a given distance” from a given point. In such case, while the query could even be expressed graphically on the map, the system must able to respond not in terms of a graphic subset of the map, but rather in terms of the items which fall inside that area and represent “locations” (e.g. city names or zip codes), so as to enable the composition of this result with other results. Moreover, “closeness” to a point has to be used for ranking the results before putting them in an XML format which is compatible with the other partial results.
This task is also responsible for capturing the interaction with the users in order to improve the iterative execution of searches. It is well known that the interaction with search engines is typically an iterative process, where users perform several iterations of the search by altering the choice of input terms based upon the results of the previous iteration, until they are satisfied; normally, this process converges to a “better” result. Well known techniques of information retrieval allow the user to further condition the search by indicating, in the results, the elements that are either highly relevant or irrelevant.
We believe that user input may be very useful to improve search strategies, and therefore we plan to spend the final period of the project in experimenting and testing various alternatives for user’s involvement. We plan to enable users to indicate which retrieved concepts better represent its intended meaning, and we plan to trace them back to the inputs being presented to given search services, so as to repeat such searches with improved input. The “tracing” can either be automatic or also be helped by users, by means of suitable interactions. In addition, interaction may be used to confirm conjectural matches (e.g., the matches of concepts such as “professor”, “researcher”, “author”, if not explicitly supported by the vocabulary).
Task T4: Query Reformulation Techniques
Task T4 has the goal of reformulating a user query into a set of queries to be issued to the available services. Furthermore, it will provide additional information to be used for determining the join operations on the component queries that are necessary to reconstruct the correct answer to the original query. In the environment we have in mind, a query is specified by means of a conjunction of terms, each referring to a concept in the Global Ontology.
The Global Ontology itself serves several purposes in the NGS architecture. On the one hand, it presents to the user a unified view of the available information exported by the underlying services, providing the vocabulary in terms of which to formulate requests. This allows the user to ignore how the information is actually distributed over the services accessible through the system, so that in his/her request he/she can concentrate on the desired semantic conditions, rather than system-specific aspects. On the other hand, the Global Ontology serves as a target for mapping the information in the local schemas of the available services, thus providing a uniform interface also for the description of the services’ inputs and outputs. We consider such a kind of mappings (termed LAV, for Local-as-View, in the context of Data Integration), since they provide great flexibility in a dynamic environment, where new web-services may repeatedly join the system. Finally, the Global Ontology encodes several kinds of constraints that are known to hold in the application domain, e.g., is-a, part-of, disjointness, or more general relationships between concepts. Such constraints should be considered during the query answering process, since they can indeed contribute to expand the set of answers. For example, if we assume that the Global Ontology stores the information that each VLDB author does research in databases, then, a query asking for database researchers should return also VLDB authors, even if they are not explicitly reported to be database researchers by any service in the system. Hence, the query reformulation algorithm should include in the reformulation of such a query the access to the service that returns the VLDB authors, if such a service is available in the system.
Each local schema of a service managed by the NGS system specifies on the one hand the vocabulary of terms that can be interpreted by the service, and in terms of which a user request that will need to access that service has to be reformulated. On the other hand, the local schema may introduce also access limitations, which are typical of web sources, even when they are wrapped. For example, considering a web service provided by an online bookstore, it will not be possible to ask such a service to return all book-author pairs for which it stores information. Instead, the service may accept a request asking for all books of a given author. Such limitations need to be considered when accessing a service. Also, the reformulation algorithm needs to take them into account in order to avoid the generation of reformulations that would violate them. Moreover, it would be desirable to generate a query plan that is able to overcome the limitations but still provides meaningful answers.
The development of query reformulation algorithms for the NGS setting requires dealing with all aspects mentioned above, namely the constraints in the Global Ontology, the LAV mappings from the local schemas to the Global Ontology, and the limitations in accessing services. It should be noted that, while partial solutions for the different sub-problems are known, query reformulation under the combination of all above mentioned features has not been addressed before. The investigation of this problem will require combining automated reasoning over ontologies with query reformulation techniques.
First, we note that, in the presence of LAV mappings, the query cannot be directly reformulated in terms of the mapped services. However, we can exploit the possibility to transform LAV mappings into GAV (Global-as-View) mappings, in which each concept of the Global Ontology is directly mapped over the services. This requires to introduce suitable constraints in the global schema (here, the Global Ontology) that encode what was originally expressed in LAV and cannot be conveniently stated in GAV. Therefore, the overall set of constraints will eventually contain the constraints that were initially present in the Global Ontology, plus those that were introduced by the LAV-to-GAV transformation. In this respect, the application of techniques for query answering in the presence of constraints (particularly, constraints of this specific kind) will prove useful. Furthermore, LAV rewriting has not been studied in the context of access limitations and further insight will be gained on the interaction between these two aspects. It is known that query rewriting in the presence of access limitations may generate recursive queries and that, in turn, the problem of answering general recursive queries in the presence of constraints is undecidable. However, it remains to be investigated how recursive queries originating from access limitations, in which recursion occurs in a restricted form, interact with the kinds of constraints present in the NGS setting, and whether such a kind of interaction would still lead to undecidability. Possibly, suitable restrictions on the allowed queries and/or constraints will need to be introduced, so as to achieve decidability and efficiency of query processing. Finally, we plan to adapt known techniques for optimizing query plans under access limitations, which exploit the specific structure of the query or of the schemas exposed by the services, so as to take into account also constraints at the global level.
Workplan
The team will participate to the following deliverables:
D1.1. Technology-oriented state-of-the-art concerning Web Services, including the choice of the Web services to be used in the project and of their ontological domains (M3)
D1.2. Definition of the Web Service Registration platform architecture (M6)
D2.1. Definition of the protocols for web service invocation and for storing partial results (M6)
D4.1. Definition of global/local mapping schemes and definition of their link to a standard domain ontology (M8)
D2.2. First running prototype supporting two sources, simple join methods, and no user interaction (M12)
D4.2. Design of advanced query reformulation algorithms that take into account ontological constraints and access limitations on wrapped services (M18)
D2.3. Second running prototype supporting multiple sources, several join methods, and user interaction (M22)
D1.3. Experimentation and evaluation (M24)



