Contenuto
Ti trovi in: HOME »Programmi, progetti e risultati »I progetti »PRIN - Programmi di ricerca di Rilevante Interesse Nazionale»Programma di ricercaINIZIO_TESTO_DA_INDICIZZARE
RESEARCH PROGRAM
italiano - inglese
Research Units
Similar research programs:
- 1 - Linguistic Atlas of Sicily. The Geolinguistic-Anthropological Dimension in Sicily: Cartographic Representation, Diatopic Lexicography, Iconic Representation, Archives and Data Warehouse
- 2 - Learning Hierarchical, Abstract Models from Temporal or Spatial Data
- 3 - Parlare italiano: theoretical and applied linguistic proposals.
- 4 - Cryptographic databases
- 5 - The geomatics in support of the actions of Government of the territory
- 6 - Parlare italiano: an observatory of linguistic usages.
- 7 - Web Ram: Web Retrieval and Mining
- 8 - The Third Greece and the West
- 9 - Integrated Methodologies for the Survey, Drawing, Modelling of Urban and Architectural Environment.
- 10 - New technologies and tools for the integration of Web search services
Scientific and education field classification
Geographical classification
- Region: Sicilia
Keywords
GEOLINGUISTICS; ATALS; SICILY; DATA BASEInformatics an geo-linguistic research: ALS: micro-areal atlases and data base fruition
Università degli Studi di PalermoAbstract
ALS operating group is developing simultaneously different parts that are presented as integrated dynamical working sites and that are open to disciplines which integrate linguistics and computer technology (linguistics of corpora, data banks, computational linguistics, etc).The entire programme cannot leave out of consideration the computerised implant of the whole geo-linguistic project, and also the formation of a large digital archives of spoken dialects and of regional Italian.
The actual perspective is orientated on the structuring of different areas or sectorial micro atlases into which apply these new addresses. It will be organized a complete informatic standardization of the phonetic, iconic and photographic data. Automatic, data-driven language model construction techniques will be studied o enhance the system capabilities. <<<
Principal Investigator
Giovanni RUFFINO Università degli Studi di PALERMOResearch Objectives
The aim of ALS operating group is to develop simultaneously different parts that are presented as integrated dynamical working sites and that are open to disciplines which integrate linguistics and computer technology (linguistics of corpora, data banks, computational linguistics, etc).The entire programme cannot leave out of consideration the computerised implant of the whole geo-linguistic project, and also the formation of a large digital archives of spoken dialects and of regional Italian.
The actual perspective is orientated on the structuring of different areas or sectorial micro atlases into which apply these new addresses. It will be organized a complete informatic standardization of the phonetic, iconic and photographic data. Automatic, data-driven language model construction techniques will be studied o enhance the system capabilities.
Ethnic-dialectal section.
The exploring and the geo-linguistic documentation of the island's tradition lexicon and ethnical uses will concentrate on the following aims:
Children's games
Completion of the module using the huge amount of material already gathered in seventy points of survey.
Lexicon of alimentary culture
The aim is to treat the ethnical dialectical data (lexicon, procedures, alimentary traditional rules) referring them to the new cultural contexts, to the emerging "tastes", to the new rules and to the new practises.
Regarding the editorial production, a first module on bread lexicon is going to be produced (including material associated to "schiacciate e focacce") articulated in a cartographic series of really productive concepts and in a special lexicon.
Antroponomastic micro-atlas
The aim is to gather a repertoire of popular antroponims in 100 survey points. It will find an adequate publication in a series of typological maps and in a popular antroponomastic lexicon.
Interpretative parameters considering creative and motivational processes in a spatial dimension will be applied to both (quality and quantity variation of antroponomastic lexicon in relation to the different sub-areas), overtaking the simple alphabetical or semantic presentation. The classification of this material will be based on the distinction between connotative and denotative antroponims.
SOCIOLINGUISTIC SECTION
The analysis in this sector will be concentrated on single areas or on single portions of the questionnaire in order to verify the usefulness of the BD maps of complex materials and projecting implants in its wholeness. We are going to concentrate on two areas that show different problems.
Micro-atlas of the metropolitan area of Palermo.
The aim is to verify how the selected areas react to the models of linguistic behaviour coming from the capital, to see if they are receptive to the linguistic novelty from the city or if they use defensive strategies. Important, in this spatial view, is the success and the mingling of dynamic linguistic analysis inside the city with the external ones, moved by the city working as an irradiation centre.
Microatlas of the Madonie
The points considered by the areal atlas are: Gangi, Geraci, Caltavuturo, Cefalù, Sclafani Bagni and Scillato. This area allows an organic consideration. In particular the materials allow a crossing considaration of perceptive view of the speakers, who identify themselves with their town and -in general- with the Madonie area, and their actual linguistic productions which show some dynamical changes.
Syntactical micro atlas
The aim is to analyse the translations from Sicilian to Italian and to permit to find out that pani-italian constructions are often reinterprated and enlarged by the speakers. In this perceptive, the actual labelling system will be checked and checked again, while in a second a first quantitative interpretation of phenomena will be made, correlating them usefully with the quality of the input and with the interactional dynamic that produced them.
Regional italian
The final goal of the module is to use the gathered data in order to: delineate the areas in which innovations are widespreading; individuate irradiation centres and classes of speakers (young people/educated people) leading and spreading linguistic change; trace the directions and the possible results of the inter-linguistic convergence; build a vast repertoire of the systematic forms of contact between Italian and dialect; observe if for some lexemes there have been a resemantizations or a partial change of previous semantic features.
Informatic standardization
The aims are:
1)The ultimate of prototype software for the representation of diatopic (goegraphic) and diastratic (social) data.
2)The testing and the demonstration of the multimedia products realized within the working-group.
3)The publication of the results obtained in the ALS, including an analysis about the future prospects for the permanent preservation of the linguistic-cultural patrimony of the Sicily, for the technical innovation of its formats of representation, and for the definition of a common protocol of network distribution
Automatic language model construction
The actvity will be articulated in the following aims:
1) to segment speech signals and determine the classification probabilities into six basic phonetic classes, namely vowels, nasals, semivowels, liquids, fricatives, and plosives.
2) investigation of sub-symbolic methodologies for automatic, data-driven language model construction. Modules will be developed that could be easily integrated into standard, state-of-the-art toolkits for speech recognition, such as HTK, or SPHINX. To reach this objective a robust estimator will be designed to efficiently extract a parametric estimation of corpus population;
3) design of systems for complex query generation, execution, and result visualization. The query will be executed on the ALS corpus. Algorithms will be studied to enable semantic querying: starting from a user provided set of keywords, the system will automatically "expand" the original query by adding other terms correlated in a conceptual or semantic way. <<<
Timescale
24 monthsNational and international background
The plan system of the Linguistic Atlas of Sicily has now been defined for many years thanks to the systematic comparison and to the active contribution of the national and international scientific donations, that have worked very hard for the Variance Atlas studies that include the romance area and other areas. The most recent moment of the theoretical–methodological debate and the most useful exchange on the research experience was actually promoted in Palermo during the Seminar of studies "Percorsi di geografia linguistica. Esperienze italiane e europee" (23rd-24th March 2005), all the experts that are actually involved in geographic linguistics were invited: Lorenzo Massobrio (Turin University), Atlante Linguistico Italiano (ALI); Tullio Telmon and Sabina Canobbio (Turin University), Atlante Linguistico ed Etnografico del Piemonte Occidentale (ALEPO); Franco Lurà (Centro di dialettologia e di etnografia, Bellinzona), Lessico dialettale della Svizzera italiana (LSI); Annalisa Nesi and Giuliano Giannelli (Siena University), Atlante Lessicale Toscano (ALT); Carla Marcato (Udine University), Atlante Storico Linguistico del Friuli (ASLEF); Alberto Sobrero (Lecce University), Nuovo Atlante dei Dialetti e dell'Italiano Regionale (NADIR); Saverio Favre (Aosta University) and Federica Dièmoz (Neuchâtel), Atlas des Patois Valdôtains (APV); Edgar Radtke (Heidelberg University), Atlante Linguistico della Campania (AlCam); John B. Trumper and Marta Maddalon (Cosenza University), Atlante Linguistico Etnografico della Calabria (ALECAL); Roland Bauer and Hans Goebl (Salzburg University), Atlante del Ladino centrale e dei Dialetti limitrofi (ALD); Thomas Krefeld (Monaco University), Atlante sintattico dell'italiano meridionale: Calabria (AsiCa); Immacolata Tempesta and Alessandra Schena (Lecce), Archivio Pugliese Linguistico Informatico (APLI); Manuel González (Santiago de Compostela University), Atlas Lingüístico Galego (ALG); Pilar Garcia Mouton (Madrid University), Atlas Lingüístico de Castilla-La Mancha (AleCman); Andres Kristol and Federica Diémoz (Univ. di Neuchâtel), Atlas Linguistique Audiovisuel du Francoprovençal Valaisan (ALAVAL); Harald Thun (Kiel), Atlas Lingüístico Guaraní – Románico (ALGR) e Atlas lingüístico Diatópico y Diastrático del Uruguay (ADDU); Michel Contini (Grenoble University), Atlante Linguistico Romanzo (Alir).Profitable contacts were made in two other conferences in Palermo, in which many Italian and foreign experts took part: "Parlare oggi. Dinamiche linguistiche nell'Italia contemporanea" (25th February–2nd March 2002); "Gli italiani e la lingua. A quarant'anni dalla pubblicazione della Storia linguistica dell'Italia unita di Tullio De Mauro" (13th–14th June 2003). Moreover, in the same period of time, the members of the ALS research team, especially the director Giovanni Ruffino and the managers of the social variance section, Mari D'Agostino and Antonio Pennisi, were invited to illustrate the theoretical implant, the method, the first results and the technical input and the analysis of data, to the round tables "International meetings and conferences" (Kiel, Deutscher Romanistentag; Copenaghen; Bellinzona; Malta). In the last two years, accounts about ALS in Sappada (I dialetti e la montagna, 2003; Dialetti in città, 2004)), Naples (International conference of SLI, Il parlato italiano, Procida (Lingua e dialetto nell'Italia del 2000. Dinamiche sociolinguistiche in atto e diversità regionali), Lecce (GISCEL conference, Il linguaggio dall'infanzia all'adolescenza: tra italiano, dialetto e italiano L2), Udine (Città plurilingui. Lingue e culture a confronto in situazioni urbane), Rome (International conference SILFI), Turin – Novi Ligure (Alir conference, Atlanti linguistici: dall'indagine onomasiologica alla ricerca motivazionale), Bergamo (SLI International conference, Ecologia linguistica, Copenhagen SILFI International Conference) and they were showed in journals (Bollettino dell'Atlante Linguistico Italiano, Bollettino dell'Atlante linguistico campano, Rivista Italiana di Dialettologia, Bollettino del Centro di Studi filologici e linguistici siciliani) and publications on this sector (for instance in "Geolinguistica. Trabajos europeos" edited by P. Garcia Mouton, Madrid 1994)..
The resonance of the innovation and the complexity of the project inside the international view, determined the assignment from Pavia University of the prestigious prize " Premio Giovani", that was assigned to the whole group of research for the memoirs that have come along side with the ALS. Special attention was also given to the printing organ of wide diffusion (it can be seen in particular in a wide coverage that takes the of whole page VII of "La Repubblica" – Palermo, 15th September 2004) and to the territory, regional and local, scholastic and political institutions. In agreement with the cultural heritage of the regional council office, agreements started for the constitution of a file of spoken Sicilian with digital support that allows a long conservation.
In the ethnological and variance parts of the atlas, the primary aim has not changed, allowing the emerging on one hand of the links between material culture and dialectical culture, on the other the relations that link the linguistic structure and the social–spatial structure. This is the way to read the will to document the island's entire linguistic repertoire in its diachronic and synchronic complex stratification.
The Local Unit which deals with the section concerning the Gallo-Italic dialects‚ which have been present in Sicily since XI century and are still vital now, has at its disposal: 1. A rich data base on the dialects of San Fratello‚ Montalbano Elicona, Nicosia, Sperlinga, Piazza Armerina e Aidone; 2. Sociolinguistic surveys carried out within the ALS project in the Gallo-Italic centres; 3. paroemiologic data drawn from special surveys carried out in most of these centres.
Within both sections, theoretical references prompt to avoid chaotic atomism concerning datum as well as his representation in terms of an isolated and casual event.The model of gathering and analyzing data (especially in base and sociolinguistic surveys) intend to reveal communication processes which generated data themselves and processes concerning the transmission through the different generations of linguistic and cultural knowledge.
Concerning the ethno-linguistic section good results have been obtained thanks to the training of a group of researchers. They have already completed the survey concerning traditional children's games and are now ready – after a period of trial surveys – to gather material on traditional food culture
In the ethnical-linguistic section an important innovation was shown, in the first phase of the gathering of data, with a very big questionnaire that considered and gave space to the ergologic universes that had been studied, to then focalise, the more connoted phenomena that are more subject to the geo–linguistic variability, through other integrative inquiries and questionnaires. That path,, that had already been completed for the traditional games section, is going to happen also for the alimentary uses section. This last block of inquiries is going to be accompanied with a social- alimentary questionnaire about personal, individual, and community habits; the used goods are going to be given back to the territory (institutions, schools, etc.) for a bigger social- anthropological-cultural thought( consult Ruffino 2005 in bibliography). ALS has already started the trial of the base questionnaire, experimenting different ways compared to the traditional ones, that had been given by theoretical positions and which have gradually been cleared in progress . This section of the Atlas has the aim of giving information about the resistance or the mobility of the classical isoglosses that are at the base of island's dialect classification, and for this reason a team has already selected the concepts obtaining the rich bibliography of the sector and to the Sicilian atlas studies materials.
As far as the social – variational module is concerned, once the expected enquiries were finished ( around 50 between points, some of which are over – sampled and micro areas), some aspects of popular and / or regional Italian which were more significant for the social linguistic analysis (concordance, adverbs, personal pronouns, past tenses in narrative texts).
Thanks to the dia-generation conformation of the sample, it was possible to make a selection of the adolescence data and to give evidence of the possible loss of dialectical structures and the ability to produce a dialectal speech with which be sure of the quality of bilinguism and semi- linguism, of which they are the carriers , accompanying this data with space and social- cultural variances.
In the socio- variational questionnaire all the elicitate questions were analysed through the translation, to evaluate if this technique, which is traditional and cannot be ignored for all the atlas studies projects and for the gathering of the linguistic synchronic materials, because allows to build the linguistic system of speakers.
Another analysed field in the social linguistic module regards the attribution of a phenomena all'IR according to perceptive criteria. The assumption initially was that, the more a phenomena enters a popular normative ( even if it really is a Sicilian and/ or southern phenomena) the more it is to be considered as a regional aspect, while on the contrary, the more a phenomena is observed with normative suspect,the more it should be considered exclusively popular( if its acceptance was allowed by non educated or low educated classes) or dialectal ( if use and acceptability diverge in both the speaking classes). The following enquiries that were carried out with elicitaive techniques that refer to the matched –guise and an acceptability test. At the end of this first and summarised analysis of the data, it is possible to state that: if the variable "acceptability", is used systematically for the gathering of regionalisms, it renders the definition of the IR category, which seems to exist more in the perception of a linguistic than in that of the speakers.
The data that is arriving, phonetically transcribed and according to the phonetical-conversation rules, are inserted in a data base that allows to go beyond a simple quantitative interrogation and that offers the possibility to recall the interaction. The profitable points have been pointed out under a conversional profile ,and to not loose the real spontaneous language that was spoken during the enquiries, texts with dialogical speech were introduced. Another important integration consists in having expected fields on the reaction of the source to reconstruct the perception of the moment of the enquiry and the communicative dynamics that appear.
We can say that all atlas projects are nowadays conceived according to a precise programme of computerization of the linguistic data banks kept in their archives. Linguistic atlases already completed have been turned into digital formats; some of them, while are being completed, have organized a selection of materials on the base of their data (texts, sounds, photographic images, films, statistical formulation of numerical vectors, etc.) and they are studying specific strategies of cartographation.
The results achieved in this field are manifold; particularly the ALS distinguishes itself for the realization of a cartographic projection software able to provide the following outputs:
a) Polarization: contrast comparison of two different single data through the localization on the graphical units map differed in form and size;
b) Irradiation: given a certain linguistic point, calculating the value index given by the average of the speakers per point (or the average measure statistics), the procedure will have to look for all the near points that present assimilating values in a prescribed range;
c) Graphical metamaps: a parallel comparison of more phenomena in a same point through the localization on the map of graphical units collection diversified by function and size;
d) Process map: projections of diverse indexes measured on the basis of generational succession and localized on the map by collections of graphical units diversified by function and size;
e) Procedure of statistic monitoring of the digital vectors included in C;
f) Interface of connection beetween the program of analysis of the ambit of spoken language and the program of a bi-tridimensional graphic projection of these data on maps.
Within the framework of the project, a research unit will investigate artificial intelligence-based techniques for information retrieval of complex speech and spoken. The research is aimed to adding several features, including intelligent data retrieval capabilities, automatic word spotting and transcriptions alignment, to the functionalities of the digital repository developed for the Linguistic Atlas of Sicily. The proposed activity is laid upon the state of the art in the three following research areas: i) information retrieval; ii) language modelling techniques; iii) speech recognition techniques.
i) Information retrieval (IR) deals with automatically retrieving only those documents that satisfy the need of information of the user minimizing the quantity of irrelevant information. Search engines are the main example of IR system. Manual query expansion has been revised by a lot of researchers, however the choice of the new terms to be added to the initial query depends on the particular technique of expansion implemented.
ii) Language modelling techniques
Speech can be considered as a sequence of words produced by a source affected by noise factors. Several techniques have been designed in order to correctly recast single spoken words from speech sounds. Therefore single words recognition should be supported by a language model (i.e. by a word source's model), which takes into account speech's syntactic and semantic context, thus allowing prediction of the source's behavior (i.e. the sequence of words it produces).
iii) Speech recognition technique. With the increasing usage of data-driven learning frameworks, such as hidden Markov model (HMM) and artificial neural network (ANN), we have witnessed a fast technology progress in automatic speech recognition (ASR) in recent years. State-of-the-art research on ASR is conducted by a research group at the Center for Signal and Image Processing at the Georgia Institute of Technology. This research unit has an long-established on-going scientific collaboration with the above mentioned research group. <<<



