COllaboration, CLassification, Incrémentalité et COnnaissances

COllaboration, CLassification, Incrémentalité et COnnaissances

Presentation

De COllaboration, CLassification, Incrémentalité et COnnaissances
Aller à la navigation Aller à la recherche


Introducing COCLICO


Data mining is an important step in the process from data to knowledge. Thus, for example, understanding the processes and development of systems, more or less anthropic, in various spatial and temporal scales (urbanization pressure on land, biodiversity loss etc..) from satellites or other data becomes a major component in various areas such as the study of the environment or urbanism. But the current analysis techniques are less and less able to address the current avalanche of heterogeneous data often incomplete or inaccurate and increasingly supplied as continuous streams. But if the characteristics of mining methods are generally well known and understood by the analyst-statistician or computer scientist, it is rarely the same for the user. Thus, quite often it is necessary to try several algorithms with different parameters to determine which best suits the question. The user must also take into account the indeterminacy of many unsupervised classification methods. Moreover, it is necessary to take into account the variable quality of raw or preprocessed data, the robustness of learning methods to noise, and the sensitivity of results to changes in methods or parameters of data acquisition / construction, in order to suggest more appropriate strategies for data cleaning and preprocessing. Finally, the data being supplied continuously, a dynamic dimension and the need for incremental learning ability in a changing environment are added. There are currently no surefire way to choose the best method and its parameters, as this choice is strongly related to the application domain and a priori knowledge on it and the data to be processed. One approach increasingly proposed to circumvent this problem is based on the intuition that the methods are complementary or at least can corroborate among themselves. Thus, mechanisms of confrontation and unification of results from different methods and data can be used to provide the user with a relevant summary. A promising avenue in this area is based on collaboration between different methods.

Nevertheless, we learn even better than what we address relates to what we already know and that the objective of the task is known and understood: it is not desirable that data interpretation is done by a person ignorant of the topic. Thus, the interpretation process often requires the presence of a thematic expert, but is unfortunately very time-consuming. Though reducing that by introducing direct involvement of the expert knowledge in this process requires modeling and formalizing classes / objects in the real world, to define their possible representations in the data space and finally to study and build mechanisms for extracting and labeling these objects with respect to this knowledge. Coclico is a research project to study and propose a generic method for an innovative multi-scale analysis of large volumes of spatio-temporal data provided as a stream of highly variable quality, implementing a multi-strategy approach in which incremental collaboration between different data mining methods will be guided by knowledge of both the thematic field (Geosciences, Geography) formalized in ontologies and of the domain analysis (knowledge of the methods), and guaranteeing a objective of final quality taking into account both the quality of data and of knowledge.