Belief AnalYsis and Evidence Sampling Object Oriented Libraries for database Knowledge Extraction,
Web and Text Mining
Funding programme: L.46/82 Fondo Speciale per la Ricerca Applicata, Ministero dell’Università e della Ricerca Scientifica e Tecnologica
Start – end year: 2001 – 2003
Communication requirements, data elaboration and analysis, stemming from the society and industrial sectors, the drastic costs reduction of both the elaboration instruments and supports of data memorisation (Hard Disk and CD-Rom), they have carried to an huge data collection and to the opportunity to value them through cognitive and decisional objectives.
The valorisation of such data passes obligatorily through the ability to analyse them in a coherent and efficient way in order to extract both qualitative and quantitative information and knowledge that can be used to better comprise the mechanisms regulating the particular field which they refer.
The need to have sophisticated, efficient and effective tools of automatic calculation available, that offer the possibility to sound great amounts of not homogeneous data, often characterised by lack of observations and to elevated dimensionality of the variables, brought to formulate and develop a particularly attractive and effective computational approach that it goes under the name of “Data Mining” (DM) or also “Knowledge Discovery in Databases” (KDD).
Nowadays, different commercial sophisticated and efficient DM environments are yet on the market, addressed in particular way to the analysis of well structured and quantitative matter data. These environments nevertheless present significant drawbacks:
- functionality for analysis of half structured or not structured data, like textual data and in web (HTML, XML) format, are rarely available,
- very high price, moreover without offer to user the possibility both to access to the heart of inferential algorithms and to knowledge extraction with the result of not allow tool customisation for the problem / domain analysed
- any commercially available environments don’t present the assistance functionality for the on-line performances assessment of the inferential models built during the complex DM process
- bayesian methodologies are implemented only within some commercial environments.
The product base idea, whose prototype was developed in BAYES project, consists in exploiting recent technologies of objects oriented planning (UML unified modelling language) jointly to elevated calculation and automatic extraction potentialities of recent Artificial Intelligence and Machine Learning findings, in order to obtain exhaustive and flexible libraries allowing great autonomy to the final user.
The BAYES project, by permitting the design and development of a software libraries group with innovative functionality of data management & visualisation, pre-processing and feature selection, data sharing, computational engines, monitoring updating, models exporting, reporting and warehousing, brought to the realisation of a DM prototype and a Web Mining prototype. These pre-products exploit computational models coming from Artificial Neural Networks, Bayesian Belief Networks, Bayesian Classifiers which allow to build specialised software for the automatic knowledge extraction starting from quantitative and textual data.
BAYES libraries have been integrating into a programming environment that allow an easy and effective assembly of different components which consist in a DM or WM process as a whole. The addition of a configuration language to the libraries make this objective reachable.
All fields ask for availability of processing ability, knowledge automatic analysis and extraction; BAYES prototypes has been functionality tested within two domains consistent the Milano Ricerche experience: manufacturing and web-internet.