[Under Construction]

 


VisualMine

Visual Data Mining Environment

Introduction

The well known challenge all major companies in the retail, services and financial sectors are facing over the coming years, is that of exploiting the enormous amount of information held in their legacy systems, in order to improve their competitiveness. Data in legacy systems are typically just managed by operational applications: what is now is a definite need for turning such data into structured information and, finally, into knowledge.

Data Warehousing processes turn data into structured information. These processes provide the basic coherent, clean and business-oriented information repository on which Knowledge Discovery strategies can be applied. Today these strategies are mainly based on two kinds of tools: Query and Reporting tools, providing basic access to the data, and Data Mining tools able to (semi)automatically identify patterns in the information repositories.

It is the Knowledge Discovery process, supported by Data Mining techniques, which can provide the real targeted competitive advantage. This advantage consists in turning information into knowledge to support any strategic and operational decision making.

Independently from the specific context, typical database analysis activities focus on one or more of the following targets:

· synthesize information;
· understand data relationships;
· identify patterns;
· classify information (clustering);
· identify, understand, forecast dynamical trends; and
· understand geographical distributions.

For instance, Marketing Departments are usually interested in customer classification and customer spending profiling (pattern identification), while at the strategic decision level there is a generic need to synthesize information and to understand the global status and trends of organizations.

 

The role of Data Visualization in Knowledge Discovery

Existing Data Mining tools provide a wide range of algorithms, most of which derived from the Statistical, the Artificial Intelligence and the Neural Nets sectors. These fields are able to support data analysts in the above listed analysis activities. Most of these tools require deep technical and statistical knowledge by the Data Analyst, in addition to a clear comprehension of the data. In addition, even if many techniques are semi-automatic, the overall process of data selection, preparation and result interpretation is slow and expensive.

Advanced 3-Dimensional Data Visualization provides a new way of analyzing large quantities of information and provides benefits both to Data Analysts and to Decision Makers.

3D visualization enables Data Analysts to fast analyze large quantities of information (millions of entities), aiding quick understanding of data distributions and rapid detection of patterns. This activity naturally takes place before the detailed specific analysis, which may employ Statistical and Data Mining tools. In addition, visualization is a powerful tool to understand the results of Data Mining, such as regression, clustering, and multivariate analysis.

For instance, this is the way VisualMine is employed at the Ufficio Italiano dei Cambi (UIC), which is the part of the Italian central bank in charge of analyzing financial transactions and for the detection of money laundering related anomalies. At UIC analysts first visualize large transactions datasets and quickly identify anomalies that need further analysis. Such analysis is usually performed by drilling down to a more detailed level and by using statistical tools. The results from statistical analysis are fed back into the visualization system to obtain a complete understanding of the phenomena.

3D visualization introduces a brand new opportunity for Decision Makers: it brings Knowledge Discovery and Data Mining to their desktop. Sophisticated statistical analysis and data mining tools, based on mathematical and logical formalisations, have typically been employed up to now simply because they were the only way to deal with large amounts of information. Their numeric nature, though, is not a requirement at the strategic level: in most cases, the knowledge needed at this level is of qualitative nature. Understanding data distributions, identifying trends, analyzing temporal evolution: all need qualitative answers for strategic purposes. 3D Data Visualization provides an easy to use and economic way to build qualitative knowledge.

For example, in a strategic Marketing department a large amount of customer-centered information can be loaded into VisualMine, and then it can be interactively analyzed against any number of personal, geographical and behavioral variables. Once the data are loaded, even an end user without specific technical skills can perform powerful analysis. In a very short time, geographic or 3D scatter visualizations can provide Figures describing the distribution of the customer base over the territory, or with respect to any combination of personal and behavioral variables. Automatic aggregations allow an analyst to transparently move to the required level of synthesis, or to drill down to the most detailed information. In this way data are quickly and interactively understood, and qualitative knowledge is acquired.


VisualMine: Visual Data Mining Environment

VisualMine is a Knowledge Discovery tool based on the employment of advanced 3D visualization. It was developed by A.I.S. in the context of a European Union funded project DBInspector. The objective of the project was to provide tools to support the analysis of a large financial transactions database managed by UIC (Ufficio Italiano dei Cambi, part of the Italian central bank. The identification of such anomalies is a typical Data Mining problem. The fundamental task is to identify anomalous patterns in a large amount of information (each month every Italian financial institution delivers to UIC information on the transactions that exceeds $15,000).

VisualMine combines the most advanced 3D graphical capabilities with a totally flexible and easy to use management environment. Data can be loaded from files or directly from RDBMSs, they can be interactively selected and mapped to a choice of tens of different visualization metaphors that are based on geographical or virtual 3D spaces.

VisualMine, developed using the visual programming environment leader in the Scientific Visualization field AVS/Express, integrates data access modules supporting ODBC, Oracle, Informix, Sybase and other popular RDBMS, statistical analysis libraries and a sophisticate visualization subsystem.

The high level structure of VisualMine is depicted in the following Figure. A more detailed description of each product component is provided in the following paragraphs.

 

Data Access Module

This module has been built on the basis of the Agent concept. In this context, an Agent is a piece of software accessing and possibly processing data. Agents are stored in an Agents Repository, from which users can select and run them. Currently supported Agents enable file and RDBMS access (ODBC SQL). All the main data management capabilities are supported from within the environment: in addition to data extraction, data and tables management is possible.

Data Access Agents can be created by SQL programmers and then employed by end users. A key feature of this environment is related to the ability to manage and transform large datasets. Only a few exhaustive extraction queries are usually needed: once the data are loaded into the environment, they can be manipulated by selecting, aggregating and visualizing the information interactively.

A standardized data flow connects the Data Access module to the Statistical Processing and Visualization Modules. This standardized data flow is common to any Agent and makes both processing and visualization software totally independent from the specific nature of the data.

 

Processing Module

The Processing Module provides a set of mathematical and statistical functions for data preparation and pre-processing.

Statistical functions are primarily to support time series analysis and multivariate analysis. They contribute to build graphical emphasis on information patterns (in particular, anomalous patterns) and multiple variable correlations.

Simple and joined frequency distributions are computed in order to segment and classify data, as well as to understand variable dependencies. Various filters (such as smoothing, thresholding, data aggregations) are provided to obtain significant visual representations, eliminating data inhomogeneities.

Interpolation functions
are employed to emphasize data clusters. Visual clustering is also obtained by applying isolines, isosurfaces, isovolumes and orthogonal cutting planes.


Visualization Module

Composed of a set of specific three dimensional viewers, the Visualization Module allows the user to interactively build visual data representations. A simple interaction provides a flexible way to obtain different representations.

In the Mapping Window depicted in Figure 1 two input lists are provided: a list of available viewers and a list of variables describing the input data flow obtained from the Data Access and Processing modules. Once a specific viewer has been selected, a third list appears on the right, containing a set of graphically representable entities, such as axes, colour or geographic region. Then the user can build his specific visual representation by creating association pairs relating input variable to graphical entity. Once this process is complete, the system will build the required visual representation.

Any required change in this mapping, as well as the selection of new representations based on the same data, can be done through this window.

 Figure 1.

 

Three main viewers sets are available within the Visualization Module:

- traditional 2D Business Charts
- 3D business viewers
- geographic 3D viewers


Business Charts

A full set of traditional charts is available for the analysis of limited amounts of information. Usually these charts are employed when drilling down to a detailed level, after having analyzed large datasets using more advanced 3D representations.

The VisualMine charts library includes 2D scatter, lines, bars, pies and areas graphs.

Figure 2.


3D Visualizations

3D visualizations in VisualMine have been developed to support multivariate analysis, cluster analysis, pattern identification and dynamic analysis through the animation available on any variable.

For example, the 3D scattered representation enables the contemporary visualization of up to twenty variables. Through the employment of colour, object dimensions and other graphical artifacts, it emphasizes relationships between variables, clusters, and anomalies. Animations support the dynamic analysis related to time variables, as well as the visualization of what-if simulations.

Cutting planes, isolines and isosurfaces allow the user to enrich the visualization with new detailed or structured information.

A sample of demographic data from a population of credit card owners is shown in the two following Figures. In the first Figure, customer profiling variables are mapped to the axes, colour and object dimensions, providing an immediate perception of the customer clusters and segments. Through isovolumes, in Figure 3b, four definite clusters are identified. Support for exporting datasets belonging to computed or user selected 3D clusters is under development.

a)

b)

Figure 3


In Figure 4a) the same dataset is further analyzed using orthogonal planes, on which additional variables are mapped, and in 4b) with isosurfaces where connected points have the same values. Anomalies and clusters are once more identified, providing the ability to study the relationship between additional variables and to deepen the analysis.


Figure 4.


In other examples, in the following Figures, a set of bank branches are analyzed in order to detect anomalies possibly related with money laundering operations. Bank branches are represented on the X axis, while on the Y axis their behaviour in time is represented.

In Figures 5a and 5b, each bar height represents the total money flow towards the bank (in flows), while the colour is associated to the out flows (blue corresponds to low values, red to high values). Stripes are employed in Figure 5b, with the same associations. Anomalous values are immediately perceived, leading to further analysis. Detailed data on the banks can be picked up by just clicking on the 3D objects.

a)

b)

Figure 5.


Geographic 3D Visualizations

Whenever a geographic reference exists in the data, a visual analysis of geographical distribution can be useful in order to understand phenomena. VisualMine supports the traditional thematic mapping technology, as well as specialized visualizations, such as those employed to represent financial flows. As usual, animations are supported for all the representations, on any variable.

With the first technique, the analysis of geographically distributed data can be performed at different levels of detail, from continents to municipalities. Multiple maps and windows allow the user to compare different variables or different levels of detail, as shown in Figure 6.

In Figure 6, macroscopic variables referred to banks in the south of Italy are visualized at municipality and regional level. The height of each territory is proportional to the number of bonds treated in that area, while its colour is proportional to the number of accounts. Large financial investments in areas with low economic activity may emerge.



Financial flows are represented in Figure 7 as coloured arcs. Money transfer operations in south Italian areas are visualized. The height and colour of the arcs are proportional to the total monthly amount of transfers from the Palermo province to other Italian areas: through animation, a large amount of historical data can be analyzed.

 

[Home Icon][What's New Icon][Products Icon]

For further information please e-mail to webmaster@ais.it
Copyright © 1996 Artificial Intelligence Software SpA
Last modified: dicembre 06, 1996