Data mining techniques (8 articles)

Predicting metaheuristic performance on graph coloring problems using data mining

Smith-Miles, K., Wreford, B., Lopes, L., & Insani, N. (2013). In Hybrid metaheuristics (pp. 417-432). Springer, Berlin, Heidelberg.

A metaheuristic approach is introduced to find out which computer algorithms perform best for complex problems. As a case study, 5000 different graphs are randomly generated and two different graph coloring algorithms are applied to them. Viscovery SOMine is used to order the graphs by 16 graph measures and to analyze how the performance of the algorithms depends on these measures.

A discussion on visual interactive data, exploration using self-organizing maps

Moehrmann, J., Burkovski, A., Baranovskiy, E., Heinze, G. A., Rapoport, A., & Heidemann, G. (2011, June). In International Workshop on Self-Organizing Maps (pp. 178-187). Springer, Berlin, Heidelberg.

This article provides an overview of state-of-the-art software tools for self-organizing map-based visual data exploration. Viscovery SOMine gets best grades for data preprocessing and interaction with the map and above average grades for interaction with data and visualization, as well as label assignment.

Meta-learning of instance selection for data summarization

Smith-Miles, K. A., & Islam, R. M. (2011). In Meta-Learning in Computational Intelligence (pp. 77-95). Springer, Berlin, Heidelberg.

This article analyzes how to select a smaller subset from a large data set without losing much information. An instance selection method using k-means clustering is applied to 112 classification data sets with different compression rates. Viscovery SOMine is used to cluster the data sets with respect to their statistical properties and to analyze the classification accuracy of a naive Bayes classifier with respect to the compression rate. This model enables the optimal compression rate to be predicted for new data sets.

Generalising algorithm performance in instance space: a timetabling case study

Smith-Miles, K., & Lopes, L. (2011, January). In International Conference on Learning and Intelligent Optimization (pp. 524-538). Springer, Berlin, Heidelberg.

The performance of two timetabling algorithms is studied on a mix of 21 real-world and 8178 computer generated timetabling problems of university courses. The timetabling problems are characterized by 21 meta-features (such as number of courses, number of rooms, graph- theoretical measures) and clustered with Viscovery SOMine. The resulting model shows how real world and computer generated problems differ and which algorithm performs better for which kind of problems in terms of the meta-features.

Characteristic-based clustering for time series data

Wang, X., Smith, K., & Hyndman, R. (2006). Data mining and knowledge Discovery, 13(3), 335-364.

This paper proposes a feature-engineering method for clustering time series based on their structural characteristics. Viscovery SOMine's SOM-Ward algorithm is used alongside complete linkage, k-means and fuzzy c-means clustering to test the generated features.

Data visualization of asymmetric data using Sammon mapping and applications of self-organizing maps

Li, H. (2005).

The performance of several software implementations of methods based on self-organizing maps is evaluated. Viscovery SOMine is found to be helpful in determining the number of clusters and recovering the cluster structure of data sets. A genocide and politicide data set is analyzed using Viscovery SOMine, followed by another analysis using public and private college data sets with the goal to identify schools with best values.

A scalable method for time series clustering

Wang, X., Smith, K. A., Hyndman, R., & Alahakoon, D. (2004). Technical Report, Monash University.

Global measures to compare (long) time series are introduced. The self-organizing map is used for additional dimension reduction and, finally, the time series are clustered using Viscovery's SOM-Ward algorithm.

A comparison of software implementations of SOM clustering procedures

Li, H., Golden, B., Wasil, E., Zantek, P. (2002). In Intelligent engineering systems through artificial neural networks; 12; 447-452. 12th, Artificial neural networks in engineering conference; 2002; St Louis, MO.

This review presentation compares the clustering possibilities of Viscovery SOMine with those in SOM_PAK and k-means clustering implemented in the SPSS Clementine software package. The Ward algorithm of Viscovery SOMine and its modified version resulted in the best cluster recovery rates and Rand statistic values of all considered methods.