Abstract
Data and dimension reduction techniques hold promise for representing data in easily understandable formats, as has been shown by their wide scope of applications. Data reductions provide summarizations of data by compressing information into fewer partitions, whereas dimension reductions provide low-dimensional overviews of similarity relations in data. Thus, these techniques provide means for exploratory data analysis (EDA). From a broader perspective, EDA is only one approach out of many in data mining, and knowledge discovery includes data mining as only one of its steps. To provide a holistic view in a top-down manner, we start by the broader concepts, and end with discussions of data and dimension reductions and their combination. As the aim of Chap. 5 is to provide a comparison of early dimension reduction methods, the focus of this chapter is also on more detailed presentations of so-called first-generation methods, including Multidimensional Scaling (MDS), Sammon’s mapping and the Self-Organizing Map (SOM).
Keywords
- Dimension Reduction
- Exploratory Data Analysis
- Reference Vector
- Locally Linear Embedding
- Information Visualization
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The eye, which is called the window of the soul, is the principal means by which the central sense can most completely and abundantly appreciate the infinite works of nature
– Leonardo da Vinci
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
There are several software implementations of the SOM. The seminal packages—SOM_PAK, SOM Toolbox for Matlab, Nenet, etc—are not regularly updated or adapted to their environment. Out of the newer implementations, Viscovery SOMine provides the needed means for interactive exploratory analysis. The most recent addition to the list of implementations is the interactive, web-based implementation provided by infolytika (http://risklab.fi/demo/macropru/). For a description, see Sarlin (2014a). For a practical discussion of SOM software and an early version of the implementation in Viscovery SOMine, see Deboeck (1998b, a). See also Moehrmann et al. (2011), for a comparison of SOM implementations. The first analyses of this book were performed in the Viscovery SOMine 5.1 package due to its easily interpretable visual representation and interaction features, not the least when introducing it to practitioners in general and policymakers in particular. Recently, the packages available in the statistical computing environment R have significantly improved, in particular regarding the visualization of SOM outputs. Thus, the final parts of the research in this book, including the figures, have been produced in R. Moreover, the above mentioned interface by infolytika provides an interactive implementation of the R-based models.
- 2.
In the literature, learning of the SOM has been defined through the entire spectrum of supervision. For instance, van Heerden and Engelbrecht (2008) define semi-supervised SOMs as similar to the supervised ones, except for them not being included in the matching phase (Eq. 4.9), whereas the semi-supervised version herein is their supervised SOM. However, as the SOM is never fully supervised, we stick to the definition of an unsupervised and a semi-supervised version.
References
Anand, S., & Buchner, A. (1998). Decision support using data mining. London: Financial Time Management.
Baddeley, A., & Logie, R. (1999). Working memory: The multiple-component model. In A. Miyake & P. Shah (Eds.), Models of working memory (pp. 28–61). New York: Cambridge University Press.
Barreto, G. (2007). Time series prediction with the self-organizing map: A review. In P. Hitzler & B. Hammer (Eds.), Perspectives on neural-symbolic integration. Heidelberg: Springer-Verlag.
Bederson, B., & Shneiderman, B. (2003). The craft of information visualization: Readings and reflections. San Francisco, CA: Morgan Kaufman.
Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. Dietterich, S. Becker & Z. Ghahramani (Eds.), Advances in neural information processing systems (Vol. 14, pp. 586–691). Cambridge, MA: MIT Press.
Bertin, J. (1983). Semiology of graphics. Madison, WI: The University of Wisconsin Press.
Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.
Bishop, C., Svensson, M., & Williams, C. (1998). GTM: The generative topographic mapping. Neural Computation, 10(1), 215–234.
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., & Zanasi, A. (1998). Discovering data mining: From concepts to implementation. New Jersey: Prentice Hall.
Card, S., Mackinlay, J., & Schneidermann, B. (1999). Readings in information visualization, using vision to think. San Diego, CA: Academic Press.
Card, S., Robertson, G., & Mackinlay, J. (1991). The information visualizer, an information workspace. In Proceedings of CHI ’91, ACM Conference on Human Factors in Computing Systems, New Orleans (pp. 181–188).
Chapelle, O., Schölkopf, B., & Zien, A. (Eds.). (2006). Semisupervised learning. Cambridge, MA: MIT Press.
Chen, L., & Buja, A. (2009). Local multidimensional scaling for nonlinear dimension reduction, graph drawing and proximity analysis. Journal of the American Statistical Association, 104, 209–219.
Cottrell, M., & Letrémy, P. (2005). Missing values: Processing with the Kohonen algorithm. In Proceedings of Applied Stochastic Models and Data Analysis (ASMDA 05), Brest, France (pp. 489–496).
Cox, T., & Cox, M. (2001). Multidimensional scaling. Boca Raton, Florida: Chapman & Hall/CRC.
Deboeck, G. (1998a). Best practices in data mining using self-organizing maps. In G. Deboeck & T. Kohonen (Eds.), Visual explorations in finance with self-organizing maps (pp. 201–229). Berlin: Springer-Verlag.
Deboeck, G. (1998b). Software tools for self-organizing map. In G. Deboeck & T. Kohonen (Eds.), Visual explorations in finance with self-organizing maps (pp. 179–194). Berlin: Springer-Verlag.
Demartines, P., & Hérault, J. (1997). Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 8, 148–154.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (Series B), 39(1), 1–38.
Dunn, J. (1973). A fuzzy relative of the isodata process and its use in detecting compact, well-separated clusters. Cybernetics and Systems, 3, 32–57.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han & U. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 96) (pp. 226–231). AAAI Press.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996a). From data mining to knowledge discovery: An overview. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 1–34). Menlo Park, CA: AAAI Press / The MIT Press.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996b). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27–34.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996c). Knowledge discovery and data mining: Towards a unifying framework. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR (pp. 82–88).
Fekete, J.-D., van Wijk, J., Stasko, J., & North, C. (2008). The value of information visualization. In Information visualization: Human-centered issues and perspectives (pp. 1–18). Springer.
Forte, J., Letrémy, P., & Cottrell, M. (2002). Advantages and drawbacks of the batch Kohonen algorithm. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN 02), Bruges, Belgium (pp. 223–230).
Frawley, W., Piatetsky-Shapiro, G., & Matheus, C. (1992). Knowledge discovery in databases: An overview. AI Magazine, 13(3), 57–70.
Gisbrecht, A., Hofmann, D., & Hammer, B. (2012). Discriminative dimensionality reduction mappings. In Proceedings of the International Symposium on Intelligent Data Analysis (pp. 126–138). Helsinki, Finland: Springer-Verlag.
Haroz, S., & Whitney, D. (2012). How capacity limits of attention influence information visualization effectiveness. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2402–2410.
Havre, S., Hetzler, B., & Nowell, L. (2000). Themeriver: Visualizing theme changes over time. In Proceedings of the IEEE Symposium on Information Visualization (pp. 115–123).
Hoaglin, D., Mosteller, F., & Tukey, J. (1983). Understanding robust and exploratory data analysis. New York: Wiley.
Jain, A. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.
Jain, A., Murty, M., & Flynn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.
Kaser, O., & Lemire, D. (2007). Tag-cloud drawing: Algorithms for cloud visualization. In Proceedings of the Tagging and Metadata for Social Information Organization Workshop, Banff, Alberta, Canada.
Keim, D. (2001). Visual exploration of large data sets. Communications of the ACM, 44(8), 38–44.
Keim, D., Kohlhammer, J., Ellis, G., & Mannsmann, F. (2010). Mastering the information age. Solving problems with visual analytics. Goslar: Eurographics Association.
Keim, D., & Kriegel, H.-P. (1996). Visualization techniques for mining large databases: A comparison. IEEE Transactions on Knowledge and Data Engineering, 8(6), 923–938.
Keim, D., Mansmann, F., Schneidewind, J., & Ziegler, H. (2006). Challenges in visual data analysis. In Proceedings of the IEEE International Conference on Information Visualization (iV 13) (pp. 9–16). London, UK: IEEE Computer Society.
Keim, D., Mansmann, F., & Thomas, J. (2009). Visual analytics: How much visualization and how much analytics? SIGKDD Explorations, 11(2), 5–8.
Koffa, K. (1935). Principles of gestalt psychology. London: Routledge & Kegan Paul.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69.
Kohonen, T. (1991). The hypermap architecture. In T. Kohonen, K. Mäkisara, O. Simula & J. Kangas (Eds.), Artificial neural networks (Vol. II, pp. 1357–1360). Amsterdam, Netherlands: Elsevier.
Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer-Verlag.
Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1–27.
Kurgan, L., & Musilek, P. (2006). A survey of knowledge discovery and data mining process models. The Knowledge Engineering Review, 21(1), 1–24.
Lampinen, J., & Oja, E. (1992). Clustering properties of hierarchical self-organizing maps. Journal of Mathematical Imaging and Vision, 2(2–3), 261–272.
Larkin, J., & Simon, H. (1987). Why a diagram is (sometimes) worth ten thousand words. Cognitive Science, 11, 65–99.
Lee, J., & Verleysen, M. (2007). Nonlinear dimensionality reduction. Information science and statistics series. Heidelberg, Germany: Springer-Verlag.
Lin, X. (1997). Map displays for information retrieval. Journal of the American Society for Information Science, 48(1), 40–54.
Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 702–710.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.
Moehrmann, J., Burkovski, A., Baranovskiy, E., Heinze, G., Rapoport, A., & Heideman, G. (2011). A discussion on visual interactive data exploration using self-organizing maps. In J. Laaksonen & T. Honkela (Eds.), Proceedings of the 8th International Workshop on Self-Organizing Maps (pp. 178–187). Helsinki, Finland: Springer-Verlag.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6), 559–572.
Pölzlbauer, G. (2004). Survey and comparison of quality measures for self-organizing maps. In Proceedings of the 5th Workshop on Data Analysis (WDA 2004), Sliezsky dom, Vysoké Tatry, Slovakia (pp. 67–82).
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326.
Rubin, D. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley & Sons.
Sammon, J. (1969). A non-linear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401–409.
Sarlin, P. (2014a) Macroprudential oversight, risk communication and visualization. arXiv:1404.4550.
Shannon, C., & Weaver, W. (1963). A mathematical theory of communication. Champaign: University of Illinois Press.
Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 15(4), 13–19.
Shepard, R. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. Psychometrika, 27(125–140), 219–246.
Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the IEEE Symposium on Visual Languages, Boulder, CO (pp. 336–343).
Tenenbaum, J., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323.
Thomas, J., & Cook, K. (2005). Illuminating the path: Research and development agenda for visual analytics. Los Alamitos: IEEE Press.
Torgerson, W. S. (1952). Multidimensional scaling: I. theory and method. Psychometrika, 17, 401–419.
Triesman, A. (1985). Preattentive processing in vision. Computer Vision, Graphics and Image Processing, 31(2), 156–177.
Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.
Tukey, J. (1977). Exploratory data analysis. Reading, PA: Addison-Wesley.
van der Maaten, L., & Hinton, G. (2008). Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
van Heerden, W., & Engelbrecht, A. (2008). A comparison of map neuron labeling approaches for unsupervised self-organizing feature maps. In Proceedings of the IEEE International Joint Conference on Neural Networks (pp. 2139–2146). Hong Kong: IEEE Computer Society.
Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19, 889–899.
Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000). SOM toolbox for Matlab 5. Technical Report: Helsinki University of Technology. A57.
Ward, J. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Ware, C. (2004). Information visualization: Perception for design. San Francisco, CA: Morgan Kaufman.
Ware, C. (2005). Visual queries: The foundation of visual thinking. In S. Tergan & T. Keller (Eds.), Knowledge and information visualization (pp. 27–35). Berlin, Germany: Springer.
Weinberger, K., & Saul, L. (2005). Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision, 70(1), 77–90.
Wismüller, A. (2009). A computational framework for non-linear dimensionality reduction and clustering. In J. Principe & R. Miikkulainen (Eds.), Proceedings of the Workshop on Self-Organizing Maps (WSOM 09) (pp. 334–343). St. Augustine, Florida, USA: Springer.
Yin, H. (2008). The self-organizing maps: Background, theories, extensions and applications. In J. Fulcher & L. Jain (Eds.), Computational intelligence: A compendium (pp. 715–762). Heidelberg, Germany: Springer-Verlag.
Young, G., & Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.
Zhang, J., & Liu, Y. (2005). SVM decision boundary based discriminative subspace induction. Pattern Recognition, 38(10), 1746–1758.
Zhang, L., Stoffel, A., Behrisch, M., Mittelstädt, S., Schreck, T., Pompl, R., et al. (2012). Visual analytics for the big data era—a comparative review of state-of-the-art commercial systems. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST), Seattle, WA (pp. 173–182).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sarlin, P. (2014). Data and Dimension Reduction. In: Mapping Financial Stability. Computational Risk Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54956-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-54956-4_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54955-7
Online ISBN: 978-3-642-54956-4
eBook Packages: Business and EconomicsEconomics and Finance (R0)