FAQFind below answers to frequently asked questions relating to typical Viscovery applications.


Viscovery runs on Windows 7, Windows Server 2008, Windows 8, Windows 8.1, Windows Server 2012, Windows Server 2016, Windows Server 2019, and Windows 10. Both 32-bit and 64-bit versions are supported.


Yes. Viscovery is available both as 32-bit and 64-bit software.


Yes, you can download a trial version of Viscovery SOMine from here.


For technical support, send an email to support@viscovery.net.


Viscovery is only available as a download package.


The manuals are available in English and Japanese languages, as is Viscovery software.


There are 3 to 4 patch releases per year that mostly include bug fixes. One minor feature release is planned every year. Major feature releases are planned every 2 years.


The most popular site is probably the UCI KDD Archive from University of California with the UCI Machine Learning Repository cited therein.

Many other sites also offer a variety of data sets. Here are a few to get you started:


View the application demos and the software demo of Viscovery SOMine on the Viscovery website. Further examples with tips and tricks are part of our training courses.


First, familiarize yourself with self-organizing maps (SOMs) and how to read and interpret them.

To try out Viscovery tools, download the free basic version of Viscovery SOMine.

Watch the software demo of Viscovery SOMine, which shows all steps in the process of map creation.

Using a simple and small data set that you are familiar with, repeat the workflows step by step with your data.

Contact support@viscovery.net for open questions.

For more involved applications, Viscovery also offers consulting support and training courses.


Yes, you can. All you need to know is how to read a SOM.

In Viscovery, users are shielded from the technology, as they are guided by an easy-to-use workflow-oriented interface. Proven default settings have been established so that novice users can get useful results. Of course, the more the user understands the process and the technology, the more he or she can control the process.

Even though you do not have to be a SOM expert, a basic knowledge of data mining is necessary to be able to work with SOMs in a useful manner. In particular, “garbage in – garbage out”, the first paradigm, is as true for data mining with SOMs as it is for other data mining methods. Equally important, “know your data”, the second paradigm, holds true for SOMs as well as for any other data mining methods.


You can use Viscovery even if you do not understand much about statistics. The unique visualization of the resulting maps can easily be understood by non-statisticians. For statistically skilled users, Viscovery provides a variety of statistical tools to evaluate data in addition to the SOM.

Keep in mind that even though you do not know much about statistics, you do have to know a lot about your data before you can produce meaningful results with SOMs.


Viscovery provides numerous preprocessing functions, including the following:

  • Transformations of variables
  • Replacements of values
  • Treatment of (multi-valued) nominal attributes
  • Definition of new attributes depending on existing ones
  • Handling of missing values
  • Outlier treatment
  • Removal of data records
  • Sampling of data

The statistical analysis functions provided by Viscovery include the following:

  • Descriptive statistics
  • Correlation analysis
  • Histograms
  • Frequency tables
  • Box plots
  • Scatter plots
  • Principal components analysis
  • Regression analyses

Viscovery reads the following input data formats:

  • Tab-separated text files
  • Excel files (*.xls, *.xlsx)
  • SPSS files (*.sav)
  • XML files (*.xml) in a Viscovery specific format
  • Database tables where ODBC drivers are available

The data should be organized in rows and columns, such that each column represents an attribute and each row represents a data record. The first row should contain the names of the attributes.


To ensure that a text file containing data with international characters is interpreted correctly, prepare it as a UTF-8 or UTF-16 (“Unicode”) text file with a byte order mark (BOM). Data prepared in this way can contain any combination of international characters. If a text file does not start with a BOM, it is treated as text encoded according to the system code page: the data can contain only characters of the operating system language.


Viscovery can handle any amount of data your computer is able to process. Applications can include many thousands of variables and millions of records. However, if you are using Viscovery SOMine without the Enterprise Data module (Viscovery SOMine version 6 and earlier: if you are using the Basic or Expert Edition), up to 100,000 data records and 100 attributes can be processed.


There are 2 data types in Viscovery: “values” used for numeric attributes and “text” used for nominal attributes or labels.

The preprocessing function datetime can be used to convert a text that represents a date or time or both to a numeric value.


Yes. Text attributes can either be declared as nominal attributes, or they remain unprocessed and are only copied from the input to the output (e.g., the key attribute and labels are always text attributes).

The preprocessing function datetime can be used to convert a text that represents a date or time or both to a numeric value.


Yes. To process date or time values, make sure that the values are stored as text and that months are indicated numerically. When importing the data, define a new attribute in the Define New Attributes dialog by specifying a formula that uses the datetime function. This function converts the date and time represented in the text value to a metric value.


Viscovery version 6 computes the and and or operators differently than previous versions when at least one of the operands is a missing value.

  • The and operator evaluates operands in order, stopping when the first zero (false) value is encountered. In previous versions of Viscovery, the and operator also stopped when a missing value was encountered and returned a missing value in this case. Version 6 returns a missing value only if no zero value is encountered after the first missing value.
  • The or operator evaluates operands in order, stopping when the first non-zero (true) value is encountered. In previous versions, missing values were simply ignored. Version 6 returns a missing value only if no non-zero value is encountered after the first missing value.

No, the key may consist of numbers but may also be defined as a numerical attribute. However, it is best to define it as a text attribute no matter whether it consists of numbers or characters to avoid problems if the numbers representing the key have more digits than significant digits are defined.


Viscovery dedicates a workflow step for this purpose: You define which values of a (text) attribute Viscovery should recognize. Viscovery represents nominal values by generating numerical columns for nominal attributes, where the column value is set either to 0 or 1 depending on which nominal value the data record contains for the attribute.


In the attribute pictures of the map window as well as in the Group Profile window, the values of the binary attributes that were derived from the nominals, are between 0 and 1. Of course, there could not be someone who is only partly “Gender: male” or “Profession: Public Officer”. The values represent the mean at this node and can be interpreted as proportion (such as a percentage). If, for example, “Profession: Public Officer” = 0.345, then about 1/3 of the people in the corresponding group (or node) have the Profession: Public Officer (i.e., exactly 34.5%).


All nominal attributes that have been defined in the Viscovery data mart (i.e., split up in their values) can be used for map training and, therefore, also for segmentations.


You would use transformations to treat outliers such that the values will become more evenly distributed.


If an attribute exhibits a positively skewed distribution, you may want to try logarithmic transformation. However, in most cases, the sigmoid transformation is appropriate.


The automatic offset of the logarithmic transformation has been changed for Viscovery version 6. The new automatic offset is suitable in many more cases than the value used in previous versions. To reproduce the transformation from an earlier version, look up the automatic offset value used in the old workflow step (available in the step report), inactivate the automatic offset in version 6, and enter the offset that was used in the old project.


Best would be to perform transformations to the attributes that exhibit outliers, but you could also replace all outlying values with upper or lower boundary values.

Another option is to remove the data records with outliers if you want to exclude these values from the scope of your analysis.


Data records with missing values or invalid entries are recognized by Viscovery and treated appropriately in the analysis. For numerical attributes, all entries that are not numbers will be treated as missing. For nominal text attributes, all values that you did not define will be treated as missing.

The basic operation with a SOM is to look up the best-matching node. If an input data record is not complete (has missing values), then the look-up is limited to the available values. That is, the SOM is treated as if the nodes were shorter vectors (in math speak: the SOM is projected into the data space that consists of the available values) and then the lookup is conducted in this reduced map. This happens for each individual record.

It is possible to substitute missing values with the lookup values of the matching nodes. When a data mart is exported, the missing attribute values of the data mart records can even be replaced by the node values of the corresponding nodes in a SOM.


If the 10% existing values are more or less evenly distributed in the data set, it should be ok (e.g., if you have demographic data just for a part of your customers).

If the missing values in one attribute systematically depend on the values of other attributes that are used for map training you need to keep this in mind when you interpret the map. You should definitely not give too much priority to such an attribute, especially if you prioritize only few attributes.


The scaling is necessary to overcome the different orders of magnitude of the different attributes. Initially, when attributes have been scaled to, for example, variance=1, values can be compared across different attributes to calculate a (meaningful) Euclidean distance between two points.


In both cases, the mean value is subtracted first from each value so that the new mean of the scaled values is 0.

  • For variance scaling, the result will be divided by the standard deviation of the attribute. Thus, the new variance of the scaled values will always be 1.
  • For range scaling, the result will be multiplied by 8/(max-min), where "max" and "min" are the maximum and mimimum values of the variable; consequently, the new range (i.e., difference between maximum and minimum) is always 8.

Range scaling provides additional means to handle outliers compared to variance scaling.

Outliers can influence the layout of the map during training, and outliers might be over represented in the resulting map. This effect can be mitigated by using range scaling because the maximum value will not exceed 8.


If the range of the attribute (i.e., the difference between maximum and minimum value) is smaller than 8 times the standard deviation, variance scaling is used, otherwise range scaling is applied. This heuristic is based on the fact that in a normal distribution, 99.73% of all data are located within the interval of [–3*stddev, +3*stddev]. Thus, values outside of the interval [–4*stddev, +4*stddev] are supposed to be extreme outliers and thus range scaling is used.


The SOM algorithm starts out in the space spanned by the two largest principal component eigenvectors. The nodes are evenly distributed over this plane and initialized with the corresponding values. The data records (also called input vectors) will be matched to the node with the shortest Euclidean distance (i.e., the best matching node). The weight vector of this node as well as of the neighboring nodes will then be pulled towards the input vector. The closer the node to the best matching node, the “stronger” it will be pulled. Finally when all data records have been presented several times, the nodes represent the data distribution.

In each learning cycle of Viscovery, iterations due to all data records are cumulated and applied at once (“Batch-SOM”). Moreover the number of nodes grows from cycle to cycle from an initially small size to the final size (i.e. number of nodes).


The training time is roughly proportional to the number of attributes, to the number of data records, and to the number nodes. Moreover, the number of training cycles and, in general, the training schedule have an essential influence on the map creation time. Thus it can take from a second up to a several hours.


This is a very important issue in the creation of any SOM (and, actually, for data modeling in general). Giving a priority to an attribute means assigning it a particular importance for the application. Internally, the priority is a relative scaling factor multiplied on the variance or range scaling. Prioritizing an attribute formally gives it a weight other than 0. Attributes with a higher priority get a higher influence on the ordering of SOM data representation. As a consequence, clusters tend to emerge orthogonally with respect to that attribute.

You may want to include attributes in your map without prioritizing them. These attributes do not contribute to the ordering of the map. Nevertheless, it makes sense to include them, so you can see the distribution of their values over the map.


There is no difference as long as all attributes are prioritized by the same value. Only the relative factors between the priorities is decisive, but not the absolute numbers.


In most applications, the final map includes no more than 15 attributes that contribute to the order of the map. Keep in mind, the more attributes you prioritize, the less each one of the attributes will be ordered in the map. The more attributes correlate with each other, the more of them you can prioritize without disrupting the order of the map. If there are many highly correlated attributes, you may use several of them for the map training while turning on Correlation Compensation (which gives each of them a smaller priority in an automated manner). Nevertheless, you should lower the priorities for this group of highly correlated attributes.


A rule of thumb prevalent in the literature suggests that the number of nodes should be the same as the number of data records divided by 10: on average, 10 records match each node. In most practical cases, however, no less than 500 and no more than 5000 nodes are used, even if the mentioned relation is not observed. Viscovery can also handle SOMs that contain many more nodes than records in the data set. In this case, the SOM contains empty nodes, which do not disturb the ordering, but looks nicer.

In contrast, it does not make sense to use more than 2000 records per node when performing segmentation or data exploration. The SOM is an abstraction of the data distribution and will, thus, look very much the same whether 5000 or 500 records per node have been used. Therefore, the smaller data sample will do the same job. For prediction and scoring models, however, all records available are generally used because non-linear prediction models depend on the local information in the nodes.


The tension reflects the rigidity of the map. The higher the tension, the less is the approximation of the map to the data. A larger tension makes a smoother map, which is less specific at the nodes. A smaller tension yields a map, that rather follows outliers and noise. The default of 0.5 is adequate to almost all applications.


The quality of the map is determined less by performance indicators than by its suitability for your application. The goal is not to approximate the data most perfectly (for example, modeling every outlier and noise in the map), but rather to have a smooth and averaging representation of the data that gives you insight into the dependences among the attributes and leads to new findings.

Viscovery does compute overall Quantization and Distortion errors. These values can be viewed in the Production Journal (version 5.2 and earlier: in the Description of the Map History available in the File menu). Comparing these values for different maps makes sense only if the maps were trained from the same data and roughly the same attribute set.


Which map is best depends on the goal of your analysis. In addition, superior maps have ordered attributes and a representation that reflects your application task. However, the usefulness of a map depends on the data and their dependences, whether and to which extent it is possible to order all attributes at the same time.


A map can never be wrong. Everything a map reveals is correct and is intrinsic to the data. It might just happen that some characteristics of the data do not show very clearly because of a disadvantageous priority setting.


Sure, it can make sense to create a map if the intrinsic dimension of the data distribution is non-trivial.


Finding appropriate priorities is an iterative process. Depending on the goal of your analysis, you would usually start with setting the priorities of all attributes shown in the map (i.e., attributes pertinent to the question you want to answer) to 1 to create your first map. It is often useful initially to not prioritize more than about 30 attributes at once. Non-zero priority values are typically between 0.3 and 1.5.

Examine the map and make corrections with the following:

  • Deselecting attributes (or giving them a priority of 0) that seem not to contain relevant information;
  • Selecting and prioritizing attributes that you had not included before;
  • Raising priorities of interesting attributes;
  • Lowering priorities of attributes that seem to disturb the interesting order of the map.

Deltas for raising and lowering priorities are suggested to be between 0.3 and 1 (if you started out with 1).

However, the process of finding an optimal priority setting requires some intuition and will become faster and easier the more experienced you are.


First of all, attributes that you definitely do not want to see in the map should not be included in the data mart.

If you have many data records (for example, more than 100,000), you may want to use only a sample of your data for map creation.

You can create samples of your data set by saving the data mart in the last step of the Create Data Mart workflow; then use that data mart for training.

If you are still in the process of finding appropriate priorities, you should create maps with 500 nodes only. This number can be raised in the process of generating the final map.

For initial attempts, the training schedule “Fast” is sufficient and much faster (as the name suggests).

By following these suggestions, you can speed up map creation. Once you have found the attributes you want to use for map creation and an appropriate priority setting, you might want to recreate the final map with a bigger sample (or even all data records), with more nodes (up to 2000 nodes) and using the “Normal” or “Accurate” training schedule. You may finally also want to include attributes with priority 0, which should not contribute to the map ordering, to see their distribution over the map.


The colors correspond to numerical values of the attributes. The scale at the bottom of each attribute picture in the map window shows the correspondence between the displayed colors and the numerical values of the corresponding attribute. You can also consult "Understanding SOM visualization" of the SOM technology page on the Viscovery website.


Because the colors represent the node values and each node has a value. Before the actual training starts, all nodes are initialized by the corresponding values of the principal plane, thus get an initial node value. Later during the training process, the node values gradually adapt to the data records matching it. However, each data record that matches a node influences not only the value of the node itself, but also the neighboring nodes (which might not have any match among the data records).


Each node in the map represents a micro cluster, which is shown as a little hexagon.


The dots at either end of the color scale indicate that there are numerical values of the attribute that are outside of the displayed range.


Short answer: It is neither the goal nor a guarantee of the SOM training algorithm to achieve this property.

Long answer: After the training process, the map nodes are kept constant and are not updated anymore. At this point the nodes are considered a representation of the model data. To compute group profiles and statistical information on subsets of data records, all data records are matched into the nodes one final time. There is no guarantee that the mean of the data records that matched into a particular node are equal to the node’s values. It is in fact just a rare coincidence.


All values shown are in original scale. The scaled values are hidden from the user and only used in the background when computing the map. Viscovery generally presents attributes in their original scaling so that the user does not need to be concerned with the inverse scaling or transformations.


The colors represent the values contained in a node. Thus for a binary attribute such as gender, green matches a value of 0.5 (i.e., 50% of the data records in the node are female, the other 50% are male). Of course all colors are possible depending on the ratio of males to females in a node. Binary attributes given high priority might be displayed in mostly blue and red. Binary attributes given low priority might be displayed with a number of colors, as the data may not necessarily be ordered according to gender.


The reference group is necessary for comparing two group ranges and is used to compute group profiles (as shown in the Group Profile window).

To define a reference group, select a group range using the Group Range drop-down list in the toolbar and then select a node, a set of nodes, or a cluster in the map pictures. To specify the selection as the reference group, select Set Reference Group from the Edit menu.

Then modify the group range, selected nodes or clusters to compare the new group range to the reference group.

In Viscovery versions 5.2 and earlier, it was not possible to define an arbitrary reference group. Instead, the group profile behaved as if Entire map was selected for the reference group.


After selecting the first group of nodes, select Set Reference Group from the Edit menu. By selecting a second group of nodes, the Group Profile window displays a comparison of the profile values from the second group and the reference group (i.e., the first group).


In all cases but one, the bars are absolute values, whose meaning is specified by the selection in the Group Descriptive sub-menu of the View menu (version 5.2 and earlier: in the Select Statistics drop-down list) and refer to the selected range. But since the attributes might have very different scales, the absolute values are often not comparable.

Only if Profile is selected, the bars do not show absolute values. In this case, the bars reflect the deviation of the mean of the selected range from the mean of the reference group. To get comparable measures, the deviations of means are divided by standard deviations of the reference group: i.e., if the bar is short, the mean of the selected range does not differ very much from the reference group's mean in terms of the standard deviation. In the bar chart of the Group Profile window, it can easily be seen which attributes make up the group’s profile (i.e., differ most from the rest of the population exhibiting a long bar).


This bar chart only shows attributes for which the mean of the selected range differs significantly from the mean of the reference group. To see more or fewer attributes in the bar chart, the confidence level used in the Charts page can be changed in the Preferences dialog from the File menu. If you want to see a bar for all attributes regardless of their confidence, choose “don’t use” as confidence level.

In Viscovery versions prior to version 6, this setting can be found in the View page of the Preferences dialog.


The table in the Clusters window shows cluster means for both options. However, the Group Profile window behaves differently for Mean and Profile:

If you choose Profile, the bar chart shows the deviation of the mean of the selected range from the mean of the reference group. The unit is standard deviations of the reference group: i.e., if the bar is short, the mean of the selected range does not differ very much from the mean of the reference group in terms of the standard deviation.

If you choose Mean, the bar chart actually shows the mean attribute values of the selected range.


The Cluster Characteristics window (available from version 6) provides an automatic analysis of all clusters in the current segmentation. The sorted cluster descriptions provide an overview of the properties of each cluster. Each cluster is described by the variables that exhibit the strongest contribution to differentiating it from the entire data set. The median of absolute profile values key figure (displayed in the Clusters window) helps identify interesting clusters.


Box plots, scatter plots as well as other statistical features are available in a context-sensitive manner throughout Viscovery. You can use these functions over arbitrary selections of a map and also at each workflow step by choosing Statistics from the context menu (i.e., right click while the curser is on a workflow step).

For box plots, choose the previous to last tab in the statistics window and select all attributes of which you want to see the box plots. The box plots show the median as a white line inside of the colored box, the box from the lower to the upper quartile, the whiskers at +/-1.5 times the box length, and outliers denoted by colored lines outside of the whiskers.

For scatter plots, choose the last tab in the statistics window and select one attribute for the x-axis as well as one for the y-axis. The scatter plots show the distribution of one attribute in terms of any other one. Usually, the points in the scatter plot are color-coded according to the number of records represented by the point. This can be changed with the Color-coding according to drop-down list.


The number of data records that match a node is called frequency and is shown in the frequency picture of the map window.

  1. Choose Attributes… from the Map menu (version 5.2 and earlier: from the View menu).
  2. Check the previous to last entry Frequency.
  3. Leave the dialog by clicking “OK”.
  4. Click the node whose number of matching data records you would like to know.
  5. In the frequency picture that appears in the map window, move the mouse over the arrow in the color scale to see the number of data records contained in that node.

Alternatively, you can find the frequency in the list of the Group Profile window (last entry), if you choose the range Node.


Click on the node so it becomes the currently active node, which is indicated by a blinking cursor. On the color scale, you see a small black triangle that points down to the corresponding value of the current node. You can read off the exact value of this node by moving the mouse pointer over the triangle.


Yes, of course. If you want to show the node values of any attribute displayed over the respective node, do the following:

  1. Select the nodes at which you would like to show the node values.
  2. Copy the selection,
  3. Switch to label mode.
  4. Paste the selection as labels while you choose the attribute whose node values you would like to show at these nodes.

If you want to show attribute values as labels in the map, you would need to import labels from the source data file with the following steps:

  1. Open the source data file.
  2. Select and copy the rows with the records from which you would like to import the labels.
  3. In Viscovery switch to Label Mode.
  4. Paste them into the map.

Alternatively, you may use the Import feature from the File menu of Viscovery to import Labels for all data records of a data file.


You can sort the attributes by similarity in the Attributes dialog. This results in map pictures arranged by visual similarity.


The fastest and easiest option is the following:

  1. Open the Data Records dialog.
  2. Look up the record in question.
  3. Double-click it.

The curser will then be placed on the node that contains that record.

There are several other options to locate a specific data record in the map:

  1. Open the source data file and select the headline and the record in question.
  2. Copy the two selected lines.
  3. Switch to Selection Mode.
  4. Paste the record.

The best matching node containing this data record will be selected.

Alternatively, after copying the data record and the headline, you can

  1. Switch to Label Mode.
  2. Paste the identifier (key attribute) as labels such that the key will appear over the best matching node containing the data record.

You could copy several data records at once and paste their keys as labels to the map.


  1. Prepare a file that contains only the data records you would like to locate.
  2. Import labels from this file.

Yes, you can always select the rows of tables you would like to export and use copy and paste to export them into other programs. If you use Copy while the map window is active, the image of selected attribute pictures is copied to the clipboard. Use Paste Special in the destination application to insert either the graphics or a textual representation of the data visible in the graphics.

You can also export the attribute pictures of the map directly as a Windows meta file (WMF) graphic using the Export sub-menu of the File menu.


Yes, you can. You can use the export functionality of Viscovery to export all map node values to a text file directly or only the values of nodes that either contain labels, or that are selected, or located along a path. If the corresponding mode is turned on you can also copy the corresponding node values from the map to the clipboard.


Yes. Choose the Selection mode in the SOM. Select the nodes to which you want to add labels. Copy the selection into a spreadsheet. Add a column named “Label” and enter the label you wish. In this case, the new column will contain the label at each row. Copy all rows and the headline.

In the Viscovery map switch to label mode. Paste the copied records from the spreadsheet. The labels should appear at the nodes you previously selected.


This was only a problem with version 5.2 and earlier. As of version 6, labels are shifted so that they are not truncated.

If you cannot upgrade to the new version, you have to adjust the location of labels manually: Switch to label mode (select Label Mode from the Edit menu) and drag the half-visible labels inwards. For long labels you should consider writing them in two or more lines, which will be centered above the node.


Yes. You can edit step titles and descriptions while the steps are being processed by opening the step Properties and changing the title and adding a comment.


By pointing the mouse over a workflow step, the differences between the step and the step above it are shown in a popup. Additionally, the differences are listed in the step properties. No differences are listed for the first workflow step in a branch.


If you are using Viscovery SOMine 7.0 or later, right-click a yellow Import Data step of the Preprocess workflow, then select Import Preprocessing Protocol to mass-import your attribute descriptions.

In earlier versions of Viscovery software, use the Import Descriptions button in the Select Attributes page of the Import Data step to mass-import your attribute descriptions.


The computation of an optimal local regression is an iterative process. Starting with a set of priorities (specified by the user), a map is trained, from which a better set of priorities with certain criteria is computed. With this new set another map is trained and the priorities are refined again. These are the iterations.

Training cycles are the operations by which one of these maps are trained. The training cycles can be different in each iteration. They depend on the principal components of the (transformed and scaled) data, which in turn depends on the priorities.


No, priorities cannot be local. “Local” always refers to a single node. Priorities are always related to variables as a whole.


The receptive fields do not influence the map ordering. They determine which data records are used for computing significant local regressions at each node.


A white node in the coefficient picture of an attribute means, that this attribute was not used in a stepwise regression in this node.


A self-organizing map (SOM, also referred to as Kohonen map) is an ordered representation of multi-dimensional data in two dimensional space, which simplifies complexity and reveals relationships among the variables. The intuitive visualization of SOMs is easily understandable also by non-technicians providing a communication platform for business, statisticians, and IT. Read more about SOMs at SOM technology.


Self-organizing maps are used for the following tasks:

  • Data representation
  • Data exploration
  • Dependency analysis
  • Clustering
  • Segmentation
  • Classification
  • Non-linear prediction
  • Scoring

Your first stop could be our article on SOM technology. You can also follow the links to our extensive list of suggested reading, including online resources and printed material.


Please refer to SOM technology to learn more about the interpretation of SOM visualization.


As with all data mining software, you should know how to deal with data and how preprocessing can influence the results. Thus, some basic statistics knowledge is useful. Also, you should know how to read and interpret a SOM (see question above). Since this is a rather intuitive task, you will be able to understand SOMs within a few minutes.


The SOM-Ward clustering is based on the SOM-Ward distance, which is a variant of the Ward distance.

The Ward distance between two clusters is defined as

dxy := nx * ny / (nx + ny) * norm(meanx - meany)2

where nx and ny are the numbers of data points and meanx and meany the centers of gravity of the clusters; norm() is the Euclidean norm.

The SOM-Ward distance is defined as

d'xy := 

 if clusters x and y are adjacent in the SOM

Thus, the SOM-Ward distance observes the topological location of the clusters. In particular, two clusters that are not adjacent in the SOM are never considered to be merged.


For detailed information, see The SOM-Ward cluster algorithm.

Here are the exact formulas for the indicator I(c) of c clusters:

I'(c) := [ mu( c ) / mu( c+1 ) ] - 1

I(c) := max(0, I'(c)) * 100

mu(c) := d(c) * c-beta

where d(c) is that Ward distance that was used to merge c clusters into c-1 clusters; and 3 <= c < number of nodes. beta is the linear regression coefficient for the “data points” [ ln(c), ln(d(c)) ] (where 2 <= c <= number of nodes). This is because the d(c) “behave” like c-beta.

Further we define I(1) := 0 and I(2) := 0. And for SOM-Ward clusters we further define I(c) := 0 for inversions at c clusters, i.e. if d(c) < d(c+1).

The idea behind this is that when d(c) is high, but d(c+1) is low, c clusters is a good clustering because the next merge step (resulting in c-1 clusters) would result in a high variance within the clusters.

A further matter is how the (SOM-) Ward distance matrix is initialized: We consider the frequencies at each node (the number of data points that match at each node).


Do include it in the training data, but set the priority of this attribute to 0. This way it does not influence the map order.


Identifying appropriate priorities is an iterative process, which depends on the analysis and business objectives of the project.

After choosing initial priorities, the process cycles between map creation and evaluation and the refinement of the priorities until the result is satisfying.

For more thorough information see the following four FAQ entries.


This procedure is called explorative data mining.

The iterative process of finding priorities for explorative data mining (data visualization, to get new insights into data, find dependencies; there is no specific target value or predetermined objective) is as follows:

The initial priorities for explanatory attributes are set to 1, for the remaining attributes to 0.

In a first iteration attributes which appear disrupted are removed from the model by setting the priority to 0 or by deselecting them. In this iteration possibly additional attributes are added.

In a subsequent iteration the priorities are changed gradually (in steps of about 0.3) until the most important attributes show ordered areas. Possibly attributes which develop interesting features should be prioritized higher.


The iterative process of finding priorities for the definition of clusters and segmentations to answer special questions (e.g., buying patterns, customer profiles) is as follows:

To start, all attributes that are correlated with the objective (e.g. for buying patterns, attributes such as "buys product A", "Turnover product A", etc.) are prioritized with 1; the remaining attributes with 0.

In several iterations, the priorities of attributes which have been prioritized very low but appear slightly sorted nevertheless are increased, and the priorities of attributes which have been sorted very strongly but disturb the order of other attributes are decreased.


To start, all attributes that could have an influence on the target value and also the target value itself are prioritize with 1.

In a first iteration the priority of the target value is adapted (usually decreased), so that there are several areas with high target value. A complete separation into just two areas, where one area has only a high target value and the other area has only a low target value should be avoided.

Using the group profiles, the SOM is evaluated to detect the most influencing attributes of the areas with a specific (low or high) target value.

In a subsequent iterations the priorities of the most influencing attributes are increased, and/or the priorities of the other attributes are decreased (in steps of aprox. 0.2).


Set the priorities of the score value and the score groups to 1 and the priorities of the other attributes between 0.01 to 0.1.

Deactivate Compensate Correlations.


Priorities are the weights of the attributes, influencing the order of the map. The higher the priority of an attribute the higher is its influence on the map and the more the attribute will appear ordered.


There are some rules of thumb:

  • No more than 10-15 attributes should be prioritized (have a non-zero priority). The more attributes are prioritized, the lower is the potential for all attributes to be sorted well.
  • If there are correlated attributes, more attributes may be prioritized.
  • Never prioritize only nominal or binary attributes! This results in a crowd of data records in few nodes and many nodes with no data records.
  • The more different values a nominal attribute has the lower the priority of this attribute should be set.
  • Attributes with many missing values should be prioritized low.
  • If an attribute A only makes sense if attribute B has a specific value, (e.g. „time since last insurance case“ only makes sense, if „number of insurance cases“ is > 0) then B should always be prioritized when attribute A is prioritized.

If this option is chosen, internal scaling factors (not visible for the user) are used to reduce the influence of strongly correlated attributes. Without Compensate Correlations, these attributes would have a larger influence on the map ordering than they deserve. When Compensate Correlations is enabled, other attributes have a better chance to be ordered.

This option is activated by default and should generally be used, especially in the following cases:

  • When several attributes are strongly correlated
  • For general data visualization
  • When making predictions

Priorization is just one factor of scaling. The scaling of input data consists of three factors:

  • Variance or Range Scaling PS (internal, always)
  • Correlation Compensation PCC (internal, can be turned off)
  • Priorities PP (defined by the user)

The total scaling is computed as P = PS * PCC * PP.