Recognition of Western style musical genres using machine learning techniques

https://doi.org/10.1016/j.eswa.2009.03.050Get rights and content

Abstract

This study uses machine learning techniques (ML) to classify and cluster different Western music genres. Three artificial neural network models (multi-layer perceptron neural network [MLP], probabilistic neural network [PNN]) and self-organizing maps neural network (SOM) along with support vector machines (SVM) are compared to two standard statistical methods (linear discriminant analysis [LDA] and cluster analysis [CA]). The variable sets considered are average frequencies, variance frequencies, maximum frequencies, amplitude or loudness of the sound and the median of the location of the 15 highest peaks in the periodogram. The results show that machine learning models outperform traditional statistical techniques in classifying and clustering different music genres due to their robustness and flexibility of modeling algorithms. The study also shows how it is possible to identify various dimensions of music genres by uncovering complex patterns in the multidimensional data.

Introduction

Musical genres are widely used to segment and describe titles, both by the music industry and the consumers. The segmentation of music genres is also an important aspect of many multimedia retrieval systems (Khan & Al-Khatib, 2006). Music segmentation refers to the process of breaking up an audio stream into temporal segments by means of applying a boundary detection criterion as, for example, texture, note, instrument, rhythm pattern, overall structure, etc. (Dillon, 2003).

Musical genres segmentation has become increasingly important issue as it has many applications in professional media production, automatic speech recognition systems (ASR), audio archive management, commercial music usage, content-based browsing and audio retrieval systems (Lin & Chen, 2005). However, despite the recent major breakthroughs, segmenting music genres according to overall sound similarity has remained an unsolved problem (Pampalk, Dixon, & Widmer, 2004). In this paper we address this problem using ML techniques. More specifically, the aim of this study is two-fold: (1) to investigate the influence of various factors on music genres segmentation; and (2) to compare the classification and clustering performance of ML against the more traditional techniques such as LDA and CA within the context of music genres segmentation.

This paper is organized as follows. Section 2 surveys related research work carried out in music genres classification and clustering. Section 3 presents the methodology used to conduct the analysis. Section 4 presents the experimental results. Finally, Section 5 highlights the major research implications and limitations. Possible avenues for future research are also explored.

Section snippets

Related work

Several researchers have recently addressed the problem related to speech/music classification. Using a sample of approximately 2.25 h of speech, 2.72 h of music and 0.62 h of speech/music data representing five languages: American English, Urdu, Japanese, Spanish and Hebrew, Khan and Al-Khatib (2006) used three different classification frameworks to classify speech and music: MLP neural network, radial basis functions (RBF) neural network and a hidden Markov model (HMM). The authors found that

Methodology

Three musical genres were used in this study: classical, rock and new wave. Classical music refers to music rooted in the traditions of Western liturgical and secular music, encompassing a broad period from the 9th century to present times. The instruments used in classical music were mostly invented before the mid-19th century and codified in the 18th and 19th centuries. Classical music can take on the form of the concerto, symphony, opera, dance music, etc. (Lebrecht, 1996). The rock music is

Multi-layer perceptron neural network

MLP was first developed to mimic the functioning of the brain. It consists of interconnected nodes referred to as processing elements that receive, process, and transmit information. MLP consists of three types of layers: the first layer is known as the input layer and corresponds to the problem input variables with one node for each input variable. The second layer is known as the hidden layer and is useful in capturing non-linear relationships among variables. The final layer is known as the

Implications, limitations and future research

Our results confirm the theoretical work by Hecht-Nielson (1989) who has shown that ML techniques can learn input–output relationships to the point of making perfect forecasts with the data on which the network is trained. However, perfect forecasts with the training data do not guarantee optimal forecasts with the testing data due to differences in the two data sets. The superior performance of the ML techniques can be traced to its inherent non-linearity. This makes ML techniques ideal for

References (89)

  • M. Hajmeer et al.

    A probabilistic neural approach for modeling and classification of bacterial growth/no-growth data

    Journal of Microbiological Methods

    (2002)
  • I. Hmeidi et al.

    Performance of KNN and SVM classifiers on full word Arabic articles

    Advanced Engineering Informatics

    (2008)
  • M. Kiang et al.

    Selecting the right MBA schools: An application of self-organizing map networks

    Expert Systems with Applications

    (2008)
  • B. Koetz et al.

    Multi-source land cover classification for forest management based on imaging spectrometry and LIDAR data

    Forest Ecology and Management

    (2008)
  • E. Laskari et al.

    Studying the performance of artificial neural networks on problems related to cryptography

    Nonlinear Analysis: Real World Applications

    (2006)
  • S. Lek et al.

    Artificial neural networks as a tool in ecological modeling: An introduction

    Ecological Modeling

    (1999)
  • X. Li et al.

    Predicting motor vehicle crashes using support vector machine models

    Accident Analysis and Prevention

    (2008)
  • D. Moreno et al.

    Self-organizing maps could improve the classification of Spanish mutual funds

    European Journal of Operational Research

    (2006)
  • Y. Shan et al.

    Application of probabilistic neural network in the clinical diagnosis of cancers based on clinical chemistry data

    Analytica Chimica Acta

    (2002)
  • H. Silver et al.

    Analysis of cognitive performance in schizophrenia patients and healthy individuals with unsupervised clustering models

    Psychiatry Research

    (2008)
  • K. Smith et al.

    Neural networks in business: Techniques and applications for the operations researcher

    Computers & Operations Research

    (2000)
  • D. Specht

    Probabilistic neural networks

    Neural Networks

    (1990)
  • J. Vesanto

    SOM-based data visualisation methods

    Intelligent Data Analysis

    (1999)
  • C. Vijayakumar et al.

    Segmentation and grading of brain tumors on apparent diffusion coefficient images using self-organizing maps

    Computerized Medical Imaging and Graphics

    (2007)
  • S. Wilson

    Algorithm architectures for patient dependent seizure detection

    Clinical Neurophysiology

    (2006)
  • C. Xue et al.

    Study of probabilistic neural networks to classify the active compounds in medicinal plants

    Journal of Pharmaceutical and Biomedical Analysis

    (2005)
  • Z. Yang et al.

    Probabilistic neural networks in bankruptcy prediction

    Journal of Business Research

    (1999)
  • B. Yu et al.

    A comparative study for content-based dynamic spam classification using four machine learning algorithms

    Knowledge-Based Systems

    (2008)
  • J. Zahavi et al.

    Applying neural computing to target marketing

    Journal of Direct Marketing

    (1997)
  • Alyuda Research Company (2003). NeuroIntelligence user manual (Version...
  • M. Anandarajan et al.

    Bankruptcy prediction of financially stressed firms: An examination of the predictive accuracy of artificial neural networks

    International Journal of Intelligent Systems in Accounting, Finance & Management

    (2001)
  • A. Audrain-Pontevia

    Kohonen self-organizing maps: A neural approach for studying the links between attributes and overall satisfaction in a services context

    Journal of Consumer, Satisfaction Dissatisfaction and Complaining Behavior

    (2006)
  • E. Baranoff et al.

    A semi-parametric stochastic spline model as a managerial tool for potential insolvency

    Journal of Risk and Insurance

    (2000)
  • M. Bensic et al.

    Modeling small-business credit scoring by using logistic regression, neural networks and decision trees

    Intelligent Systems in Accounting, Finance and Management

    (2005)
  • C. Bishop

    Neural Networks for Pattern Recognition

    (1999)
  • L. Canetta et al.

    Applying two-stage SOM-based clustering approaches to industrial data analysis

    Production Planning & Control

    (2005)
  • Chang, C., & Lin, C. (2001). LIBSVM: A library for support vector machines. Software available at:...
  • D. Cook et al.

    Interactive and dynamic graphics for data analysis with R and GGobi

    (2007)
  • R. Dillon

    Classifying musical performance by statistical analysis of audio cues

    Journal of New Music Research

    (2003)
  • Dimitradou, E., Hornik, K., Leisch, F., Meyer, D., & Weingessel, A. (2005). E1071: Misc. functions of the department of...
  • C. Ding et al.

    User modeling for personalized web search with self-organizing map

    Journal of the American Society for Information Science and Technology

    (2007)
  • R. Frota et al.

    Anomaly detection in mobile communication network using the self-organizing map

    Journal of Intelligent and Fuzzy Systems

    (2007)
  • D. Gerbec et al.

    Allocation of the load profiles to consumers using probabilistic neural networks

    IEEE Transactions on Power Systems

    (2005)
  • H. Ghaziri et al.

    Self-organizing feature maps for the vehicle routing problem with backhauls

    Journal of Scheduling

    (2006)
  • Cited by (21)

    • A survey on symbolic data-based music genre classification

      2016, Expert Systems with Applications
      Citation Excerpt :

      Results from Table 4 indicate better accuracies were achieved for relatively small datasets, while for a larger dataset, the average accuracy was approximately 50%. For the classification of audio data, studies that adopt an unsupervised perspective for the discrimination of musical genres use k-means, SOM and HMM methods, and the clustering performance seems to be comparable to that of supervised techniques (Mostafa & Billor, 2009; Shao, Xu, & Kankanhalli, 2004). Event-based features tend to perform better than global descriptors, since they capture the sequential information from music events, as verified in the literature (for instance, Hillewaere et al., 2009; van Kranenburg, Volk, & Wiering, 2013) and confirmed by the reports in Tables 1 and 3.

    • MUSIC-MAS: Modeling a harmonic composition system with virtual organizations to assist novice composers

      2016, Expert Systems with Applications
      Citation Excerpt :

      Mocholi, Martinez, Jaen, and Catala (2012) addressed the problem of music playlist generation by using a multicriteria ant colony, and López-Ortega and López-Popa (2012) present a suite to assist in the creation of musical pieces, whose foundation lies on fractals, fuzzy logic and expert systems. Another good example is the recognition system for western music made by Mostafa and Billor (2009), based on machine learning algorithms. Recently, Velardo and Vallati (2014) propose a memetic model for music composition, which considers both psychological and social levels.

    • Complexity-entropy causality plane: A useful approach for distinguishing songs

      2012, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      Correa et al. [25] investigated four music genres employing a complex network representation for rhythmic features of the songs. There are still other investigations [26–39], most of which are based on fractal dimensions, entropies, power spectrum analysis or correlation analysis. It is worth noting that there are several methods of automatic genre classification emerging from engineering disciplines (see, for instance, Ref. [40]).

    • Recommended biometric stress management system

      2011, Expert Systems with Applications
      Citation Excerpt :

      Maslow’s Hierarchy of Needs theory (Maslow, 1943, 1954) is probably the most widely applied. Research shows that various scientists have specialised in depth the different and very important areas of speech and emotion analysis (Clavel, Vasilescu, Devillers, Richard, & Ehrette, 2008), emotion detection (Altun & Polat, 2009), emotion annotation (Callejas & López-Cózar, 2008), evaluation and the estimation of emotions in speech (Grimm, Kroschel, Mower, & Narayanan, 2007), ensemble methods for spoken emotion recognition (Morrison, Wang, & De Silva, 2007), speech and emotion (Douglas-Cowie, Cowie, & Campbell, 2003), emotional states that are expressed in speech (Cowie & Cornelius, 2003), voice quality in communicating emotion, mood and attitude (Gobl & Chasaide, 2003), emotions, speech and the ASR framework (Bosch, 2003), vocal communication of emotion (Scherer, 2003), emotional speech recognition (Ververidis & Kotropoulos, 2006), speech recognition (Avci & Akpolat, 2006), speaking improvement (Hsu, 2010), voice dialogue (Tsai, 2006), recognition of musical genres (Mostafa & Billor, 2009), command recognition (Savage-Carmona, Billinghurst, & Holden, 1998), intelligent home appliance control (Hsu, Yang, & Wu, 2010). According to Ververidis and Kotropoulos (2006) the most frequent acoustic features used for emotional speech recognition are pitch, formants, vocal tract cross-section areas, mel-frequency cepstral coefficients, Teager energy operator-based features, the intensity of speech signals, and speech rates.

    View all citing articles on Scopus
    View full text