Recognition of Western style musical genres using machine learning techniques
Introduction
Musical genres are widely used to segment and describe titles, both by the music industry and the consumers. The segmentation of music genres is also an important aspect of many multimedia retrieval systems (Khan & Al-Khatib, 2006). Music segmentation refers to the process of breaking up an audio stream into temporal segments by means of applying a boundary detection criterion as, for example, texture, note, instrument, rhythm pattern, overall structure, etc. (Dillon, 2003).
Musical genres segmentation has become increasingly important issue as it has many applications in professional media production, automatic speech recognition systems (ASR), audio archive management, commercial music usage, content-based browsing and audio retrieval systems (Lin & Chen, 2005). However, despite the recent major breakthroughs, segmenting music genres according to overall sound similarity has remained an unsolved problem (Pampalk, Dixon, & Widmer, 2004). In this paper we address this problem using ML techniques. More specifically, the aim of this study is two-fold: (1) to investigate the influence of various factors on music genres segmentation; and (2) to compare the classification and clustering performance of ML against the more traditional techniques such as LDA and CA within the context of music genres segmentation.
This paper is organized as follows. Section 2 surveys related research work carried out in music genres classification and clustering. Section 3 presents the methodology used to conduct the analysis. Section 4 presents the experimental results. Finally, Section 5 highlights the major research implications and limitations. Possible avenues for future research are also explored.
Section snippets
Related work
Several researchers have recently addressed the problem related to speech/music classification. Using a sample of approximately 2.25 h of speech, 2.72 h of music and 0.62 h of speech/music data representing five languages: American English, Urdu, Japanese, Spanish and Hebrew, Khan and Al-Khatib (2006) used three different classification frameworks to classify speech and music: MLP neural network, radial basis functions (RBF) neural network and a hidden Markov model (HMM). The authors found that
Methodology
Three musical genres were used in this study: classical, rock and new wave. Classical music refers to music rooted in the traditions of Western liturgical and secular music, encompassing a broad period from the 9th century to present times. The instruments used in classical music were mostly invented before the mid-19th century and codified in the 18th and 19th centuries. Classical music can take on the form of the concerto, symphony, opera, dance music, etc. (Lebrecht, 1996). The rock music is
Multi-layer perceptron neural network
MLP was first developed to mimic the functioning of the brain. It consists of interconnected nodes referred to as processing elements that receive, process, and transmit information. MLP consists of three types of layers: the first layer is known as the input layer and corresponds to the problem input variables with one node for each input variable. The second layer is known as the hidden layer and is useful in capturing non-linear relationships among variables. The final layer is known as the
Implications, limitations and future research
Our results confirm the theoretical work by Hecht-Nielson (1989) who has shown that ML techniques can learn input–output relationships to the point of making perfect forecasts with the data on which the network is trained. However, perfect forecasts with the training data do not guarantee optimal forecasts with the testing data due to differences in the two data sets. The superior performance of the ML techniques can be traced to its inherent non-linearity. This makes ML techniques ideal for
References (89)
- et al.
Speech/music segmentation using entropy and dynamism features in a HMM classification framework
Speech Communication
(2003) - et al.
Dynamic classification for video stream using support vector machine
Applied Soft Computing
(2008) - et al.
Supervised pattern recognition in food analysis
Journal of Chromatography A
(2007) - et al.
A roadmap for future neural networks research in auditing and risk assessment
International Journal of Accounting Information Systems
(2002) - et al.
Classification of magnetic resonance brain images using wavlets as input to support vector machines and neural network
Biomedical Signal Processing and Control
(2006) - et al.
Application of neural networks to an emerging market: Forecasting and trading the Taiwan Stock Index
Computers & Operations Research
(2003) - et al.
A hybrid SARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan
Expert Systems with Applications
(2007) - et al.
Towards fair ranking of Olympics achievements: The case of Sydney 2000
Computers & Operations Research
(2006) - et al.
Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques
Expert Systems with Applications
(2008) - et al.
The use of data mining and neural networks for forecasting stock market returns
Expert Systems with Applications
(2005)
A probabilistic neural approach for modeling and classification of bacterial growth/no-growth data
Journal of Microbiological Methods
Performance of KNN and SVM classifiers on full word Arabic articles
Advanced Engineering Informatics
Selecting the right MBA schools: An application of self-organizing map networks
Expert Systems with Applications
Multi-source land cover classification for forest management based on imaging spectrometry and LIDAR data
Forest Ecology and Management
Studying the performance of artificial neural networks on problems related to cryptography
Nonlinear Analysis: Real World Applications
Artificial neural networks as a tool in ecological modeling: An introduction
Ecological Modeling
Predicting motor vehicle crashes using support vector machine models
Accident Analysis and Prevention
Self-organizing maps could improve the classification of Spanish mutual funds
European Journal of Operational Research
Application of probabilistic neural network in the clinical diagnosis of cancers based on clinical chemistry data
Analytica Chimica Acta
Analysis of cognitive performance in schizophrenia patients and healthy individuals with unsupervised clustering models
Psychiatry Research
Neural networks in business: Techniques and applications for the operations researcher
Computers & Operations Research
Probabilistic neural networks
Neural Networks
SOM-based data visualisation methods
Intelligent Data Analysis
Segmentation and grading of brain tumors on apparent diffusion coefficient images using self-organizing maps
Computerized Medical Imaging and Graphics
Algorithm architectures for patient dependent seizure detection
Clinical Neurophysiology
Study of probabilistic neural networks to classify the active compounds in medicinal plants
Journal of Pharmaceutical and Biomedical Analysis
Probabilistic neural networks in bankruptcy prediction
Journal of Business Research
A comparative study for content-based dynamic spam classification using four machine learning algorithms
Knowledge-Based Systems
Applying neural computing to target marketing
Journal of Direct Marketing
Bankruptcy prediction of financially stressed firms: An examination of the predictive accuracy of artificial neural networks
International Journal of Intelligent Systems in Accounting, Finance & Management
Kohonen self-organizing maps: A neural approach for studying the links between attributes and overall satisfaction in a services context
Journal of Consumer, Satisfaction Dissatisfaction and Complaining Behavior
A semi-parametric stochastic spline model as a managerial tool for potential insolvency
Journal of Risk and Insurance
Modeling small-business credit scoring by using logistic regression, neural networks and decision trees
Intelligent Systems in Accounting, Finance and Management
Neural Networks for Pattern Recognition
Applying two-stage SOM-based clustering approaches to industrial data analysis
Production Planning & Control
Interactive and dynamic graphics for data analysis with R and GGobi
Classifying musical performance by statistical analysis of audio cues
Journal of New Music Research
User modeling for personalized web search with self-organizing map
Journal of the American Society for Information Science and Technology
Anomaly detection in mobile communication network using the self-organizing map
Journal of Intelligent and Fuzzy Systems
Allocation of the load profiles to consumers using probabilistic neural networks
IEEE Transactions on Power Systems
Self-organizing feature maps for the vehicle routing problem with backhauls
Journal of Scheduling
Cited by (21)
Classifying environmental sounds using image recognition networks
2017, Procedia Computer ScienceA survey on symbolic data-based music genre classification
2016, Expert Systems with ApplicationsCitation Excerpt :Results from Table 4 indicate better accuracies were achieved for relatively small datasets, while for a larger dataset, the average accuracy was approximately 50%. For the classification of audio data, studies that adopt an unsupervised perspective for the discrimination of musical genres use k-means, SOM and HMM methods, and the clustering performance seems to be comparable to that of supervised techniques (Mostafa & Billor, 2009; Shao, Xu, & Kankanhalli, 2004). Event-based features tend to perform better than global descriptors, since they capture the sequential information from music events, as verified in the literature (for instance, Hillewaere et al., 2009; van Kranenburg, Volk, & Wiering, 2013) and confirmed by the reports in Tables 1 and 3.
MUSIC-MAS: Modeling a harmonic composition system with virtual organizations to assist novice composers
2016, Expert Systems with ApplicationsCitation Excerpt :Mocholi, Martinez, Jaen, and Catala (2012) addressed the problem of music playlist generation by using a multicriteria ant colony, and López-Ortega and López-Popa (2012) present a suite to assist in the creation of musical pieces, whose foundation lies on fractals, fuzzy logic and expert systems. Another good example is the recognition system for western music made by Mostafa and Billor (2009), based on machine learning algorithms. Recently, Velardo and Vallati (2014) propose a memetic model for music composition, which considers both psychological and social levels.
Complexity-entropy causality plane: A useful approach for distinguishing songs
2012, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :Correa et al. [25] investigated four music genres employing a complex network representation for rhythmic features of the songs. There are still other investigations [26–39], most of which are based on fractal dimensions, entropies, power spectrum analysis or correlation analysis. It is worth noting that there are several methods of automatic genre classification emerging from engineering disciplines (see, for instance, Ref. [40]).
Recommended biometric stress management system
2011, Expert Systems with ApplicationsCitation Excerpt :Maslow’s Hierarchy of Needs theory (Maslow, 1943, 1954) is probably the most widely applied. Research shows that various scientists have specialised in depth the different and very important areas of speech and emotion analysis (Clavel, Vasilescu, Devillers, Richard, & Ehrette, 2008), emotion detection (Altun & Polat, 2009), emotion annotation (Callejas & López-Cózar, 2008), evaluation and the estimation of emotions in speech (Grimm, Kroschel, Mower, & Narayanan, 2007), ensemble methods for spoken emotion recognition (Morrison, Wang, & De Silva, 2007), speech and emotion (Douglas-Cowie, Cowie, & Campbell, 2003), emotional states that are expressed in speech (Cowie & Cornelius, 2003), voice quality in communicating emotion, mood and attitude (Gobl & Chasaide, 2003), emotions, speech and the ASR framework (Bosch, 2003), vocal communication of emotion (Scherer, 2003), emotional speech recognition (Ververidis & Kotropoulos, 2006), speech recognition (Avci & Akpolat, 2006), speaking improvement (Hsu, 2010), voice dialogue (Tsai, 2006), recognition of musical genres (Mostafa & Billor, 2009), command recognition (Savage-Carmona, Billinghurst, & Holden, 1998), intelligent home appliance control (Hsu, Yang, & Wu, 2010). According to Ververidis and Kotropoulos (2006) the most frequent acoustic features used for emotional speech recognition are pitch, formants, vocal tract cross-section areas, mel-frequency cepstral coefficients, Teager energy operator-based features, the intensity of speech signals, and speech rates.
Does musical training affect neuro-cognition of emotions? An EEG study with Indian Classical Instrumental Music
2022, Proceedings of Meetings on Acoustics