sine verbis

Subjective Metadata in Music Classification:

Genre Prediction via Hiearchical Agglomerative Clustering

Automated music genre prediction has many interesting uses, including the music recommendation algorithms popular in modern day music applications such as iTunes' Genius Playlists and Spotify's Suggested Tracks. The majority of the algorithms that form the basis of these services are privately held and used by the companies who created them.

We know many of these algorithms use sentimental analysis on the lyrics of songs for their recommendations, but that comes with its own costs. For one, music recommendation companies must consider the storage constraints they may encounter when storing lyrical text for the massive collection of the songs they may recommend.

We hope to find a less data-intensive way of classifying songs into genres (the first and most simple step in grouping songs by similarity) by using song metadata that numerically encapsulates subjective attributes such as energy and danceability. We hope that in using these subjective attributes, we will be able to recover some quality of music classification that is lost when an important portion of the data set is removed: the lyrics.

We adopted the idea of using a feature mapping to make data linearly separable in higher dimensions from SVM applications by creating a transformation function that mapped variables in our data set into a linear shape. In this way, we could more thoroughly explore the relationship among variables in our data set and how they relate to our key attributes of danceability, valence, and energy.

We implemented kNN with and without our transformation function to predict danceability, valence, and energy for a given song and compared the results to check the correctness of our transformation function.

Our more formal definitions for danceability, energy, and valence follow from our explorations with the data using kNN and our transformation function to predict the values of energy, valence, and danceability, given our understanding of the correlation between the attributes in the metadata of a song.

Definitions:

  • Danceablity: The ease with which a listener can dance to the song, using a modern dance style. Consistent, upbeat rhythm and high energy.
  • Valence: Low valence corresponds to sadness/negativity and high valence corresponds to happiness/positivity.
  • Energy: How stimulating a song is.

Our HAC implementation in MATLAB used the single-link (minimum distance), complete-link (maximum distance), and average-link (average distance) methods to cluster our song data set. The distance metric we used was Euclidean distance applied to the song's transformation and valence.

We show a sample cluster to the left. You can see that about 60% of songs clustered here belong to genres that are similar to one another.

Veena Calambur



La Vesha Parker

Daniel Hanggi



Javier Ortiz