Post

Music Recommendation System

Preview

Build a music recommendation system based on audio features of 170,000 digital songs using Association Rule and Clustering in Python.

image

Open In Colab

Introduction

Big companies (Spotify and Apple Music) have already adapted and developed the new technology trend by releasing their innovative ML-based approaches and services, including the AI solution to recommend songs to users. Through this algorithm, it is possible to increase user satisfaction and engagement in the music streaming services as well as make enormous profits against competitors.

Objectives

We will understand what the music recommendation system is, explore all possible machine learning models for building the system, and solve the following business and technical questions.

  1. Which model can be applied to create the music recommendation system?
  2. How do the models recommend songs to users?

Process

  1. Exploratory Data Analysis (EDA)
    • This part is exploring the given data set to understand the basic data structure, the meaning of values, and the importance of features for building a music recommendation system.
  2. Modeling
    • This part is building music recommendation models with the audio features, based on Association Rules and Clustering.
    • The data modeling part will be implemented with the following steps.
    1. Data Preprocessing
      • Association Rule: Discretization, One-Hot Encoding
      • Clustering: Normalization, Dimensionality Reduction using PCA and t-SNE
    2. Data Modeling
      • Association Rule: apriori(), association_rules()
      • Clustering: K-means clustering, DBSCAN clustering, Hierarchical (ward linkage) and Agglomerative clustering, and Spectral clustering
    3. Model Evaluation
      • Association Rule: Finding the best metric among the followings: Confidence, Lift, and Support
      • Clustering: Calculating Avg. Silhouette Scores and Calinski-Harabasz Scores
  3. Preliminary Analysis
    • This part is defining the preliminary conclusion from the above research, implementing application (developing recommendation systems), and make suggestions and improvements for further studies.

Conclusions

  1. Exploratory Data Analysis (EDA)
    • Handle Outliers: The numeric values in audio features are not on the same scale. We would solve this part for better outputs of the model, removing unnecessary columns or deploying normalization in the data preprocessing part. image
    • Explore the Values: There are 34,088 artists in the data set. Also the range of released year of songs in the data set is between 1921 to 2020. Most songs in the data set were released between 1950s and 2010s. image
  2. Modeling
    1. Data Preprocessing
      • Association Rule: Used all the autio features, converted the numeric values into three categories (high, medium, and low), and then implemented one-hot encoding.
      • Clustering: Based on the results of EDA, selected the following features: valence, energy, instrumentalness, key, liveness, speechiness, and year_group. Nomalized these values with MinMaxScaler() and conduct dimensionality reduction using PCA and t-SNE.
        • PCA: image
        • t-SNE: image
    2. Data Modeling
      • Association Rule: Calculated support, confidence, and lift values. image

      • Clustering: K-means clustering, DBSCAN clustering, Hierarchical (ward linkage) and Agglomerative clustering, and Spectral clustering image

    3. Model Evaluation
      • Association Rule: Tried to compare the results of the following metrics.
        • lift, confidence, support, lift + confidence + support
        • The best metric: support image
      • Clustering: Calculating Avg. Silhouette Scores and Calinski-Harabasz Scores image
        • Based on the results, K-Means will be the best model to build clustering model.

          1
          2
          
            - Avg. Silhouette Score (0.503): The data point in k-means clusters is very well mached to its own cluster, but not to neighboring clusters.
            - Calinski-Harabasz Score (74711.500): It shows the better clustering results.
          
  3. Preliminary Analysis
    • The music recommendation system based on K-means clustering showed the relevant result.
      • Association Rule
        image
  • Clustering
    image
  • 1. Which model can be applied to create the music recommendation system?
    • The K-Means clustering model can be applied to build the optimized music recommendation system based on audio features of digital songs.
  • 2. How do the models recommend songs to users?
    • It partitations the data set, which contains the audio features (valence, energy, instrumentalness, key, liveness, speechiness, and year_group), into 5 clusters. Based on the clusters, the music recommendation system finds the best cluster for a new input data.
This post is licensed under CC BY 4.0 by the author.