May 25, 2021 No Comments

Recommendation systems are built to generate recommendations for particular item. On ecommerce websites like Amazon, we get product recommendations and on youtube, we get video recommendations. Nowadays, recommendations systems are being used on many more content rich websites like news, movies, blogs, etc.

Here is our own try to create a Natural Language Processing (NLP) based movie recommendation system using BERT. You can also refer or copy our colab file to follow the steps.

This recommender system recommends a movie based on various movie features not just description. We have considered

  • genres
  • original_language
  • production_countries
  • tagline
  • original_title
  • keywords
  • cast
  • director
  • adult
  • release_date
  • status

from the dataset as our features. It identifies the similarity between the movie based on their features.

We have used “The Movies Dataset” available on kaggle.

The dataset has 45,000 movies listed in the Full MovieLens Dataset. The dataset includes movies released on or before July 2017. It has cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.

Steps to download the dataset, perform filtration, and other processing.

Install the kaggle and check as below

Upload kaggle.json file which you can get from your kaggle account

Change the permission of the file and export the username and key

Download and unzip the dataset

We will use movies_metadata.csv, keywords.csv and credits.csv file

We will first extract and filter data from movies_metadata.csv file, for that import the library and read CSV.

Below are the columns of movie_metadata.csv, from which we will consider ‘id’, ‘genres’, ‘original_language’, ‘production_countries’, ‘tagline’, ‘original_title’, ‘adult’, ‘release_date’, ‘status’

Below are the unique values of the status. Filter the ‘status’ column, we will only consider those movies whose status is ‘released’.

Let’s filter the movie ‘genres’. Below is the format of the ‘genres’. We will filter and fetch the movie genre names from the data.

After genres, now filter ‘production_countries’, Below is the format of ‘production_countries’. We will filter and fetch the production country name.

Now we will read keywords.csv and filter the keywords from it.

Next, we will read ‘credits.csv’, and filter movie cast data.

Let’s filter the director data too.

Now we will merge all three tables.

After merging the data. Replace blank with NAN and drop those fields.

Save the filtered data to CSV and read the new CSV

Now we will combine the columns on which we’ll later perform embedding.

Let’s also create an index column for movies.

Create two functions as given below

Once we get the combined data, we will use Sentence transformers for BERT Embedding. Install the library and download a pretrained model, we have used ‘bert-base-nli-mean-tokens’ pretrained model. You can provide any pre-trained model, here is the list

Perform embedding of ‘combined_value’

After calculating the encoding we will get our vectors, now we need to find similarities between those vectors. To do so we will use Cosine Similarity, which provides a similarity score between two vectors, where 0 indicates no similarity and 1 indicates complete similarity.

Now when the user enters the movie name, the system compares the entered movie score with other scores and recommends the movie to the user.

Below is a snippet of the movie recommendation.
Input: Toy Story
Recommendation: Toys
The Lego Movie
Big Hero 6

Do share your feedback on our recommendation systems.

Do contact us, if you are looking for recommendation system for your own use-case. We offer reliable Natural Language Processing based solutions and Chatbot Development services.

Write a comment

Your email address will not be published. Required fields are marked *

Want to talk to an Expert Developer?

Our experts in Generative AI, Python Programming, and Chatbot Development can help you build innovative solutions and scale your business faster.