Recommendation systems are built to generate recommendations for particular item. On ecommerce websites like Amazon, we get product recommendations and on youtube, we get video recommendations. Nowadays, recommendations systems are being used on many more content rich websites like news, movies, blogs, etc.
The dataset has 45,000 movies listed in the Full MovieLens Dataset. The dataset includes movies released on or before July 2017. It has cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.
Steps to download the dataset, perform filtration, and other processing.
Install the kaggle and check as below
Upload kaggle.json file which you can get from your kaggle account
Change the permission of the file and export the username and key
Download and unzip the dataset
We will use movies_metadata.csv, keywords.csv and credits.csv file
We will first extract and filter data from movies_metadata.csv file, for that import the library and read CSV.
Below are the columns of movie_metadata.csv, from which we will consider ‘id’, ‘genres’, ‘original_language’, ‘production_countries’, ‘tagline’, ‘original_title’, ‘adult’, ‘release_date’, ‘status’
Below are the unique values of the status. Filter the ‘status’ column, we will only consider those movies whose status is ‘released’.
Let’s filter the movie ‘genres’. Below is the format of the ‘genres’. We will filter and fetch the movie genre names from the data.
After genres, now filter ‘production_countries’, Below is the format of ‘production_countries’. We will filter and fetch the production country name.
Now we will read keywords.csv and filter the keywords from it.
Next, we will read ‘credits.csv’, and filter movie cast data.
Let’s filter the director data too.
Now we will merge all three tables.
After merging the data. Replace blank with NAN and drop those fields.
Save the filtered data to CSV and read the new CSV
Now we will combine the columns on which we’ll later perform embedding.
Let’s also create an index column for movies.
Create two functions as given below
Once we get the combined data, we will use Sentence transformers for BERT Embedding. Install the library and download a pretrained model, we have used ‘bert-base-nli-mean-tokens’ pretrained model. You can provide any pre-trained model, here is the list
Perform embedding of ‘combined_value’
After calculating the encoding we will get our vectors, now we need to find similarities between those vectors. To do so we will use Cosine Similarity, which provides a similarity score between two vectors, where 0 indicates no similarity and 1 indicates complete similarity.
Now when the user enters the movie name, the system compares the entered movie score with other scores and recommends the movie to the user.
Below is a snippet of the movie recommendation. Input: Toy Story Recommendation: Toys The Lego Movie Big Hero 6
Do share your feedback on our recommendation systems.
Pragnakalp Techlabs: Your trusted partner in Python, AI, NLP, Generative AI, ML, and Automation. Our skilled experts have successfully delivered robust solutions to satisfied clients, driving innovation and success.