I’ve been following Jeremy Howard’s course on Practical Deep Learning With FastAI. Recently I learned about collaborative filtering, In the course we’ve trained a movie recommdner using the small version of the MovieLense dataset, in this post, we will train the 25 million version in just 5 minutes.
The intractive Jupyter notebook version is here
What is Collaborative Filtering?
Imagine a person who likes thin, non-spicy pizzas with mild cheese and another person who likes thick spicy pizzas with medium cheese, and another person who is something in between.
Right? Now imagine that we have a table, consisting of the pizzas and the people’s ratings. Note that the only thing we have from the pizzas and the people is just a number. We don’t know the name, we don’t know whether it’s spicy or thick or thin. We just know a number! The table we’ve got looks like this:
Pizza | Person | Rating |
---|---|---|
5 | 12 | 4/5 |
845 | 02 | 1/5 |
32 | 124 | 0/5 |
Collaborative filtering lets us figure out which qualities pizzas have (density, spice level, cheese amount) just by looking at these numbers and ratings. That’s not all—it even lets us recommend our users pizzas they never had with a high confidence that they will like them.
How the hell does it work?
The magic suddenly makes sense when you learn the trick. It’s simple. First, we try to guess these factors for pizzas, for example:
- Pizza number 5:
density: thin (2/5)
spice: high (4.5/5)
cheese amount: plenty (3.5/5)
Then we do the same for the people:
- Person number 12:
Likes thin: 1/5
Likes Spicy: 4/5
Likes cheese: 4/5
(These numbers are not accurate, just guesses)
SGD Comes Again!
Now based on the ratings each user gave to different pizzas, we try to optimize these numbers to find the right ones. There is one difference though. First, in my example, I wrote down some qualities like Thickness, Spiciness, etc. In reality, we don’t even know what these qualities are! The model just finds them by clustering the users and the pizzas they like. We just specify how many qualities (we call them latent factors here) we want to predict.
There might be an obvious question lingering on your mind right now: how can we know how far from the ground truth our predictions are so we optimize them using SGD?
The Loss Function:
Going back to our guesses, let’s convert them to arrays:
- Pizza number 5: [2, 4.5, 3.5]
- Person number 12: [1, 4, 4]
Now if we take the difference of these two arrays, the higher the result means that this pizza matches with this user’s taste, and vice versa.
After optimizing this, we can recommend users pizzas they have not tried based on their previous ratings (their taste profile). Wait, doesn’t that sound familiar? (Spotify?)
The MovieLens 25M Database
Now let’s apply the same thing to a real-world example. We have a dataset consisting of 25 million records, but instead of pizzas, we have movies and the ratings for each movie by many users.
We want to recommend users movies that we guess they’ll like based on their previous ratings. That’s what we’re doing in this notebook.
Loading the Dataset
path = '/kaggle/input/movielens-25m-dataset/ml-25m/ratings.csv'
df = pd.read_csv(
path,
names=['user', 'movie', 'rating', 'timestamp'],
header=None,
usecols=(0,1,2,3),
low_memory=False
).drop(0) # 0 is the header, so we don't want that
Number of records: 25,000,095
Data Preview
Our data is very similar to the pizza examples: user ID, movie ID, and the rating.
Appending Movie Titles
moviesPath = '/kaggle/input/movielens-25m-dataset/ml-25m/movies.csv'
movies = pd.read_csv(moviesPath, usecols=(0,1), names=('movie','title'))
df = df.merge(movies)
Training the Model
Now we convert the ratings to numeric values (they’re strings by default).
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
I’m not sure if it’s necessary, but I wanted to make sure we’re using GPU if available:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Creating the dataloaders object with batch size of 4096:
dls = CollabDataLoaders.from_df(df, item_name='title', bs=4096, device=device, valid_pct=0.1)
Remember the pizza example?
In reality, we don’t even know what these qualities are! The model just finds them by clustering the users and the pizzas they like. We just specify how many qualities (we call them latent factors here) we want to predict.
Here it is—we want to predict 100 factors for each user and each movie:
learn = collab_learner(dls, n_factors=100, y_range=(0.5, 5.5)).to_fp16()
Setting up callbacks for saving best model and early stopping:
callbacks = [
SaveModelCallback(monitor='valid_loss', fname='best'),
EarlyStoppingCallback(monitor='valid_loss', patience=2)
]
Learning Rate Finder

learn.lr_find(suggest_funcs=(slide, valley))
Suggested LRs: slide=0.0331, valley=0.0109
Training
Great, now let’s train!
learn.fit_one_cycle(2, 0.003, wd=1e-3, cbs=callbacks)
In my case, we ran two epochs and it took 4 minutes and 21 seconds.
Getting Recommendations
If you’re familiar with machine learning a bit, you might know what a bias is. FastAI adds a bias to each rating to account for users who always rate low, for example.
Now, if we sort our movies by their bias, we can find out which movies are the best regardless of the bias (For example, Planet Earth is a documentary, but even people who don’t like documentaries like it).
Top Recommended Movies:
- Planet Earth II (2016)
- Planet Earth (2006)
- Band of Brothers (2001)
- Shawshank Redemption, The (1994)
- Cosmos: A Spacetime Odyssey
Lowest Rated Movies:
- SuperBabies: Baby Geniuses 2 (2004)
- Glitter (2001)
- From Justin to Kelly (2003)
- Son of the Mask (2005)
- Barney’s Great Adventure (1998)
Conclusion:
It’s been super fun to work on this project. I should really thank Jeremy Howard and the team behind their course and FastAI itself, who made my learning so enjoyable. I hope I could contribute something useful to you who are reading this as well. Let me know about your feedback in the comments section.
Leave a Reply