Data sources
Every film in the archive is built from three public sources, each credited where its data appears.
- TMDB (The Movie Database) — the backbone. Provides film titles, overviews, genres, keywords, release dates, runtimes, posters, budget/revenue, and a community-driven vote average (TMDB vote count × vote average). This site is not endorsed or certified by TMDB, but it uses the TMDB API. https://www.themoviedb.org.
- IMDb Non-Commercial Datasets — audience ratings and cast/crew. We pull
title.ratings,title.principals, andname.basicsfrom the IMDb non-commercial data export. This site is not for commercial use. https://developer.imdb.com/non-commercial-datasets/. - Letterboxd — indie-community rating signal (where available). Community averages reflect an audience that skews toward cinephile and indie taste. https://letterboxd.com.
How we filter films
We start from a snapshot of 1.4 million film records. We retain rows where:
- Release year is 2015–2025 and
- Genre is Comedy, Drama, or Romance and
- Keywords include at least one of: quarantine, pandemic, covid, coronavirus, lockdown, isolation, breakup, break-up, divorce, long distance, relationship, couple, indie, roommate, apartment.
That rule set produces 347 films in the current archive.
How we join data
TMDB rows include an imdb_id field. We left-join IMDb ratings on imdb_id, and IMDb principals (director + top-billed cast) on the same key. Letterboxd ratings, where available, are joined on (title, release_year) — an imperfect match that we accept for the thematic signal it adds.
How similarity is scored
Each film’s thematic match score against The End of Us is computed as:
score = 100 × (0.4 × Jaccard(genres) + 0.6 × Jaccard(keywords))
Keywords are weighted higher than genres because two films can share a genre and feel nothing alike. A 100% score is reserved for the subject film itself; real matches cluster in the 20–50% range.
How often we refresh
Film metadata is frozen in a weekly sync. The End of Us (2021) is a closed work — its metadata will not change — so the archive is stable by design. If you spot a data error, tell us.