AN

Alex Ndungu

CTO + Software Engineer + ML Engineer

Let's talk
HomeAboutExperienceProjectsSkillsContact
Let's talk
HomeAboutExperienceProjectsSkillsContact

Alex Ndungu

Backend systems, machine learning retrieval, and clean product-minded engineering for teams that care about reliability.

GitHubLinkedInLeetCodealexmeta517@gmail.com
Content-Based Recommendation Engine

Movie Recommendation System

A 3-stage content-based recommendation pipeline — metadata extraction, vector representation, and cosine similarity scoring — that generates explainable suggestions with no user interaction data required.

3

Pipeline stages

Content-based

No user data needed

Explainable

Traceable recommendations

Problem statement

Recommendation systems need a meaningful content representation before similarity becomes useful, especially when user interaction data is unavailable. The challenge is in the feature engineering: what you extract and how you represent it determines recommendation quality more than the similarity algorithm.

Architecture breakdown

I built a 3-stage pipeline that cleans and extracts metadata features, transforms them into comparable vector representations per title, then applies cosine similarity to identify nearest neighbors — keeping every step explicit and inspectable.

  • - 3-stage pipeline: metadata cleaning + feature extraction → vector representation → cosine similarity scoring
  • - Content-based approach requiring zero user interaction data — recommendations driven purely by metadata signals
  • - Explainable output: every suggestion traceable to shared feature vectors rather than opaque collaborative filtering
  • - Lightweight, iteration-friendly design showing feature engineering as the core recommendation lever

Tech stack explanation

PythonPandasscikit-learnFeature engineeringSimilarity modeling

System diagram

[ Movie Metadata ]
      |
      v
[ Feature Extraction ]
      |
      v
[ Vector Representation ]
      |
      v
[ Similarity Engine ]
      |
      v
[ Recommended Movies ]

Key challenges

A content-based recommender that processes movie metadata through 3 explicit pipeline stages: feature extraction, vector representation, and similarity scoring. The design prioritizes explainability — every suggestion is traceable to specific shared metadata signals rather than opaque collaborative filtering.

  • - Demonstrated recommender system design as a 3-stage pipeline problem, not just a model-fitting exercise.
  • - Built a content-based system that requires no user data, showing recommender fundamentals without collaborative filtering assumptions.
  • - Made every recommendation step explicit and inspectable, prioritizing explainability over black-box accuracy.

What I learned

Interpretability is a strength when explaining recommendation quality.
Feature representation is the real engine of content-based systems.
Simple recommenders can still feel strong when the pipeline is carefully designed.