CASE STUDY

Film Mood Concierge

Designing an Emotion-Aware Retrieval System for the Present Moment

We’ve taught machines to predict what we’ll click — but not how we feel when we arrive.


You open a streaming app knowing what you want in spirit — something witty, atmospheric, escapist, tense, romantic, or quietly absorbing — and still find yourself scrolling. Not because you lack taste, but because today’s systems were never designed to accept emotional intent as input.


The Film Mood Concierge began as an experiment in a different kind of personalization: one that treats mood as a faster, more precise interface than genre, history, or algorithmic guesswork.


Instead of asking users to browse categories or decode recommendations, it invites them to describe how they want to feel — and responds with films that match.


This project explores what happens when emotion becomes a design primitive rather than a side effect.





Systems That Understand Attention — Not Intent


Modern streaming platforms excel at scale and prediction. Netflix, HBO, and others are deeply optimized around engagement — but surprisingly blunt when it comes to intent.


There is no way to say:

— “I want something sharp and dialogue-driven.”

— “I’m in the mood for something dreamy and transportive.”

— “I want tension, but not violence.”


Instead, users navigate proxies: genres, trending rows, algorithmic blends of past behavior. These tools assume that what you watched before accurately represents what you want now.


The result isn’t just emotional mismatch — it’s friction.


Across conversations with Gen Z and Millennial users, one desire came up repeatedly: “Why can’t I just tell it the vibe?”


The Film Mood Concierge is not about slowing people down. It’s about meeting them faster and seeing them more clearly — with fewer clicks, less scanning, and greater precision.


While this prototype lives in cinema, the underlying model applies anywhere intent is emotional, contextual, and unrecognizable by the systems meant to serve it.






Goal

Build a cinematic MVP that translates emotional language into highly aligned film results — reducing discovery friction while preserving taste and intentionality.


Success Criteria

— A fully functional end-to-end prototype (input → results)

— Emotional resonance over optimization

— Trust through restraint

— Credible interpretation of emotional intent without over-explanation


Hypotheses

— Users can express emotional intent more precisely than genre or historical preference.

— Semantic similarity alone is insufficient for emotional alignment; emotional context must be modeled explicitly.

— Fewer, intentionally chosen results increase trust more than exhaustive recall.

— Editorial framing improves confidence by signaling judgment rather than optimization.




















Emotion as Signal, Not Sentiment


Emotion is often framed as soft or subjective. In practice, it’s one of the richest sources of intent.


Neuroscience shows that emotion directs decision-making. Without it, people struggle to choose — even when information is abundant. In that sense, emotion isn’t opposed to intelligence; it enables it.


The Film Mood Concierge treats emotional language as high-value input. The system doesn’t guess. It listens, interprets, and responds — without explaining itself or demanding engagement.





From Semantic Retrieval to Curated Emotional Judgment


The Film Mood Concierge architecture evolved through multiple phases as real usage exposed the limits of simpler approaches. Each iteration surfaced a different failure mode — and clarified what emotional alignment actually requires in practice.


Rather than collapsing these learnings into a single “final system,” this project intentionally documents the progression. The final architecture reflects earned judgment, not theoretical

correctness.


Phase 1 — Semantic Retrieval (Embeddings-Only)


The initial system relied on semantic retrieval. Each film was embedded using a combination of tags and synopsis text, and user queries were converted into embeddings to retrieve nearest neighbors via vector search.


This approach worked well for broad semantic relevance. Queries like “wistful rainy day vibes” reliably surfaced films clustered around romance, introspection, or atmosphere.


However, a critical flaw emerged during early quality audits: semantic similarity does not guarantee emotional correctness. Films such as Nightcrawler could surface for “happy” queries — not because the system was broken, but because it understood meaning, not human intent.


Insight: Understanding what a film is about is not the same as understanding how it feels to watch — or when it should be offered.


Phase 2 — Emotional Metadata + Mood Profiles


To address this gap, the catalog was augmented with explicit emotional metadata. Each film was tagged with a small set of properties describing the viewing experience itself — including

intensity, primary emotional tone, aftereffect, and experiential qualities.


In parallel, the system introduced 12 canonical mood profiles, mapping user language into structured emotional intent. Lightweight rule-based routing ensured that films violating obvious emotional boundaries were excluded.


This dramatically improved emotional safety and correctness. “Hopeful” no longer surfaced destabilizing films; “calm” respected low intensity and gentle aftereffects.


But a new issue surfaced: the system became too safe. Closely related moods (e.g., calm vs. relaxed) produced nearly identical results. The experience felt flattened — technically correct, but editorially dull.


Insight: Emotional correctness without discrimination produces predictability. The product began to resemble a lookup table rather than a curator.


Phase 3 — Deterministic Scoring, Heuristics, and Role Differentiation


The next iteration introduced structured differentiation within emotionally valid candidates.


Films were grouped into three conceptual roles:


— Anchor — the most canonical emotional fit

— Adjacent — same mood, different texture

— Edge — bolder, but still emotionally correct


A deterministic scoring layer ranked candidates based on alignment with emotional metadata, while explicit heuristics (phrase matching, token cues, and boolean signals) resolved intent before embeddings were reintroduced.


Unlike Phase 1, embeddings now mapped back to the constrained mood system — dramatically reducing failure paths while preserving nuance.


This phase delivered the most emotionally accurate results to date. However, scale revealed another limitation: repetition. With a curated catalog of ~250 films, the system consistently surfaced the same “dominant” matches. Despite dozens of valid candidates, discovery collapsed into gravity wells.


Insight: Accuracy alone creates stagnation. Discovery systems require controlled entropy.


Phase 4 — Curated Judgment via a Constrained LLM Reranker


The final architectural layer introduces a constrained LLM judge — not as a generator, but as a selector.


The judge operates only on a pre-filtered, emotionally valid candidate set. It cannot invent films, override guardrails, or rely on popularity signals. Its role is to exercise judgment where deterministic logic plateaus: distinguishing between near-equals, assigning roles (Anchor / Adjacent / Edge), and introducing freshness without breaking trust.


This hybrid approach avoids the known failure modes of LLM-only systems — hallucination, cultural averaging, and tonal drift — while solving the rigidity of purely deterministic pipelines.


Key distinction:

The LLM does not decide what is right.

It decides which right answer to serve.


Final System Behavior


The resulting architecture combines:

— Deterministic guardrails for emotional safety

— Lightweight scoring for stability and speed

— A judgment layer for nuance, variety, and editorial texture


Together, these layers transform retrieval into curation. The system behaves less like a recommendation engine — and more like a human concierge with rules, memory, and restraint.








































When the Tool Stops Serving the Product


I initially assumed Framer could support both UI and logic, based on its success with a prior MVP. That assumption introduced friction quickly.


After multiple failed attempts with CMS filtering and unstable overrides, it became clear the tool was constraining the product’s core interaction. I pivoted to a headless architecture: Framer as

presentation, Supabase as logic and data.


The shift restored momentum and preserved design quality. The lesson was straightforward: strong products require strong boundaries between form and function.





Designing for Momentum Over Purity


To validate the experience quickly:


— Film metadata was made public-read

— Framer queried Supabase directly

— JWT verification was temporarily disabled


Because the data was static and non-personal, the risk was minimal. In return, iteration speed increased and latency decreased. These decisions were intentional, documented, and scoped for future hardening.
















Why This Feels Different


This is not a utility interface — it’s an editorial one.


— Radial spotlight gradients and film-grain overlays establish tone

— Typography and spacing prioritize legibility and calm

— Long descriptions scroll smoothly across devices

— Rankings are hidden to avoid false precision

— Empty states guide without overwhelming


The goal was confidence, not stimulation. Users should feel oriented, not marketed to.





















FMC Functionality


— Fully functional end-to-end prototype delivered in ~35–40 focused build hours, including ~10 hours of architectural iteration and judgment-layer design

— Curated film catalog expanded from 20 → 250 titles

— Responsive experience across desktop, tablet, and mobile

— Sub-5s latency for mood-to-results flow

— Emotion-aware retrieval with deterministic guardrails

Judgment-based selection layer producing differentiated Anchor / Adjacent / Edge results

— Cinematic UI polish consistent across devices


Early user feedback consistently pointed to the same outcome: faster decisions, stronger emotional alignment, and reduced browsing fatigue. Users reported enjoying the act of re-querying different moods — not to browse endlessly, but to explore how the system interpreted subtle shifts in intent.


What registered as most compelling was not volume or novelty, but confidence. The system returned fewer results, chosen with visible care. The experience felt intentional rather than optimized — less like an algorithm guessing, more like a concierge responding.





When “Relevant” Still Feels Wrong

Once the system was functional end-to-end, I ran a structured quality audit across a range of emotional queries — including happy, calm, anxious, focused, sad, and angry. At this stage, the

architecture was already semantically robust and emotionally constrained.


And yet, something was still off.


The system was technically correct — but occasionally emotionally untrustworthy. Certain results matched the language of the query while missing the lived reality of the moment. A thriller could appear for “happy.” A heavy, destabilizing film could surface for “contemplative.” None of these were bugs in the traditional sense.


They revealed a deeper issue: relevance is not the same as appropriateness.


The Difference Between Matching and Judgment


Early phases of the system optimized for correctness:

— semantic similarity

— emotional boundary enforcement

— rule-based safety


Those layers were necessary — but insufficient.


What they could not resolve were near-neighbor decisions: moments where many films were emotionally valid, yet only a few felt right to offer now. Without a mechanism for judgment, the

system defaulted to repetition. Certain films became gravitational centers — repeatedly selected not because they were uniquely correct, but because they were safely dominant.


This created a subtle but important failure mode:

— the catalog was deep

— the experience felt shallow


Insight: A discovery system that never risks variation eventually signals a lack of taste.


Trust, Entropy, and Editorial Responsibility


Emotional trust is fragile. One emotionally wrong recommendation can collapse confidence instantly — especially in moments when users are seeking comfort, steadiness, or reassurance.


At the same time, a system that never explores feels inert.


This tension clarified the true editorial responsibility of the product:

— too much freedom introduces emotional risk

— too much constraint erodes discovery


The system needed a way to introduce controlled entropy —variation without volatility, freshness without betrayal.


This reframed the problem entirely. Taste was no longer a ranking issue. It was a judgment problem.


Why Judgment Couldn’t Be Deterministic Alone


Deterministic logic excels at boundaries:

— what must never be shown

— what clearly fits

— what must be excluded


It breaks down when choices are all “acceptable.”


At that point, selection becomes contextual, comparative, and interpretive. The system must decide:

— which film should anchor the moment

— which should offer an adjacent texture

— which can gently stretch the user without destabilizing them


Without this layer, results stagnate. With too much freedom, trust erodes.


This insight directly motivated the introduction of a constrained judgment layer — not to generate content, but to curate among emotionally valid options with intention.


Editorial Integrity as a System Property


After this shift, Film Mood Concierge stopped behaving like a semantic search tool and began behaving like a concierge.


The system no longer asked:

“Which films match?”


It asked:

“Which of these is the right one to offer now?”


Editorial integrity emerged not from a single rule or model, but from the interaction of:

— emotional constraints

— deterministic stability

— and judgment under limits


This balance — between safety and exploration — became the defining quality of the product.





Designing Film Mood Concierge clarified a deeper insight behind the product: emotion isn’t a feature to optimize around — it’s an operating condition.


The system didn’t fall short because it misunderstood films. It failed when it misunderstood people in specific moments. Semantic relevance could explain what a film was about, but not when it should be offered. When a product returns something emotionally wrong — a psychologically dark thriller for “happy,” or a destabilizing film for “calm” — trust collapses immediately.


This isn’t a cinema-specific problem. Any AI product that interfaces with human emotion — whether in media, enterprise, health, or creativity — faces the same risk. Without an explicit emotional operating system, systems can be technically correct and still fundamentally unreliable.


By treating mood as a legitimate interface and modeling emotional context directly, Film Mood Concierge demonstrated that fewer options, chosen with care, outperform infinite choice. Restraint wasn’t a limitation — it became the product.





From Browsing to Alignment


The Film Mood Concierge isn't about movies.


It's a proof of concept for a different kind of system — one that treats emotional context as a first-class input rather than a signal to smooth over.


Every failure mode this architecture encountered pointed to the same root cause: systems that understand language but not moment. That can match meaning but not read readiness. That optimize for relevance while remaining blind to appropriateness.


The solution wasn't more intelligence. It was better constraints, applied in the right sequence, with judgment reserved for the decisions that logic cannot resolve.


Restraint wasn't a limitation. It became the architecture.


As AI systems converge on similar capabilities, the differentiation will come from exactly this: the ability to interpret nuance without noise, to introduce variation without breaking trust, to know not just what fits — but what fits now.


Emotion isn't an edge case. It's the operating condition everything else runs on.


When Technology Learns to feel

Context & Cultural Gap

Screenshot from The Film Mood Concierge

Phase 1 System Architecture Wireframe

Phase 4 System Architecture Wireframe

Film Mood Concierge Screenshot

Goals & Hypothesis

Product Philosophy

Approach & Architecture

An Early Misstep — and a Necessary Pivot

MVP Trade-off

UX & Cinematic Design Decisions

What Shipped

Quality & Editorial Integrity

Reflection & Learning

Closing Signal

Framer vs. Headless Architecture Wireframe

Closing Signal