Updated: June 17, 2026
7 min read

Ocean4Rec: Offline LLM-Derived OCEAN Profiles for Request-Time VOD Reranking

Direct Answer

Ocean4Rec introduces an offline‑only reranking layer that converts video‑on‑demand (VOD) catalog metadata into OCEAN personality scores using a large language model (LLM). By pre‑computing these five‑dimensional profiles for items and users, the system eliminates per‑request LLM calls while still delivering a measurable lift in ranking quality.

Background: Why This Problem Is Hard

Streaming platforms face a relentless tension between recommendation relevance and operational latency. Modern VOD services serve millions of concurrent users, each request demanding sub‑second response times. Traditional LLM‑as‑reranker pipelines exacerbate this tension because they repeat a costly sequence of steps for every impression:

Dynamic prompt construction that extracts item metadata.
Token‑level generation of a textual justification.
Model inference on a GPU or specialized accelerator.
Parsing of the LLM’s output back into a numeric score.
Fallback logic for malformed responses.

These operations inflate tail latency, complicate capacity planning, and increase cloud spend. Moreover, the “black‑box” nature of on‑the‑fly LLM reasoning makes it difficult to audit, debug, or comply with emerging AI governance standards. Existing industry workarounds—such as caching LLM outputs or limiting reranking depth—only partially mitigate the problem and often sacrifice personalization depth.

What the Researchers Propose

The Ocean4Rec framework reframes content‑taste modeling as a static, offline enrichment task. Instead of invoking an LLM at request time, the authors run a single, large‑scale LLM pass over the entire VOD catalog to extract five personality dimensions—Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (collectively known as OCEAN). Each video receives a numeric vector in this space, effectively turning abstract content descriptors into a psychometric fingerprint.

Simultaneously, user profiles are built by aggregating the OCEAN vectors of recently consumed items, applying a time‑decay function that emphasizes fresh interactions. The final reranking step simply joins four pre‑computed tables:

Base recommender scores (e.g., collaborative‑filtering outputs).
Item OCEAN vectors.
User OCEAN aggregates.
Catalog recency metadata.

The resulting numeric blend replaces the LLM call entirely, preserving the expressive power of personality‑based similarity while delivering deterministic, low‑latency inference.

How It Works in Practice

The Ocean4Rec pipeline consists of three logical stages: Offline Enrichment, Profile Aggregation, and Real‑Time Reranking.

1. Offline Enrichment

During a scheduled batch job, the system feeds each video’s metadata—titles, descriptions, genre tags, and any available transcripts—into a pre‑trained LLM. The prompt asks the model to rate the content on the five OCEAN traits using a standardized scale (e.g., 0–100). The LLM’s textual output is parsed into a five‑element numeric vector and stored in a high‑throughput key‑value store.

2. Profile Aggregation

Every time a user watches or deep‑links to a video, the platform logs the interaction with a timestamp. A background service reads the associated item OCEAN vector, applies an exponential decay based on the interaction age, and updates the user’s aggregate OCEAN profile. This aggregation runs in near‑real‑time but remains decoupled from the request path, ensuring that the user vector is always ready for the next recommendation cycle.

3. Real‑Time Reranking

When a request arrives, the serving layer retrieves the top‑K candidates from the existing collaborative‑filtering or graph‑based model (e.g., LightGCN or Neural Collaborative Filtering). It then performs a lightweight join:

Merge each candidate’s OCEAN vector with the user’s aggregated OCEAN profile.
Compute a similarity score (e.g., cosine similarity) between the two vectors.
Blend this similarity with the original recommender score and a recency factor using a linear or learned weighting scheme.
Sort the candidates by the blended score and return the final ranked list.

Because all operations involve simple arithmetic on pre‑cached numbers, the entire reranking step completes in microseconds, well within the latency budget of high‑traffic VOD services.

Ocean4Rec workflow diagram

Evaluation & Results

The authors validated Ocean4Rec on anonymized Samsung Smart TV logs, using a temporal hold‑out split that mirrors production traffic. Two base generators were tested:

NCF (Neural Collaborative Filtering) – a dense, multi‑layer perceptron model.
LightGCN – a graph‑convolutional approach optimized for sparse interaction data.

Both generators were first combined with a simple recency reordering (the strongest industrial baseline). Ocean4Rec then added the offline OCEAN features and performed numeric reranking.

Key Findings

NCF + Ocean4Rec improved NDCG@20 by 7.6% over the Base+Recency baseline, indicating a modest but consistent lift in ranking quality.
LightGCN + Ocean4Rec achieved a 61.5% increase in NDCG@20, demonstrating that sparse graph models benefit dramatically from the auxiliary OCEAN signal.
Hit Rate@20 (HR@20) remained statistically unchanged for NCF (reflecting the already strong baseline) but rose by 67.3% for LightGCN, underscoring the importance of content‑taste features when exact‑item replay labels are scarce.
All improvements were obtained without any additional per‑request LLM inference, preserving the original system’s latency profile.

These results suggest that offline‑derived psychometric embeddings act as a bounded, yet powerful, auxiliary feature that can be safely layered onto existing recommenders. The authors emphasize that the evaluation is offline replay; however, the magnitude of the lift, especially for LightGCN, provides strong evidence for real‑world impact.

For a deeper dive into the methodology and raw numbers, see the Ocean4Rec paper.

Why This Matters for AI Systems and Agents

Streaming platforms constantly juggle three competing goals: personalization depth, operational scalability, and compliance with emerging AI transparency standards. Ocean4Rec addresses each of these pillars:

Scalable personalization – By moving the heavy LLM computation offline, the system scales linearly with catalog size rather than request volume, enabling millions of concurrent users without additional GPU capacity.
Deterministic latency – Numeric joins and vector similarity calculations are predictable, making it easier to meet strict Service Level Agreements (SLAs) and to provision resources accurately.
Explainability – OCEAN scores provide an interpretable lens into why a video matches a user’s taste (e.g., “high Openness aligns with the user’s recent exploratory viewing”). This aligns with growing regulatory expectations for AI transparency.
Agent‑centric design – For AI agents that orchestrate content discovery (e.g., voice assistants recommending shows), the OCEAN vectors can be directly exposed as attributes, allowing agents to reason about “personality‑matched” suggestions without invoking a separate LLM.

Practitioners can integrate Ocean4Rec into existing pipelines using standard data‑engineering tools. For teams already leveraging the Enterprise AI platform by UBOS, the offline enrichment stage can be orchestrated as a scheduled job, while the real‑time join fits naturally into the platform’s low‑latency serving layer.

What Comes Next

While Ocean4Rec demonstrates a compelling trade‑off, several open challenges remain:

Dynamic content updates – New releases or user‑generated metadata require re‑running the LLM enrichment. Incremental update strategies or lightweight fine‑tuning could reduce batch latency.
Cross‑modal extensions – Incorporating audio, visual embeddings, or subtitle sentiment could enrich the OCEAN vectors beyond textual metadata.
Personalization of the OCEAN mapping – Different user segments may interpret personality traits differently; adaptive prompting or multi‑task LLMs could tailor the scoring function per demographic.
Online learning loops – Feeding real‑time interaction signals back into the OCEAN profiles (e.g., via reinforcement learning) could further close the gap between offline enrichment and evolving user tastes.

Future research may also explore hybrid architectures where a lightweight LLM runs at the edge for “cold‑start” items, while the bulk of the catalog relies on pre‑computed OCEAN scores. Such a design would preserve the low‑latency guarantee for the majority of traffic while still handling novel content gracefully.

For teams interested in prototyping these ideas, the Workflow automation studio offers a visual canvas to chain batch LLM jobs, profile aggregation, and real‑time serving components without writing extensive glue code.

Conclusion

Ocean4Rec reimagines the role of large language models in VOD recommendation by shifting their heavy lifting to an offline phase and distilling content into psychometric OCEAN vectors. The approach delivers substantial ranking gains—especially for graph‑based recommenders—while preserving the low‑latency, high‑throughput characteristics essential for production streaming services. As the industry continues to balance AI sophistication with operational pragmatism, frameworks like Ocean4Rec illustrate a viable path forward: harness the expressive power of LLMs without sacrificing scalability or explainability.

Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

Ocean4Rec: Offline LLM-Derived OCEAN Profiles for Request-Time VOD Reranking

Direct Answer

Background: Why This Problem Is Hard

What the Researchers Propose

How It Works in Practice

1. Offline Enrichment

2. Profile Aggregation

3. Real‑Time Reranking

Evaluation & Results

Key Findings

Why This Matters for AI Systems and Agents

What Comes Next

Conclusion

Carlos

Multi-language AI Translator

AI Chat Bot: Text, Voice, and Video Magic

Sarcastic AI Chat Bot

Service ERP

Calculate Time Complexity with ChatGPT API

Python Bug Fixer

Sign up for our newsletter

Direct Answer

Background: Why This Problem Is Hard

What the Researchers Propose

How It Works in Practice

1. Offline Enrichment

2. Profile Aggregation

3. Real‑Time Reranking

Evaluation & Results

Key Findings

Why This Matters for AI Systems and Agents

What Comes Next

Conclusion

Share

Carlos

Sign up for our newsletter

Sign In

Register

Reset Password