The Candidate Pipeline Lifecycle
The Candidate Pipeline Lifecycle
The Candidate Pipeline is the heart of the "For You" feed, responsible for transforming a massive global corpus of posts into a personalized, ranked list for a specific user. This guide walks you through the lifecycle of a post as it moves from raw data to a ranked candidate in your feed.
The process is orchestrated by the Home Mixer and follows four distinct phases: Retrieval, Hydration, Scoring, and Selection.
Step 1: Retrieval (Finding Potential Candidates)
The lifecycle begins with gathering a "pool" of potential posts from two distinct sources. This reduces the search space from millions of posts down to a few thousand.
- In-Network (Thunder): The system queries the
Thunderservice to retrieve recent posts from accounts the user follows. It uses theGetInNetworkPostsgRPC call to fetch these "LightPosts." - Out-of-Network (Phoenix Retrieval): This identifies relevant posts from people the user doesn't follow. It uses a Two-Tower Model:
- User Tower: Encodes the user's history and actions into a normalized vector.
- Candidate Tower: Projects global posts and their authors into the same embedding space.
- ANN Search: The system performs an Approximate Nearest Neighbor search (dot product similarity) to find posts that align with the user's representation.
# Conceptual Retrieval Logic
user_representation = user_tower(user_history_hashes)
top_k_posts = ann_index.query(user_representation, k=1000)
Step 2: Hydration (Enriching the Candidates)
Raw post IDs aren't enough for high-precision ranking. The Hydrator phase enriches candidates with the metadata required by the Grok-based transformer.
- Feature Gathering: The system fetches author information, post content metadata, and the user's recent action sequence (likes, replies, shares).
- Hashing: To eliminate hand-engineered features, everything is converted into hashes. The
RecsysBatchobject is constructed containing:user_hashes: Unique identifiers for the user.history_post_hashes/history_author_hashes: The last $N$ posts the user interacted with.candidate_post_hashes/candidate_author_hashes: The hashes for the post currently being evaluated.history_actions: The specific types of engagement (like, reply, etc.) associated with the history.
Step 3: Scoring (Predicting Engagement)
Once hydrated, candidates are passed to Phoenix, the Grok-based ranking model. This stage predicts the probability of the user engaging with each post.
- Embedding Lookup: Hashes are converted into dense embeddings using
RecsysEmbeddings. - Transformer Processing: The model uses a Grok-1 based transformer architecture. Unlike standard NLP, it uses a specialized attention mask:
- User/History positions use causal attention (attending only to the past).
- Candidate positions attend to the entire user history to understand context, but do not attend to other candidates in the same batch.
- Logit Generation: The model outputs a set of logits representing various engagement types (e.g., probability of a Like, probability of a Retweet).
# The scoring model combines history and candidate data
output = phoenix_ranker(
embeddings=recsys_embeddings,
batch=recsys_batch,
mask=recsys_attn_mask
)
# Final Score = weighted_sum(output.logits)
Step 4: Filtering and Selection
The final phase transforms scores into the actual feed the user sees.
- Scoring & Weighting: A
Scorerapplies business logic weights to the model's raw engagement probabilities. For example, a "Reply" might be weighted more heavily than a "Like." - Filtering: The
Filtermodule removes content that shouldn't be shown, such as:- Duplicate posts or near-duplicate media.
- Blocked or muted content.
- Content already seen by the user in recent sessions.
- Selection: The
Selectortakes the top $N$ filtered results and prepares the finalScoredPostsResponseto be sent to the client.
Summary of the Flow
| Phase | Component | Input | Output |
| :--- | :--- | :--- | :--- |
| Retrieval | Thunder / Phoenix Retrieval | User ID | ~1,000–2,000 Candidate IDs |
| Hydration | Hydrator | Candidate IDs | RecsysBatch (Hashes + Metadata) |
| Scoring | Phoenix Ranker | RecsysBatch | Predicted Engagement Logits |
| Selection | Scorer / Filter | Raw Logits | Final Ranked Feed (Top ~100 posts) |