System Architecture
Architecture Overview
The X "For You" feed is generated through a multi-stage pipeline orchestrated by Home Mixer. The system transitions from a massive corpus of billions of posts down to a final, ranked selection of roughly 100 posts delivered to a user's timeline.
The process follows four main stages:
- Candidate Sourcing: Retrieving the best posts from In-Network and Out-of-Network sources.
- Scoring: Ranking candidates using a Grok-based heavy transformer model.
- Filtering: Applying safety, visibility, and fatigue filters.
- Mixing: Combining posts with other content (like ads or follows) for final delivery.
Step 1: Request Orchestration via Home Mixer
Home Mixer acts as the central conductor for the recommendation engine. Built in Rust, it provides a gRPC interface that receives a request and manages the concurrent calls to various subsystems.
To initiate a feed generation, the system interacts with the ScoredPostsService.
// The Home Mixer orchestration entry point
pub struct HomeMixerServer {
pub service: pb::scored_posts_service_server::ScoredPostsServiceServer<HomeMixerServer>,
}
When a request arrives, Home Mixer performs Query Hydration, gathering the user's recent engagement history and features to pass to the ranking models.
Step 2: Candidate Sourcing (Thunder & Phoenix)
The system retrieves candidates from two primary channels simultaneously to ensure a mix of familiar and discovery-based content.
In-Network Retrieval (Thunder)
Thunder provides real-time access to posts from the accounts a user follows. It maintains a massive in-memory PostStore that is updated via Kafka streams.
- Logic: It fetches the list of people you follow (via the Strato Client) and queries the PostStore for their most recent, relevant posts.
- Performance: Uses a
Semaphoreto limit concurrent requests and maintain low latency.
// Interface for fetching in-network candidates
let request = GetInNetworkPostsRequest {
user_id: 12345,
include_replies: true,
max_results: 500,
..Default::default()
};
Out-of-Network Retrieval (Phoenix)
For content outside your social circle, Phoenix uses a "Two-Tower" Grok-based retrieval model.
- User Tower: Encodes your interests and history into a vector.
- Candidate Tower: Projects global posts into the same vector space.
- ANN Search: Performs an Approximate Nearest Neighbor search to find posts that align with your vector.
Step 3: Global Ranking with the Phoenix Transformer
Once candidates are gathered, they are passed to the Phoenix Ranker. This is where the Grok-based transformer performs heavy-duty inference to predict engagement.
The Input Batch
The ranker takes a RecsysBatch, which includes user history, author information, and the candidate posts themselves. Unlike traditional models, we utilize Hash-based embeddings to represent entities without massive, static lookup tables.
# Example of the data structure passed to the Phoenix Ranker
class RecsysBatch(NamedTuple):
user_hashes: jax.typing.ArrayLike # User identity
history_post_hashes: jax.typing.ArrayLike # What you recently engaged with
history_actions: jax.typing.ArrayLike # Like, Reply, Retweet, etc.
candidate_post_hashes: jax.typing.ArrayLike # The post to be scored
Transformer Ranking
The model applies a specialized attention mask (make_recsys_attn_mask) that allows candidate posts to "attend" to your engagement history. This ensures the model understands the context of why a post might be relevant based on your past behavior.
The output is a set of Logits, which represent the probability of various engagement types (Like, Reply, Retweet, etc.). These are combined into a final score using a weighted formula.
Step 4: Filtering and Pipeline Finalization
The final stage of the architecture ensures the quality and diversity of the feed. The Candidate Pipeline applies several modular filters:
- Visibility Filtering: Removes posts from blocked or muted accounts.
- Integrity Filtering: Filters out low-quality content or "spammy" posts.
- Author Diversity: Ensures your feed isn't dominated by a single account.
- Fatigue Filtering: Prevents you from seeing the same post repeatedly across different sessions.
// Logic within the candidate-pipeline for final refinement
pub mod filter {
pub trait Filter {
fn evaluate(&self, post: &ScoredPost) -> bool;
}
}
Once filtered, the results are returned to Home Mixer, which packages the top-ranked posts into the final response for the client.