Creating Query & Candidate Hydrators
Overview of Hydration
In the X For You algorithm, Hydration is the process of enriching "thin" objects (like a User ID or a Post ID) with the full context required by the Phoenix ranking model. Because the Phoenix model is based on the Grok transformer architecture, it requires detailed sequences of user actions and high-dimensional embeddings to make accurate engagement predictions.
Hydration is split into two distinct phases:
- Query Hydration: Enriching the request with user-specific data (e.g., recent likes, follows, and account settings).
- Candidate Hydration: Enriching the retrieved posts with content features and author information.
Implementing a Query Hydrator
Query hydrators run at the start of the Home Mixer pipeline. They transform a raw request into a "Hydrated Query" that contains the user's engagement history.
1. Define the Hydrator Logic
Create a new module in home-mixer/src/query_hydrators/ that implements the hydration trait. Your hydrator should fetch data from internal services (like Strato or a Key-Value store).
use async_trait::async_trait;
use xai_home_mixer_proto::UserContext;
#[async_trait]
pub trait QueryHydrator {
async fn hydrate(&self, user_id: i64) -> anyhow::Result<UserContext>;
}
pub struct UserActionHydrator {
// Clients for internal services
pub action_store_client: ActionStoreClient,
}
#[async_trait]
impl QueryHydrator for UserActionHydrator {
async fn hydrate(&self, user_id: i64) -> anyhow::Result<UserContext> {
// Fetch the last 200 actions (likes, replies, etc.) for the user
let actions = self.action_store_client.get_recent_actions(user_id, 200).await?;
Ok(UserContext {
user_id,
action_sequence: actions,
// Additional features used by the Phoenix Transformer
})
}
}
2. Register with Home Mixer
Open home-mixer/lib.rs and ensure your hydrator is initialized within the HomeMixerServer.
Implementing a Candidate Hydrator
Candidate hydrators are executed in parallel for every post returned by the retrieval stages (Thunder and Phoenix Retrieval).
1. Create the Hydrator
Candidate hydrators focus on fetching post-level metadata, such as the author's ID, media types, and engagement counts.
use xai_candidate_pipeline::hydrator::CandidateHydrator;
use xai_home_mixer_proto::CandidatePost;
pub struct PostFeatureHydrator {
pub feature_repo: Arc<FeatureRepository>,
}
#[async_trait]
impl CandidateHydrator<CandidatePost> for PostFeatureHydrator {
async fn hydrate(&self, candidates: Vec<CandidatePost>) -> Vec<CandidatePost> {
// Batch fetch features to minimize RPC overhead
let post_ids: Vec<i64> = candidates.iter().map(|c| c.id).collect();
let features = self.feature_repo.batch_get_features(post_ids).await;
candidates.into_iter().map(|mut c| {
if let Some(f) = features.get(&c.id) {
c.author_id = f.author_id;
c.has_video = f.has_video;
}
c
}).collect()
}
}
2. Batching and Concurrency
To maintain low latency, the CandidatePipeline automatically handles the distribution of candidates across your hydrators.
- Batching: Always implement batch-lookup methods in your clients.
- Timeouts: Set strict timeouts for hydration to ensure a slow metadata service doesn't block the entire feed.
Mapping Hydrated Data to Phoenix
The data you fetch during hydration must eventually map to the RecsysBatch used by the JAX/Haiku model. The Phoenix model expects the following keys to be populated during the hydration phase:
| Hydrator Type | Phoenix Feature | Description |
| :--- | :--- | :--- |
| Query | user_hashes | Unique identifiers for the requesting user. |
| Query | history_actions | The sequence of actions (Like, RT, Click). |
| Candidate | candidate_post_hashes | Unique hashes for the post content. |
| Candidate | candidate_author_hashes| Unique hashes for the post author. |
Example: Embedding Lookup
The Phoenix model uses RecsysEmbeddings to process the hydrated hashes. Ensure your hydrator provides the exact IDs used in the model's hash tables:
// Inside a Candidate Hydrator
let post_hash = calculate_hash(post.id);
// This hash is used by Phoenix to look up embeddings in the transformer
candidate.post_hash = post_hash;
Best Practices
- Fail Open: If a hydrator fails (e.g., a timeout), the pipeline should continue with default features rather than returning an error to the user.
- Deduplication: The
HomeMixerorchestration layer automatically deduplicates candidates from different sources before hydration. Do not implement deduplication logic within the hydrator itself. - Observability: Use the
metricsmodule inthunderorhome-mixerto track the latency of each hydration step. The Phoenix model is computationally expensive, so saving time in hydration is critical for overall performance.