Creating Query & Candidate Hydrators

Overview of Hydration

In the X For You algorithm, Hydration is the process of enriching "thin" objects (like a User ID or a Post ID) with the full context required by the Phoenix ranking model. Because the Phoenix model is based on the Grok transformer architecture, it requires detailed sequences of user actions and high-dimensional embeddings to make accurate engagement predictions.

Hydration is split into two distinct phases:

Query Hydration: Enriching the request with user-specific data (e.g., recent likes, follows, and account settings).
Candidate Hydration: Enriching the retrieved posts with content features and author information.

Implementing a Query Hydrator

Query hydrators run at the start of the Home Mixer pipeline. They transform a raw request into a "Hydrated Query" that contains the user's engagement history.

1. Define the Hydrator Logic

Create a new module in home-mixer/src/query_hydrators/ that implements the hydration trait. Your hydrator should fetch data from internal services (like Strato or a Key-Value store).

use async_trait::async_trait;
use xai_home_mixer_proto::UserContext;

#[async_trait]
pub trait QueryHydrator {
    async fn hydrate(&self, user_id: i64) -> anyhow::Result<UserContext>;
}

pub struct UserActionHydrator {
    // Clients for internal services
    pub action_store_client: ActionStoreClient,
}

#[async_trait]
impl QueryHydrator for UserActionHydrator {
    async fn hydrate(&self, user_id: i64) -> anyhow::Result<UserContext> {
        // Fetch the last 200 actions (likes, replies, etc.) for the user
        let actions = self.action_store_client.get_recent_actions(user_id, 200).await?;
        
        Ok(UserContext {
            user_id,
            action_sequence: actions,
            // Additional features used by the Phoenix Transformer
        })
    }
}

2. Register with Home Mixer

Open home-mixer/lib.rs and ensure your hydrator is initialized within the HomeMixerServer.

Implementing a Candidate Hydrator

Candidate hydrators are executed in parallel for every post returned by the retrieval stages (Thunder and Phoenix Retrieval).

1. Create the Hydrator

Candidate hydrators focus on fetching post-level metadata, such as the author's ID, media types, and engagement counts.

use xai_candidate_pipeline::hydrator::CandidateHydrator;
use xai_home_mixer_proto::CandidatePost;

pub struct PostFeatureHydrator {
    pub feature_repo: Arc<FeatureRepository>,
}

#[async_trait]
impl CandidateHydrator<CandidatePost> for PostFeatureHydrator {
    async fn hydrate(&self, candidates: Vec<CandidatePost>) -> Vec<CandidatePost> {
        // Batch fetch features to minimize RPC overhead
        let post_ids: Vec<i64> = candidates.iter().map(|c| c.id).collect();
        let features = self.feature_repo.batch_get_features(post_ids).await;

        candidates.into_iter().map(|mut c| {
            if let Some(f) = features.get(&c.id) {
                c.author_id = f.author_id;
                c.has_video = f.has_video;
            }
            c
        }).collect()
    }
}

2. Batching and Concurrency

To maintain low latency, the CandidatePipeline automatically handles the distribution of candidates across your hydrators.

Batching: Always implement batch-lookup methods in your clients.
Timeouts: Set strict timeouts for hydration to ensure a slow metadata service doesn't block the entire feed.

Mapping Hydrated Data to Phoenix

The data you fetch during hydration must eventually map to the RecsysBatch used by the JAX/Haiku model. The Phoenix model expects the following keys to be populated during the hydration phase:

Example: Embedding Lookup

The Phoenix model uses RecsysEmbeddings to process the hydrated hashes. Ensure your hydrator provides the exact IDs used in the model's hash tables:

// Inside a Candidate Hydrator
let post_hash = calculate_hash(post.id); 
// This hash is used by Phoenix to look up embeddings in the transformer
candidate.post_hash = post_hash;

Best Practices

Fail Open: If a hydrator fails (e.g., a timeout), the pipeline should continue with default features rather than returning an error to the user.
Deduplication: The HomeMixer orchestration layer automatically deduplicates candidates from different sources before hydration. Do not implement deduplication logic within the hydrator itself.
Observability: Use the metrics module in thunder or home-mixer to track the latency of each hydration step. The Phoenix model is computationally expensive, so saving time in hydration is critical for overall performance.