Adding New Candidate Sources
Adding a new candidate source allows you to inject content from new providers—such as a specialized news service, a niche media vertical, or a new ML retrieval model—directly into the Home Mixer orchestration layer.
In the X recommendation architecture, candidate sources are responsible for providing an initial pool of posts that the Phoenix ranker then scores and filters.
Overview
Integrating a new source involves three primary steps:
- Defining the Source Logic: Creating a module to handle the request to your content provider.
- Implementing the Source Trait: Standardizing how candidates are returned to the pipeline.
- Registering with Home Mixer: Adding the source to the orchestration logic in
home-mixer/lib.rs.
Step 1: Define the Source Logic
New sources should be implemented within the home-mixer/sources/ directory. Your source needs to communicate with your third-party provider, typically via gRPC or a fast KV store.
Create a new file, e.g., home-mixer/src/sources/my_new_source.rs:
use async_trait::async_trait;
use xai_home_mixer_proto as pb;
use crate::sources::CandidateSource;
pub struct MyNewSource {
client: MyThirdPartyClient, // Your gRPC or API client
}
impl MyNewSource {
pub fn new(client: MyThirdPartyClient) -> Self {
Self { client }
}
}
Step 2: Implement the CandidateSource Trait
The CandidateSource trait ensures your source can be plugged into the asynchronous candidate pipeline. Your implementation must return a vector of candidate posts, which include the post_id and the author_id.
#[async_trait]
impl CandidateSource for MyNewSource {
async fn fetch_candidates(
&self,
query: &pb::HomeMixerRequest
) -> anyhow::Result<Vec<pb::CandidatePost>> {
// 1. Extract user features from the request
let user_id = query.user_id;
// 2. Fetch data from your third-party provider
let response = self.client.get_recommendations(user_id).await?;
// 3. Map the response to the Home Mixer candidate format
let candidates = response.items.into_iter().map(|item| {
pb::CandidatePost {
post_id: item.id,
author_id: item.author_id,
source_id: "my_new_source".to_string(),
..Default::default()
}
}).collect();
Ok(candidates)
}
}
Step 3: Register the Source in Home Mixer
Once the source logic is ready, you must register it in the HomeMixerServer. This happens in home-mixer/lib.rs (or your server initialization logic).
- Initialize the client: Add your third-party client to the server struct.
- Add to the Pipeline: Include the source in the
candidate_pipelineexecution list.
// Inside HomeMixerServer::new()
pub async fn new() -> Self {
let my_client = MyThirdPartyClient::connect("http://my-service:50051").await;
let my_source = MyNewSource::new(my_client);
// Add to the list of active sources
let sources: Vec<Arc<dyn CandidateSource>> = vec![
Arc::new(ThunderSource::new()), // In-Network
Arc::new(PhoenixSource::new()), // Out-of-Network
Arc::new(my_source), // Your new source
];
Self { sources, ..Default::default() }
}
Step 4: Add Candidate Hydration (Optional)
If your new source only returns IDs, but the ranking model requires more data (like the full text or media types), you must ensure your candidates pass through the Hydrator stage.
In home-mixer/candidate_hydrators/, ensure your candidates are being sent to the QueryHydrator. This component fills in the missing features needed for the Phoenix ranking model:
// Example of ensuring source compatibility in the hydrator
pub async fn hydrate_batch(batch: Vec<CandidatePost>) {
// The Phoenix transformer requires hashes for ranking
// This step is handled automatically if the source_id is registered
}
Best Practices
- Set Timeouts: Third-party providers should never block the main feed. Always wrap your source requests in a timeout (typically < 50ms).
- Handle Failures Gracefully: If your source fails, return an empty
Vec. The Home Mixer is designed to continue with other sources (like Thunder or Phoenix) rather than returning an error to the user. - Limit Batch Size: Only return the top 100–200 candidates from your source. Providing too many candidates can degrade the performance of the ranking stage.
- Monitoring: Add a unique label for your
source_idin the metrics collector so you can track the "survival rate" of your candidates through the scoring and filtering stages.