Configuring Pipeline Side Effects
In the X recommendation engine, Side Effects are actions triggered during or after the candidate pipeline execution that do not directly affect the ranking of the current feed. Instead, they are used for external system updates, such as logging engagement data for model training, updating caches, or emitting real-time metrics.
This guide walks you through implementing and configuring side effects within the Home Mixer and Candidate Pipeline.
1. Defining a New Side Effect
Side effects are implemented in the candidate-pipeline and home-mixer services. To create a new side effect, you must implement the execution logic that handles the processed candidates.
Step 1: Implement the Side Effect Logic
In the candidate-pipeline/src/side_effect/ directory, define your side effect logic. Most side effects act upon the ScoredCandidate objects.
// Example: A side effect for logging feature data to Kafka
pub struct FeatureLoggingSideEffect {
producer: KafkaProducer,
}
impl SideEffect for FeatureLoggingSideEffect {
fn execute(&self, candidates: &[ScoredPost], metadata: &QueryMetadata) {
let log_entries = candidates.iter().map(|post| {
// Transform post data into a log message
serialize_for_grok_training(post, metadata)
}).collect();
self.producer.send_batch(log_entries);
}
}
2. Registering the Side Effect in the Pipeline
Once defined, the side effect must be registered within the Home Mixer orchestration layer so the system knows when to trigger it.
Step 2: Update the Side Effect Registry
In home-mixer/side_effects.rs, add your new implementation to the registry. The registry ensures that the side effects are initialized with the necessary clients (e.g., Kafka producers, Redis clients).
pub fn get_active_side_effects(params: &PipelineParams) -> Vec<Box<dyn SideEffect>> {
let mut effects = Vec::new();
if params.enable_feature_logging {
effects.push(Box::new(FeatureLoggingSideEffect::new()));
}
if params.enable_candidate_caching {
effects.push(Box::new(CandidateCacheSideEffect::new()));
}
effects
}
3. Configuring Execution Behavior
Side effects can be executed synchronously (blocking the response) or asynchronously (background tasks).
Step 3: Set Execution Policy
Configure how the pipeline handles the side effect in your configuration files (e.g., params.rs).
- Synchronous: Use for critical paths like "Read-through Caching" where the next stage depends on the side effect's completion.
- Asynchronous: Use for "Fire-and-forget" actions like logging or metrics. This is the recommended setting for most side effects to minimize latency for the user.
// In home-mixer/params.rs
pub const SIDE_EFFECT_TIMEOUT_MS: u64 = 50;
pub const ASYNC_SIDE_EFFECT_BUFFER_SIZE: usize = 1000;
4. Common Use Cases
Training Data Logging
The most frequent use of side effects in the x-algorithm is logging data for the Phoenix (Grok-based) transformer. When a user is served a feed, a side effect logs the "impressions"—the posts shown and the features associated with them. This data is later joined with user engagement (likes, replies) to retrain the model.
Candidate Caching
To reduce load on the Thunder (In-Network) and Phoenix Retrieval (Out-of-Network) sources, side effects can be used to cache retrieved candidates for a specific user.
// Example: Side effect to cache candidates for subsequent pagination
impl SideEffect for CandidateCacheSideEffect {
fn execute(&self, candidates: &[ScoredPost], metadata: &QueryMetadata) {
self.redis_client.set_ex(
format!("feed_cache:{}", metadata.user_id),
candidates,
300 // 5 minute TTL
);
}
}
5. Monitoring Side Effects
Since side effects often interact with external infrastructure (Kafka, Memcached, etc.), it is critical to monitor their health.
- Latency: Monitor the time taken for
execute()to complete. - Drop Rate: If using asynchronous side effects, monitor the buffer fill rate. If the buffer is full, the system will drop side effects to prioritize serving the feed.
- Failure Count: Track gRPC or network errors when the side effect communicates with external services.
Metrics are automatically exported via the metrics_port defined in main.rs and can be visualized in your standard dashboards under the home_mixer_side_effect_* namespace.