Configuring Pipeline Side Effects

In the X recommendation engine, Side Effects are actions triggered during or after the candidate pipeline execution that do not directly affect the ranking of the current feed. Instead, they are used for external system updates, such as logging engagement data for model training, updating caches, or emitting real-time metrics.

This guide walks you through implementing and configuring side effects within the Home Mixer and Candidate Pipeline.

1. Defining a New Side Effect

Side effects are implemented in the candidate-pipeline and home-mixer services. To create a new side effect, you must implement the execution logic that handles the processed candidates.

Step 1: Implement the Side Effect Logic

In the candidate-pipeline/src/side_effect/ directory, define your side effect logic. Most side effects act upon the ScoredCandidate objects.

// Example: A side effect for logging feature data to Kafka
pub struct FeatureLoggingSideEffect {
    producer: KafkaProducer,
}

impl SideEffect for FeatureLoggingSideEffect {
    fn execute(&self, candidates: &[ScoredPost], metadata: &QueryMetadata) {
        let log_entries = candidates.iter().map(|post| {
            // Transform post data into a log message
            serialize_for_grok_training(post, metadata)
        }).collect();
        
        self.producer.send_batch(log_entries);
    }
}

2. Registering the Side Effect in the Pipeline

Once defined, the side effect must be registered within the Home Mixer orchestration layer so the system knows when to trigger it.

Step 2: Update the Side Effect Registry

In home-mixer/side_effects.rs, add your new implementation to the registry. The registry ensures that the side effects are initialized with the necessary clients (e.g., Kafka producers, Redis clients).

pub fn get_active_side_effects(params: &PipelineParams) -> Vec<Box<dyn SideEffect>> {
    let mut effects = Vec::new();

    if params.enable_feature_logging {
        effects.push(Box::new(FeatureLoggingSideEffect::new()));
    }

    if params.enable_candidate_caching {
        effects.push(Box::new(CandidateCacheSideEffect::new()));
    }

    effects
}

3. Configuring Execution Behavior

Side effects can be executed synchronously (blocking the response) or asynchronously (background tasks).

Step 3: Set Execution Policy

Configure how the pipeline handles the side effect in your configuration files (e.g., params.rs).

Synchronous: Use for critical paths like "Read-through Caching" where the next stage depends on the side effect's completion.
Asynchronous: Use for "Fire-and-forget" actions like logging or metrics. This is the recommended setting for most side effects to minimize latency for the user.

// In home-mixer/params.rs
pub const SIDE_EFFECT_TIMEOUT_MS: u64 = 50;
pub const ASYNC_SIDE_EFFECT_BUFFER_SIZE: usize = 1000;

4. Common Use Cases

Training Data Logging

The most frequent use of side effects in the x-algorithm is logging data for the Phoenix (Grok-based) transformer. When a user is served a feed, a side effect logs the "impressions"—the posts shown and the features associated with them. This data is later joined with user engagement (likes, replies) to retrain the model.

Candidate Caching

To reduce load on the Thunder (In-Network) and Phoenix Retrieval (Out-of-Network) sources, side effects can be used to cache retrieved candidates for a specific user.

// Example: Side effect to cache candidates for subsequent pagination
impl SideEffect for CandidateCacheSideEffect {
    fn execute(&self, candidates: &[ScoredPost], metadata: &QueryMetadata) {
        self.redis_client.set_ex(
            format!("feed_cache:{}", metadata.user_id),
            candidates,
            300 // 5 minute TTL
        );
    }
}

5. Monitoring Side Effects

Since side effects often interact with external infrastructure (Kafka, Memcached, etc.), it is critical to monitor their health.

Latency: Monitor the time taken for execute() to complete.
Drop Rate: If using asynchronous side effects, monitor the buffer fill rate. If the buffer is full, the system will drop side effects to prioritize serving the feed.
Failure Count: Track gRPC or network errors when the side effect communicates with external services.

Metrics are automatically exported via the metrics_port defined in main.rs and can be visualized in your standard dashboards under the home_mixer_side_effect_* namespace.