Creating Custom Filters
Filtering is a critical stage in the X recommendation pipeline. While the Phoenix transformer ranks posts based on engagement probability, filters ensure the final feed adheres to safety guidelines, user preferences, and business logic (e.g., removing blocked content or duplicate posts).
This guide walks you through creating and integrating a custom filter within the home-mixer and candidate-pipeline architecture.
Overview of the Filtering Stage
Filters are executed within the Home Mixer orchestration layer. They typically operate on ScoredPost objects after retrieval and ranking. A filter's job is to evaluate a batch of candidates and return a subset that meets specific criteria.
Step 1: Define Your Filter Logic
Filters in the X algorithm are implemented in Rust. To create a custom filter, you define a struct that implements the filtering logic.
In this example, we will create a LanguageFilter that removes posts not matching the user's preferred languages.
Create the Filter Struct
Navigate to home-mixer/src/filters/ (or the equivalent directory in your local setup) and create a new file, e.g., language_filter.rs.
use xai_home_mixer_proto::LightPost;
pub struct LanguageFilter {
allowed_languages: Vec<String>,
}
impl LanguageFilter {
pub fn new(languages: Vec<String>) -> Self {
Self {
allowed_languages: languages,
}
}
/// Returns true if the post should be kept, false if it should be filtered out.
pub fn filter_post(&self, post: &LightPost) -> bool {
// If no language is specified for the post, we keep it by default
if post.language.is_empty() {
return true;
}
self.allowed_languages.contains(&post.language)
}
}
Step 2: Implement the Pipeline Integration
Filters must handle batches of posts efficiently. You will typically implement a function that processes a vector of candidates.
impl LanguageFilter {
pub fn apply(&self, posts: Vec<LightPost>) -> Vec<LightPost> {
posts
.into_iter()
.filter(|post| self.filter_post(post))
.collect()
}
}
Step 3: Register the Filter in Home Mixer
To make the filter active, you must integrate it into the CandidatePipeline or the HomeMixerServer. This usually happens in the home-mixer/lib.rs or the specific pipeline configuration.
- Open
home-mixer/src/filters/mod.rs. - Expose your new module:
pub mod language_filter; - Initialize the filter in the pipeline setup logic:
use crate::filters::language_filter::LanguageFilter;
// Inside your pipeline execution logic
let lang_filter = LanguageFilter::new(vec!["en".to_string(), "es".to_string()]);
// Apply the filter to the candidate set
let filtered_candidates = lang_filter.apply(initial_candidates);
Step 4: Testing Your Filter
It is essential to verify that your filter correctly handles edge cases, such as empty post lists or missing metadata.
Create a test block in your filter file:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_language_filtering() {
let filter = LanguageFilter::new(vec!["en".to_string()]);
let mut post_en = LightPost::default();
post_en.language = "en".to_string();
let mut post_fr = LightPost::default();
post_fr.language = "fr".to_string();
assert!(filter.filter_post(&post_en));
assert!(!filter.filter_post(&post_fr));
}
}
Best Practices for Custom Filters
- Performance: Filters run on every request. Avoid heavy I/O operations (like database lookups) inside the
filter_postloop. Use pre-hydrated user features instead. - Observability: Use the
metricsmodule to track how many posts are removed by your filter. This helps identify if a filter is being too aggressive.// Example metric increment metrics::FILTERED_POSTS_COUNT.with_label_values(&["language_filter"]).inc(); - Statelessness: Design filters to be stateless whenever possible to ensure they can be executed in parallel across the candidate set.
- Ordering: Remember that filters are destructive. If you have multiple filters, order them from most restrictive/cheapest to least restrictive/expensive to optimize processing time.