Implementing Custom Post Filters
Post filtering is a critical stage in the recommendation pipeline that occurs after ranking. While the Phoenix model determines the relevance of a post, filters apply business logic and safety rules—such as removing duplicate content, hiding blocked authors, or enforcing age-restrictions—before the feed is returned to the user.
In the x-algorithm repository, filters are primarily managed within the Home Mixer and Candidate Pipeline components.
Overview of the Filter Interface
Filters are implemented as modular units within the Rust-based orchestration layer. A filter typically takes a list of candidate posts and the user's context as input, returning a filtered list or a bitmask of allowed posts.
Key Locations
- Home Mixer Filters:
home-mixer/filters/ - Candidate Pipeline Logic:
candidate-pipeline/src/filter.rs
Step 1: Create a New Filter
To implement a custom filter, you need to define a struct that implements the filtering logic. For this guide, we will create an AgeGatedFilter that removes sensitive content for users under 18.
Define the Filter Structure
Create a new file or add to the existing filters module:
use xai_home_mixer_proto::LightPost;
use crate::filters::FilterContext;
pub struct AgeGatedFilter {
min_age_requirement: i32,
}
impl AgeGatedFilter {
pub fn new(min_age: i32) -> Self {
Self { min_age_requirement: min_age }
}
}
Step 2: Implement the Filtering Logic
The filtering logic evaluates each LightPost against the FilterContext (which contains user metadata).
impl AgeGatedFilter {
pub fn filter(&self, context: &FilterContext, posts: Vec<LightPost>) -> Vec<LightPost> {
// If the user meets the age requirement, no filtering is needed
if context.user_age >= self.min_age_requirement {
return posts;
}
// Filter out posts marked as sensitive/NSFW
posts.into_iter()
.filter(|post| !post.is_sensitive)
.collect()
}
}
Step 3: Register the Filter in the Home Mixer
Once the filter is defined, it must be added to the execution pipeline within the HomeMixerServer. This usually happens in the orchestration layer where candidate sets are processed.
Locate the filter registry in home-mixer/lib.rs or the server.rs implementation:
// Inside your service orchestration logic
let age_filter = AgeGatedFilter::new(18);
// Apply the filter to the retrieved candidates
let final_posts = age_filter.filter(&user_context, ranked_candidates);
Practical Example: Content Deduplication
Deduplication filters are often used to ensure a user doesn't see the same post multiple times if it was retrieved by both Thunder (In-Network) and Phoenix (Out-of-Network) sources.
Implementation
This filter tracks seen IDs and removes duplicates from the stream.
use std::collections::HashSet;
pub struct DeduplicationFilter;
impl DeduplicationFilter {
pub fn filter(posts: Vec<LightPost>) -> Vec<LightPost> {
let mut seen_ids = HashSet::new();
posts.into_iter()
.filter(|post| {
// Returns true if the ID was not already present
seen_ids.insert(post.post_id)
})
.collect()
}
}
Best Practices for Custom Filters
- Performance First: Filters run on every request. Avoid making external network calls (like DB lookups) inside a filter. Use the Query Hydration stage to fetch all necessary user data into the
FilterContextbeforehand. - Order Matters: Place high-cardinality filters (those that remove the most posts, like "Blocked Authors") early in the chain to reduce the workload for subsequent, more complex filters.
- Stat Reporting: Use the
metricsmodule to track how many posts each filter removes. This is vital for debugging "empty feed" issues.
// Example of reporting filter drops
metrics::FILTER_DROPPED_POSTS.with_label_values(&["age_gated"]).inc_by(dropped_count);
Testing Your Filter
Add unit tests to ensure your logic handles edge cases, such as empty candidate lists or missing user metadata.
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_age_gated_filter_removes_sensitive_content() {
let filter = AgeGatedFilter::new(18);
let context = FilterContext { user_age: 16 };
let posts = vec![
LightPost { post_id: 1, is_sensitive: false },
LightPost { post_id: 2, is_sensitive: true },
];
let filtered = filter.filter(&context, posts);
assert_eq!(filtered.len(), 1);
assert_eq!(filtered[0].post_id, 1);
}
}