Profiling System Performance
To maintain a low-latency experience for the "For You" feed, identifying bottlenecks in the retrieval and ranking pipeline is essential. The x-algorithm codebase provides built-in support for profiling both the Rust-based infrastructure (Thunder, Home Mixer) and the Python-based transformer models (Phoenix).
This guide walks you through enabling the profiling servers and capturing performance traces.
Enabling the Profiling Server
The Rust services include an optional profiling server that can be toggled via CLI arguments. When enabled, it provides an endpoint for CPU and memory profiling.
1. Start Thunder with Profiling
To profile the in-network retrieval service (Thunder), use the --enable-profiling flag. By default, this spawns a profiling server on port 3000.
./target/release/thunder \
--grpc-port 5001 \
--http-port 8080 \
--enable-profiling \
--is-serving
2. Configure Home Mixer Metrics
Home Mixer uses a dedicated metrics port to export telemetry data that can be consumed by Prometheus or visualized via Grafana.
./target/release/home-mixer \
--grpc-port 5002 \
--metrics-port 9090 \
--reload-interval-minutes 60
Profiling the Rust Pipeline
Once the services are running with profiling enabled, you can use standard tools like pprof or the internal xai_profiling interface to capture data.
Capturing a CPU Profile
While the system is under load (e.g., during a load test or a high-traffic period), capture a 30-second CPU profile:
- Identify the endpoint: The profiling server usually listens at
http://localhost:3000/debug/pprof/profile. - Generate the trace:
# Capture a 30-second profile and save it to a file curl -o cpu_profile.pb.gz http://localhost:3000/debug/pprof/profile?seconds=30 - Visualize: Use
pprofto generate a flame graph:pprof -http=:8081 cpu_profile.pb.gz
Identifying Memory Bottlenecks
In Thunder, the PostStore maintains a large in-memory cache of recent posts. If you notice high memory usage, check the PostStore stats logger which is automatically started in serving mode:
- Look for "PostStore stats" in the application logs.
- Monitor the
GET_IN_NETWORK_POSTS_EXCLUDED_SIZEmetric to see if too many candidates are being filtered out early, which indicates inefficient retrieval.
Profiling Phoenix (JAX/Python)
The Phoenix ranking and retrieval models are built using JAX and Haiku. Performance issues here usually stem from XLA compilation or GPU/TPU utilization.
1. Using the JAX Profiler
To profile the RecsysModel during inference, use the jax.profiler API. This allows you to see the execution time of specific transformer layers.
import jax
# Start the profiler server
jax.profiler.start_server(port=9999)
# Alternatively, wrap a specific execution block
with jax.profiler.TraceAnnotation("inference_step"):
output = model.apply(params, batch_embeddings)
2. Analyzing XLA Compilation
If the first request to the "For You" feed is extremely slow, it is likely due to XLA compilation. To debug this:
- Set the environment variable
XLA_FLAGS="--xla_dump_to=/tmp/xla_dump". - Run the model.
- Analyze the generated HLO files to identify expensive operations or fusion failures in the transformer block.
Monitoring Real-time Metrics
Both Thunder and Home Mixer export high-level metrics that should be monitored during profiling to correlate performance spikes with specific system behaviors.
| Metric | Description | Service |
| :--- | :--- | :--- |
| GET_IN_NETWORK_POSTS_DURATION | Total latency for Thunder retrieval. | Thunder |
| IN_FLIGHT_REQUESTS | Number of concurrent requests being processed. | Thunder |
| BATCH_PROCESSING_TIME | Time spent deserializing Kafka tweet events. | Thunder |
| REJECTED_REQUESTS | Requests dropped due to the semaphore limit. | Thunder |
Adjusting Concurrency Limits
If you see a high number of REJECTED_REQUESTS despite low CPU usage, you may need to increase the --max-concurrent-requests argument in Thunder. This setting controls the semaphore that prevents service exhaustion during retrieval spikes.