Ranking with Grok Transformers

The final stage of the recommendation pipeline uses Phoenix, a ranking model built on the Grok-1 transformer architecture. Unlike traditional recommendation systems that rely on thousands of hand-engineered features, Phoenix uses a sequence-based transformer approach to understand the context of user engagement.

This guide walks you through configuring the Phoenix model, preparing input data, and executing a ranking pass.

1. Configure the Transformer Architecture

The Phoenix model is highly configurable via the TransformerConfig and PhoenixRankingModelConfig classes. You must define the dimensions of the embedding space and the depth of the transformer layers.

from grok import TransformerConfig
from phoenix.recsys_model import PhoenixRankingModelConfig

# Define the transformer hyperparameters
model_config = TransformerConfig(
    emb_size=768,             # Embedding dimension
    num_layers=12,            # Number of transformer blocks
    num_q_heads=12,           # Number of query heads
    num_kv_heads=4,           # Number of KV heads (for Grouped Query Attention)
    widening_factor=4,        # Expansion factor for the MLP block
    attn_output_multiplier=0.125
)

# Apply to the Ranking configuration
ranking_config = PhoenixRankingModelConfig(
    model=model_config,
    history_seq_len=128,      # Maximum number of past actions to consider
    candidate_seq_len=200     # Number of posts to rank per request
)

2. Prepare the Input Batch

Phoenix processes data using two main structures: RecsysBatch (raw feature hashes) and RecsysEmbeddings (vectors retrieved from feature stores).

Identifying Features via Hashes

The system uses hash-based identities for users, authors, and posts. This allows the model to map sparse IDs to dense vectors without storing massive vocabulary tables.

from phoenix.recsys_model import RecsysBatch

# Create a batch representing the user and candidates
batch = RecsysBatch(
    user_hashes=user_id_hashes,             # [B, 2]
    history_post_hashes=past_post_ids,      # [B, history_seq_len, 2]
    history_actions=past_actions,           # [B, history_seq_len] (e.g., Like, Reply)
    candidate_post_hashes=candidate_ids,    # [B, candidate_seq_len, 2]
    candidate_author_hashes=author_ids      # [B, candidate_seq_len, 2]
)

Hydrating Embeddings

Before inference, you must provide the pre-looked-up embeddings.

from phoenix.recsys_model import RecsysEmbeddings

embeddings = RecsysEmbeddings(
    user_embeddings=user_vectors,
    history_post_embeddings=past_post_vectors,
    candidate_post_embeddings=candidate_post_vectors,
    # ... other required author and action embeddings
)

3. Apply the Recommendation Attention Mask

A critical step in ranking is the attention mask. In a standard language model, attention is purely causal (left-to-right). In Phoenix ranking, we use a specialized mask generated by make_recsys_attn_mask:

User History: Maintains causal attention (each action only sees previous actions).
Candidates: Each candidate post attends to the entire user history to gain context.
Independence: Candidate posts cannot attend to each other. This ensures that a post's score is independent of the other candidates in the same batch.

from grok import make_recsys_attn_mask

# Generate the mask for the combined sequence
# total_seq_len = 1 (user) + history_seq_len + candidate_seq_len
mask = make_recsys_attn_mask(
    seq_len=total_seq_len, 
    candidate_start_offset=history_seq_len + 1
)

4. Execute the Ranking Pass

With the configuration and data prepared, you can run the model to generate engagement probabilities. The model returns logits, which represent the likelihood of different user actions (Like, Retweet, Reply, etc.).

import haiku as hk

def ranking_forward(batch, embeddings):
    model = ranking_config.make()
    return model(batch, embeddings)

# Transform for JAX/Haiku execution
ranking_fn = hk.transform(ranking_forward)

# Run inference
logits = ranking_fn.apply(params, rng, batch, embeddings)

5. Scoring and Final Re-ranking

The output logits are multi-dimensional. Each dimension corresponds to a specific engagement type. To produce the final "For You" feed, the system applies a weighted sum to these logits:

Positive Weights: Applied to "Like," "Retweet," and "Long View" probabilities.
Negative Weights: Applied to "Report," "Not Interested," or "Mute" probabilities.

The Home Mixer component (written in Rust) receives these scores and performs the final sort before serving the feed to the user.

Troubleshooting Model Performance

Low Relevance: Check the history_seq_len. If the user's action sequence is too short, the transformer may lack sufficient context to build an accurate user representation.
Inconsistent Scores: Ensure the make_recsys_attn_mask is being applied correctly. If candidates are allowed to attend to each other, their scores will fluctuate depending on which other posts are in the candidate pool.
High Latency: Adjust the num_kv_heads in the TransformerConfig. Using Grouped Query Attention (GQA) significantly reduces memory bandwidth requirements during ranking.