From Logs to Numbers: Google’s RLM Predicts System Performance from Raw Text

Why predicting industrial systems is hard

Predicting performance for large-scale industrial systems—such as Google’s Borg compute clusters—has historically relied on heavy domain-specific feature engineering and rigid tabular representations. Logs, configuration files, heterogeneous hardware inventories and nested job descriptions are difficult to flatten into fixed tensors, which makes models brittle and slow to adapt when workloads or hardware change.

Turning regression into language modeling

Google's Regression Language Model (RLM) reframes regression as a text-generation task. Instead of hand-crafting numeric features, the entire system state—configuration, logs, workload profiles, hardware specs—is serialized into structured text (YAML, JSON, or similar) and fed to an encoder-decoder LLM as a prompt. The model then generates the numeric target as text (for example, efficiency metrics like MIPS per GCU).

This text-to-text formulation removes the need for predefined feature sets, normalization rules, and rigid encoding schemes. Any state that can be expressed as text becomes a first-class input: nested fields, heterogeneous lists, and evolving attributes are all naturally supported.

Model architecture and training details

RLM uses a relatively compact encoder-decoder LLM (around 60M parameters) trained from scratch on the string pairs of system state and numeric outcome. Training optimizes next-token cross-entropy over sequences that contain both the serialized input state and a textual representation of the numeric target.

Numeric outputs are handled via a custom tokenization scheme optimized for floating-point values (for example, a P10 mantissa-sign-exponent encoding) so that real-valued targets are represented accurately within the model vocabulary. RLMs can be trained from random initialization, focusing learning capacity on correlating system-state text with numeric outcomes rather than on general language modeling priors.

Adaptation, sequence length, and uncertainty

RLMs are few-shot adaptable: pretrained models can be fine-tuned on a few hundred to a few thousand examples and quickly adapt to new clusters, configurations, or time windows. They support long input sequences (thousands of tokens), which allows full observation of complex system states without aggressive truncation.

Crucially, the text-generation framing naturally supports uncertainty quantification: by sampling multiple outputs per input, RLMs provide distributions over possible numeric outcomes. This captures both aleatoric uncertainty (inherent randomness) and epistemic uncertainty (due to limited observability), enabling probabilistic simulation and Bayesian optimization.

Performance on Google Borg

When tested on Borg cluster data, RLMs achieved extremely strong correlations with ground truth metrics—Spearman rank correlations up to 0.99 and about 0.9 on average—and reduced mean squared error by roughly two orders of magnitude compared to tabular baselines. The model's density estimation capabilities make it suitable as a universal simulator or digital twin, offering fast, uncertainty-aware predictions for infrastructure optimization.

Applications and implications

Potential applications span cloud and compute cluster optimization, manufacturing and IoT pipeline simulation, and scientific experiments where inputs are complex, textually described states. By removing feature engineering bottlenecks and supporting rapid adaptation, RLMs can speed up simulation, make optimization workflows more robust, and enable real-time feedback loops for next-generation industrial AI.

For full technical details, datasets, and code, see the arXiv paper: https://arxiv.org/abs/2506.21718