X-Cell
Overview
X-Cell is Xaira Therapeutics' virtual cell model — a diffusion language model trained on perturbative causal single-cell data. Unlike first-generation virtual cell models such as scGPT and Geneformer, which used next-token prediction and were trained on descriptive observational data, X-Cell treats all gene tokens symmetrically without positional ordering. This makes the diffusion formulation better suited to gene expression data, where there is no natural sequential ordering of genes, and enables counterfactual perturbation prediction rather than mere description.
X-Cell integrates five sources of prior biological knowledge via cross-attention: LLM text embeddings, ESM2 protein language model representations, STRING protein interaction network data, DepMap cancer dependency data, and JUMP-Cell Painting morphological profiles. The flagship scale variant, X-Cell-Ultra, has 4.9 billion parameters — 92 times larger than scGPT's 53M parameters. Scaling follows a strong law (R²=0.971 on the Replogle-Nadig dataset), and out-of-distribution generalization improves at larger scale. X-Cell outperforms Cell2Sentence, STATE, and scGPT on all perturbation prediction metrics.
Sign in to read the full article.
Sign In