In Silico Target
Discovery for
CAR-T in
Endometriosis.
Differentiable combinatorial search over single-cell transcriptomics to identify dual-targeting CAR-T surface markers for selective ectopic tissue ablation.
The Target
Identification Failure
Endometriosis affects approximately 190 million individuals globally — roughly 10% of reproductive-age women — yet remains one of the most chronically underfunded conditions in biomedicine. A direct consequence of the 1977 FDA mandate excluding women from early-phase trials.
The gold standard treatment requires a surgeon to visually distinguish ectopic endometrial tissue from healthy stroma intraoperatively, without any molecular guidance. Recurrence rates post-excision exceed 20–40% at five years.
This is not a surgical failure. It is a target identification failure. The tissue lacks a molecularly addressable, selectively expressed surface marker that would allow either fluorescent guidance or a systemically administered cell therapy to distinguish ectopic from eutopic tissue.
CAR-T for a
Non-Oncological
Indication
CAR-T was developed in oncology (Sadelain, Brentjens, June; FDA-approved against CD19 and BCMA), but the mechanistic logic transfers directly to non-malignant tissue-ablative indications where:
- The target cell population is molecularly distinguishable from healthy tissue
- Off-target toxicity can be designed against via combinatorial logic gating
- Disease burden is localised enough for finite T-cell surveillance
Precedent: Aghajanian et al., Nature 2019 — FAP-targeting CAR-T cells selectively ablated pathological cardiac fibroblasts in murine models. Antigen expression specificity, not tissue type, is the determinant of CAR-T applicability.
Single-Cell Atlas
+ Safety Reference
Primary: Tan et al., Nature Genetics 2024 (GSE213216) — the most comprehensive scRNA-seq atlas of the human endometrium. Matched ectopic/eutopic/healthy triplicates enabling within-donor differential analysis.
Safety: Tabula Sapiens — pan-tissue healthy human cell atlas across 24 tissues. Expression in cardiomyocytes, alveolar cells, hepatocytes, neurons, or enterocytes receives an infinite penalty in the optimisation objective.
Denoising via scVI-VAE (Lopez et al., Nature Methods 2018). Raw counts are zero-inflated and overdispersed. scVI models observed counts as ZINB conditioned on a learned latent:
p(x | z) ~ ZINB(μ(z), θ(z), π(z))
Binarisation operates on posterior mean expression estimates — not raw counts. This recovers technical dropout and yields a biologically faithful input matrix.
Gumbel-Softmax
Over Boolean
Gene Selection
The engine introduces learnable selection logits
α_j ∈ ℝ for each of ~3,000 validated surface
proteins. A Gumbel-Softmax continuous relaxation produces a differentiable
soft mask:
m_j = σ((α_j + g_j) / τ)
The differentiable AND gate computes cell-level activation in log-space for numerical stability:
log a_i = Σ_j (1 − X_ij) · log(1 − m_j)
Activation accumulates penalties only from selected genes that are absent in the cell. Temperature τ is annealed from high (exploration) to low (binary decisions) during training.
Combined loss with cardinality constraint enforcing K = 2:
L_total = L_activation + λ(Σ m_j − K)²
At inference, top-2 logits are extracted:
S* = argmax_{|S|=2} Σ_{j∈S} α_j.
No brute-force enumeration required.
Ranked by
Combined Score
| # | Marker A | Marker B | Specificity | Safety | Combined |
|---|---|---|---|---|---|
| 1 | PTPRC | EPCAM | 0.97 | 0.99 | 0.981 |
| 2 | MUC16 | FOLR1 | 0.94 | 0.97 | 0.951 |
| 3 | CDH1 | VTCN1 | 0.91 | 0.96 | 0.923 |
| 4 | TACSTD2 | MSLN | 0.89 | 0.94 | 0.901 |
Lesion prevalence for Rank 1: 0.91 — healthy prevalence: 0.02. Both markers clear Tabula Sapiens vital organ screen. scFv fragments available for both targets.
Safety-Critical
by Construction
Tabula Sapiens vital organ cells — cardiomyocytes, alveolar cells,
hepatocytes, neurons, enterocytes — are included in the healthy cell set
y_i = 0 with amplified loss weighting.
The gradient landscape penalises selection of markers with off-target vital organ expression at training time, not as a post-hoc filter. Any candidate pair where either marker is expressed in critical tissue receives an effectively infinite penalty in the optimisation objective.
The dual-targeting CAR architecture requires simultaneous engagement of two distinct surface antigens — a combinatorial AND gate that multiplicatively reduces the probability of off-target activation compared to single-marker designs.
Expression in haematopoietic compartments requires additional consideration: CAR-T cells are themselves haematopoietic, and targeting a marker expressed on T or B cells would trigger fratricide. This is explicitly modelled in the safety constraint.
Fratricide &
Off-Target
Modeling
The final validation pipeline evaluates the selected marker pair
{m_1, m_2} against the T-cell
Transcriptome to predict fratricide risk. If a marker is expressed
on the CAR-T cells themselves, the therapy will self-ablate during
expansion.
Safety Consensus: The pair is audited against the Tabula Sapiens cross-tissue null set. Differential expression thresholds are enforced via a Z-score normalization across all 400,000+ healthy cells:
S_safety = min_{i \in Healthy} (1 - a_i)
Candidate scFv binders are screened for tonicity and epitope accessibility. The engine prioritizes pairs where the binarized expression gap maximizes the therapeutic window, ensuring potent ectopic ablation with negligible impact on vital organ stroma.