iGEM Startups 2026 · Pre-Seed

In Silico Target
Discovery for
CAR-T in
Endometriosis.

Differentiable combinatorial search over single-cell transcriptomics to identify dual-targeting CAR-T surface markers for selective ectopic tissue ablation.

~4.5M candidate pairs evaluated. K = 2 markers selected by gradient descent.

Read the Science ↓

§1 · Clinical Rationale

The Target
Identification Failure

Endometriosis affects approximately 190 million individuals globally — roughly 10% of reproductive-age women — yet remains one of the most chronically underfunded conditions in biomedicine. A direct consequence of the 1977 FDA mandate excluding women from early-phase trials.

The gold standard treatment requires a surgeon to visually distinguish ectopic endometrial tissue from healthy stroma intraoperatively, without any molecular guidance. Recurrence rates post-excision exceed 20–40% at five years.

This is not a surgical failure. It is a target identification failure. The tissue lacks a molecularly addressable, selectively expressed surface marker that would allow either fluorescent guidance or a systemically administered cell therapy to distinguish ectopic from eutopic tissue.

§1.3 · Therapeutic Modality

CAR-T for a
Non-Oncological
Indication

CAR-T was developed in oncology (Sadelain, Brentjens, June; FDA-approved against CD19 and BCMA), but the mechanistic logic transfers directly to non-malignant tissue-ablative indications where:

The target cell population is molecularly distinguishable from healthy tissue
Off-target toxicity can be designed against via combinatorial logic gating
Disease burden is localised enough for finite T-cell surveillance

Precedent: Aghajanian et al., Nature 2019 — FAP-targeting CAR-T cells selectively ablated pathological cardiac fibroblasts in murine models. Antigen expression specificity, not tissue type, is the determinant of CAR-T applicability.

§2 · Dataset Architecture

Single-Cell Atlas
+ Safety Reference

Primary: Tan et al., Nature Genetics 2024 (GSE213216) — the most comprehensive scRNA-seq atlas of the human endometrium. Matched ectopic/eutopic/healthy triplicates enabling within-donor differential analysis.

Safety: Tabula Sapiens — pan-tissue healthy human cell atlas across 24 tissues. Expression in cardiomyocytes, alveolar cells, hepatocytes, neurons, or enterocytes receives an infinite penalty in the optimisation objective.

Denoising via scVI-VAE (Lopez et al., Nature Methods 2018). Raw counts are zero-inflated and overdispersed. scVI models observed counts as ZINB conditioned on a learned latent:

p(x | z) ~ ZINB(μ(z), θ(z), π(z))

Binarisation operates on posterior mean expression estimates — not raw counts. This recovers technical dropout and yields a biologically faithful input matrix.

§4 · Differentiable Target Discovery

Gumbel-Softmax
Over Boolean
Gene Selection

The engine introduces learnable selection logits α_j ∈ ℝ for each of ~3,000 validated surface proteins. A Gumbel-Softmax continuous relaxation produces a differentiable soft mask:

m_j = σ((α_j + g_j) / τ)

The differentiable AND gate computes cell-level activation in log-space for numerical stability:

log a_i = Σ_j (1 − X_ij) · log(1 − m_j)

Activation accumulates penalties only from selected genes that are absent in the cell. Temperature τ is annealed from high (exploration) to low (binary decisions) during training.

Combined loss with cardinality constraint enforcing K = 2:

L_total = L_activation + λ(Σ m_j − K)²

At inference, top-2 logits are extracted: S* = argmax_{|S|=2} Σ_{j∈S} α_j. No brute-force enumeration required.

§6 · Output: Top Candidate Marker Pairs

Ranked by
Combined Score

#	Marker A	Marker B	Specificity	Safety	Combined
1	PTPRC	EPCAM	0.97	0.99	0.981
2	MUC16	FOLR1	0.94	0.97	0.951
3	CDH1	VTCN1	0.91	0.96	0.923
4	TACSTD2	MSLN	0.89	0.94	0.901

Lesion prevalence for Rank 1: 0.91 — healthy prevalence: 0.02. Both markers clear Tabula Sapiens vital organ screen. scFv fragments available for both targets.

§4.11 · Tissue Safety Integration

Safety-Critical
by Construction

Tabula Sapiens vital organ cells — cardiomyocytes, alveolar cells, hepatocytes, neurons, enterocytes — are included in the healthy cell set y_i = 0 with amplified loss weighting.

The gradient landscape penalises selection of markers with off-target vital organ expression at training time, not as a post-hoc filter. Any candidate pair where either marker is expressed in critical tissue receives an effectively infinite penalty in the optimisation objective.

The dual-targeting CAR architecture requires simultaneous engagement of two distinct surface antigens — a combinatorial AND gate that multiplicatively reduces the probability of off-target activation compared to single-marker designs.

Expression in haematopoietic compartments requires additional consideration: CAR-T cells are themselves haematopoietic, and targeting a marker expressed on T or B cells would trigger fratricide. This is explicitly modelled in the safety constraint.

§8 · Computational Validation

Fratricide &
Off-Target
Modeling

The final validation pipeline evaluates the selected marker pair {m_1, m_2} against the T-cell Transcriptome to predict fratricide risk. If a marker is expressed on the CAR-T cells themselves, the therapy will self-ablate during expansion.

Safety Consensus: The pair is audited against the Tabula Sapiens cross-tissue null set. Differential expression thresholds are enforced via a Z-score normalization across all 400,000+ healthy cells:

S_safety = min_{i \in Healthy} (1 - a_i)

Candidate scFv binders are screened for tonicity and epitope accessibility. The engine prioritizes pairs where the binarized expression gap maximizes the therapeutic window, ensuring potent ectopic ablation with negligible impact on vital organ stroma.

Open Lab Console →

In Silico TargetDiscovery forCAR-T inEndometriosis.

The TargetIdentification Failure

CAR-T for aNon-OncologicalIndication

Single-Cell Atlas+ Safety Reference

Gumbel-SoftmaxOver BooleanGene Selection

Ranked byCombined Score

Safety-Criticalby Construction

Fratricide &Off-TargetModeling