The Dataset
Scientific grounding for the atlas we're mining. The Human Endometriosis Cell Atlas β 54 patient samples, 15.7GB of scRNA-seq data across ectopic lesions, eutopic endometrium, and healthy controls.
Dataset overview
The Human Endometriosis Cell Atlas (Nature Genetics, 2024) provides single-cell RNA sequencing data across lesion, eutopic and unaffected tissue. This enables precise identification of lesion-specific expression signatures at single-cell resolution β the granularity required for safe CAR-T target discovery.
The atlas represents the largest and most comprehensive scRNA-seq dataset for endometriosis to date. Critically, it includes matched eutopic tissue from the same patients β allowing us to identify markers that distinguish ectopic from eutopic endometrium, not just from control tissue.
Data structure
Each sample is a sparse matrix in Cell Ranger HDF5 format. After loading, the structure is:
Integration across 54 samples requires batch-aware modelling. Sample batch is passed as a covariate to scVI, which learns to disentangle biological variation from technical batch effects. This is critical β naΓ―ve integration would confound tissue type with sample preparation batch.
Validation strategy
Literature cross-check
Top candidate markers are checked against existing endometriosis literature. Known markers (e.g. CA-125, VEGF) should appear in top-ranked pairs as a sanity check. Novel candidates are flagged for further investigation.
scFv availability
For each candidate gene, we check whether existing single-chain variable fragments (scFvs) are available in the literature or antibody databases. No scFv = harder (not impossible) to engineer the CAR domain.
Surface protein confirmation
CAR-T targets must be surface-accessible proteins. mRNA expression in scRNA-seq doesn't guarantee surface protein expression. Top candidates require protein-level validation (IHC, flow cytometry) in lesion tissue.
Tabula Sapiens orthogonal check
Independent of the optimisation penalty: we manually inspect expression heatmaps for each top pair across all Tabula Sapiens cell types. No surprises allowed before anything goes near a lab.