GSE213216 Β· Nature Genetics 2024 Β· 10x Genomics

The Dataset

Scientific grounding for the atlas we're mining. The Human Endometriosis Cell Atlas β€” 54 patient samples, 15.7GB of scRNA-seq data across ectopic lesions, eutopic endometrium, and healthy controls.

πŸ“Š

Dataset overview

54
Patient samples
Across ectopic lesions, eutopic endometrium, and unaffected control tissue.
15.7GB
Raw data
Cell Ranger output. Sparse barcodes Γ— genes matrices per sample.
3
Tissue types
Ectopic lesion, eutopic endometrium, unaffected peritoneal control.

The Human Endometriosis Cell Atlas (Nature Genetics, 2024) provides single-cell RNA sequencing data across lesion, eutopic and unaffected tissue. This enables precise identification of lesion-specific expression signatures at single-cell resolution β€” the granularity required for safe CAR-T target discovery.

The atlas represents the largest and most comprehensive scRNA-seq dataset for endometriosis to date. Critically, it includes matched eutopic tissue from the same patients β€” allowing us to identify markers that distinguish ectopic from eutopic endometrium, not just from control tissue.

54 samples ~15.7 GB 10x Genomics scRNA-seq Cell Ranger output Ectopic lesions (endometriomas + peritoneal) Eutopic endometrium Control tissue Nature Genetics 2024 NCBI GEO: GSE213216
πŸ”’

Data structure

Each sample is a sparse matrix in Cell Ranger HDF5 format. After loading, the structure is:

# AnnData structure after QC
adata.shape β†’ (N_cells, 33_538) # cells Γ— genes
adata.obs β†’ ['patient_id', 'tissue_type', 'batch', 'n_genes', 'pct_mt']
adata.var β†’ ['gene_name', 'gene_id', 'highly_variable']
adata.layers['counts'] β†’ raw UMI counts (sparse)
adata.obsm['X_scVI'] β†’ latent representation (d=20)

Integration across 54 samples requires batch-aware modelling. Sample batch is passed as a covariate to scVI, which learns to disentangle biological variation from technical batch effects. This is critical β€” naΓ―ve integration would confound tissue type with sample preparation batch.

βœ…

Validation strategy

V1

Literature cross-check

Top candidate markers are checked against existing endometriosis literature. Known markers (e.g. CA-125, VEGF) should appear in top-ranked pairs as a sanity check. Novel candidates are flagged for further investigation.

V2

scFv availability

For each candidate gene, we check whether existing single-chain variable fragments (scFvs) are available in the literature or antibody databases. No scFv = harder (not impossible) to engineer the CAR domain.

V3

Surface protein confirmation

CAR-T targets must be surface-accessible proteins. mRNA expression in scRNA-seq doesn't guarantee surface protein expression. Top candidates require protein-level validation (IHC, flow cytometry) in lesion tissue.

V4

Tabula Sapiens orthogonal check

Independent of the optimisation penalty: we manually inspect expression heatmaps for each top pair across all Tabula Sapiens cell types. No surprises allowed before anything goes near a lab.