Methods ยท Technical Architecture

The Pipeline

A complete technical description of the computational pipeline โ€” from raw Cell Ranger output to ranked dual-marker candidates. This reads like a methods section.

01

Data Ingestion & Quality Control

Raw Cell Ranger outputs (raw_feature_bc_matrix.h5) are parsed for each of the 54 patient samples. Quality control removes empty droplets and dying cells using three filters: minimum gene count, maximum mitochondrial read fraction, and minimum total UMI count.

Filtered cells are exported as .h5ad files (AnnData format) for downstream processing. Metadata (patient ID, tissue type, sample batch) are embedded in adata.obs.

Gene counts per cell โ€” histogram (QC filter threshold shown)
โ–  filtered (removed) โ–  kept โ€” threshold
QC filter criteria
$$\text{keep cell } i \iff n\_\text{genes}(i) \geq 200 \;\wedge\; \frac{MT\text{-}UMI_i}{\text{total-}UMI_i} \leq 0.2 \;\wedge\; \text{total-}UMI_i \geq 500$$
scanpy AnnData min_genes=200 max_MT=20% 54 samples