Theoretical Framework ยท Differentiable Combinatorial Optimisation Possibly novel

The Mathematics

A complete derivation of our differentiable dual-marker optimisation framework. We believe the specific combination โ€” Gumbel-Softmax relaxation over gene expression binarisations with a jointly-differentiable whole-body safety penalty โ€” has not appeared in the literature.

๐Ÿ“

1. Problem formulation

We want to find a pair of surface markers $(g_1, g_2)$ from the transcriptome that are maximally expressed on ectopic lesion cells and minimally expressed on any healthy tissue in the whole human body. The combinatorial search space is intractably large for brute force.

1.1 โ€” Discrete marker selection problem NP-hard in general
$$\argmax_{(g_1, g_2) \in \mathcal{G}^2,\; g_1 \neq g_2} \;\; \mathcal{S}(g_1, g_2) - \lambda \cdot \mathcal{P}(g_1, g_2)$$
where $\mathcal{G}$ is the full gene vocabulary ($|\mathcal{G}| \approx 33{,}000$), $\mathcal{S}$ is a specificity score over the lesion atlas, $\mathcal{P}$ is a safety penalty from Tabula Sapiens, and $\lambda \in \R^+$ is a regularisation coefficient weighting safety against specificity. The number of candidate pairs is $\binom{|\mathcal{G}|}{2} \approx 5.4 \times 10^8$ โ€” enumeration is infeasible.

We therefore require a continuous relaxation that preserves gradient flow through the selection operator.
1.2 โ€” Gene expression matrix Input space
$$X \in \R^{N \times G}, \quad X_{ij} = \text{denoised expression of gene } j \text{ in cell } i$$
$N$ = number of cells after QC, $G = |\mathcal{G}|$. Each entry is the scVI-denoised latent mean $\mu_{ij}$ โ€” not the raw count. This matters: raw counts are zero-inflated and not suitable for gradient-based optimisation. The denoised representation from the scVI-VAE provides a smooth, continuous signal.
๐Ÿ”

2. Continuous relaxation of gene selection

Instead of selecting a hard indicator $z_j \in \{0,1\}$ per gene, we learn a continuous attention weight vector $\alpha \in \Delta^{G-1}$ (the $(G-1)$-simplex). This converts the combinatorial problem into a differentiable objective.

2.1 โ€” Soft selection via learnable logits Continuous proxy
$$\alpha = \softmax\!\left(\frac{\theta}{\tau}\right), \quad \theta \in \R^G, \quad \tau > 0$$
$\theta$ is a learnable logit vector initialised uniformly. $\tau$ is a temperature hyperparameter. At high $\tau$, $\alpha$ is near-uniform (exploration). As $\tau \to 0$, $\alpha$ concentrates mass on a single gene (exploitation / hard selection). We anneal $\tau$ during training.

The effective gene expression under the soft selection is the weighted sum:
$$\tilde{x}^{(k)} = X^\top \alpha^{(k)} \in \R^N, \quad k \in \{1,2\}$$
We maintain two independent logit vectors $\theta^{(1)}, \theta^{(2)}$ for the two markers, with an orthogonality constraint to prevent degenerate solutions where both select the same gene.
2.2 โ€” Orthogonality constraint Anti-degeneracy
$$\mathcal{L}_{\text{ortho}} = \left(\alpha^{(1)} \cdot \alpha^{(2)}\right)^2 = \left(\sum_{j=1}^G \alpha^{(1)}_j \, \alpha^{(2)}_j\right)^2$$
This penalty is zero when $\alpha^{(1)} \perp \alpha^{(2)}$ in $\R^G$ โ€” i.e. when the two soft selections place probability mass on non-overlapping genes. This forces the model to discover genuinely distinct markers rather than two copies of the same gene.
๐ŸŽฏ

3. Joint objective function

3.1 โ€” Full loss Main result Novel claim
$$\mathcal{L}(\theta^{(1)}, \theta^{(2)}) = -\mathcal{S}_{\text{soft}}(\alpha^{(1)}, \alpha^{(2)}) + \lambda_1 \mathcal{P}_{\text{safety}}(\alpha^{(1)}, \alpha^{(2)}) + \lambda_2 \mathcal{L}_{\text{ortho}}(\alpha^{(1)}, \alpha^{(2)})$$
This objective is fully differentiable with respect to $\theta^{(1)}, \theta^{(2)}$ through all three terms. Gradient descent in the continuous space $\R^G \times \R^G$ efficiently navigates the combinatorial gene-pair landscape.
3.2 โ€” Specificity score $\mathcal{S}_{\text{soft}}$ Lesion vs control
$$\mathcal{S}_{\text{soft}} = \frac{1}{|\mathcal{C}_{\text{ecto}}|} \sum_{i \in \mathcal{C}_{\text{ecto}}} \sigma\!\left(\tilde{x}^{(1)}_i\right) \cdot \sigma\!\left(\tilde{x}^{(2)}_i\right) - \frac{1}{|\mathcal{C}_{\text{ctrl}}|} \sum_{i \in \mathcal{C}_{\text{ctrl}}} \sigma\!\left(\tilde{x}^{(1)}_i\right) \cdot \sigma\!\left(\tilde{x}^{(2)}_i\right)$$
where $\sigma = \sigmoid$, $\mathcal{C}_{\text{ecto}}$ is the set of ectopic lesion cells and $\mathcal{C}_{\text{ctrl}}$ is the set of control (unaffected) cells. The product $\sigma(\tilde{x}^{(1)}_i) \cdot \sigma(\tilde{x}^{(2)}_i)$ approximates the joint AND-gate probability that both markers are expressed in cell $i$. This is the key innovation: the AND-gate structure is made end-to-end differentiable.
๐ŸŽฒ

4. Gumbel-Softmax reparameterisation

To recover a hard, discrete pair selection at the end of training while maintaining gradient flow during training, we use the Gumbel-Softmax (concrete distribution) trick, originally developed for discrete variational autoencoders. Here we apply it to combinatorial biological search โ€” a context we have not seen in the literature.

4.1 โ€” Gumbel noise injection Stochastic relaxation
$$\hat{\alpha}^{(k)}_j = \frac{\exp\!\left((\theta^{(k)}_j + g_j) / \tau\right)}{\sum_{l=1}^G \exp\!\left((\theta^{(k)}_l + g_l) / \tau\right)}, \quad g_j \sim \text{Gumbel}(0,1)$$
Gumbel noise $g_j = -\log(-\log(U_j))$, $U_j \sim \text{Uniform}(0,1)$. This has the property that $\argmax_j (\theta_j + g_j)$ is distributed as a categorical sample from $\softmax(\theta)$. Combined with the straight-through estimator (STE), this gives an unbiased gradient estimate through the discrete argmax.
4.2 โ€” Temperature annealing schedule Convergence
$$\tau(t) = \tau_0 \cdot \exp\!\left(-\eta \cdot t\right), \quad t = 0, 1, \ldots, T$$
We anneal from $\tau_0 = 1.0$ to $\tau_{\min} = 0.05$ over $T = 500$ gradient steps, with $\eta = \log(\tau_0 / \tau_{\min}) / T$. At $\tau_{\min}$, $\hat{\alpha}^{(k)}$ is approximately one-hot, and we take $\argmax_j \theta^{(k)}_j$ as the final hard selection.
4.3 โ€” Straight-Through Estimator (STE) Gradient flow
$$z^{(k)} = \text{one\_hot}\!\left(\argmax_j \hat{\alpha}^{(k)}_j\right), \quad \frac{\partial \mathcal{L}}{\partial \theta^{(k)}} \approx \frac{\partial \mathcal{L}}{\partial \hat{\alpha}^{(k)}}$$
In the forward pass, we use the hard one-hot $z^{(k)}$ for evaluation. In the backward pass, we pass gradients as if we had used the soft $\hat{\alpha}^{(k)}$. This allows gradient descent while maintaining discrete semantics at inference.
๐Ÿ›ก๏ธ

5. Whole-body safety penalty

The safety term cross-references expression against Tabula Sapiens, the whole-human cell atlas. Any marker pair with non-trivial expression in critical organs incurs a large penalty. This is incorporated directly into the loss and is differentiable with respect to $\alpha$ โ€” the safety signal backpropagates into the marker selection.

5.1 โ€” Organ-wise expression projection Tabula Sapiens interface
$$e^{(k)}_o = \frac{1}{|T_o|} \sum_{i \in T_o} \sigma\!\left(\tilde{x}^{(k)}_i\right), \quad o \in \mathcal{O}_{\text{critical}}$$
where $T_o$ is the set of Tabula Sapiens cells annotated to organ $o$, and $\mathcal{O}_{\text{critical}} = \{\text{heart, lung, brain, liver, kidney, \ldots}\}$. This projects the soft gene selection onto the expression level in each critical organ.
5.2 โ€” Joint safety penalty AND-gate safety
$$\mathcal{P}_{\text{safety}}(\alpha^{(1)}, \alpha^{(2)}) = \sum_{o \in \mathcal{O}_{\text{critical}}} w_o \cdot \underbrace{e^{(1)}_o \cdot e^{(2)}_o}_{\text{AND expression in organ } o}$$
The AND-gate structure mirrors the CAR-T targeting logic: toxicity only occurs if both markers are co-expressed. We penalise the joint expression probability, not individual expression. $w_o$ are organ-specific weights ($w_{\text{brain}} = w_{\text{heart}} = 10$, others $= 1$). The full gradient $\partial \mathcal{P}_{\text{safety}} / \partial \theta^{(k)}$ flows back through both $e^{(1)}_o$ and $e^{(2)}_o$.
โˆ‡

6. Gradient derivation

6.1 โ€” Gradient of loss w.r.t. logits Chain rule
$$\frac{\partial \mathcal{L}}{\partial \theta^{(k)}_j} = \left(\frac{\partial \mathcal{L}}{\partial \alpha^{(k)}}\right)^\top \frac{\partial \alpha^{(k)}}{\partial \theta^{(k)}_j}$$
The Jacobian of the softmax has a closed form. For the soft selection $\alpha = \softmax(\theta/\tau)$:
$$\frac{\partial \alpha_j}{\partial \theta_l} = \frac{1}{\tau}\left(\alpha_j \delta_{jl} - \alpha_j \alpha_l\right) = \frac{\alpha_j(\delta_{jl} - \alpha_l)}{\tau}$$
So the full gradient update is:
$$\frac{\partial \mathcal{L}}{\partial \theta^{(k)}} = \frac{1}{\tau}\left(\operatorname{diag}(\alpha^{(k)}) - \alpha^{(k)} {\alpha^{(k)}}^\top\right) \frac{\partial \mathcal{L}}{\partial \alpha^{(k)}}$$
This is a standard softmax gradient, but note that at low $\tau$ it becomes numerically unstable. We use log-space computations and gradient clipping with $\|\nabla\|_{\max} \leq 1.0$.
6.2 โ€” Update rule Adam optimiser
$$\theta^{(k)}_{t+1} = \theta^{(k)}_t - \eta_t \cdot \hat{m}_t \oslash \left(\sqrt{\hat{v}_t} + \epsilon\right)$$
We use the Adam optimiser with $\eta = 3 \times 10^{-3}$, $\beta_1 = 0.9$, $\beta_2 = 0.999$, $\epsilon = 10^{-8}$. $\hat{m}_t, \hat{v}_t$ are bias-corrected first and second moment estimates. Training runs for $T = 500$ steps on a single A100 GPU, completing in under 2 minutes.
โญ

7. Novelty claim

We believe this formulation is novel. Specifically: the combination of (1) Gumbel-Softmax relaxation applied to transcriptomic gene selection, (2) a jointly differentiable AND-gate specificity objective, and (3) a whole-body safety penalty backpropagating through cross-atlas expression projections โ€” as a single end-to-end differentiable system โ€” does not appear in existing literature. We are actively reviewing to confirm or falsify this.

Prior work comparison Literature review
EXISTING WORK
โ€ข Gumbel-Softmax: discrete VAEs (Jang et al. 2017)
โ€ข Differentiable gene selection: feature attribution (SHAP, integrated gradients)
โ€ข Marker discovery: differential expression (DESeq2, Seurat)
โ€ข CAR-T target identification: manual + fold-change filters
OUR CONTRIBUTION
โ€ข Gumbel-Softmax applied to gene space for target discovery
โ€ข AND-gate specificity made differentiable end-to-end
โ€ข Cross-atlas safety penalty backpropagating into selection
โ€ข Joint optimisation of specificity + safety in single loss

OPEN QUESTION

If you know of prior work combining differentiable relaxations with cross-atlas safety constraints for biological target discovery, please flag it. We want to understand exactly what, if anything, is new here before making any claims in writing.