Theoretical Framework ยท Differentiable Combinatorial Optimisation
Possibly novel
The Mathematics
A complete derivation of our differentiable dual-marker optimisation framework.
We believe the specific combination โ Gumbel-Softmax relaxation over gene expression binarisations
with a jointly-differentiable whole-body safety penalty โ has not appeared in the literature.
We want to find a pair of surface markers $(g_1, g_2)$ from the transcriptome that
are maximally expressed on ectopic lesion cells and
minimally expressed on any healthy tissue in the whole human body.
The combinatorial search space is intractably large for brute force.
$$\argmax_{(g_1, g_2) \in \mathcal{G}^2,\; g_1 \neq g_2} \;\; \mathcal{S}(g_1, g_2) - \lambda \cdot \mathcal{P}(g_1, g_2)$$
where $\mathcal{G}$ is the full gene vocabulary ($|\mathcal{G}| \approx 33{,}000$),
$\mathcal{S}$ is a specificity score over the lesion atlas,
$\mathcal{P}$ is a safety penalty from Tabula Sapiens, and
$\lambda \in \R^+$ is a regularisation coefficient weighting safety against specificity.
The number of candidate pairs is $\binom{|\mathcal{G}|}{2} \approx 5.4 \times 10^8$ โ enumeration is infeasible.
We therefore require a continuous relaxation that preserves gradient flow through the selection operator.
$$X \in \R^{N \times G}, \quad X_{ij} = \text{denoised expression of gene } j \text{ in cell } i$$
$N$ = number of cells after QC, $G = |\mathcal{G}|$. Each entry is the scVI-denoised
latent mean $\mu_{ij}$ โ not the raw count. This matters: raw counts are zero-inflated
and not suitable for gradient-based optimisation. The denoised representation from
the scVI-VAE provides a smooth, continuous signal.
Instead of selecting a hard indicator $z_j \in \{0,1\}$ per gene, we learn a continuous
attention weight vector $\alpha \in \Delta^{G-1}$ (the $(G-1)$-simplex).
This converts the combinatorial problem into a differentiable objective.
$$\alpha = \softmax\!\left(\frac{\theta}{\tau}\right), \quad \theta \in \R^G, \quad \tau > 0$$
$\theta$ is a learnable logit vector initialised uniformly. $\tau$ is a temperature hyperparameter.
At high $\tau$, $\alpha$ is near-uniform (exploration). As $\tau \to 0$, $\alpha$ concentrates
mass on a single gene (exploitation / hard selection). We anneal $\tau$ during training.
The effective gene expression under the soft selection is the weighted sum:
$$\tilde{x}^{(k)} = X^\top \alpha^{(k)} \in \R^N, \quad k \in \{1,2\}$$
We maintain two independent logit vectors $\theta^{(1)}, \theta^{(2)}$ for the two markers,
with an orthogonality constraint to prevent degenerate solutions where both select the same gene.
$$\mathcal{L}_{\text{ortho}} = \left(\alpha^{(1)} \cdot \alpha^{(2)}\right)^2 = \left(\sum_{j=1}^G \alpha^{(1)}_j \, \alpha^{(2)}_j\right)^2$$
This penalty is zero when $\alpha^{(1)} \perp \alpha^{(2)}$ in $\R^G$ โ i.e. when the two
soft selections place probability mass on non-overlapping genes. This forces the model to
discover genuinely distinct markers rather than two copies of the same gene.
$$\mathcal{L}(\theta^{(1)}, \theta^{(2)}) = -\mathcal{S}_{\text{soft}}(\alpha^{(1)}, \alpha^{(2)}) + \lambda_1 \mathcal{P}_{\text{safety}}(\alpha^{(1)}, \alpha^{(2)}) + \lambda_2 \mathcal{L}_{\text{ortho}}(\alpha^{(1)}, \alpha^{(2)})$$
This objective is fully differentiable with respect to $\theta^{(1)}, \theta^{(2)}$
through all three terms. Gradient descent in the continuous space $\R^G \times \R^G$
efficiently navigates the combinatorial gene-pair landscape.
$$\mathcal{S}_{\text{soft}} = \frac{1}{|\mathcal{C}_{\text{ecto}}|} \sum_{i \in \mathcal{C}_{\text{ecto}}} \sigma\!\left(\tilde{x}^{(1)}_i\right) \cdot \sigma\!\left(\tilde{x}^{(2)}_i\right) - \frac{1}{|\mathcal{C}_{\text{ctrl}}|} \sum_{i \in \mathcal{C}_{\text{ctrl}}} \sigma\!\left(\tilde{x}^{(1)}_i\right) \cdot \sigma\!\left(\tilde{x}^{(2)}_i\right)$$
where $\sigma = \sigmoid$, $\mathcal{C}_{\text{ecto}}$ is the set of ectopic lesion cells
and $\mathcal{C}_{\text{ctrl}}$ is the set of control (unaffected) cells.
The product $\sigma(\tilde{x}^{(1)}_i) \cdot \sigma(\tilde{x}^{(2)}_i)$ approximates the
joint AND-gate probability that both markers are expressed in cell $i$.
This is the key innovation: the AND-gate structure is made end-to-end differentiable.
To recover a hard, discrete pair selection at the end of training while maintaining
gradient flow during training, we use the Gumbel-Softmax (concrete distribution)
trick, originally developed for discrete variational autoencoders. Here we apply it
to combinatorial biological search โ a context we have not seen in the literature.
$$\hat{\alpha}^{(k)}_j = \frac{\exp\!\left((\theta^{(k)}_j + g_j) / \tau\right)}{\sum_{l=1}^G \exp\!\left((\theta^{(k)}_l + g_l) / \tau\right)}, \quad g_j \sim \text{Gumbel}(0,1)$$
Gumbel noise $g_j = -\log(-\log(U_j))$, $U_j \sim \text{Uniform}(0,1)$.
This has the property that $\argmax_j (\theta_j + g_j)$ is distributed as a categorical
sample from $\softmax(\theta)$. Combined with the straight-through estimator (STE),
this gives an unbiased gradient estimate through the discrete argmax.
$$\tau(t) = \tau_0 \cdot \exp\!\left(-\eta \cdot t\right), \quad t = 0, 1, \ldots, T$$
We anneal from $\tau_0 = 1.0$ to $\tau_{\min} = 0.05$ over $T = 500$ gradient steps,
with $\eta = \log(\tau_0 / \tau_{\min}) / T$. At $\tau_{\min}$, $\hat{\alpha}^{(k)}$
is approximately one-hot, and we take $\argmax_j \theta^{(k)}_j$ as the final hard selection.
$$z^{(k)} = \text{one\_hot}\!\left(\argmax_j \hat{\alpha}^{(k)}_j\right), \quad \frac{\partial \mathcal{L}}{\partial \theta^{(k)}} \approx \frac{\partial \mathcal{L}}{\partial \hat{\alpha}^{(k)}}$$
In the forward pass, we use the hard one-hot $z^{(k)}$ for evaluation.
In the backward pass, we pass gradients as if we had used the soft $\hat{\alpha}^{(k)}$.
This allows gradient descent while maintaining discrete semantics at inference.
The safety term cross-references expression against Tabula Sapiens,
the whole-human cell atlas. Any marker pair with non-trivial expression in critical organs
incurs a large penalty. This is incorporated directly into the loss and is differentiable
with respect to $\alpha$ โ the safety signal backpropagates into the marker selection.
$$e^{(k)}_o = \frac{1}{|T_o|} \sum_{i \in T_o} \sigma\!\left(\tilde{x}^{(k)}_i\right), \quad o \in \mathcal{O}_{\text{critical}}$$
where $T_o$ is the set of Tabula Sapiens cells annotated to organ $o$,
and $\mathcal{O}_{\text{critical}} = \{\text{heart, lung, brain, liver, kidney, \ldots}\}$.
This projects the soft gene selection onto the expression level in each critical organ.
$$\mathcal{P}_{\text{safety}}(\alpha^{(1)}, \alpha^{(2)}) = \sum_{o \in \mathcal{O}_{\text{critical}}} w_o \cdot \underbrace{e^{(1)}_o \cdot e^{(2)}_o}_{\text{AND expression in organ } o}$$
The AND-gate structure mirrors the CAR-T targeting logic: toxicity only occurs if
both markers are co-expressed. We penalise the joint expression probability,
not individual expression. $w_o$ are organ-specific weights ($w_{\text{brain}} = w_{\text{heart}} = 10$,
others $= 1$). The full gradient $\partial \mathcal{P}_{\text{safety}} / \partial \theta^{(k)}$
flows back through both $e^{(1)}_o$ and $e^{(2)}_o$.
$$\frac{\partial \mathcal{L}}{\partial \theta^{(k)}_j} = \left(\frac{\partial \mathcal{L}}{\partial \alpha^{(k)}}\right)^\top \frac{\partial \alpha^{(k)}}{\partial \theta^{(k)}_j}$$
The Jacobian of the softmax has a closed form. For the soft selection $\alpha = \softmax(\theta/\tau)$:
$$\frac{\partial \alpha_j}{\partial \theta_l} = \frac{1}{\tau}\left(\alpha_j \delta_{jl} - \alpha_j \alpha_l\right) = \frac{\alpha_j(\delta_{jl} - \alpha_l)}{\tau}$$
So the full gradient update is:
$$\frac{\partial \mathcal{L}}{\partial \theta^{(k)}} = \frac{1}{\tau}\left(\operatorname{diag}(\alpha^{(k)}) - \alpha^{(k)} {\alpha^{(k)}}^\top\right) \frac{\partial \mathcal{L}}{\partial \alpha^{(k)}}$$
This is a standard softmax gradient, but note that at low $\tau$ it becomes numerically unstable.
We use log-space computations and gradient clipping with $\|\nabla\|_{\max} \leq 1.0$.
$$\theta^{(k)}_{t+1} = \theta^{(k)}_t - \eta_t \cdot \hat{m}_t \oslash \left(\sqrt{\hat{v}_t} + \epsilon\right)$$
We use the Adam optimiser with $\eta = 3 \times 10^{-3}$, $\beta_1 = 0.9$, $\beta_2 = 0.999$,
$\epsilon = 10^{-8}$. $\hat{m}_t, \hat{v}_t$ are bias-corrected first and second moment estimates.
Training runs for $T = 500$ steps on a single A100 GPU, completing in under 2 minutes.
We believe this formulation is novel.
Specifically: the combination of (1) Gumbel-Softmax relaxation applied to transcriptomic
gene selection, (2) a jointly differentiable AND-gate specificity objective, and
(3) a whole-body safety penalty backpropagating through cross-atlas expression projections โ
as a single end-to-end differentiable system โ does not appear in existing literature.
We are actively reviewing to confirm or falsify this.
EXISTING WORK
โข Gumbel-Softmax: discrete VAEs (Jang et al. 2017)
โข Differentiable gene selection: feature attribution (SHAP, integrated gradients)
โข Marker discovery: differential expression (DESeq2, Seurat)
โข CAR-T target identification: manual + fold-change filters
OUR CONTRIBUTION
โข Gumbel-Softmax applied to gene space for target discovery
โข AND-gate specificity made differentiable end-to-end
โข Cross-atlas safety penalty backpropagating into selection
โข Joint optimisation of specificity + safety in single loss
OPEN QUESTION
If you know of prior work combining differentiable relaxations with cross-atlas safety constraints for biological target discovery, please flag it. We want to understand exactly what, if anything, is new here before making any claims in writing.