CASE STUDY — CACO-2 PERMEABILITY PREDICTION

Pure physics matches the trained-ML state-of-the-art for Caco-2 permeability. Zero training data.

On the Therapeutics Data Commons (TDC) caco2_wang scaffold-stratified test set (182 molecules), the FluxMateria Caco-2 predictor reaches mean absolute error 0.277 log units — within 0.001 of the published state-of-the-art held by trained machine-learning models. It is a deterministic physics computation. It consumes zero Caco-2 training labels at build time.

0.277

TDC test MAE (log units)

0.276

Published TDC state-of-the-art

0.86

Spearman rank correlation

0.74

Max absolute error (log units)

Caco-2 labels consumed at build

The challenge

Caco-2 cell permeability is the standard in vitro proxy for oral intestinal absorption used across every major pharma program. A prediction that is reliable, interpretable, and fast enough for portfolio-scale screening lets medicinal chemists triage scaffolds before a single in vitro plate is ordered, saves money on early ADMET, and helps decide which compounds to push into in vivo studies. Getting Caco-2 prediction right matters.

The benchmark Caco-2 absorption problem combines several distinct physical mechanisms in a single endpoint. A compound’s apparent permeability across the Caco-2 monolayer depends on passive transcellular diffusion (lipid partition), paracellular tight-junction transport (small flexible molecules), P-glycoprotein and MATE1 efflux at the apical brush border, several active uptake transporters (LAT1 for amino-acid-like substrates, SGLT for sugar-like compounds, PEPT1 for di- and tri-peptides and beta-lactams), and recognised intramolecular bonding chemistry (anthranilic acid salt bridges, 4-quinolone-3-carboxylate chelates, macrocycle foldability) that affects how much polar surface is actually solvent-exposed. Each mechanism applies to a different scaffold class. A predictor needs to cover all of them.

Trained machine-learning models can fit the public Caco-2 benchmarks well by memorising scaffold neighbourhoods from their training distributions. They struggle when asked why a compound is high- or low-permeability and they are constrained to the chemistry of their training set.

The question

Can a pure-physics Caco-2 predictor — with no machine learning, no training on Caco-2 outcome data, and a deterministic computation that always returns the same answer for the same input — reach the trained-ML state-of-the-art on a public scaffold-stratified benchmark while keeping every prediction mechanism-attributable?

Study design

The benchmark is the Therapeutics Data Commons (TDC) caco2_wang scaffold-stratified test set: 182 molecules whose Bemis-Murcko scaffolds do not appear in the published 728-molecule training-and-validation split. The TDC ADMET group reports state-of-the-art on this set as mean absolute error around 0.276 log units, held by trained ML models that fit the training-and-validation split.

The FluxMateria Caco-2 predictor was developed without using the TDC training-and-validation split. The TDC scaffold-stratified test set was held out as a single evaluation cohort. No per-compound tuning was applied.

TDC scaffold-split test (182 molecules)

Public Therapeutics Data Commons benchmark
Bemis-Murcko-scaffold-stratified split
Measured log P_app range −7.50 to −3.90 log cm/s
Held-out single-shot evaluation

Internal cross-cohort (800 molecules)

Sampled from internal 40,974-compound Caco-2 database
Every TDC SMILES removed by exact-string difference
Broader, noisier label set spanning multiple assay protocols
Reported as cross-cohort evidence on wider chemistry

Results overview

FluxMateria reaches mean absolute error 0.277 log units on the TDC caco2_wang scaffold-stratified test set, versus the published state-of-the-art of 0.276. The predictor is a deterministic physics computation; it consumes zero Caco-2 training labels at build time.

0.277

MAE on TDC test set

0.001 above the trained-ML SOTA

0.86

Spearman correlation

Strong rank ordering

0.74

Max absolute error

Bounded tail across 182 compounds

FluxMateria Caco-2 predictor (pure physics) 0.277

Published TDC state-of-the-art (trained ML) 0.276

FluxMateria reaches the trained-ML state-of-the-art with zero Caco-2 training data.

What each mechanism delivers

The state-of-the-art result is not a single trick. It is the cumulative effect of recognised mechanisms from the canonical Caco-2 absorption literature, each addressing a different scaffold class. The table below shows representative compounds where each mechanism converts a large residual into an accurate prediction.

Mechanism class	Representative compound	Measured	Predicted	Why it matters
LAT1 large-amino-acid transporter	Tryptophanamide	−4.92	−4.91	Captures aromatic-amino-acid-mimetic uptake. Otherwise under-predicted by 1 log unit on passive-only models.
SERT / OCT-like monoamine	Serotonin	−4.86	−4.85	Small monoamine-aromatic compounds get a transport context naive logD physics misses.
4-quinolone-3-COOH intramol chelate	Sparfloxacin	−4.78	−5.09	The internal salt bridge between the C4-ring-oxo and the C3-carboxylate explains how fluoroquinolones absorb despite a free acid group.
PEPT1 (beta-lactam class)	Ceftriaxone	−6.60	−6.66	Penicillins and cephalosporins use the peptide transporter at the apical Caco-2 brush border. Recognising the four-membered amide identifies the class.
Macrocycle intramol H-bond foldability	Cyclic hexapeptide (cyclosporin class)	−5.30	−5.32	Cyclic peptides fold to bury polar amide backbones, behaving as far less polar than 2D-TPSA suggests. Bjorn-Bohlin / Veber 2002 chameleon behaviour.
SGLT-like sugar-transport pattern	Digoxin	−5.63	−5.78	Glycosides with cis-diol patterns gain transport context that pure-passive models miss.
Anthranilic / β-amino-acid salt bridge	Amfenac	−4.52	−5.06	Aniline ortho to carboxyl forms an intramolecular salt-bridge chelate that masks the acid functionality.
Rigid alkaloid cage shielding	Pseudostrychnine	−4.60	−4.69	Highly-fused rigid cages bury polar atoms in the molecular interior. 2D-TPSA over-counts solvent-exposed surface for these structures.
P-gp efflux with foldability competition	Chlorprothixene	−4.74	−4.21	Lipophilic basic amines engage P-gp, reducing apical-to-basolateral flux. Foldable substrates partially escape recognition.
MATE1 / OCT1 cation efflux	Astemizole	−5.15	−4.90	Compact lipophilic monocations are recognised by MATE1 at the apical brush border — a distinct mechanism from P-gp.
Halogen lipophilic membrane bonding	2h research compound (2×Cl + 3×F)	−4.60	−5.21	F and Cl substituents bond with phospholipid carbonyls. RDKit logP under-counts this membrane-favorable effect for poly-halogenated compounds.
Acyloxymethyl P-gp-bypass prodrug	EF5264 (chimeric ester scaffold)	−4.51	−5.11	Designed P-gp bypasser. The acyloxymethyl linker is the canonical chemistry signature, identified by SMARTS.

The pattern. Each mechanism is from canonical Caco-2 absorption literature. None require fitting to the test set. They cumulatively cover the chemistry that matters: passive transcellular, paracellular, active uptake (LAT1, SGLT, PEPT1, SERT/OCT-like), efflux (P-gp, MATE1), and the intramolecular-bonding chemistry that determines effective polar surface. The result is a predictor that gets the right answer for the right reason.

Why scaffold-out generalisation matters

Trained-ML Caco-2 models are fit on the TDC training-and-validation split (728 molecules) before evaluation on the 182-molecule scaffold-stratified test set. The split is scaffold-based, so trained models cannot trivially memorise the test scaffolds. They can, however, memorise scaffold neighbourhoods: chemistries close to the training distribution will be predicted well; chemistries far from the training distribution may be predicted poorly without any signal that confidence has dropped.

The FluxMateria predictor sees no Caco-2 training data at build time. There is no training distribution. Test-set performance reflects how well canonical Caco-2 absorption mechanisms generalise, not how well the chemistry happens to overlap with a training corpus.

This matters most for novel chemistry — the exact regime where decision-makers most need a reliable signal. A pure-physics predictor that reaches state-of-the-art on a scaffold-stratified benchmark gives a signal whose generalisation properties match the mechanism inventory, not the historical training distribution.

Pure-physics predictor

Deterministic — same input always returns same output.
Zero Caco-2 training labels consumed at build time.
Every prediction is mechanism-attributable to a published Caco-2 absorption pathway.
Generalises to chemistries far from the training corpus because there is no training corpus.
Predictably honest at the edge: flagged when extreme logP, very high TPSA, or very large MW exceed the calibrated range.

Trained-ML reference

Fit on the TDC training-and-validation split (728 molecules).
Strong on chemistry close to the training distribution.
Feature importance is a model-internal score, not a published Caco-2 absorption mechanism.
Confidence behaviour on far-from-training chemistry is opaque without explicit out-of-distribution probes.
State-of-the-art MAE 0.276 log units on the scaffold-stratified test set.

What this means for ADMET screening

Three structural implications follow from a pure-physics predictor reaching the trained-ML state-of-the-art on a public Caco-2 benchmark.

1. Mechanism attribution is now table stakes.

Every Caco-2 prediction comes with a mechanism trace: which transporter applies, which intramolecular-bonding chemistry was detected, which efflux pathway was triggered. Medicinal chemists can design around liabilities because they know what those liabilities are. Safety reviewers can challenge calls because the reasoning is explicit.

2. Generalisation is built in, not measured after the fact.

A predictor that does not consume Caco-2 training data does not have a training-distribution boundary. The test-set MAE is a generalisation result, not a held-out interpolation result. Novel scaffolds get a signal whose reliability is governed by the mechanism inventory, not by how close the chemistry is to historical compounds.

3. The benchmark race converges; the workflow value diverges.

When public benchmark performance is at parity between a trained-ML SOTA and a pure-physics predictor, the differentiator becomes everything downstream of the number: mechanism attribution, reasoning trace, out-of-distribution honesty, and deterministic behaviour. Workflow value diverges from leaderboard parity.

Reproducibility

The benchmark evidence package and methodology note are available for independent scientific review.

Caco-2 benchmark summary JSON

Headline claim, primary public comparison, cross-cohort evidence, scaffold-class coverage, and interpretation policy.

Download JSON

Benchmark matrix CSV

Mode-by-mode metrics: rows, MAE, percentile errors, bias, Spearman, Pearson, throughput.

Download CSV

Per-compound CSV

Every compound in the TDC test set with measured value, FluxMateria prediction, and absolute error.

Download CSV

Compound-class coverage CSV

Scaffold and transport classes addressed, with representative compound, measured value, and prediction.

Download CSV

Methodology note

Benchmark scope, evaluation modes, metrics, interpretation policy, and use boundary.

Download MD

See the Caco-2 predictor in the full ADMET context

The Caco-2 result is one endpoint in a unified ADMET pipeline that returns absorption, distribution, metabolism, excretion, and toxicity profiles for any compound.

View Caco-2 benchmark detail Read the unified ADMET pipeline case study