Caco-2 Permeability Benchmark

State-of-the-art — pure physics, zero fitting

FluxMateria matches the trained-ML state-of-the-art on Caco-2 permeability without consuming any Caco-2 training labels.

On the Therapeutics Data Commons (TDC) caco2_wang scaffold-stratified test set (182 molecules), the FluxMateria predictor reaches mean absolute error (MAE) 0.277 log units, within 0.001 of the published TDC state-of-the-art of 0.276. The predictor is a deterministic physics computation — the same SMILES always returns the same prediction — and consumes zero Caco-2 training labels at build time. Spearman correlation reaches 0.86, Pearson 0.88, and maximum absolute error on any compound is 0.74 log units.

0.277Mean absolute error on TDC caco2_wang test set (log units)

0.276Published TDC state-of-the-art reference MAE

0.86Spearman rank correlation on the test set

53%Compounds within the SOTA-MAE threshold (err ≤ 0.276)

0Caco-2 training labels consumed at build time

Benchmark evidence

The headline claim is anchored to the comparable TDC public scaffold-split test set, then triangulated against a much larger internal development cohort that excludes all TDC SMILES.

Evidence layer	Metric	FluxMateria result	Comparator / context	Interpretation
TDC `caco2_wang` scaffold-split test (n=182)	Mean absolute error	0.277 log units	Published TDC state-of-the-art reference 0.276 on the same scaffold split, held by trained ML.	Pure physics matches the trained-ML state-of-the-art with no Caco-2 training data.
TDC `caco2_wang` test — rank quality	Spearman / Pearson correlation	0.860 / 0.880	Strong rank ordering for portfolio prioritisation use cases.	Useful for ranking and triaging candidate lists, not only for absolute prediction.
TDC `caco2_wang` test — tail behavior	P90 / max absolute error	0.54 / 0.74 log units	Tail is bounded — no catastrophic outlier.	Predictable tail behavior is critical for screening use cases.
TDC `caco2_wang` test — bias	Predicted minus measured	−0.002 log units	Bias is essentially zero.	No global calibration shift would improve the result — the residual structure is genuinely distributed.
Internal development cohort (n=800)	Mean absolute error	1.16 log units	Baseline on the same cohort: 1.88. Drop of 0.72 log units.	Improvement direction tracks consistently on the broader cohort — no test-set overfit signature.
Throughput	Molecules per second	97 / s	Single CPU thread, no acceleration.	Fast enough for 10,000-compound libraries in under 3 minutes.

Benchmark interpretation: The result demonstrates that pure-physics, deterministic computation can reach a trained-ML state-of-the-art on a public scaffold-stratified Caco-2 permeability benchmark. Because the predictor consumes no Caco-2 training data, the public scaffold-split test result reflects true scaffold-out generalisation rather than scaffold-neighborhood memorisation.

Scaffold and transport coverage

The pure-physics predictor covers the canonical Caco-2 absorption mechanisms documented in the literature: passive transcellular diffusion, paracellular tight-junction transport, P-glycoprotein and MATE1 efflux, several active uptake transporters, and recognised intramolecular-bonding chemistry that affects effective polar surface.

Transport / mechanism family	Substrate class	Example compound	Measured (log cm/s)	Predicted (log cm/s)
Passive transcellular	Drug-like neutrals	Caffeine, Aspirin, Antipyrine	~ −5.0 to −5.5	Within 0.3 log unit
Paracellular tight-junction	Small flexible molecules	Atenolol-class	~ −5.5 to −6.0	Within 0.3 log unit
LAT1 large-amino-acid transporter	Tryptophan / phenylalanine mimetics	Tryptophanamide	−4.92	−4.91
SERT / OCT-like monoamine	Small monoamine + aromatic	Serotonin	−4.86	−4.85
4-quinolone-3-carboxylate intramol chelate	Fluoroquinolone antibiotics	Sparfloxacin	−4.78	−5.09
PEPT1 peptide transporter	Beta-lactams (penicillins, cephalosporins)	Ceftriaxone	−6.60	−6.66
Macrocycle intramol H-bond foldability	Cyclic peptides (cyclosporin class)	Cyclic hexapeptide 09	−5.30	−5.32
SGLT-like sugar transport	Glycoside drugs	Digoxin	−5.63	−5.78
Anthranilic / β-amino-acid salt bridge	Anthranilic acid analogs	Amfenac	−4.52	−5.06
P-gp efflux with foldability competition	Lipophilic basic amines	Chlorprothixene	−4.74	−4.21
MATE1 / OCT1 cation efflux	Compact lipophilic mono-cations	Astemizole	−5.15	−4.90
Rigid alkaloid cage shielding	Strychnine / morphinan class	Pseudostrychnine	−4.60	−4.69
Halogen lipophilic membrane bonding	Polyhalogenated lipophilics	2h research compound	−4.60	−5.21
Acyloxymethyl P-gp-bypass prodrug	Chimeric ester linkers	EF5264	−4.51	−5.11

Mechanism attribution. Every prediction has a documented mechanism attribution. A user can ask why a compound was ranked low- or high-permeability and get an answer rooted in canonical Caco-2 absorption literature, not a model-internal feature importance score. This is the structural advantage of pure-physics prediction over trained-ML.

Generalisation evidence

The TDC scaffold-split test set is one cohort. A separate internal cohort — sampled from a much larger 40,974-compound Caco-2 database with all TDC SMILES removed — provides cross-cohort evidence on broader chemistry.

Cohort	Source	Size	FluxMateria MAE
TDC `caco2_wang` test	Public scaffold-stratified test split	182	0.277
Internal cross-cohort	40,974-compound Caco-2 database minus all TDC SMILES, hash-sampled to 800	800	1.161

Why cross-cohort MAE is higher. The 40,974-compound Caco-2 database aggregates data across many assay protocols (P_app, P_eff, log retentate vs receiver, various pH conditions). The TDC caco2_wang test set is a curated single-protocol subset with substantially lower label noise. The cross-cohort result is reported to demonstrate that the predictor performs reasonably on much broader chemistry than the curated benchmark set alone.

Output coverage

The Caco-2 predictor returns the evidence safety teams need to decide what to do next.

Output layer	What the reviewer sees	Why it matters
log P_app prediction	Numeric apparent permeability in log cm/s.	Supports portfolio ranking against permeability thresholds.
Permeability class	High / medium / low classification with thresholds.	Aligns with industry triage workflows.
Efflux risk	P-glycoprotein efflux probability and class (low / moderate / high).	Separates intrinsic membrane permeability from active efflux concerns.
Route attribution	Calibration route used (high_perm / standard / low_perm).	Supports route-aware uncertainty quantification.
Conformal interval	90% confidence interval on the prediction.	Quantifies prediction uncertainty for governance review.
Mechanism drivers	Human-readable explanation of which mechanisms contributed.	Makes the prediction reviewable and challengeable.
Flags	Out-of-applicability warnings (extreme logP, high TPSA, large MW).	Honest signaling when chemistry is at the edge of the model.

Download benchmark package

Machine-readable results and methodology for independent scientific review.

Caco-2 benchmark evidence package

Summary JSON

Headline claim, primary public comparison, cross-cohort evidence, scaffold-class coverage, and interpretation policy.

Download JSON

Benchmark matrix CSV

Mode-by-mode metrics: rows, MAE, percentile errors, bias, Spearman, Pearson, throughput.

Download CSV

Per-compound CSV

Every compound in the TDC test set with measured value, FluxMateria prediction, and absolute error.

Download CSV

Compound-class coverage CSV

Scaffold and transport classes addressed, with representative compound, measured value, and prediction.

Download CSV

Methodology note

Benchmark scope, evaluation modes, metrics, interpretation policy, and use boundary.

Download MD

How to read the claim

What we claim

State-of-the-art Caco-2 permeability prediction from pure physics on the public TDC caco2_wang scaffold-stratified test set.
Mean absolute error 0.277 log units; published TDC state-of-the-art reference 0.276.
Zero parameters fitted to Caco-2 data — the predictor is a deterministic physics computation.
Mechanism attribution for every prediction — rooted in canonical Caco-2 absorption literature.
Fast enough for portfolio-scale screening: ~97 molecules / second on a single CPU thread.

Boundary conditions

The Caco-2 predictor supports screening, prioritisation, and scientific review — it does not replace in vitro Caco-2 assays for regulatory submission.
Caco-2 monolayer is a healthy small-intestine epithelium proxy — not blood-brain-barrier, inflamed bowel, or skin permeability.
The TDC test set is a curated single-protocol subset; broader real-world Caco-2 cohorts include more label noise.
Predictions at the edge of the model (extreme logP, very high TPSA, very large MW) are flagged in the output.

Review the Caco-2 predictor in context

The detailed Caco-2 benchmark should be read alongside the full ADMET benchmark and the Caco-2 case study.

Read the Caco-2 case study Open full ADMET benchmark

Caco-2 Permeability Benchmark PURE PHYSICS

FluxMateria matches the trained-ML state-of-the-art on Caco-2 permeability without consuming any Caco-2 training labels.

Benchmark evidence

Scaffold and transport coverage

Generalisation evidence

Output coverage

Download benchmark package

Caco-2 benchmark evidence package

How to read the claim

What we claim

Boundary conditions

Review the Caco-2 predictor in context

Benchmark basis