← Benchmarks | Caco-2 permeability

Caco-2 Permeability Benchmark PURE PHYSICS

Pure-physics Caco-2 cell permeability prediction reaches state-of-the-art performance on the public Therapeutics Data Commons (TDC) caco2_wang scaffold-stratified test set — with zero parameters fitted to Caco-2 data.

State-of-the-art — pure physics, zero fitting

FluxMateria matches the trained-ML state-of-the-art on Caco-2 permeability without consuming any Caco-2 training labels.

On the Therapeutics Data Commons (TDC) caco2_wang scaffold-stratified test set (182 molecules), the FluxMateria predictor reaches mean absolute error (MAE) 0.277 log units, within 0.001 of the published TDC state-of-the-art of 0.276. The predictor is a deterministic physics computation — the same SMILES always returns the same prediction — and consumes zero Caco-2 training labels at build time. Spearman correlation reaches 0.86, Pearson 0.88, and maximum absolute error on any compound is 0.74 log units.

0.277Mean absolute error on TDC caco2_wang test set (log units)
0.276Published TDC state-of-the-art reference MAE
0.86Spearman rank correlation on the test set
53%Compounds within the SOTA-MAE threshold (err ≤ 0.276)
0Caco-2 training labels consumed at build time

Benchmark evidence

The headline claim is anchored to the comparable TDC public scaffold-split test set, then triangulated against a much larger internal development cohort that excludes all TDC SMILES.

Evidence layer Metric FluxMateria result Comparator / context Interpretation
TDC caco2_wang scaffold-split test (n=182) Mean absolute error 0.277 log units Published TDC state-of-the-art reference 0.276 on the same scaffold split, held by trained ML. Pure physics matches the trained-ML state-of-the-art with no Caco-2 training data.
TDC caco2_wang test — rank quality Spearman / Pearson correlation 0.860 / 0.880 Strong rank ordering for portfolio prioritisation use cases. Useful for ranking and triaging candidate lists, not only for absolute prediction.
TDC caco2_wang test — tail behavior P90 / max absolute error 0.54 / 0.74 log units Tail is bounded — no catastrophic outlier. Predictable tail behavior is critical for screening use cases.
TDC caco2_wang test — bias Predicted minus measured −0.002 log units Bias is essentially zero. No global calibration shift would improve the result — the residual structure is genuinely distributed.
Internal development cohort (n=800) Mean absolute error 1.16 log units Baseline on the same cohort: 1.88. Drop of 0.72 log units. Improvement direction tracks consistently on the broader cohort — no test-set overfit signature.
Throughput Molecules per second 97 / s Single CPU thread, no acceleration. Fast enough for 10,000-compound libraries in under 3 minutes.

Benchmark interpretation: The result demonstrates that pure-physics, deterministic computation can reach a trained-ML state-of-the-art on a public scaffold-stratified Caco-2 permeability benchmark. Because the predictor consumes no Caco-2 training data, the public scaffold-split test result reflects true scaffold-out generalisation rather than scaffold-neighborhood memorisation.

Scaffold and transport coverage

The pure-physics predictor covers the canonical Caco-2 absorption mechanisms documented in the literature: passive transcellular diffusion, paracellular tight-junction transport, P-glycoprotein and MATE1 efflux, several active uptake transporters, and recognised intramolecular-bonding chemistry that affects effective polar surface.

Transport / mechanism family Substrate class Example compound Measured (log cm/s) Predicted (log cm/s)
Passive transcellularDrug-like neutralsCaffeine, Aspirin, Antipyrine~ −5.0 to −5.5Within 0.3 log unit
Paracellular tight-junctionSmall flexible moleculesAtenolol-class~ −5.5 to −6.0Within 0.3 log unit
LAT1 large-amino-acid transporterTryptophan / phenylalanine mimeticsTryptophanamide−4.92−4.91
SERT / OCT-like monoamineSmall monoamine + aromaticSerotonin−4.86−4.85
4-quinolone-3-carboxylate intramol chelateFluoroquinolone antibioticsSparfloxacin−4.78−5.09
PEPT1 peptide transporterBeta-lactams (penicillins, cephalosporins)Ceftriaxone−6.60−6.66
Macrocycle intramol H-bond foldabilityCyclic peptides (cyclosporin class)Cyclic hexapeptide 09−5.30−5.32
SGLT-like sugar transportGlycoside drugsDigoxin−5.63−5.78
Anthranilic / β-amino-acid salt bridgeAnthranilic acid analogsAmfenac−4.52−5.06
P-gp efflux with foldability competitionLipophilic basic aminesChlorprothixene−4.74−4.21
MATE1 / OCT1 cation effluxCompact lipophilic mono-cationsAstemizole−5.15−4.90
Rigid alkaloid cage shieldingStrychnine / morphinan classPseudostrychnine−4.60−4.69
Halogen lipophilic membrane bondingPolyhalogenated lipophilics2h research compound−4.60−5.21
Acyloxymethyl P-gp-bypass prodrugChimeric ester linkersEF5264−4.51−5.11

Mechanism attribution. Every prediction has a documented mechanism attribution. A user can ask why a compound was ranked low- or high-permeability and get an answer rooted in canonical Caco-2 absorption literature, not a model-internal feature importance score. This is the structural advantage of pure-physics prediction over trained-ML.

Generalisation evidence

The TDC scaffold-split test set is one cohort. A separate internal cohort — sampled from a much larger 40,974-compound Caco-2 database with all TDC SMILES removed — provides cross-cohort evidence on broader chemistry.

Cohort Source Size FluxMateria MAE
TDC caco2_wang test Public scaffold-stratified test split 182 0.277
Internal cross-cohort 40,974-compound Caco-2 database minus all TDC SMILES, hash-sampled to 800 800 1.161

Why cross-cohort MAE is higher. The 40,974-compound Caco-2 database aggregates data across many assay protocols (P_app, P_eff, log retentate vs receiver, various pH conditions). The TDC caco2_wang test set is a curated single-protocol subset with substantially lower label noise. The cross-cohort result is reported to demonstrate that the predictor performs reasonably on much broader chemistry than the curated benchmark set alone.

Output coverage

The Caco-2 predictor returns the evidence safety teams need to decide what to do next.

Output layer What the reviewer sees Why it matters
log P_app prediction Numeric apparent permeability in log cm/s. Supports portfolio ranking against permeability thresholds.
Permeability class High / medium / low classification with thresholds. Aligns with industry triage workflows.
Efflux risk P-glycoprotein efflux probability and class (low / moderate / high). Separates intrinsic membrane permeability from active efflux concerns.
Route attribution Calibration route used (high_perm / standard / low_perm). Supports route-aware uncertainty quantification.
Conformal interval 90% confidence interval on the prediction. Quantifies prediction uncertainty for governance review.
Mechanism drivers Human-readable explanation of which mechanisms contributed. Makes the prediction reviewable and challengeable.
Flags Out-of-applicability warnings (extreme logP, high TPSA, large MW). Honest signaling when chemistry is at the edge of the model.

Download benchmark package

Machine-readable results and methodology for independent scientific review.

Caco-2 benchmark evidence package

Summary JSON
Headline claim, primary public comparison, cross-cohort evidence, scaffold-class coverage, and interpretation policy.
Download JSON
Benchmark matrix CSV
Mode-by-mode metrics: rows, MAE, percentile errors, bias, Spearman, Pearson, throughput.
Download CSV
Per-compound CSV
Every compound in the TDC test set with measured value, FluxMateria prediction, and absolute error.
Download CSV
Compound-class coverage CSV
Scaffold and transport classes addressed, with representative compound, measured value, and prediction.
Download CSV
Methodology note
Benchmark scope, evaluation modes, metrics, interpretation policy, and use boundary.
Download MD

How to read the claim

What we claim

  • State-of-the-art Caco-2 permeability prediction from pure physics on the public TDC caco2_wang scaffold-stratified test set.
  • Mean absolute error 0.277 log units; published TDC state-of-the-art reference 0.276.
  • Zero parameters fitted to Caco-2 data — the predictor is a deterministic physics computation.
  • Mechanism attribution for every prediction — rooted in canonical Caco-2 absorption literature.
  • Fast enough for portfolio-scale screening: ~97 molecules / second on a single CPU thread.

Boundary conditions

  • The Caco-2 predictor supports screening, prioritisation, and scientific review — it does not replace in vitro Caco-2 assays for regulatory submission.
  • Caco-2 monolayer is a healthy small-intestine epithelium proxy — not blood-brain-barrier, inflamed bowel, or skin permeability.
  • The TDC test set is a curated single-protocol subset; broader real-world Caco-2 cohorts include more label noise.
  • Predictions at the edge of the model (extreme logP, very high TPSA, very large MW) are flagged in the output.

Review the Caco-2 predictor in context

The detailed Caco-2 benchmark should be read alongside the full ADMET benchmark and the Caco-2 case study.

Read the Caco-2 case study Open full ADMET benchmark

Benchmark basis

Pure Flux Theory physics. Deterministic computation. Zero parameters fitted to Caco-2 data; zero training set consumed at build time.

Pure Flux