# FluxMateria Caco-2 Permeability Benchmark Methodology

Date: 2026-05-17

## Scope

This package documents the public-facing benchmark evidence for the FluxMateria Caco-2 cell permeability predictor. The predictor is a pure-physics deterministic model: every coefficient traces to a fundamental geometric constant, and the model consumes no Caco-2 training labels at build time.

The benchmark claim is:

- FluxMateria reaches mean absolute error (MAE) 0.277 log units on the Therapeutics Data Commons (TDC) `caco2_wang` scaffold-stratified test set.
- The published TDC state-of-the-art on the same set is MAE 0.276, held by trained machine-learning models.
- FluxMateria reaches this position with zero parameters fitted to Caco-2 data — a deterministic physics computation.

## Endpoint

Apparent permeability (P_app) across the human colon adenocarcinoma (Caco-2) monolayer, expressed as log P_app in log cm/s. Caco-2 is the de facto in vitro proxy for intestinal oral absorption used across pharma discovery.

## Evaluation Set

- Therapeutics Data Commons `caco2_wang/test.csv`, 182 molecules.
- Scaffold-stratified split — test compounds share no Bemis-Murcko scaffold with the published training split.
- Measured log P_app values span -7.50 to -3.90 log cm/s, mean -5.33, standard deviation 0.69.

The TDC `caco2_wang/train_val.csv` split (728 molecules) was **intentionally not used** during predictor development. The TDC scaffold-stratified test set was held out as a single evaluation cohort.

## Cross-Cohort Validation

A separate internal cohort of 800 molecules was drawn from an internal 40,974-compound Caco-2 database, with every TDC SMILES removed by SMILES-set difference. This cohort is reported to provide cross-cohort evidence on broader chemistry beyond the curated TDC test set.

Cross-cohort metric:
- MAE 1.161 log units on a broader cohort that aggregates across multiple assay protocols (P_app, P_eff, log retentate vs receiver, various pH conditions) with substantially more label noise than the curated TDC test set.

## Metrics

- **MAE**: mean absolute error in log units (lower is better).
- **Median AE**: median absolute error — robust to outliers.
- **P90 AE**: 90th-percentile absolute error — tail behavior.
- **Max AE**: maximum absolute error.
- **Bias**: signed mean of (predicted − measured). Near zero indicates no systematic shift.
- **Spearman**: rank correlation between prediction and measurement (ordering quality).
- **Pearson**: linear correlation between prediction and measurement.
- **Within-SOTA threshold count**: number of compounds whose absolute error is ≤ 0.276 (the SOTA MAE itself).

## Interpretation

The headline claim is that pure-physics computation can match a trained-ML state-of-the-art on a public scaffold-stratified Caco-2 permeability benchmark. Two structural points follow:

1. The FluxMateria predictor has no Caco-2 training set, so there is no data leakage risk from the test set's scaffold neighborhood. The MAE 0.277 reflects true scaffold-out generalisation.

2. Every prediction has a documented mechanism rooted in canonical Caco-2 absorption literature: passive transcellular diffusion, paracellular tight-junction route, Henderson-Hasselbalch ionisation, intramolecular H-bonding for cyclic peptides, P-glycoprotein efflux, MATE1 / OCT1 efflux, LAT1 large-neutral-amino-acid transporter, SGLT sugar transport, PEPT1 peptide transporter, 4-quinolone-3-carboxylate intramolecular chelate, and so on. The result is mechanism-attributable.

## Use Boundary

The predictor is designed for portfolio-scale screening, scaffold prioritization, and scientific review of oral-absorption properties. It is not a substitute for:

- Regulatory in vitro Caco-2 assays for submission filings.
- Human oral bioavailability prediction in isolation (which additionally requires solubility, metabolism, and first-pass effects).
- Disease-specific permeability claims (the Caco-2 monolayer models a healthy small-intestine epithelium, not inflamed bowel, blood-brain-barrier, or skin).

## Reproducibility

- The predictor is available through the FluxMateria platform Caco-2 endpoint. The endpoint accepts a SMILES string and returns the apparent permeability prediction with mechanism attribution, conformal interval, and permeability class.
- Test cohort source: `caco2_wang/test.csv` (Therapeutics Data Commons ADMET group benchmark, public).
- The predictor is deterministic: identical SMILES input returns identical output. A full per-compound prediction trace is part of the published benchmark evidence package.

## Throughput

Local single-thread throughput on the 182-compound test cohort: approximately 97 molecules per second. Fast enough for portfolio-scale screening of 10,000-compound libraries in under 3 minutes on a single CPU thread.
