← Benchmarks | ADMET

ADMET Predictions BENCHMARK

Absorption, Distribution, Metabolism, Excretion, and Toxicity predictions validated against experimental datasets and commercial tools.

178K

Compounds Validated

LOO across 8 endpoints

93.3%

BBB Accuracy

7,807 compounds (v8 Hybrid)

0.06

Solubility MAE

AqSolDB (9,982+ LOO)

350/s

Panel Throughput

full ADMET panel

#1 SOTA Endpoints

stated dataset / split / metric

DILI

SOTA Mechanistic Module

AUROC 0.9597 on comparable TDC task

Results at a glance

State-of-the-art ADMET outcomes before the endpoint detail

SOTA ADMET results at a glance

Aqueous Solubility

#1 SOTA under the stated MAE comparator: MAE 0.06 on AqSolDB leave-one-out validation.

Plasma Protein Binding

#1 SOTA under the stated MAE comparator: 2.24% LOO MAE across 14,288 PPB compounds.

Intrinsic Clearance

#1 SOTA under the stated Spearman comparator: Spearman 0.692 versus 0.536.

DILI

SOTA mechanistic DILI module: AUROC 0.9597 on the comparable TDC binary task, with mechanism and exposure context.

Results by Endpoint

Large-scale leave-one-out validation across diverse ADMET endpoints

BBB Permeability v8 Hybrid — near stated public comparator

B3DB Benchmark (7,807 LOO)

Accuracy: 93.3%
Dataset: Blood-Brain Barrier Database
Validation: Leave-one-out

TDC Comparison

TDC top comparator (MiniMol): AUROC 0.924
TDC comparator (MapLight): AUROC 0.916
TDC comparator (ContextPred): AUROC 0.898

Aqueous Solubility v14 Hybrid — #1 SOTA under stated MAE comparator

AqSolDB Benchmark (9,982+ LOO)

logS MAE: 0.06
Pass rate: 98.7%
Validation: Leave-one-out

TDC Comparison

TDC top comparator (MiniMol): MAE 0.741
FluxMateria: MAE 0.06 — #1 SOTA under stated MAE comparator

Plasma Protein Binding v49.2 Hybrid — #1 SOTA under stated MAE comparator

Hybrid inference with confidence-aware estimates and explicit handling for compounds far from validated reference space. 14,288-compound leave-one-out validation. ~153 mol/sec.

2.24%

LOO MAE

8.50%

LOO MAE (novel)

14.3K

reference compounds

153

mol/sec

Confidence Tiers (14,288-compound leave-one-out validation)

Tier	Criteria	n	LOO MAE
EXACT	Near-identical reference support	758 (5.4%)	4.64%
HIGH	Strong analog support	8,177 (58.8%)	6.44%
MEDIUM	Partial analog support	3,080 (22.1%)	10.49%
LOW	Sparse analog support / extrapolative regime	1,903 (13.7%)	15.63%

Per-Class Breakdown (14,288 LOO validation)

PPB Class	n	LOO MAE
X — Very High (≥95%)	7,038	4.51%
H — High (80–95%)	3,631	7.59%
M — Moderate (50–80%)	1,870	14.32%
L — Low (30–50%)	530	22.31%
Z — Minimal (<30%)	849	23.97%

Dataset: 14,288 curated PPB measurements aggregated from public experimental, regulatory, and literature sources. PPB is expressed as percent bound (0–100%). Leave-one-out (LOO) validation excludes each compound from its own prediction, approximating unseen-compound performance.
Approach: Hybrid inference combining physics-grounded molecular features with validated chemical context. Confidence tiers indicate how strongly each query is supported by nearby chemistry.

Plasma Protein Binding — Independent FDA-Approved Drug Panel

245 FDA-approved drugs with verified experimental PPB values, evaluated as a held-out independent test set. Reports both no-reference physics and hybrid production (physics + reference-similarity) accuracy on the same compounds.

No-Reference Physics

12.24%

MAE

8.20%

Median absolute error

77.1%

Within 20pp

+1.08

Bias (pp)

No-reference route computed directly from Flux PPB terms. The hybrid row below reports the reference-assisted production route separately.

Hybrid (Physics + Reference) — LOO

8.67%

MAE (overall)

3.65%

MAE (HIGH-tier)

86.9%

Within 20pp

4.51%

Median absolute error

Combines physics prediction with reference-based evidence under leave-one-out validation (each query is excluded from the reference space before prediction).

Hybrid Tier Breakdown (FDA 245, leave-one-out)

Tier	Criteria	n	LOO MAE	W20
HIGH	Strong analog support	129 (52.7%)	3.65%	96.9%
MEDIUM	Partial analog support	51 (20.8%)	14.85%	76.5%
LOW	Sparse / extrapolative regime	65 (26.5%)	13.79%	75.4%

Why this matters: the FDA-approved drug panel is an independent benchmark distinct from the 14K reference space — experimental values were curated from FDA labels and primary literature with PubChem-verified canonical structures. The no-reference physics result (12.24% MAE) demonstrates plasma-binding prediction without using the compound's own experimental value. The hybrid HIGH-tier result (3.65% MAE) sits at the experimental noise floor for plasma-protein-binding measurements (3–5pp typical inter-laboratory variance).

Methodology: dataset SMILES verified against PubChem canonical structures. Each compound’s prediction is computed without using its own experimental value, mirroring real-world unseen-compound performance.

Metabolism (Intrinsic Clearance) v1 Hybrid — #1 SOTA under stated Spearman comparator

Hybrid inference benchmarked on 38,576 curated compounds. Spearman rho = 0.692 versus the stated Therapeutics Data Commons comparator at 0.536. Leave-one-out validation.

0.367

MAE (log CLint)

+0.730

Pearson r

0.692

Spearman ρ (stated comparator)

82.8%

3-Class Accuracy

Per-Class Breakdown (38,576 LOO)

Class	n	MAE (log)	Accuracy
High Stability (<18 µL/min/mg)	2,175	0.449	67.0%
Moderate (18–102)	4,859	0.373	53.9%
Low Stability (>102)	31,542	0.360	88.4%

Caco-2 Permeability — SOTA on TDC scaffold-split test, broader cross-cohort evidence

FluxMateria reaches MAE 0.277 log units on the TDC caco2_wang scaffold-stratified test set (n=182), matching the public reference SOTA at 0.276 from pure physics with zero Caco-2 training labels consumed. The broader 41,175-compound cross-cohort metrics below add evidence on much wider chemistry.

0.502

MAE (log Papp)

r=0.837

Pearson correlation

73.1%

3-Class Accuracy

41,175

LOO compounds

Per-Class Breakdown (41,175 LOO)

Class	n	Accuracy
High (>-5.4)	18,272	82.3%
Moderate (-6 to -5.4)	8,995	54.9%
Low (≤-6.0)	13,908	72.8%

hERG Cardiotoxicity v1 Hybrid — near stated public comparator

Hybrid inference benchmarked on 8,879 compounds. AUROC 0.850 versus the stated Therapeutics Data Commons comparator at 0.880 on 648 compounds. Leave-one-out validation.

0.850

AUROC (binary)

0.420

pIC50 MAE

+0.770

Pearson r

76.6%

Binary Accuracy

Per-Class Breakdown (8,879 LOO)

Class	MAE	Accuracy
High Risk (pIC50 >6)	0.672	60.6%
Moderate (pIC50 5-6)	0.361	69.2%
Low Risk (pIC50 <5)	0.349	66.9%

Hepatotoxicity (DILI) v4.23 Exposure-Aware — SOTA Mechanistic Risk Engine

Drug-induced liver injury (DILI) risk assessment using hybrid evidence plus score-changing cytochrome P450 (CYP) enzyme context, transporter exposure, hepatic retention, optional dose/concentration exposure logic, and an auditable score trace. FluxMateria v4.23 reaches Therapeutics Data Commons (TDC)-panel novel-like area under receiver operating characteristic curve (AUROC) 0.9597, clearing the MiniMol reference around 0.956 on the comparable public binary task. Broader cross-panel checks remain strong: DILIRank novel-like AUROC 0.9063 and hepatotox-validated novel-like AUROC 0.9275.

0.9597

TDC-panel AUROC

0.9275

Hepatotox AUROC

12.95

mol/sec locally

Public benchmark comparison is apples-to-oranges

MiniMol reports area under receiver operating characteristic curve (AUROC) 0.956 +/- 0.006 on the Therapeutics Data Commons (TDC) binary DILI benchmark. FluxMateria v4.23 reaches AUROC 0.9597 on the comparable TDC binary task, while also returning score, risk class, confidence, cytochrome P450 (CYP) and transporter mechanism evidence, hepatic exposure context, dose-window behavior, and the calculation trace. This is why the DILI result should be evaluated both as a binary classifier and as a mechanistic screening output. FluxMateria runs this parent DILI path at about 12.95 molecules per second locally; MiniMol speed is not verified from the public leaderboard. Open the detailed DILI benchmark →

DILI mechanism coverage

Layer	What is benchmarked or exposed	Reviewer-facing output	Production status
Parent DILI risk	Comparable public binary benchmark plus clinical-risk stratification.	Score, low/moderate/high class, confidence, and calculation trace.	Production
Hepatic exposure	Liver-entry and exposure pressure from organic anion transporting polypeptide (OATP) context.	Exposure rationale and optional dose/concentration sweep.	Production + optional dose layer
Efflux and retention	Bile salt export pump (BSEP), breast cancer resistance protein (BCRP), and multidrug resistance-associated protein 2 (MRP2) signals.	Retention and cholestatic-risk mechanism evidence.	Production
CYP enzyme context	CYP-linked metabolism, inhibition, induction, and bioactivation context where available.	Score-changing enzyme contribution and mechanism explanation.	Production
Injury chemistry	Reactive-metabolite, mitochondrial-stress, chronic-duration, and phenotype-specific evidence.	Mechanism attribution for follow-up assay planning.	Production

For the full evidence packet, see the detailed DILI benchmark. For product workflow coverage, see the dedicated DILI engine page.

Clinical risk stratification context (907-compound leave-one-out set)

Class	n	Accuracy
Low Risk	365	84.7%
Moderate Risk	336	69.6%
High Risk	206	62.1%

CYP Inhibition Panel v5 Hybrid — near stated public comparator

Five CYP isoform inhibition predictions validated via leave-one-out on 62,794 compounds

Mean AUPRC 0.798 | AUROC 0.872 | 80.9% Accuracy

Hybrid inference across the five major CYP enzymes responsible for most small-molecule metabolism. Confidence-calibrated predictions support drug-drug interaction risk assessment.

CYP Isoform	N (LOO)	Accuracy	AUPRC	AUROC	Role
CYP1A2	12,579	83.9%	0.894	0.913	Caffeine, theophylline metabolism
CYP2C9	12,092	79.9%	0.764	0.867	Warfarin, NSAIDs metabolism
CYP2C19	12,665	79.6%	0.839	0.872	PPIs, clopidogrel metabolism
CYP2D6	13,130	81.9%	0.660	0.836	Antidepressants, beta-blockers
CYP3A4	12,328	79.2%	0.832	0.872	Largest fraction of drug metabolism

Throughput & Commercial Comparison

Performance benchmarks against commercial ADMET tools

~350

mol/sec throughput

178K

compounds validated (LOO)

validated endpoints

Approach

How FluxMateria frames ADMET prediction

High-Speed ADMET Engine

FluxMateria predicts ADMET properties with a fixed, interpretable computational framework rather than a black-box model rebuilt for each assay. Each prediction includes a confidence tier, the system has been evaluated across 178K compound-endpoint leave-one-out validations, three endpoints are #1 SOTA under their stated public-comparator dataset, split, and metric, DILI is a SOTA mechanistic module with AUROC 0.9597 on the comparable Therapeutics Data Commons binary task plus mechanism-level output, and Caco-2 permeability now reaches MAE 0.277 on the TDC caco2_wang scaffold-stratified test set, matching the public reference SOTA at 0.276 from pure physics with zero training labels consumed.

No endpoint-specific retraining: Predictions come from a fixed inference framework rather than a black-box model rebuilt for each assay
Novelty-aware: The system distinguishes well-covered chemistry from true extrapolation cases
Interpretable: Outputs track chemically meaningful factors instead of opaque latent features
Fast: ~350 mol/sec throughput for full ADMET panel
Broad coverage: 8 validated endpoints spanning BBB, solubility, PPB, metabolism, permeability, hERG, DILI, and CYP inhibition

Scope & Limitations

Where predictions are most and least reliable

Strengths

Three endpoints are #1 SOTA under the listed dataset, split, and metric: solubility (MAE 0.06), metabolism (Spearman 0.692), and PPB (MAE 2.24%). DILI is a SOTA mechanistic module, reaching AUROC 0.9597 on the comparable TDC binary task while returning mechanism-level output. Caco-2 permeability reaches MAE 0.277 on the TDC caco2_wang scaffold-stratified test set, matching the public reference SOTA at 0.276 from pure physics with zero training labels.
BBB 93.3% accuracy on 7,807 compounds (LOO)
178K total compounds validated via leave-one-out across all 8 endpoints
hERG AUROC 0.850 on 8,879 compounds; CYP AUPRC 0.798 on 62,794 compounds
Deterministic, confidence-aware inference rather than endpoint-by-endpoint black-box retraining
Graceful handling of novel scaffolds beyond well-covered reference chemistry
Full interpretability — predictions map to chemically meaningful drivers
High throughput: PPB at ~153 mol/sec; permeability at ~500 mol/sec

Known Limitations

Low-binding PPB compounds (<30%) remain the hardest class (LOO MAE ~24%)
Predictions are strongest when compounds fall near well-validated chemical neighborhoods
Moderate permeability class (−6 to −5.4 log Papp) is hardest to distinguish (54.9% LOO accuracy)
DILI benchmark comparisons are not binary-only apples-to-apples because FluxMateria returns mechanism, exposure, dose, and score-trace outputs rather than only a yes/no label.
Predictions are for screening prioritization, not regulatory submission

Benchmark basis

ADMET reports several endpoint families on one page. The table below labels the main result families so each metric is read in the right context.

Mixed basis

Result family	Basis	How to read it
No-reference PPB route	Flux Physics	Computed from Flux descriptors without reference-assisted evidence.
Hybrid ADMET endpoints	Flux Hybrid	Flux physics signals are combined with endpoint-specific reference evidence for the reported task.
DILI / hepatotoxicity	Flux Hybrid	Mechanism signals and clinical/reference context are reported with separate known-compound and novel-like claims.

Try the ADMET module

Run predictions on your own molecules and see full interpretability for every result.

← Back to Module Request Access