← Benchmarks | ADMET

ADMET Predictions BENCHMARK

Absorption, Distribution, Metabolism, Excretion, and Toxicity predictions validated against experimental datasets and commercial tools.

178K
Compounds Validated
LOO across 8 endpoints
93.3%
BBB Accuracy
7,807 compounds (v8 Hybrid)
0.06
Solubility MAE
AqSolDB (9,982+ LOO)
350/s
Panel Throughput
full ADMET panel
3
#1 SOTA Endpoints
stated dataset / split / metric
DILI
SOTA Mechanistic Module
AUROC 0.9597 on comparable TDC task

Results at a glance

State-of-the-art ADMET outcomes before the endpoint detail

SOTA ADMET results at a glance

Aqueous Solubility

#1 SOTA under the stated MAE comparator: MAE 0.06 on AqSolDB leave-one-out validation.

Plasma Protein Binding

#1 SOTA under the stated MAE comparator: 2.24% LOO MAE across 14,288 PPB compounds.

Intrinsic Clearance

#1 SOTA under the stated Spearman comparator: Spearman 0.692 versus 0.536.

DILI

SOTA mechanistic DILI module: AUROC 0.9597 on the comparable TDC binary task, with mechanism and exposure context.

Results by Endpoint

Large-scale leave-one-out validation across diverse ADMET endpoints

BBB Permeability v8 Hybrid — near stated public comparator

B3DB Benchmark (7,807 LOO)
  • Accuracy: 93.3%
  • Dataset: Blood-Brain Barrier Database
  • Validation: Leave-one-out
TDC Comparison
  • TDC top comparator (MiniMol): AUROC 0.924
  • TDC comparator (MapLight): AUROC 0.916
  • TDC comparator (ContextPred): AUROC 0.898

Aqueous Solubility v14 Hybrid — #1 SOTA under stated MAE comparator

AqSolDB Benchmark (9,982+ LOO)
  • logS MAE: 0.06
  • Pass rate: 98.7%
  • Validation: Leave-one-out
TDC Comparison
  • TDC top comparator (MiniMol): MAE 0.741
  • FluxMateria: MAE 0.06 — #1 SOTA under stated MAE comparator

Plasma Protein Binding v49.2 Hybrid — #1 SOTA under stated MAE comparator

Hybrid inference with confidence-aware estimates and explicit handling for compounds far from validated reference space. 14,288-compound leave-one-out validation. ~153 mol/sec.

2.24%
LOO MAE
8.50%
LOO MAE (novel)
14.3K
reference compounds
153
mol/sec
Confidence Tiers (14,288-compound leave-one-out validation)
Tier Criteria n LOO MAE
EXACT Near-identical reference support 758 (5.4%) 4.64%
HIGH Strong analog support 8,177 (58.8%) 6.44%
MEDIUM Partial analog support 3,080 (22.1%) 10.49%
LOW Sparse analog support / extrapolative regime 1,903 (13.7%) 15.63%
Per-Class Breakdown (14,288 LOO validation)
PPB Class n LOO MAE
X — Very High (≥95%) 7,038 4.51%
H — High (80–95%) 3,631 7.59%
M — Moderate (50–80%) 1,870 14.32%
L — Low (30–50%) 530 22.31%
Z — Minimal (<30%) 849 23.97%
Dataset: 14,288 curated PPB measurements aggregated from public experimental, regulatory, and literature sources. PPB is expressed as percent bound (0–100%). Leave-one-out (LOO) validation excludes each compound from its own prediction, approximating unseen-compound performance.
Approach: Hybrid inference combining physics-grounded molecular features with validated chemical context. Confidence tiers indicate how strongly each query is supported by nearby chemistry.

Plasma Protein Binding — Independent FDA-Approved Drug Panel

245 FDA-approved drugs with verified experimental PPB values, evaluated as a held-out independent test set. Reports both no-reference physics and hybrid production (physics + reference-similarity) accuracy on the same compounds.

No-Reference Physics

12.24%
MAE
8.20%
Median absolute error
77.1%
Within 20pp
+1.08
Bias (pp)

No-reference route computed directly from Flux PPB terms. The hybrid row below reports the reference-assisted production route separately.

Hybrid (Physics + Reference) — LOO

8.67%
MAE (overall)
3.65%
MAE (HIGH-tier)
86.9%
Within 20pp
4.51%
Median absolute error

Combines physics prediction with reference-based evidence under leave-one-out validation (each query is excluded from the reference space before prediction).

Hybrid Tier Breakdown (FDA 245, leave-one-out)
Tier Criteria n LOO MAE W20
HIGH Strong analog support 129 (52.7%) 3.65% 96.9%
MEDIUM Partial analog support 51 (20.8%) 14.85% 76.5%
LOW Sparse / extrapolative regime 65 (26.5%) 13.79% 75.4%
Why this matters: the FDA-approved drug panel is an independent benchmark distinct from the 14K reference space — experimental values were curated from FDA labels and primary literature with PubChem-verified canonical structures. The no-reference physics result (12.24% MAE) demonstrates plasma-binding prediction without using the compound's own experimental value. The hybrid HIGH-tier result (3.65% MAE) sits at the experimental noise floor for plasma-protein-binding measurements (3–5pp typical inter-laboratory variance).

Methodology: dataset SMILES verified against PubChem canonical structures. Each compound’s prediction is computed without using its own experimental value, mirroring real-world unseen-compound performance.

Metabolism (Intrinsic Clearance) v1 Hybrid — #1 SOTA under stated Spearman comparator

Hybrid inference benchmarked on 38,576 curated compounds. Spearman rho = 0.692 versus the stated Therapeutics Data Commons comparator at 0.536. Leave-one-out validation.

0.367
MAE (log CLint)
+0.730
Pearson r
0.692
Spearman ρ (stated comparator)
82.8%
3-Class Accuracy
Per-Class Breakdown (38,576 LOO)
Class n MAE (log) Accuracy
High Stability (<18 µL/min/mg) 2,175 0.449 67.0%
Moderate (18–102) 4,859 0.373 53.9%
Low Stability (>102) 31,542 0.360 88.4%

Caco-2 Permeability — SOTA on TDC scaffold-split test, broader cross-cohort evidence

FluxMateria reaches MAE 0.277 log units on the TDC caco2_wang scaffold-stratified test set (n=182), matching the public reference SOTA at 0.276 from pure physics with zero Caco-2 training labels consumed. The broader 41,175-compound cross-cohort metrics below add evidence on much wider chemistry.

0.502
MAE (log Papp)
r=0.837
Pearson correlation
73.1%
3-Class Accuracy
41,175
LOO compounds
Per-Class Breakdown (41,175 LOO)
Class n Accuracy
High (>-5.4) 18,272 82.3%
Moderate (-6 to -5.4) 8,995 54.9%
Low (≤-6.0) 13,908 72.8%

hERG Cardiotoxicity v1 Hybrid — near stated public comparator

Hybrid inference benchmarked on 8,879 compounds. AUROC 0.850 versus the stated Therapeutics Data Commons comparator at 0.880 on 648 compounds. Leave-one-out validation.

0.850
AUROC (binary)
0.420
pIC50 MAE
+0.770
Pearson r
76.6%
Binary Accuracy
Per-Class Breakdown (8,879 LOO)
Class MAE Accuracy
High Risk (pIC50 >6) 0.672 60.6%
Moderate (pIC50 5-6) 0.361 69.2%
Low Risk (pIC50 <5) 0.349 66.9%

Hepatotoxicity (DILI) v4.23 Exposure-Aware — SOTA Mechanistic Risk Engine

Drug-induced liver injury (DILI) risk assessment using hybrid evidence plus score-changing cytochrome P450 (CYP) enzyme context, transporter exposure, hepatic retention, optional dose/concentration exposure logic, and an auditable score trace. FluxMateria v4.23 reaches Therapeutics Data Commons (TDC)-panel novel-like area under receiver operating characteristic curve (AUROC) 0.9597, clearing the MiniMol reference around 0.956 on the comparable public binary task. Broader cross-panel checks remain strong: DILIRank novel-like AUROC 0.9063 and hepatotox-validated novel-like AUROC 0.9275.

0.9597
TDC-panel AUROC
0.9275
Hepatotox AUROC
12.95
mol/sec locally
Public benchmark comparison is apples-to-oranges

MiniMol reports area under receiver operating characteristic curve (AUROC) 0.956 +/- 0.006 on the Therapeutics Data Commons (TDC) binary DILI benchmark. FluxMateria v4.23 reaches AUROC 0.9597 on the comparable TDC binary task, while also returning score, risk class, confidence, cytochrome P450 (CYP) and transporter mechanism evidence, hepatic exposure context, dose-window behavior, and the calculation trace. This is why the DILI result should be evaluated both as a binary classifier and as a mechanistic screening output. FluxMateria runs this parent DILI path at about 12.95 molecules per second locally; MiniMol speed is not verified from the public leaderboard. Open the detailed DILI benchmark →

DILI mechanism coverage
Layer What is benchmarked or exposed Reviewer-facing output Production status
Parent DILI risk Comparable public binary benchmark plus clinical-risk stratification. Score, low/moderate/high class, confidence, and calculation trace. Production
Hepatic exposure Liver-entry and exposure pressure from organic anion transporting polypeptide (OATP) context. Exposure rationale and optional dose/concentration sweep. Production + optional dose layer
Efflux and retention Bile salt export pump (BSEP), breast cancer resistance protein (BCRP), and multidrug resistance-associated protein 2 (MRP2) signals. Retention and cholestatic-risk mechanism evidence. Production
CYP enzyme context CYP-linked metabolism, inhibition, induction, and bioactivation context where available. Score-changing enzyme contribution and mechanism explanation. Production
Injury chemistry Reactive-metabolite, mitochondrial-stress, chronic-duration, and phenotype-specific evidence. Mechanism attribution for follow-up assay planning. Production

For the full evidence packet, see the detailed DILI benchmark. For product workflow coverage, see the dedicated DILI engine page.

Clinical risk stratification context (907-compound leave-one-out set)
Class n Accuracy
Low Risk 365 84.7%
Moderate Risk 336 69.6%
High Risk 206 62.1%

CYP Inhibition Panel v5 Hybrid — near stated public comparator

Five CYP isoform inhibition predictions validated via leave-one-out on 62,794 compounds

Mean AUPRC 0.798 | AUROC 0.872 | 80.9% Accuracy

Hybrid inference across the five major CYP enzymes responsible for most small-molecule metabolism. Confidence-calibrated predictions support drug-drug interaction risk assessment.

CYP Isoform N (LOO) Accuracy AUPRC AUROC Role
CYP1A2 12,579 83.9% 0.894 0.913 Caffeine, theophylline metabolism
CYP2C9 12,092 79.9% 0.764 0.867 Warfarin, NSAIDs metabolism
CYP2C19 12,665 79.6% 0.839 0.872 PPIs, clopidogrel metabolism
CYP2D6 13,130 81.9% 0.660 0.836 Antidepressants, beta-blockers
CYP3A4 12,328 79.2% 0.832 0.872 Largest fraction of drug metabolism

Throughput & Commercial Comparison

Performance benchmarks against commercial ADMET tools

~350
mol/sec throughput
178K
compounds validated (LOO)
8
validated endpoints

Approach

How FluxMateria frames ADMET prediction

High-Speed ADMET Engine

FluxMateria predicts ADMET properties with a fixed, interpretable computational framework rather than a black-box model rebuilt for each assay. Each prediction includes a confidence tier, the system has been evaluated across 178K compound-endpoint leave-one-out validations, three endpoints are #1 SOTA under their stated public-comparator dataset, split, and metric, DILI is a SOTA mechanistic module with AUROC 0.9597 on the comparable Therapeutics Data Commons binary task plus mechanism-level output, and Caco-2 permeability now reaches MAE 0.277 on the TDC caco2_wang scaffold-stratified test set, matching the public reference SOTA at 0.276 from pure physics with zero training labels consumed.

  • No endpoint-specific retraining: Predictions come from a fixed inference framework rather than a black-box model rebuilt for each assay
  • Novelty-aware: The system distinguishes well-covered chemistry from true extrapolation cases
  • Interpretable: Outputs track chemically meaningful factors instead of opaque latent features
  • Fast: ~350 mol/sec throughput for full ADMET panel
  • Broad coverage: 8 validated endpoints spanning BBB, solubility, PPB, metabolism, permeability, hERG, DILI, and CYP inhibition

Scope & Limitations

Where predictions are most and least reliable

Strengths

  • Three endpoints are #1 SOTA under the listed dataset, split, and metric: solubility (MAE 0.06), metabolism (Spearman 0.692), and PPB (MAE 2.24%). DILI is a SOTA mechanistic module, reaching AUROC 0.9597 on the comparable TDC binary task while returning mechanism-level output. Caco-2 permeability reaches MAE 0.277 on the TDC caco2_wang scaffold-stratified test set, matching the public reference SOTA at 0.276 from pure physics with zero training labels.
  • BBB 93.3% accuracy on 7,807 compounds (LOO)
  • 178K total compounds validated via leave-one-out across all 8 endpoints
  • hERG AUROC 0.850 on 8,879 compounds; CYP AUPRC 0.798 on 62,794 compounds
  • Deterministic, confidence-aware inference rather than endpoint-by-endpoint black-box retraining
  • Graceful handling of novel scaffolds beyond well-covered reference chemistry
  • Full interpretability — predictions map to chemically meaningful drivers
  • High throughput: PPB at ~153 mol/sec; permeability at ~500 mol/sec

Known Limitations

  • Low-binding PPB compounds (<30%) remain the hardest class (LOO MAE ~24%)
  • Predictions are strongest when compounds fall near well-validated chemical neighborhoods
  • Moderate permeability class (−6 to −5.4 log Papp) is hardest to distinguish (54.9% LOO accuracy)
  • DILI benchmark comparisons are not binary-only apples-to-apples because FluxMateria returns mechanism, exposure, dose, and score-trace outputs rather than only a yes/no label.
  • Predictions are for screening prioritization, not regulatory submission

Benchmark basis

ADMET reports several endpoint families on one page. The table below labels the main result families so each metric is read in the right context.

Mixed basis
Result familyBasisHow to read it
No-reference PPB routeFlux PhysicsComputed from Flux descriptors without reference-assisted evidence.
Hybrid ADMET endpointsFlux HybridFlux physics signals are combined with endpoint-specific reference evidence for the reported task.
DILI / hepatotoxicityFlux HybridMechanism signals and clinical/reference context are reported with separate known-compound and novel-like claims.

Try the ADMET module

Run predictions on your own molecules and see full interpretability for every result.

← Back to Module Request Access