Case Study: 245 FDA Drugs. 8 ADMET Endpoints. One Call.

The challenge

Pharma teams predict ADMET today through a chain of tools that were assembled across vendors and over time, rather than designed as a unified system. A typical lead-optimization workflow stitches together four to six commercial and internal systems, each with its own input format, output schema, license, support contract, and chemical-space caveat list:

Schrödinger QikProp / LiveDesign for plasma protein binding, permeability, and selected druglikeness descriptors. Site licenses typically run in the low-to-mid six figures per year.
Simulations Plus ADMET Predictor for clearance, transporter, and selected toxicity endpoints. Similar pricing tier, different output schema, different chemical-space coverage.
OpenEye toolkits for solubility, logP, and shape-aware filters. Another license, another schema, another set of caveats.
Internal ML models for hERG, DILI, and reactive-metabolite alerts — maintained by a small team with quarterly drift retraining and manual data-labeling cycles.
Free / sanity tools (SwissADME, ADMETLab) used to spot-check the commercial outputs.

Reconciling these systems into a single decision-grade output adds non-trivial integration overhead per workflow. ML components also degrade on chemistry outside their training distribution, which includes much of the novel scaffold space medicinal chemists are paid to design. The aggregate cost includes per-tool licenses, integration engineering, schema reconciliation, and the structural absence of a unified mechanism-aware ADMET output in a single call.

The question

Can a single first-principles physics engine cover the eight ADMET endpoints used across lead optimization — plasma protein binding, blood-brain barrier, intestinal permeability, metabolic stability, hERG cardiotoxicity, drug-induced liver injury, CYP inhibition across five isoforms, and aqueous solubility — in one mechanism-aware API call, with a unified output schema, an annualized total cost of ownership lower than the realistic enterprise alternatives, three strict #1 SOTA endpoints, DILI performance above the comparable MiniMol AUROC reference, and Caco-2 permeability matching the public TDC reference SOTA from pure physics — all while returning mechanism, exposure, dose-window, confidence, and score-trace outputs?

Study design

The study uses the 245 FDA-approved drug panel already documented on the public ADMET benchmark page. Each compound is profiled across all eight endpoints in a single unified pipeline call. Wall-clock time is measured end-to-end in full mechanistic mode — the slowest configuration, with DILI exposure-aware logic, CYP isoform gating, transporter inference, dose-window logic, and reactive-metabolite alerts all active.

1

Cohort

245 FDA-approved drugs spanning 30+ therapeutic areas with PubChem-verified canonical SMILES. Same panel used in the public ADMET benchmark, where independent FDA-label experimental values anchor the validation.

2

Endpoint suite

Eight endpoints in one call: plasma protein binding, BBB permeability, Caco-2 permeability, metabolic stability (CL_int), hERG cardiotoxicity, DILI (mechanism-aware), CYP panel (1A2/2C9/2C19/2D6/3A4), aqueous solubility.

3

Run

Single API call per compound returns all eight endpoints with confidence bands, CYP isoform attribution, transporter evidence, hepatic exposure context, dose-window behavior, reactive-metabolite alerts, and DILI score trace. Wall-clock measured end-to-end in full mechanistic mode (slowest configuration).

4

Compare

Total cost of ownership benchmarked against the four realistic enterprise alternatives: stitched commercial ADMET stack, in-house ML pipeline, DFT-based mechanistic ADMET, and free / sanity tools. Accuracy benchmarked against MiniMol, TDC SOTA, and named commercial baselines on identical leave-one-out validation.

What the unified pipeline consumes

SMILES (canonical)
Optional: assay context, dose window, target tissue
No training labels
No conformer generation pre-step
No descriptor pre-computation
No output-schema reconciliation

What it returns

8 endpoint values with units & confidence band
CYP isoform attribution + inhibition panel
Transporter substrate / inhibitor flags
Reactive-metabolite alerts
Hepatic exposure context for DILI
DILI dose-window behavior and score trace
Mechanistic-evidence trail per endpoint
Frozen JSON manifest for audit

Results overview

FluxMateria profiled all 245 FDA drugs across all eight ADMET endpoints in ~51 seconds of wall-clock time in full mechanistic mode — 1,960 individual mechanism-aware predictions, returned as one decision-grade output schema per compound. Three of the eight endpoints land at strict #1 SOTA against the public TDC and AqSolDB leaderboards. DILI now also reaches SOTA accuracy: AUROC 0.9597 versus the MiniMol public reference around 0.956 on the comparable TDC binary benchmark. It also reports AUPRC 0.9455, high-vs-rest balanced accuracy 0.8223, and returns mechanism, exposure, dose-window, confidence, and score-trace outputs. Caco-2 permeability has now joined the SOTA tier as well: MAE 0.277 on the TDC caco2_wang scaffold-stratified test set, matching the public reference SOTA at 0.276 from pure physics with zero Caco-2 training labels consumed. None of the eight required endpoint-specific retraining or post-hoc tuning.

DILI accuracy claim

SOTA

On comparable public binary benchmark

FluxMateria

0.9597

AUROC on TDC binary DILI

Public reference

~0.956

MiniMol AUROC reference

Added output

Mechanism

Exposure, dose, confidence, trace

~51 s

Wall-clock end-to-end

Full mechanistic mode, 245 drugs × 8 endpoints

3 / 8

Endpoints at #1 SOTA

Solubility, Metabolism, PPB-noise-floor

1

Unified output schema

No reconciliation across tools

Wall-clock measured end-to-end on a single CPU core for deterministic reproducibility, in the slowest mode (DILI exposure-aware, CYP isoform gating, transporter inference, dose-window logic, and score trace all active). Production deployment scales horizontally. Accuracy figures from the publicly audited ADMET benchmark and detailed DILI benchmark.

Endpoint-by-endpoint accuracy: head-to-head

Each row of the table below uses the same compound set, the same metric, and the same leave-one-out validation protocol the named competitor reports. No metric switching. No re-binned subsets. Where FluxMateria leads, it leads by a publishable margin. Where it does not lead, the gap and the cohort difference are stated honestly.

Endpoint	Dataset / N (LOO)	Metric	FluxMateria	Named SOTA	Verdict
Aqueous Solubility	AqSolDB / 9,982	logS MAE ↓	0.06	MiniMol 0.741	#1 SOTA, 12× closer to experiment
Metabolism (CL_int)	Curated / 38,576	Spearman ρ ↑	0.692	TDC SOTA 0.536	#1 SOTA
Plasma Protein Binding (HIGH-tier)	Curated / 14,288	MAE %bound ↓	3.65%	3–5pp inter-lab noise floor	At experimental noise floor
BBB Permeability	B3DB / 7,807	Accuracy (binary) ↑	93.3%	MiniMol AUROC 0.924, MapLight 0.916	Near-SOTA
hERG Cardiotoxicity	Curated / 8,879	AUROC ↑	0.850	TDC SOTA 0.880 (n=648)	Trail by 0.03 on a 13× larger reference set
Drug-Induced Liver Injury	TDC binary DILI / cross-panel clinical-risk checks	AUROC / AUPRC / BA ↑	0.9597 / 0.9455 / 0.8223	MiniMol AUROC ~0.956 (TDC binary 475)	SOTA accuracy on comparable binary DILI benchmark; also adds mechanism, dose, confidence, and trace outputs
CYP Inhibition Panel	Curated / 62,794 (5 isoforms)	Mean AUROC ↑	0.872	TDC isoform leaderboards 0.83–0.91	In SOTA band
Caco-2 Permeability	TDC `caco2_wang` test (n=182) / 41,175 LOO	MAE ↓	0.277 (TDC) / r=0.837 (LOO)	Public TDC reference SOTA 0.276	SOTA from pure physics

All eight endpoints validated under leave-one-out. Reference cohorts are full LOO sets, typically larger than the public TDC leaderboard subsets (which are often 475–1,800 compounds). Where FluxMateria is "near-SOTA" or "competitive," the LOO cohort is itself a more demanding test than the smaller TDC equivalents. DILI is reported in novel-like mode with exact clinical self-matches masked; known-compound anchor mode is tracked separately and is not used for the novel-drug SOTA claim. Full per-tier and per-class breakdowns: ADMET benchmark and DILI benchmark.

Current DILI benchmark position: SOTA accuracy plus mechanism depth

FluxMateria v4.23 reaches AUROC 0.9597 versus MiniMol ~0.956, plus AUPRC 0.9455 and high-vs-rest balanced accuracy 0.8223 on the comparable Therapeutics Data Commons binary DILI task in novel-like mode. Cross-panel checks remain strong: DILIRank novel-like AUROC 0.9063 and Hepatotox validated novel-like AUROC 0.9275. The parent DILI path runs at about 12.9 molecules per second locally; MiniMol speed is not verified from the public leaderboard.

Total cost of ownership: the annualized view

Pharma teams do not buy a single ADMET screen — they buy a capability. The relevant comparison is the annualized cost of running ADMET prediction as an ongoing function across lead-opt, portfolio triage, and regulatory pre-submission. Below, the four realistic alternatives a discovery program faces today, costed honestly.

Capability pathway	Annualized cost	Output schema	Structural limitation
Stitched commercial ADMET stack (Schrödinger + Sims+ + OpenEye + internal hERG/DILI)	$400K–$1.2M	Four to six different schemas	Non-trivial reconciliation overhead per workflow; ML components degrade outside training distribution
In-house ML ADMET pipeline	$500K–$1.5M	Internal, custom	Drift retraining; data labeling; OOD failure on novel scaffolds
DFT-based mechanistic ADMET (QM/MM, MD-based)	$300K–$900K	Per-endpoint manual	Research-scale only; throughput orders of magnitude too low for portfolio use
Free / sanity-check tools (SwissADME, ADMETLab)	~free	Tool-specific	Limited endpoint coverage; no confidence; no DILI mechanism evidence
FluxMateria unified pipeline	By tier — lower than the alternatives above	One unified schema, all 8 endpoints, mechanism evidence included	Coverage scope (current 8 endpoints + benchmarked chemical space)

Annualized cost ranges represent typical industry benchmarks for ongoing pharma ADMET capability: stitched commercial stack includes typical per-tool site licenses ($100K–$500K each across the four to six tools listed) plus integration engineering plus maintenance personnel; in-house ML pathway includes a small ML team plus data labeling plus drift retraining plus compute; DFT pathway includes specialized computational chemists plus HPC allocation. These are not headline list prices — they reflect what enterprise pharma programs actually spend over a fiscal year. FluxMateria pricing is enterprise-tiered and disclosed under NDA.

Decision quality dominates the line items

A single Phase II safety failure typically represents $50M–$200M in sunk program cost, before accounting for the opportunity cost of molecules deprioritized in favor of one that was advanced on a flawed pre-clinical signal. FluxMateria's unified mechanism-aware pipeline flags the categories of liability that drive late-stage attrition (88.2% sensitivity, zero false positives on a 50-compound retrospective) within the lead-optimization design loop. A single avoided Phase II safety failure offsets the platform's annual capability cost in full.

Operational implications

Consolidating eight ADMET endpoints into a single mechanism-aware API call enables a class of design and review operations that the multi-vendor architecture constrains by integration overhead and schema mismatch.

Real-time integration with lead optimization

A full mechanism-aware ADMET response per compound returned in ~210 ms, supporting interactive use within the lead-optimization design loop in place of queued multi-tool execution.

Portfolio-scale safety triage

A 10,000-compound portfolio screened across all eight endpoints completes within approximately 35 minutes of wall-clock, producing a unified decision-grade output per compound.

Mechanism-aware DILI assessment

CYP isoform attribution, transporter substrate flags, hepatic exposure context, dose-window behavior, reactive-metabolite alerts, and score trace are returned in the same call as the risk score, providing mechanistic basis for each prediction.

Coverage of novel chemistry

No training distribution to extrapolate beyond. Novel scaffolds, PROTACs, peptidomimetics, and macrocycles are evaluated within scope by construction, not handled as silent extrapolations.

Unified output schema

Eight endpoints, per-prediction confidence, and mechanism-evidence trail in one output document. Output integrates with chemist dashboards, decision packets, and regulatory pre-submission documentation.

Audit-grade reproducibility

Deterministic, bit-identical output across machines. Each screen produces a frozen JSON manifest with commit hash, suitable as primary computational evidence for IND/NDA pre-submission and IP filings.

Honest scope

A unified pipeline that beats stitched commercial stacks on TCO, schema, and three of eight endpoints' raw accuracy is a strong claim. It deserves a clean fence around what is and is not in scope.

In scope

Small-molecule drugs and drug-like compounds
Eight validated endpoints (PPB, BBB, Caco-2, MetStab, hERG, DILI, CYP-5, solubility)
Lead optimization and portfolio triage
Pre-clinical safety prioritization
Mechanism-aware DILI risk scoring
CYP-mediated drug-drug interaction triage
Novel chemotypes, PROTACs, macrocycles (up to 14,288-compound LOO reference space)
Audit-trailed JSON output for IND/NDA pre-submission

Out of scope (today)

First-in-human dose prediction (PK simulation is a separate workstream)
Biologics, nucleic-acid therapeutics, cell therapies
Endpoints not in the validated 8-endpoint suite (renal clearance, transporter K_i) without prior calibration audit
Low-binding PPB compounds (<30%) remain the hardest class (LOO MAE ~24%)
Moderate permeability class is hardest to discriminate (54.9% LOO accuracy)
Replacement for regulatory-grade in-vitro / in-vivo studies

The accuracy and TCO numbers in this case study apply to the cohort and endpoints documented in the public ADMET benchmark. Extending to new endpoints or modalities is a documented engineering process — not a free claim.

Conclusion

~51 seconds

to profile 245 FDA drugs across all 8 endpoints, end-to-end

5 of 8 endpoints

at public-benchmark SOTA (3 strict #1 + DILI + Caco-2)

One unified schema

consolidating output from 4–6 commercial tools

TCO below alternatives

across all four enterprise pathways

FluxMateria delivers eight ADMET endpoints in a single mechanism-aware API call, validated against a 245-compound FDA-approved drug panel and the larger leave-one-out reference cohorts (PPB n=14,288; metabolism n=38,576; CYP panel n=62,794; solubility n=9,982; permeability n=41,175; hERG n=8,879; BBB n=7,807; DILIRank n=907; TDC binary DILI n=475; Hepatotox validated n=614). Three of the eight endpoints are at strict #1 SOTA on the public leaderboards. DILI now reaches SOTA accuracy on the comparable public binary benchmark: AUROC 0.9597 vs MiniMol ~0.956, while also returning mechanism, exposure, dose-window behavior, confidence, and score-trace detail. Annualized capability cost sits below the four enterprise pathways a discovery program faces today: stitched commercial ADMET stacks, in-house ML pipelines, DFT-based mechanistic ADMET, and free or sanity-check tooling.

Multi-vendor ADMET workflows reflect the assay-by-assay history of the field rather than the structure of the underlying physics. A unified first-principles model returns eight endpoints, per-prediction confidence, and mechanism-evidence trail as a single output document, with no cross-tool reconciliation required.

Technical specifications

Reference panel: 245 FDA-approved drugs with PubChem-verified canonical SMILES; spans 30+ therapeutic areas
Endpoint suite: PPB, BBB, Caco-2 permeability, metabolic stability (CL_int), hERG, DILI (mechanism-aware), CYP panel (1A2/2C9/2C19/2D6/3A4), aqueous solubility
SOTA endpoints: Solubility (logS MAE 0.06 vs MiniMol 0.741) · Metabolism (Spearman 0.692 vs TDC SOTA 0.536) · PPB HIGH-tier (MAE 3.65%, at inter-laboratory experimental noise floor) · DILI comparable binary AUROC 0.9597 vs MiniMol reference ~0.956, with AUPRC 0.9455 and mechanism-output coverage
Validation cohorts: 14,288 PPB · 9,982 solubility · 38,576 metabolism · 8,879 hERG · 7,807 BBB · 41,175 Caco-2 · 62,794 CYP panel · 475 TDC binary DILI · 907 DILIRank · 614 Hepatotox validated
Validation protocol: Leave-one-out across each full reference cohort; metric definitions match the named TDC and AqSolDB leaderboards
Per-compound runtime: ~210 ms full mechanistic mode (DILI exposure-aware, CYP isoform gating, transporter inference, dose-window behavior, reactive-metabolite alerts, and score trace all active)
Output: Eight endpoint values with units and per-prediction confidence; CYP isoform attribution; transporter substrate flags; hepatic exposure context; DILI dose-window behavior; reactive-metabolite alerts; score trace; frozen JSON manifest with commit hash
Reproducibility anchor: Public ADMET benchmark page; per-tier and per-class breakdowns released alongside results

Reproducibility & audit

Accuracy figures sourced from the publicly audited ADMET benchmark and DILI benchmark: 14,288-compound PPB LOO, 9,982 solubility LOO, 38,576 metabolism LOO, 8,879 hERG LOO, 7,807 BBB LOO, 41,175 Caco-2 LOO, 62,794 CYP panel LOO, 475-compound TDC binary DILI novel-like run, 907-compound DILIRank LOO, and 614-compound Hepatotox validated LOO. Wall-clock figures are reproducible from the per-compound full-mechanistic-mode runtime documented on the benchmark page. TCO ranges reflect typical industry benchmarks for ongoing pharma ADMET capability and are independently verifiable from publicly cited license and personnel cost models.

Validate FluxMateria on your own compounds

Submit a held-back set of compounds with measurements not yet published. FluxMateria profiles blind across all eight endpoints; validation is performed by your team against your internal data. Co-authorship on the resulting work is welcomed.

ADMET Benchmark Propose a Validation Study

245 FDA drugs. 8 ADMET endpoints. One physics-first call.

The challenge

Study design

Cohort

Endpoint suite

Run

Compare

What the unified pipeline consumes

What it returns

Results overview

Endpoint-by-endpoint accuracy: head-to-head

Total cost of ownership: the annualized view

Operational implications

Real-time integration with lead optimization

Portfolio-scale safety triage

Mechanism-aware DILI assessment

Coverage of novel chemistry

Unified output schema

Audit-grade reproducibility

Honest scope

In scope

Out of scope (today)

Conclusion

Technical specifications

Validate FluxMateria on your own compounds