DILI Benchmark — FluxMateria

State-of-the-art benchmark position

FluxMateria is state-of-the-art for mechanistic DILI prediction.

On the comparable Therapeutics Data Commons (TDC) binary drug-induced liver injury (DILI) task, FluxMateria reaches area under receiver operating characteristic curve (AUROC) 0.9597 versus the MiniMol public reference around AUROC 0.956. FluxMateria also returns outputs that binary benchmark entries do not: risk score, class, confidence, mechanism attribution, hepatic exposure context, optional dose-window behavior, and a calculation trace.

0.9597AUROC on comparable TDC binary DILI benchmark

~0.956MiniMol public AUROC reference on the binary benchmark

0.9275Hepatotox validated novel-like AUROC

12.95 / sFluxMateria parent DILI path, molecules per second locally

FluxMateria DILI mechanism coverage infographic showing molecular structure input, hepatic exposure, efflux and retention, cytochrome P450 enzyme context, injury chemistry, and parent DILI risk output. — Mechanism coverage supporting the benchmark: exposure in, clearance out, enzyme context, injury chemistry, parent risk, and review-ready trace.

Benchmark evidence

The headline claim is anchored to the comparable public binary task, then stress-tested with broader clinical-risk panels.

Evidence layer	Metric	FluxMateria result	Comparator / context	Interpretation
Comparable public binary benchmark	AUROC	0.9597	MiniMol reference around 0.956 on the TDC binary DILI task.	FluxMateria clears the public binary reference while providing richer outputs.
DILIRank clinical-transfer panel	Novel-like AUROC	0.9063	Clinical-risk oriented drug list, separate from the binary benchmark framing.	Supports transfer from benchmark labels to clinically interpretable risk ordering.
Hepatotox validated panel	Novel-like AUROC	0.9275	Independent hepatotoxicity validation context.	Confirms that the engine is not only matching one benchmark split.
Clinical risk stratification context	907-compound leave-one-out risk stratification	3-class accuracy 0.7696; 3-class balanced accuracy 0.7677; high-vs-rest balanced accuracy 0.8397	Three-class risk view rather than a binary yes/no benchmark.	Useful for review workflow; not the primary binary state-of-the-art claim.
Throughput	Molecules per second	12.95 locally for the parent DILI path	MiniMol speed is not verified from the public leaderboard.	Fast enough for portfolio-scale triage and interactive safety review.

Benchmark interpretation: This is intentionally not a binary-only apples-to-apples claim. The binary AUROC comparison establishes public benchmark position; the added mechanism, exposure, dose, confidence, and trace outputs establish why FluxMateria is a mechanistic safety engine rather than just another classifier.

Full benchmark matrix

The table below separates known-compound behavior from novel-like leave-one-out behavior. The public state-of-the-art claim uses the novel-like Therapeutics Data Commons (TDC) rows, where exact clinical self-matches are masked.

Panel	Mode	Rows	Speed	3-class accuracy	3-class balanced accuracy	High-vs-rest balanced accuracy	High-vs-rest AUROC	High-vs-rest AUPRC (area under precision-recall curve)	DILI-concern AUROC	Exact anchors
TDC DILI public	Novel-like leave-one-out	475	12.8806 molecules/second	n/a	n/a	0.8223	0.9597	0.9455	0.9597	0
TDC DILI raw	Novel-like leave-one-out	475	12.9478 molecules/second	n/a	n/a	0.8223	0.9597	0.9455	0.9597	0
DILIRank	Novel-like leave-one-out	907	11.3889 molecules/second	0.7696	0.7677	0.8397	0.9063	0.7355	0.9157	0
Hepatotox validated	Novel-like leave-one-out	614	13.6910 molecules/second	0.6857	0.6648	0.8077	0.9275	0.8594	0.9394	0
TDC DILI public	Known-compound production	475	12.8170 molecules/second	n/a	n/a	0.9237	0.9932	0.9948	0.9932	454
DILIRank	Known-compound production	907	8.5056 molecules/second	0.9713	0.9740	0.9897	0.9976	0.9965	0.9987	488
Hepatotox validated	Known-compound production	614	12.6597 molecules/second	0.8648	0.8519	0.9278	0.9850	0.9795	0.9880	575

Protocol note: Novel-like leave-one-out mode is the correct proxy for new drug candidates because exact clinical self-matches are removed. Known-compound production mode is useful for reference-drug reproducibility and user-facing known-drug behavior, but it is not the public novel-drug state-of-the-art claim.

Mechanism signal audit

For the 475-row TDC public novel-like run, the engine records which mechanism layers contributed or constrained the parent score. These counts are included so reviewers can see that the benchmark is not a single opaque binary score.

Mechanism family	Observed count in TDC public novel-like run	Scientific role
CYP3A4 (cytochrome P450 family 3 subfamily A member 4) induction	7	Enzyme induction context.
CYP3A4 time-dependent inhibition	19	Mechanism-based enzyme inhibition context.
CYP2E1 (cytochrome P450 family 2 subfamily E member 1) contribution	9	Small-molecule bioactivation and acetaminophen-like pathway context.
CYP2B6 (cytochrome P450 family 2 subfamily B member 6) contribution	5	Metabolic activation and enzyme-specific liver-risk contribution.
CYP2C8 (cytochrome P450 family 2 subfamily C member 8) contribution	5	Primary-metabolizer and acyl-glucuronide-adjacent context.
OATP1B1 (organic anion transporting polypeptide 1B1) uptake	144	Liver-entry context.
Liver exposure circuit	144	Combined hepatic uptake, exposure, retention, and clearance context.
UGT2B7 (UDP glucuronosyltransferase family 2 member B7) acyl context	12	Detox and acyl-glucuronide context.
OATP1B3 (organic anion transporting polypeptide 1B3) uptake	53	Liver-entry context.
BCRP (breast cancer resistance protein) efflux	172	Efflux context.
BCRP inhibitor context	135	Transporter inhibition context.
CYP1A2 (cytochrome P450 family 1 subfamily A member 2) bioactivation	4	Aromatic bioactivation context.
CYP1A2 induction	51	Aryl hydrocarbon receptor-linked induction context.
Targeted chemistry deltas	19	High-specificity injury-chemistry support from residual failure-cluster review.
Targeted chemistry floors	6	Minimum-risk support for validated high-specificity injury families.
Aromatic cluster deltas	18	Near-threshold aromatic liver-risk support.
Aromatic cluster floors	11	Minimum-risk support for validated aromatic failure clusters.
Mechanism-coherence floors	45	Support when independent evidence layers agree on the same liver-risk direction.
Confidence-combiner caps	7	False-positive control when broad backbone risk lacks sufficient mechanism support.
Small inert caps	2	False-positive control for low-process small-molecule chemistry.
Low-backbone mechanism rescues	4	Allows high-specificity mechanism evidence to surface even when the initial clinical-neighbor backbone is low.

Descriptor validation

Reusable mechanism descriptors were validated as separable mechanism signals before they were used inside the parent DILI engine. This is a signal-quality check, not a final binary classifier.

Descriptor family	Status	Positive mean	Control mean	Separation	Minimum positive	Maximum control
Hepatobiliary residence pressure	Pass	0.4623	0.1486	0.3137	0.4049	0.1636
Selective retention specificity	Pass	0.3488	0.1221	0.2267	0.3061	0.1669
Polar antimetabolite stress	Pass	0.7391	0.1239	0.6152	0.6268	0.2543
Net reactive-metabolite pressure	Pass	0.2811	0.0814	0.1997	0.2140	0.1236
Mitochondrial and beta-oxidation pressure	Pass	0.5165	0.0764	0.4401	0.4716	0.1877

Descriptor audit: The descriptor validation covered 38 probe cases at 1.07 cases per second. Seventeen additional parent process probes passed as high-specificity chemistry checks, with controls kept below the high-risk band.

Output coverage beyond binary classification

FluxMateria returns the evidence safety teams need to decide what to do next.

Output layer	What the reviewer sees	Why it matters
Risk score and class	Numeric score, low/moderate/high class, and confidence.	Supports portfolio ranking and safety-governance thresholds.
Hepatic exposure	Organic anion transporting polypeptide (OATP) uptake context and exposure pressure.	Separates structural hazard from likely liver access.
Retention and efflux	Bile salt export pump (BSEP), breast cancer resistance protein (BCRP), and multidrug resistance-associated protein 2 (MRP2) evidence.	Highlights cholestatic and hepatobiliary-retention concerns.
Enzyme context	Cytochrome P450 (CYP) metabolism, inhibition, induction, and bioactivation context where available.	Connects DILI risk to drug-drug interaction and metabolic-liability review.
Injury chemistry	Reactive-metabolite, mitochondrial-stress, chronic-duration, and phenotype-specific evidence.	Gives toxicology teams a plausible follow-up assay direction.
Score trace	Structured calculation trace from baseline evidence to final class.	Makes the call reviewable, challengeable, and reproducible.

Download benchmark package

Sanitized machine-readable results and methodology for independent scientific review.

DILI benchmark evidence package

Summary JSON

Headline claim, primary public comparison, cross-panel validation, output coverage, and interpretation policy.

Download JSON

Benchmark matrix CSV

Panel-by-panel metrics: rows, mode, speed, balanced accuracy, AUROC, AUPRC, and exact-anchor policy.

Download CSV

Mechanism signal counts CSV

Counts for cytochrome P450, transporter, exposure, chemistry, confidence, and rescue layers in the public novel-like run.

Download CSV

Descriptor validation CSV

Mechanism-signal separation checks for residence, retention, antimetabolite, reactive-metabolite, and mitochondrial pressure descriptors.

Download CSV

Methodology note

Benchmark scope, evaluation modes, reported metrics, interpretation policy, and use boundary.

Download MD

How to read the claim

What we claim

State-of-the-art mechanistic DILI prediction engine.
Comparable binary benchmark AUROC above the MiniMol public reference.
Richer output than binary classifiers: mechanisms, exposure, dose-window, confidence, and trace.
Fast enough for screening and review workflows: about 12.95 molecules per second locally.

Boundary conditions

Prediction supports screening, prioritization, and scientific review; it does not replace regulated toxicology studies.
MiniMol speed is not verified from public leaderboard material.
Three-class clinical risk stratification is a separate workflow from the binary benchmark comparison.
Novel chemistry should be interpreted with the reported confidence and follow-up evidence needs.

Review the DILI engine in context

The detailed DILI benchmark should be read alongside the full ADMET benchmark and the product DILI page.

Open DILI engine page Open full ADMET benchmark

DILI Benchmark MECHANISTIC ENGINE

FluxMateria is state-of-the-art for mechanistic DILI prediction.

Benchmark evidence

Full benchmark matrix

Mechanism signal audit

Descriptor validation

Output coverage beyond binary classification

Download benchmark package

DILI benchmark evidence package

How to read the claim

What we claim

Boundary conditions

Review the DILI engine in context

Benchmark basis