DILI Benchmark MECHANISTIC ENGINE
Detailed validation evidence for the FluxMateria drug-induced liver injury engine: binary benchmark position, cross-panel transfer, throughput, and mechanism-output coverage.
Detailed validation evidence for the FluxMateria drug-induced liver injury engine: binary benchmark position, cross-panel transfer, throughput, and mechanism-output coverage.
On the comparable Therapeutics Data Commons (TDC) binary drug-induced liver injury (DILI) task, FluxMateria reaches area under receiver operating characteristic curve (AUROC) 0.9597 versus the MiniMol public reference around AUROC 0.956. FluxMateria also returns outputs that binary benchmark entries do not: risk score, class, confidence, mechanism attribution, hepatic exposure context, optional dose-window behavior, and a calculation trace.
The headline claim is anchored to the comparable public binary task, then stress-tested with broader clinical-risk panels.
| Evidence layer | Metric | FluxMateria result | Comparator / context | Interpretation |
|---|---|---|---|---|
| Comparable public binary benchmark | AUROC | 0.9597 | MiniMol reference around 0.956 on the TDC binary DILI task. | FluxMateria clears the public binary reference while providing richer outputs. |
| DILIRank clinical-transfer panel | Novel-like AUROC | 0.9063 | Clinical-risk oriented drug list, separate from the binary benchmark framing. | Supports transfer from benchmark labels to clinically interpretable risk ordering. |
| Hepatotox validated panel | Novel-like AUROC | 0.9275 | Independent hepatotoxicity validation context. | Confirms that the engine is not only matching one benchmark split. |
| Clinical risk stratification context | 907-compound leave-one-out risk stratification | 3-class accuracy 0.7696; 3-class balanced accuracy 0.7677; high-vs-rest balanced accuracy 0.8397 | Three-class risk view rather than a binary yes/no benchmark. | Useful for review workflow; not the primary binary state-of-the-art claim. |
| Throughput | Molecules per second | 12.95 locally for the parent DILI path | MiniMol speed is not verified from the public leaderboard. | Fast enough for portfolio-scale triage and interactive safety review. |
Benchmark interpretation: This is intentionally not a binary-only apples-to-apples claim. The binary AUROC comparison establishes public benchmark position; the added mechanism, exposure, dose, confidence, and trace outputs establish why FluxMateria is a mechanistic safety engine rather than just another classifier.
The table below separates known-compound behavior from novel-like leave-one-out behavior. The public state-of-the-art claim uses the novel-like Therapeutics Data Commons (TDC) rows, where exact clinical self-matches are masked.
| Panel | Mode | Rows | Speed | 3-class accuracy | 3-class balanced accuracy | High-vs-rest balanced accuracy | High-vs-rest AUROC | High-vs-rest AUPRC (area under precision-recall curve) | DILI-concern AUROC | Exact anchors |
|---|---|---|---|---|---|---|---|---|---|---|
| TDC DILI public | Novel-like leave-one-out | 475 | 12.8806 molecules/second | n/a | n/a | 0.8223 | 0.9597 | 0.9455 | 0.9597 | 0 |
| TDC DILI raw | Novel-like leave-one-out | 475 | 12.9478 molecules/second | n/a | n/a | 0.8223 | 0.9597 | 0.9455 | 0.9597 | 0 |
| DILIRank | Novel-like leave-one-out | 907 | 11.3889 molecules/second | 0.7696 | 0.7677 | 0.8397 | 0.9063 | 0.7355 | 0.9157 | 0 |
| Hepatotox validated | Novel-like leave-one-out | 614 | 13.6910 molecules/second | 0.6857 | 0.6648 | 0.8077 | 0.9275 | 0.8594 | 0.9394 | 0 |
| TDC DILI public | Known-compound production | 475 | 12.8170 molecules/second | n/a | n/a | 0.9237 | 0.9932 | 0.9948 | 0.9932 | 454 |
| DILIRank | Known-compound production | 907 | 8.5056 molecules/second | 0.9713 | 0.9740 | 0.9897 | 0.9976 | 0.9965 | 0.9987 | 488 |
| Hepatotox validated | Known-compound production | 614 | 12.6597 molecules/second | 0.8648 | 0.8519 | 0.9278 | 0.9850 | 0.9795 | 0.9880 | 575 |
Protocol note: Novel-like leave-one-out mode is the correct proxy for new drug candidates because exact clinical self-matches are removed. Known-compound production mode is useful for reference-drug reproducibility and user-facing known-drug behavior, but it is not the public novel-drug state-of-the-art claim.
For the 475-row TDC public novel-like run, the engine records which mechanism layers contributed or constrained the parent score. These counts are included so reviewers can see that the benchmark is not a single opaque binary score.
| Mechanism family | Observed count in TDC public novel-like run | Scientific role |
|---|---|---|
| CYP3A4 (cytochrome P450 family 3 subfamily A member 4) induction | 7 | Enzyme induction context. |
| CYP3A4 time-dependent inhibition | 19 | Mechanism-based enzyme inhibition context. |
| CYP2E1 (cytochrome P450 family 2 subfamily E member 1) contribution | 9 | Small-molecule bioactivation and acetaminophen-like pathway context. |
| CYP2B6 (cytochrome P450 family 2 subfamily B member 6) contribution | 5 | Metabolic activation and enzyme-specific liver-risk contribution. |
| CYP2C8 (cytochrome P450 family 2 subfamily C member 8) contribution | 5 | Primary-metabolizer and acyl-glucuronide-adjacent context. |
| OATP1B1 (organic anion transporting polypeptide 1B1) uptake | 144 | Liver-entry context. |
| Liver exposure circuit | 144 | Combined hepatic uptake, exposure, retention, and clearance context. |
| UGT2B7 (UDP glucuronosyltransferase family 2 member B7) acyl context | 12 | Detox and acyl-glucuronide context. |
| OATP1B3 (organic anion transporting polypeptide 1B3) uptake | 53 | Liver-entry context. |
| BCRP (breast cancer resistance protein) efflux | 172 | Efflux context. |
| BCRP inhibitor context | 135 | Transporter inhibition context. |
| CYP1A2 (cytochrome P450 family 1 subfamily A member 2) bioactivation | 4 | Aromatic bioactivation context. |
| CYP1A2 induction | 51 | Aryl hydrocarbon receptor-linked induction context. |
| Targeted chemistry deltas | 19 | High-specificity injury-chemistry support from residual failure-cluster review. |
| Targeted chemistry floors | 6 | Minimum-risk support for validated high-specificity injury families. |
| Aromatic cluster deltas | 18 | Near-threshold aromatic liver-risk support. |
| Aromatic cluster floors | 11 | Minimum-risk support for validated aromatic failure clusters. |
| Mechanism-coherence floors | 45 | Support when independent evidence layers agree on the same liver-risk direction. |
| Confidence-combiner caps | 7 | False-positive control when broad backbone risk lacks sufficient mechanism support. |
| Small inert caps | 2 | False-positive control for low-process small-molecule chemistry. |
| Low-backbone mechanism rescues | 4 | Allows high-specificity mechanism evidence to surface even when the initial clinical-neighbor backbone is low. |
Reusable mechanism descriptors were validated as separable mechanism signals before they were used inside the parent DILI engine. This is a signal-quality check, not a final binary classifier.
| Descriptor family | Status | Positive mean | Control mean | Separation | Minimum positive | Maximum control |
|---|---|---|---|---|---|---|
| Hepatobiliary residence pressure | Pass | 0.4623 | 0.1486 | 0.3137 | 0.4049 | 0.1636 |
| Selective retention specificity | Pass | 0.3488 | 0.1221 | 0.2267 | 0.3061 | 0.1669 |
| Polar antimetabolite stress | Pass | 0.7391 | 0.1239 | 0.6152 | 0.6268 | 0.2543 |
| Net reactive-metabolite pressure | Pass | 0.2811 | 0.0814 | 0.1997 | 0.2140 | 0.1236 |
| Mitochondrial and beta-oxidation pressure | Pass | 0.5165 | 0.0764 | 0.4401 | 0.4716 | 0.1877 |
Descriptor audit: The descriptor validation covered 38 probe cases at 1.07 cases per second. Seventeen additional parent process probes passed as high-specificity chemistry checks, with controls kept below the high-risk band.
FluxMateria returns the evidence safety teams need to decide what to do next.
| Output layer | What the reviewer sees | Why it matters |
|---|---|---|
| Risk score and class | Numeric score, low/moderate/high class, and confidence. | Supports portfolio ranking and safety-governance thresholds. |
| Hepatic exposure | Organic anion transporting polypeptide (OATP) uptake context and exposure pressure. | Separates structural hazard from likely liver access. |
| Retention and efflux | Bile salt export pump (BSEP), breast cancer resistance protein (BCRP), and multidrug resistance-associated protein 2 (MRP2) evidence. | Highlights cholestatic and hepatobiliary-retention concerns. |
| Enzyme context | Cytochrome P450 (CYP) metabolism, inhibition, induction, and bioactivation context where available. | Connects DILI risk to drug-drug interaction and metabolic-liability review. |
| Injury chemistry | Reactive-metabolite, mitochondrial-stress, chronic-duration, and phenotype-specific evidence. | Gives toxicology teams a plausible follow-up assay direction. |
| Score trace | Structured calculation trace from baseline evidence to final class. | Makes the call reviewable, challengeable, and reproducible. |
Sanitized machine-readable results and methodology for independent scientific review.
The detailed DILI benchmark should be read alongside the full ADMET benchmark and the product DILI page.
Combines Flux physics signals with endpoint-specific reference evidence. Reported metrics describe the performance of that combined prediction route.