New SOTA on Drug-Induced Liver Injury Prediction: AUROC 0.9597 With Full Mechanism Trace
FluxMateria now holds a state-of-the-art mechanistic position for drug-induced liver injury (DILI): area under receiver operating characteristic curve (AUROC) 0.9597 on the comparable Therapeutics Data Commons (TDC) binary benchmark, plus exposure, mechanism, dose, confidence, and score-trace outputs that binary classifiers do not provide.
The claim
FluxMateria is positioned as a state-of-the-art (SOTA) mechanistic DILI prediction engine: it clears the comparable public binary benchmark reference while returning the evidence a safety team needs to review hepatic exposure, transport, enzyme context, injury chemistry, dose sensitivity, and confidence.
Drug-induced liver injury is one of the hardest safety problems in discovery because the final clinical label is rarely caused by a single property. A molecule can look acceptable on a simple structural alert and still become risky when high liver exposure, transporter retention, metabolic activation, chronic dosing, and patient context align. A one-bit classifier can help triage, but it does not explain the mechanism that should drive the next experiment.
That is the reason we rebuilt DILI inside FluxMateria as a mechanism-resolved safety engine rather than a standalone yes/no endpoint. The engine still competes on the public binary benchmark, but the output is designed for nomination review: score, risk class, confidence, mechanism attribution, dose-window behavior, and a trace of why the final score moved.
0.9597
AUROC on comparable TDC binary DILI benchmark
0.9455
Area under precision-recall curve (AUPRC) on the same comparable task
0.9275
Hepatotoxicity cross-panel AUROC in novel-like mode
12.95
Parent DILI path molecules per second locally
Why the binary benchmark still matters
The comparable public reference point is the TDC binary DILI benchmark. FluxMateria reaches AUROC 0.9597 on that task, compared with the MiniMol public reference around AUROC 0.956. That establishes competitive benchmark position on the same kind of yes/no problem that most public leaderboards measure.
But that is not the whole claim. MiniMol and similar public models are primarily evaluated as binary classifiers. FluxMateria is evaluated there too, then goes further: it returns mechanistic evidence that a binary benchmark does not ask for. The comparison is therefore intentionally not a binary-only apples-to-apples product comparison. The binary metric establishes that FluxMateria is not trading away accuracy for interpretability; the mechanism surface is the added capability.
Where FluxMateria wins
We reviewed the public landscape before writing this claim. The competitive field falls into four practical categories: public benchmark models, broad absorption, distribution, metabolism, excretion, and toxicity (ADMET) software suites, quantitative systems toxicology (QST) / physiologically based pharmacokinetic (PBPK) liver-simulation tools, and expert-alert or web-screening tools.
Category
Public examples
Best public interpretation
Public benchmark machine learning (ML) models
MiniMol
Direct public benchmark comparison: FluxMateria AUROC 0.9597 on the comparable TDC binary DILI task versus MiniMol around AUROC 0.956, plus richer output.
Broad ADMET suites
ADMET Predictor, related commercial ADMET platforms
Not a public same-dataset accuracy claim. FluxMateria's advantage is one high-throughput DILI output that combines score, hepatic exposure, transporter context, enzyme context, injury chemistry, dose behavior, confidence, and trace.
QST / PBPK liver simulation
DILIsym and related liver-safety simulation workflows
Complementary. These tools are valuable for detailed study design and simulation once exposure, dose, species, and protocol information are available. FluxMateria is earlier, faster compound-level triage.
Expert-alert and web-screening tools
Derek Nexus, ADMETlab, ProTox, pkCSM, vNN-ADMET
FluxMateria's advantage is integrated mechanism depth and auditability: not just an alert or endpoint label, but the reasoning path behind liver-risk movement.
Against public machine learning (ML) benchmark models such as MiniMol, the clean win is measurable: FluxMateria reaches AUROC 0.9597 on the comparable TDC binary DILI task, above the MiniMol public reference around AUROC 0.956, while returning a richer safety review document. That is the strongest direct benchmark statement because the metric and task are public.
Against broad ADMET software suites, the win is workflow shape. Mature platforms can cover many absorption, distribution, metabolism, excretion, and toxicity endpoints, and some include liver-safety modules. FluxMateria's DILI advantage is that the binary risk score, hepatic exposure, transporter context, enzyme context, injury chemistry, dose-window behavior, confidence, and audit trace are returned together inside the same high-throughput screening output. For a discovery team, that means fewer stitched outputs and less manual reconciliation before a safety meeting.
Against QST and PBPK liver-simulation tools, the comparison is complementary rather than adversarial. Those tools remain valuable when a program already has dose, exposure, species, and protocol detail and needs deep simulation. FluxMateria is positioned earlier: fast compound-level triage, novel-molecule review, and mechanism prioritization before teams decide which expensive assays or simulations deserve attention.
Against structural-alert and web-screening tools, the win is decision depth. A simple alert can flag concern, but it usually cannot tell whether the concern is coming from uptake, retention, enzyme-linked activation, chronic exposure pressure, or weak historical similarity. FluxMateria is designed to preserve that chain of reasoning.
FluxMateria DILI coverage is organized as a reviewable safety circuit: exposure in, clearance out, enzyme context, injury chemistry, dose-window sensitivity, confidence, and score trace.
What the engine actually returns
The current DILI workflow combines the parent risk score with a mechanism circuit. That circuit includes hepatic exposure and liver-entry context, organic anion transporting polypeptide (OATP) uptake, bile salt export pump (BSEP), breast cancer resistance protein (BCRP), and multidrug resistance-associated protein 2 (MRP2) efflux or retention context, cytochrome P450 (CYP) enzyme induction and inhibition signals, reactive-metabolite and mitochondrial-stress chemistry, chronic-duration pressure, and optional dose-window behavior.
For a reviewer, the important point is not that every layer is present in every molecule. The important point is that the final score is not opaque. If the score rises because liver entry and retention align with metabolic activation, that appears in the output. If a compound has a binary DILI concern but no strong mechanistic support, that uncertainty is surfaced rather than hidden.
Why this matters for enterprise safety review
Discovery teams do not only need to rank compounds. They need to decide which molecule to synthesize next, which assay to run next, and which safety issue should be escalated before a nomination meeting. A binary DILI label cannot answer those questions. It can say "concern" or "no concern," but it cannot tell whether the concern is driven by liver exposure, canalicular retention, enzyme-linked bioactivation, chronic dosing pressure, or weak historical similarity.
FluxMateria is built for that decision context. DILI sits beside absorption, distribution, metabolism, excretion, and toxicity (ADMET) endpoints in the same profile, with shared confidence handling and decision-packet export. The result is a safety review artifact, not just a scalar prediction.
How to read the SOTA statement
The precise claim is: FluxMateria is state-of-the-art for mechanistic DILI prediction because it reaches AUROC 0.9597 on the comparable public binary benchmark, above the MiniMol public reference around AUROC 0.956, while also returning mechanism, exposure, dose, confidence, and trace outputs. MiniMol speed is not verified from public leaderboard material, so we do not claim a speed comparison against MiniMol. FluxMateria's measured parent DILI path runs at about 12.95 molecules per second locally.
The known-compound production mode can score even higher when exact clinical anchors are allowed, but that is not the novel-drug claim. For novel chemistry, the more relevant evidence is the novel-like DILIRank transfer result and the hepatotoxicity cross-panel result, where the engine still preserves strong discrimination while maintaining reviewable mechanism output.
Plain-language summary
FluxMateria now matches or slightly exceeds the strongest public binary DILI benchmark reference, but the bigger advance is that it explains the risk. Instead of only saying "toxic" or "not toxic," it shows which liver-exposure, transporter, enzyme, injury-chemistry, dose, and confidence signals led to the score.
Where to verify the numbers
The detailed DILI benchmark page includes the public comparison, cross-panel validation, speed measurement, mechanism-output coverage, and downloadable benchmark package. The ADMET benchmark page shows the DILI result in the context of the broader FluxMateria absorption, distribution, metabolism, excretion, and toxicity suite.
This article describes a computational screening and prioritization system. It is not a regulatory toxicology determination, clinical safety claim, or replacement for in-vitro or in-vivo safety studies. Public benchmark comparisons are limited to the stated datasets, metrics, and operating modes.
Review DILI risk with mechanism context
FluxMateria exposes DILI risk inside the full ADMET panel, with benchmark evidence, mechanism attribution, confidence, and exportable decision packets for reproducible safety review.