๐ ADMET
-
BBB: 93.3% accuracy (7,807 LOO, v8 Hybrid)
✓
-
Solubility: 0.06 logS MAE (9,982+ LOO; #1 SOTA under stated MAE comparator)
✓
-
CYP Panel: AUPRC 0.798, 80.9% acc (62,794 LOO, v5 Hybrid)
✓
-
CYP3A4 inducer: 0.9350 balanced accuracy on primary external holdout
✓
-
Caco-2 permeability: pure-physics MAE 0.277 on TDC
caco2_wang test (n=182) vs published SOTA 0.276; MAE 0.502, 73.1% acc on broader 41,175 LOO cohort
✓
-
Metabolism: Spearman 0.692, 82.8% acc (38,576 LOO; #1 SOTA under stated Spearman comparator)
✓
-
PPB: 2.24% LOO MAE (14,288 LOO; #1 SOTA under stated MAE comparator)
✓
-
hERG: AUROC 0.850 (8,879 LOO, v1 Hybrid)
✓
-
DILI: SOTA mechanistic module with TDC-panel novel-like AUROC 0.9597, plus mechanism, exposure, and score-trace reporting
✓
178K compounds validated via leave-one-out protocol across 8 endpoints. Four endpoints reach public-benchmark state-of-the-art under the listed dataset, split, and metric: Solubility, Metabolism, PPB, and Caco-2 permeability (the last matching TDC caco2_wang trained-ML SOTA from pure physics with zero training labels). DILI is a SOTA mechanistic module reaching AUROC 0.9597 on the comparable TDC binary task while also returning mechanism-level output.
DILI benchmark note: FluxMateria v4.23 reaches area under receiver operating characteristic curve (AUROC) 0.9597 on the comparable Therapeutics Data Commons (TDC) binary DILI benchmark versus the MiniMol reference around 0.956. This is stronger than a binary-only comparison because FluxMateria also returns risk class, score, cytochrome P450 (CYP)/transporter mechanisms, exposure context, dose-window behavior, and a score trace. FluxMateria runs this parent DILI path at about 12.95 molecules per second locally; MiniMol speed is not verified from the public leaderboard.
🔬 Materials
<1%
Universal 16 (strict + OOF)
2.741 ms
Universal strict runtime
Band Gap Benchmark
1,048 materials
Overall MAE0.237 eV
Metals (exp = 0)0.130 eV
Non-metals (exp > 0)0.320 eV
Core Holdout (5 properties)
Lower MAPE is better
FLUX S2 (family holdout)1.17%
FLUX S3 (interaction holdout)1.38%
AFLOW S236.1%
JARVIS S210.9%
Matbench S218.4%
Universal benchmark (16 properties)
Strict + out-of-family
All 16 strict<1%
All 16 out-of-family<1%
Worst OOF scenario0.894%
Runtime mean2.7 ms
Gemstone color match19/19
What this means:
FLUX now has two primary validated tracks: near-1% strict holdout error on core thermo-mechanics (with external apples-to-apples baselines), and sub-1% strict plus out-of-family performance across a 16-property universal runtime path. It also includes a curated mini-benchmark showing defect-context color flexibility for real-time UI exploration.
Battery Electrochemistry
0.149 V
Holdout Voltage MAE
26.8 s
End-to-End Workflow
-
Calibrated holdout benchmark tracks capacity, voltage, transport, cycle, electrolyte, interface, cost, and manufacturing together
✓
-
Energy-dense cobalt-free screen lifts LiMnO2 to the top
✓
-
High-voltage frontier screen lifts LiNiPO4 to the top
✓
-
Fast-charge and cycle-life screens surface transport- and stability-led families instead of a single default winner
✓
-
The same pipeline yields different leaders for bulk, interface, battery-native, and build questions
✓
This benchmark validates the battery-native decision layer as a screening and prototype-handoff engine, not as a replacement for electrochemical lab validation.
Catalyst Scoring
35 / s
Full-Stack Throughput
-
Measured through the full production scoring path, not a simplified shortcut
✓
-
All 96 benchmark references were FLUX-enriched in the published public run
✓
-
The corrected API path now clears FT activity, FT support, ammonia support, and WGS ordering together
✓
-
Inverse search converges to real industrial catalyst families and chemically serious exclusion lanes
✓
-
The public benchmark and catalyst case study now share the same API-only narrative
✓
This benchmark validates FluxMateria as a catalyst ranking and inverse-discovery engine. Physical synthesis, reactor testing, and long-run deactivation work still remain the next laboratory step.
Activation Barrier Prediction
-
Predicts surface-reaction activation barriers from Flux energy and topology terms
✓
-
29 published literature reactions: N₂, H₂, O₂, CO dissociation and C-H activation across 13 transition metals
✓
-
Matches single-method DFT accuracy (PBE ~0.20-0.30 eV) at analytical speed — microseconds per prediction
✓
-
100% within 0.5 eV for N₂, H₂, and O₂ dissociation families
✓
-
Feeds the catalyst-scoring and microkinetics layers for end-to-end catalyst discovery
✓
Production-ready for catalyst screening, ranking, and inverse discovery. Quantitative turnover-frequency prediction is at the edge of usefulness at this MAE; same is true for single-method DFT.
d-Band Center Descriptor
beats ML
vs linear / kNN / RF
-
Central descriptor in transition-metal catalysis — predicted from Flux atomic and surface descriptors
✓
-
100-case benchmark: 27 pure TMs, 41 facets, 32 binary alloys — all with published literature targets
✓
-
Facet-specific MAE of 0.154 eV across (111), (100), (110), (211), (0001) and stepped surfaces
✓
-
Outperforms linear regression, k-NN, and random-forest baselines fitted on the same atomic descriptors
✓
-
Cross-validated against five independent literature sources (HN14, K04, GN09, N95, CM20)
✓
The d-band descriptor feeds downstream into the catalyst scoring and inverse-discovery layers. Production-ready for transition-metal catalysis workflows; rare-earth and Pt-3d skin alloys remain known weak spots.
๐งฒ Curie Temperature
-
4.6% MAPE across 107 materials from composition with magnetic closure, branch overrides, and calibration notes
✓
-
17 families: ferrites, rare-earth intermetallics, double perovskites, manganites…
✓
-
89% within 5%, 96% within 10% of experimental Tc
✓
-
Near-zero bias (−0.03%) — no systematic over- or under-prediction
✓
🧪 DFT Cross-Check
7.6%
Band gap MAPE (PBE 45.1%)
3.6%
Magnetic moment MAPE (PBE 9.0%)
0.7%
Bulk modulus median (all 15)
~20,000×
Mean speedup vs DFT
-
Head-to-head with GPAW PBE on 15 canonical materials run locally on identical inputs
✓
-
Three-layer comparison covering lattice, band gap, magnetic moment, and bulk modulus
✓
-
Engine band gap median 1.2% vs PBE 50.7%; engine bulk modulus median 0.7% (MAPE 6.0%) across all 15 materials
✓
-
Reproducible: manifest, DFT settings, and per-material results downloadable as JSON / CSV / MD
✓
Si, Ge, GaAs, GaN, ZnO, MgO, TiO2, NaCl, Al, Cu, Fe, Ni, graphite, h-BN, MoS2. Two tiers shipped (single-point + EOS-derived B); a third (DFPT phonons + dielectric function) is on the roadmap.
⚡ Carrier Mobility
-
Electron mobility μe at 300 K predicted from composition using production transport physics
✓
-
4 families: III-V, II-VI, IV-VI, elemental semiconductors
✓
-
22 of 23 materials within ±15% of experiment; SiC is the only edge case
✓
-
Balanced signed errors — no systematic over- or under-prediction
✓
โ๏ธ Atomic & Magnetic Properties
-
Electronegativity 2.5% MAPE (75 elements), ionization energy 1.7% (27), electron affinity 1.0% (28)
✓
-
Magnetic moment: 100% pass (84/84 materials); metallic intermetallics 3.1% MAPE
✓
-
Saturation magnetization: 100% pass (10/10 materials at ±50% tolerance)
✓
-
Atomic properties and magnetic subproperties carry separate basis notes
✓
๐ Spectroscopy
-
UV-Vis: 6.2% mean error, 50 molecules, 6 categories
✓
-
IR: <1% error, 32 NIST molecules validated
✓
-
NMR: 0.3-0.5 ppm MAE, 10 SDBS molecules, 5 nuclei
✓
โ๏ธ Mechanism Discovery
-
336/336 experimental test cases (SN1/SN2/E1/E2/E1cb)
โ
-
10,000 random physical consistency tests
โ
-
Head-to-head comparison with DFT (B3LYP)
โ
-
Every prediction traceable and reproducible
โ
MechanismOS
Real-time
Mechanism Steering
94.86%
SILVER Arrhenius Ea
Evidence pack
Audit-ready export
-
SOTA real-time mechanism steering: control surfaces, pathway boundaries, constraint optimizer, and evidence-pack export
✓
-
GOLD: 152/154 direct measured activation barriers passed under fixed benchmark criteria
✓
-
SILVER: 1255/1323 Arrhenius-derived barrier checks passed
✓
-
Official experimental source provenance documented per benchmark tier
✓
๐งช Synthesis Planning
-
29 reaction-type barriers at 3.1% MAE (100% pass rate)
✓
-
200 specific reactions at <1% MAE (72 exact matches)
✓
-
15 disconnection SMARTS patterns validated
✓
-
All barriers fully auditable and reproducible
✓
🔥 Reaction Enthalpy
NEW
-
157 reactions from NIST WebBook at 3.5% MAPE, 10.0 kJ/mol MAE
✓
-
12 categories: combustion, radical, formation, halogen, nitrogen, ozone
✓
-
Hess’s law with documented species resolution + universal bond engine
✓
-
Phase notation: C(s), C(g), H2O(l) — disambiguates reference states
✓
⚡ Electron Transfer
2–3×
Tunneling Enhancement
Literature
Decay constant match
-
Marcus rate constants with FLUX tunneling corrections
✓
-
Through-bond decay constant matches literature ranges
✓
-
Normal, activationless, and inverted Marcus regimes
✓
-
All coupling deterministic and traceable
✓
๐งช Solvation
SOTA-level
FreeSolv Accuracy
4
Native Non-Water Carriers
-
SOTA-level explicit hydration benchmark: 0.3295 kcal/mol MAE on 642 FreeSolv cases
โ
-
Official packet includes summary JSON, case CSV/JSON, and methodology
โ
-
Water externally benchmarked; methanol, ethanol, acetonitrile, and DMSO tracked
โ
Full Results โ
๐งฌ BioTarget
0.772
Pearson r (CASF-2016)
-
Binding affinity: Pearson r = 0.772 on CASF-2016 (270 complexes)
โ
-
MoA prediction: 91% accuracy on ChEMBL validation
โ
-
Target identification: AUC 0.980
โ
-
Selectivity profiling: planned
โณ
Full Results โ
⚛ Chemistry
-
No-fit experimental/reference benchmark: 1,483 validated scalar targets, 0.176% weighted raw MAPE
✓
-
Bond lengths: 453 bonds (391 single + 62 multiple), 0.079% mean error
✓
-
Bond energies: 908 bonds, 0.289% mean error, 870/906 within 1.0%
✓
-
Flux-encoded bond-family formulas with published benchmark provenance notes
✓
-
Coverage: 24 p-block + 30 d-block + 10 s-block elements
✓
↻ Torsion Barriers
1.06
kJ/mol MAE (99 rotors)
9–13×
More accurate than Sage 2.2 / GAFF2 / MMFF94
-
1.06 kJ/mol MAE across 99 experimentally-measured rotational barriers, zero training data
✓
-
Same-set head-to-head: 9.2× better than OpenFF Sage 2.2, 10.4× better than GAFF2, 13.3× better than MMFF94
✓
-
66% of cases within ±1 kJ/mol, 85% within ±2 kJ/mol of experiment
✓
-
Covers alkanes, ethers, amines, peptide ω, esters, acrylates, halides, X-X rotors and aromatic carbonyls
✓