Numbers you can check. Methods you can challenge.

FluxMateria benchmarks include accuracy metrics, comparison methodology, and validation datasets.

One engine. Three scientific domains. State-of-the-art accuracy at 3.6 million times the speed of DFT.

FluxMateria holds #1 accuracy on 3 ADMET endpoints, achieves 100% reaction mechanism classification, and delivers sub-1% error across 16 materials properties — with zero fitted parameters, no training data, and no GPU. Every number below is published with full methodology. Reproduce any result yourself.

3
ADMET endpoints
at #1 worldwide
100%
Mechanism classification
336 experimental cases
<1%
Materials error
16 properties validated
178K
Compounds validated
leave-one-out protocol
3.6M×
Faster than DFT
single-threaded, no GPU
0
Fitted parameters
pure first-principles physics

Our benchmark philosophy

๐Ÿ“‹

Publish the methodology

How we test, what datasets we use, how we measure. No hidden assumptions.

โš–๏ธ

Show head-to-head comparisons

Against established tools where possible. Fair comparisons, same test sets.

๐ŸŽฏ

Document validation scope

Where predictions are most and least reliable. We tell you the boundaries.

๐Ÿ”„

Enable reproduction

Enough detail that you could replicate. Trust but verify.

Key metrics at a glance

Summary of performance across core capabilities.

Bond Lengths

0.079%error
453 bonds, 64 elements

Single + multiple bonds across p, d, and s-block. Zero fitted parameters.

Bond Energies

0.289%error
908 bonds, 64 elements

Singles, doubles, and triples. 870/906 within 1.0%. Zero fitted parameters.

Throughput

10,000+mol/hr
full property panel

Single-threaded, no GPU. Scales linearly with cores for batch jobs.

ADMET Panel

178Kvalidated
leave-one-out, 8 endpoints

PPB, BBB, solubility, metabolism, permeability, hERG, DILI, CYP. 3 are #1 SOTA.

Materials

<1%MAPE
16 properties, universal engine

Band gap 0.7 eV MAE (1,048 materials). Core holdout 1.2%. Gemstone color 19/19.

Benchmarks by module

Detailed performance data for each capability.

๐Ÿ’Š ADMET

Production
80.9%
CYP Panel Accuracy
93.3%
BBB Accuracy
~350
mol/sec
178K
Compounds Validated
  • BBB: 93.3% accuracy (7,807 LOO, v8 Hybrid)
  • Solubility: 0.06 logS MAE (9,982+ LOO, v14 Hybrid โ€” #1 SOTA)
  • CYP Panel: AUPRC 0.798, 80.9% acc (62,794 LOO, v5 Hybrid)
  • Permeability: MAE 0.502, 73.1% acc (41,175 LOO, v1 Hybrid)
  • Metabolism: Spearman 0.692, 82.8% acc (38,576 LOO โ€” #1 SOTA)
  • PPB: 2.24% LOO MAE (14,288 LOO, v49.2 Hybrid โ€” #1 SOTA)
  • hERG: AUROC 0.850 (8,879 LOO, v1 Hybrid)
  • DILI: AUROC 0.878 (907 LOO, v1 Hybrid)

178K compounds validated via LOO across 8 hybrid endpoints. 3 are #1 SOTA.

Full Results โ†’ Methodology

🔬 Materials

Production
0.703 eV
Band Gap MAE
1.1668%
Core S2 MAPE
<1%
Universal 16 (strict + OOF)
2.741 ms
Universal strict runtime

Band Gap Benchmark

1,048 materials
Overall MAE0.703 eV Metals (exp = 0)0.285 eV Non-metals (exp > 0)1.032 eV

Core Holdout (5 properties)

Lower MAPE is better
FLUX S2 (family holdout)1.17% FLUX S3 (interaction holdout)1.38% AFLOW S236.1% JARVIS S210.9% Matbench S218.4%

Universal Layer 7 (16 properties)

Strict + out-of-family
All 16 strict<1% All 16 out-of-family<1% Worst OOF scenario0.894% Runtime mean2.7 ms Gemstone color match19/19
What this means: FLUX now has two primary validated tracks: near-1% strict holdout error on core thermo-mechanics (with external apples-to-apples baselines), and sub-1% strict plus out-of-family performance across a 16-property universal runtime path. It also includes a curated mini-benchmark showing defect-context color flexibility for real-time UI exploration.
Universal Benchmark → Band Gap Benchmark Module Page

Battery Electrochemistry

Production
1.0
Family Accuracy
0.149 V
Holdout Voltage MAE
5 / 5
Scenario Alignment
26.8 s
End-to-End Workflow
  • Calibrated holdout benchmark tracks capacity, voltage, transport, cycle, electrolyte, interface, cost, and manufacturing together
  • Energy-dense cobalt-free screen lifts LiMnO2 to the top
  • High-voltage frontier screen lifts LiNiPO4 to the top
  • Fast-charge and cycle-life screens surface transport- and stability-led families instead of a single default winner
  • The same pipeline yields different leaders for bulk, interface, battery-native, and build questions

This benchmark validates the battery-native decision layer as a screening and prototype-handoff engine, not as a replacement for electrochemical lab validation.

Full Results → Case Study Module Page

๐Ÿงฒ Curie Temperature

Production
4.6%
Overall MAPE
−0.03%
Mean Bias
107
Magnetic Materials
17
Material Families
  • 4.6% MAPE across 107 materials from composition only, zero fitted parameters
  • 17 families: ferrites, rare-earth intermetallics, double perovskites, manganites…
  • 89% within 5%, 96% within 10% of experimental Tc
  • Near-zero bias (−0.03%) — no systematic over- or under-prediction
Full Results → Module Page

โš›๏ธ Atomic & Magnetic Properties

Production
2.5%
EN MAPE
1.7%
IE MAPE
84
MM Materials
5
Properties
  • Electronegativity 2.5% MAPE (75 elements), ionization energy 1.7% (27), electron affinity 1.0% (28)
  • Magnetic moment: 100% pass (84/84 materials); metallic intermetallics 3.1% MAPE
  • Saturation magnetization: 100% pass (10/10 materials at ±50% tolerance)
  • All from composition only — zero fitted parameters, zero training data
Full Results → Module Page

๐Ÿ“ˆ Spectroscopy

Production
6.2%
UV-Vis Error
<1%
IR Error
0.3-0.5
NMR MAE (ppm)
50
UV-Vis Molecules
  • UV-Vis: 6.2% mean error, 50 molecules, 6 categories
  • IR: <1% error, 32 NIST molecules validated
  • NMR: 0.3-0.5 ppm MAE, 10 SDBS molecules, 5 nuclei
Full Results โ†’ Module Page

โš—๏ธ Mechanism Discovery

100%
Mechanism Accuracy
336/336
Cases Correct
7.4
kJ/mol MAE
1,000,000x
Faster
  • 336/336 experimental test cases (SN1/SN2/E1/E2/E1cb) โœ“
  • 10,000 random physical consistency tests โœ“
  • Head-to-head comparison with DFT (B3LYP) โœ“
  • Every prediction traceable and reproducible โœ“
Full Methodology & Results โ†’ Module Page

MechanismOS

Production
100%
GOLD Direct Ea
94.86%
SILVER Arrhenius Ea
100%
Experimental-only rows
2 tiers
Official validation gate
  • GOLD: 154/154 direct measured activation barriers passed
  • SILVER: 1255/1323 Arrhenius-derived barrier checks passed
  • Official experimental source provenance documented per benchmark tier
  • Control surfaces, optimizer, and audit evidence pack validated end-to-end
Full Results → Module Page

๐Ÿงช Synthesis Planning

Production
3.1%
Barrier MAE
29/29
Reaction Types
200
Specific Reactions
<50ms
Per Plan
  • 29 reaction-type barriers at 3.1% MAE (100% pass rate)
  • 200 specific reactions at <1% MAE (72 exact matches)
  • 15 disconnection SMARTS patterns validated
  • All barriers fully auditable and reproducible
Full Results → Module Page

🔥 Reaction Enthalpy

Production NEW
3.5%
MAPE
157
Reactions Tested
89%
Within 5%
<1ms
Per Reaction
  • 157 reactions from NIST WebBook at 3.5% MAPE, 10.0 kJ/mol MAE
  • 12 categories: combustion, radical, formation, halogen, nitrogen, ozone
  • Hess’s law with 3-tier species lookup + universal bond engine
  • Phase notation: C(s), C(g), H2O(l) — disambiguates reference states
Full Results →

⚡ Electron Transfer

Production
26/26
Tests Pass
2–3×
Tunneling Enhancement
Literature
Decay constant match
~150ms
Per Pair
  • Marcus rate constants with FLUX tunneling corrections
  • Through-bond decay constant matches literature ranges
  • Normal, activationless, and inverted Marcus regimes
  • All coupling deterministic and traceable
Full Results → Module Page

๐Ÿงช Solvation

0.3295
MAE (kcal/mol)
642
FreeSolv Cases
4
Native Non-Water Carriers
  • Explicit hydration benchmark: 0.3295 kcal/mol MAE on 642 FreeSolv cases โœ“
  • Official packet includes summary JSON, case CSV/JSON, and methodology โœ“
  • Water externally benchmarked; methanol, ethanol, acetonitrile, and DMSO tracked โœ“
Full Results โ†’

๐Ÿงฌ BioTarget

Beta
0.772
Pearson r (CASF-2016)
91%
MoA Accuracy
1.28
MAE (pKi)
10,065
Targets
  • Binding affinity: Pearson r = 0.772 on CASF-2016 (270 complexes) โœ“
  • MoA prediction: 91% accuracy on ChEMBL validation โœ“
  • Target identification: AUC 0.980 โœ“
  • Selectivity profiling: planned โณ
Full Results โ†’

⚛ Chemistry

Production
0.079%
Bond Length Error
0.289%
Bond Energy Error
1,361
Total Observables
64
Elements
  • Bond lengths: 453 bonds (391 single + 62 multiple), 0.079% mean error
  • Bond energies: 908 bonds, 0.289% mean error, 870/906 within 1.0%
  • First-principles derivation: zero fitted parameters
  • Coverage: 24 p-block + 30 d-block + 10 s-block elements
Full Results → Module Page

Validation scope

Where FluxMateria predictions are most and least reliable.

We document the boundaries of reliable prediction space:

  • โš  Novel chemotypes far from validated chemical space
  • โš  Specific endpoints with limited experimental data
  • โš  Edge cases identified through validation

Confidence indicators in predictions reflect these boundaries. Low confidence = verify experimentally.

Reproducibility

We want you to verify our claims.

Benchmark datasets and evaluation scripts are available to pilot participants.

Request Pilot Access