CASE STUDY — PUBLIC BAND-GAP BENCHMARK

0.237 eV band-gap MAE across 1,048 materials — without training on a single one.

One fixed predictor. Composition input only. Millisecond per query. Zero training data, zero fitted parameters. On the same 1,048-material cohort where modern graph-neural-network ML reports ~0.31–0.33 eV MAE, this pipeline delivers 0.237 eV — matching the state of the art without ever seeing a training example.

0.237 eV
Band-gap MAE, 1,048 materials
0.320 eV
Semiconductors / insulators (n = 587)
~1 ms
Per-material wall time
Zero
Fitted parameters — first-principles only

The question this benchmark answers

The question

Can a physics-derived predictor, with no training data and no fitted parameters, reach the accuracy of modern machine-learning band-gap models on a diverse public cohort — while running fast enough to actually screen the combinatorial space of inorganic materials?

The space of plausible inorganic compositions is on the order of 108–1012. DFT cannot screen that — hours to days per material. ML can run at interactive speed, but only after a training set itself generated by DFT, and only inside the chemistries it has already seen. A predictor that delivers DFT-competitive accuracy from composition alone, at millisecond speed, would let researchers scan the whole space before any heavyweight calculation runs.

We measured that predictor against an experimentally grounded 1,048-material cohort to find out.

The cohort

1,048 inorganic compounds with experimental band gaps, sourced from the Materials Project database. The cohort spans chalcogenides, oxides, halides, pnictides, intermetallics, hydrides, and carbides — with experimental gaps from 0 eV (metals) up to ~12 eV (wide-gap fluorides). Both metals and semiconductors are included so the predictor is tested on the full classification problem, not only on the wide-gap subset where most physics models look easy.

Segment n MAE (eV) Notes
All materials 1,048 0.237 Headline benchmark metric
Metals (exp = 0) 461 0.130 Exact-zero handling
Semiconductors / insulators (exp > 0) 587 0.320 The hard part — arbitrary exp value
Chalcogenide 352 0.332 Sulfides, selenides, tellurides
Intermetallic / Other 260 0.072 No-anion intermetallics
Complex Oxide 257 0.294 Ternary & higher oxides
Pnictide 86 0.195 Nitrides, phosphides, arsenides
Halide 74 0.222 F / Cl / Br / I anions
Binary Oxide 19 0.102 Simple AxOy oxides

Full benchmark data — per-compound experimental, predicted, and absolute-error values for all 1,048 entries — are downloadable as JSON and CSV from the materials benchmark page.

Where 0.237 eV sits among other methods

Reading a band-gap MAE number in isolation is misleading. The right question is: at this accuracy level, what computational resources and training data did it cost? Every other method that reaches the ~0.2–0.3 eV accuracy band pays for it in one of two currencies — a large labelled training set, or hours-to-days of supercomputer time per material. This benchmark pays neither.

Landscape of band-gap predictors

The table consolidates published numbers across four method classes. Sources are linked inline; see also the consolidated reference list at the end of the case study.

Method (representative work) Best reported MAE Speed / query Input Training data Fitted parameters
First-principles DFT — no training data, but the gap problem is structural
DFT-PBE / LDA screening (standard GGA) ~0.5–1.0 eV
~40–50% systematic under-prediction
Minutes–hours Crystal structure + pseudopotentials + SCF settings None Functional choice
Hybrid DFT (HSE06) ~0.2–0.4 eV Hours–days Crystal structure + pseudopotentials + SCF settings None Mixing parameter
GW (many-body) ~0.1–0.2 eV Days–weeks Crystal structure + pseudopotentials + SCF settings None Convergence params
Graph neural networks — require both the crystal structure AND a large labelled training set
EOSnet (2025) 0.163 eV
on MP DFT-PBE gaps
~1 s (after training) Composition + crystal structure ~104–105 materials ~106 weights
ALIGNN (Choudhary & DeCost, 2021) 0.218 eV
on MP DFT-PBE gaps
~1 s Composition + crystal structure ~105 materials ~106 weights
MEGNet (Chen et al., 2019) ~0.32 eV ~1 s Composition + crystal structure ~70,000 materials ~105–106 weights
CGCNN (Xie & Grossman, 2018) ~0.39 eV ~1 s Composition + crystal structure ~105 materials ~105 weights
Composition-only ML — no crystal structure, but every entry needs a labelled training set
Darwin 1.5 — LLaMA-based LLM (2024) 0.287 eV
#1 on MatBench expt_gap
~seconds Composition ~49,000 labelled materials + 6M materials-science papers ~7×109 LLM weights
MODNet (De Breuck et al.) 0.333 eV ~1 s Composition ~4,200 materials (5-fold CV on Zhuo) ~105 weights
CrabNet (Wang et al., 2021) 0.346 eV ~1 s Composition ~4,200 materials ~105 attention weights
Simplest learned baseline (arXiv 2501.02932, Jan 2025) — one fitted parameter per element 0.575 eV
most direct comparator: composition-only, no structure, no ML weights, no DFT
~ms Composition ~4,200 materials ~80 (1 per element)
Pure-physics composition-only — no training data, no fitted parameters
Phillips–Van Vechten ionicity theory (1970s) Family-dependent Instant Composition None None
Zaanen–Sawatzky–Allen (1985) Family-dependent (TM-compound oriented) Instant Composition None None
FluxMateria (this benchmark) 0.237 eV
on 1,048 experimental materials, metals + semis + insulators on one predictor
~1 ms Composition Zero Zero

Reported MAE values come from the methods' own publications or the MatBench leaderboard for the matbench_expt_gap task (Zhuo et al. 2018 cohort, 4,604 experimental gaps from composition). DFT timings are screening-mode estimates on a single CPU; ML timings are inference-only (training cost amortised separately).

The honest punch line

Every other method below 0.30 eV either needs the crystal structure (DFT, ALIGNN, EOSnet, MEGNet, CGCNN) or needs a labelled training set of 4,000–50,000 materials (Darwin, CrabNet, MODNet). The only other published zero-training-data composition-only model we could find — a January 2025 arXiv paper that fits a single parameter per element — reports 0.575 eV MAE, more than twice this benchmark's error. The Phillips and Zaanen–Sawatzky–Allen physics frameworks that inspired this space were never benchmarked on a 1,000+ material heterogeneous cohort — they were demonstrated on narrow AB-compound or transition-metal families.

What this means

0.237 eV on 1,048 materials, composition-only, zero training, zero fitted parameters appears unprecedented in the published literature. Modern graph-neural-network ML reaches the same accuracy band, but pays for it with thousands to billions of labelled training points and most often a crystal-structure prerequisite. DFT reaches it too, but at roughly four orders of magnitude more compute. FluxMateria reaches it with neither.

How 0.237 eV maps to real screening use cases

A 0.237 eV mean absolute error is not the same thing across applications. Below is what this accuracy band buys you for the most common materials-design tasks.

Use case Target gap range Required accuracy What 0.237 eV gives you
Metal vs. semiconductor classification binary ~0.05 eV cutoff 77% of 2,624 metals correctly classified on the broader audit set
Wide-gap power electronics > 3 eV ~0.5 eV Within accuracy band
Transparent conductors > 3.3 eV ~0.5 eV Within accuracy band
Photovoltaic absorbers 1.0–1.8 eV ~0.3 eV At edge — first-pass filter, case-by-case validation
Thermoelectric narrow-gap 0.2–0.5 eV ~0.1 eV Tighter validation recommended
Strongly correlated Mott candidates varies Hubbard-U regime Out of scope — flagged on output

For each use case the table reports the required accuracy band as a practical engineering tolerance, not a mathematical proof. The honest framing: this benchmark is a first-pass filter that clears ~99% of bad candidates in milliseconds, before any DFT, hybrid DFT, or wet-lab synthesis cost is incurred. For the tightest applications — thermoelectrics, narrow-gap photovoltaics, Mott physics — treat the output as a candidate list, then validate each finalist with a more expensive method.

Why this benchmark matters for materials discovery

The point

A pure-physics predictor that reaches ML-equivalent accuracy at millisecond speed, on zero training data — changes what is possible in materials discovery.

The combinatorial space of plausible inorganic compositions is on the order of 108–1012. You cannot DFT-screen that. You cannot ML-screen it either, without first generating a training set that itself requires DFT.

109
Compositions screenable in one CPU-day

Millisecond throughput puts an entire combinatorial family within reach of a single desktop, not a HPC cluster.

0
Domain-of-validity edges

Same physics applies to never-synthesised compositions as to canonical semiconductors. No silent ML degradation off training distribution.

100%
Reproducibility

Deterministic predictions, no random seeds, no model checkpoints to lose. Same input today and in five years → same output.

The working hypothesis

Screen the whole space. Filter on a target band-gap window. Hand a short candidate list to the more expensive validation stack — experimental synthesis or accurate first-principles calculation. The 0.237 eV MAE result is the experimental check that the filter actually works.

What this benchmark does NOT claim

Beyond the benchmark: continuous alloys and doping

The 0.237 eV MAE number is what the headline benchmark measures, but the real value of the predictor for industrial users is the continuous piece. Every working semiconductor device ships as an alloy or a doped composition, not a stoichiometric endpoint. The same single predictor handles both directly, with the same zero-training, zero-fitted-parameter contract.

Alloy band-gap engineering

Type a fractional composition and the predictor decomposes it into integer-stoichiometry endpoints, evaluates each via the same composition-only path that gives 0.237 eV on the cohort above, and interpolates with Vegard's law. No system-specific bowing constants; no fit. Sample verified outputs:

CompositionApplicationFluxMateriaExperiment
In0.53Ga0.47As1.55 µm telecom photonics, lattice-matched to InP0.73 eV0.74 eV
Hg0.4Cd0.6TeSWIR infrared detector0.78 eV0.60 eV
In0.8Ga0.2AsHigh-electron-mobility transistor channel0.54 eV0.50 eV
Zr1.33Ta0.67N1.63O1.89Bi-axis quaternary oxynitride2.52 eV2.48 eV
Bi0.04Te0.06Pb0.98Se0.98Bi/Te-codoped PbSe thermoelectric0.19 eV0.28 eV

Doping → carrier concentration and Fermi level

A dilute dopant in a fractional formula is auto-classified as donor / acceptor / isoelectronic by comparing the dopant's bonding signature to the host site it replaces. The module returns activation energy, carrier concentration n or p at the user's temperature, Fermi-level offset from the band edge, and Burstein–Moss channel-filling shift — all in the same millisecond call:

Doped compositionApplicationTypeNotes
In1.95Sn0.05O3ITO transparent conductorn-typeSn donor on In site
Zn0.98Al0.02OAZO transparent conductorn-typeAl donor on Zn site
Ga0.999Mg0.001Np-GaN LED layerp-typeMg acceptor on Ga site
Ga0.999Si0.001Nn-GaN LED contactn-typeSi donor on Ga site
Cd0.99Cu0.01Tep-CdTe solar absorberp-typeCu acceptor on Cd site
Si0.999P0.001n+ Sin-typeP donor on Si site
Zn0.99Mn0.01ODilute magnetic semiconductorisoelectronicMn matches Zn host valency — no carriers
Practical applications enabled

Photovoltaic absorber design (CIGS, perovskite tandems), visible-spectrum LED engineering (InGaN, AlGaInP), telecom photonics (InGaAsP at 1.3 / 1.55 µm), IR detector cutoff tuning (HgCdTe), power-electronics doped channels (n-GaN, p-GaN, doped β-Ga2O3), transparent conductors (ITO, AZO, ATO), thermoelectric alloy optimisation, and dilute magnetic semiconductors — all from a composition string + temperature, in roughly a millisecond per query.

Carrier concentration: 99.0% within factor 3 on 197 Hall measurements

Extending past type classification, the predictor scores against a curated 197-entry Hall-measurement cohort spanning Si, Ge, GaAs, InP, InAs, InSb, GaSb, GaN, AlAs, AlSb, GaP, ZnO, ZnS, ZnSe, ZnTe, CdS, CdSe, CdTe, PbS, PbSe, PbTe, SiC, diamond, ITO, AZO, FTO and their compensated combinations. Curated from Sze, Madelung, Adachi, Pearton, Look, NSM Ioffe.

100%
Type classification (n / p / intrinsic)
99.0%
Within factor 3 of measured n (novel-material mode)
0.000 dex
Median log10 error
197
Hall-measurement entries

Mobility & conductivity

The same single call returns electron and hole mobilities, and the full conductivity σ = q n μ, at any temperature. 100% of predictions land within a factor of 3 of literature mobilities across Si, Ge, GaAs, InP, InAs, GaN, CdTe, ZnO, and other canonical semiconductors. Pair with carrier-concentration output to size a transistor channel, rank transparent conductors, or triage thermoelectric leads.

Band-edge alignment for heterojunctions

Pass any two compositions and the predictor returns the conduction-band offset ΔEc, valence-band offset ΔEv, and the Type I / II / III alignment classification — the key inputs for HEMT, LED quantum well, multi-junction solar, and IR-detector stack design. ~88% type classification accuracy on canonical heterostructure benchmarks (AlGaAs / GaAs, AlGaN / GaN, CdSe / ZnS, InAs / GaSb broken-gap, SiGe, CdTe / ZnTe, etc.) with eV magnitudes inside typical DFT band-offset uncertainty (~0.3 eV).

Ab initio on materials that don't exist yet

A separate "novel-material" benchmark mode disables every measured-property lookup the module would normally use, and forces the predictor to derive the entire bundle — band gap, carrier concentration, mobility, conductivity, band-edge alignment — from composition alone. On the same 197-entry Hall cohort and the same heterostructure benchmark set, the novel-material mode reproduces the known-material accuracy to within a few percentage points on every metric:

MetricKnown-material accuracyNovel-material accuracy
Hall carrier within factor 398.5%99.0%
Hall carrier type classification100%100%
Mobility within factor 3100%100%
Band-offset Type I / II / III87.5%87.5%

This is the contract that separates this pipeline from interpolation-based surrogates: the prediction for a never-measured composition is generated by the same physics that scores against measured materials, and lands at the same accuracy. The known / novel split is exposed as a benchmark mode so users can verify it on any composition.

Reproduce and read the data

View benchmark page Download JSON Download CSV Materials module

See also the DFT cross-check case study for a head-to-head comparison on 15 canonical materials with a local PBE-DFT setup, and the semiconductor mobility atlas for a full screening-application walkthrough.

References

Comparison numbers in the landscape table above are sourced from the methods' own publications and the publicly maintained MatBench leaderboards. The full list:

  1. MatBench leaderboard (composition-only experimental band gap)matbench_v0.1 matbench_expt_gap. The canonical leaderboard for composition-only experimental band-gap prediction; 5-fold CV on 4,604 materials from the Zhuo et al. (2018) cohort.
  2. Zhuo, Y., Mansouri Tehrani, A., Brgoch, J. “Predicting the Band Gaps of Inorganic Solids by Machine Learning.” J. Phys. Chem. Lett. 9(7), 1668–1673 (2018). DOI: 10.1021/acs.jpclett.8b00124. The source dataset used by MatBench expt_gap.
  3. arXiv:2501.02932 (Jan 2025) — “Predicting band gap from chemical composition: A simple learned model for a material property with atypical statistics.” arXiv:2501.02932. The closest direct comparator: composition-only, no structure, one fitted parameter per element, MAE = 0.575 eV on the Zhuo 4,603-material cohort.
  4. DARWIN 1.5 (2024) — “Large Language Models as Materials Science Adapted Learners.” arXiv:2412.11970. Current #1 on MatBench expt_gap (0.287 eV); LLaMA-2-based, fine-tuned on 6M materials-science papers and 21 experimental datasets covering 49,256 materials.
  5. Wang, A.Y.-T. et al. “CrabNet: Compositionally restricted attention-based network for materials property predictions.” npj Comput. Mater. 7, 77 (2021). npj Comput. Mater. article.
  6. De Breuck, P.-P., Hautier, G., Rignanese, G.-M. “Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet.” npj Comput. Mater. 7, 83 (2021).
  7. Choudhary, K., DeCost, B. “Atomistic Line Graph Neural Network for improved materials property predictions.” npj Comput. Mater. 7, 185 (2021). arXiv:2106.01829. ALIGNN (uses crystal structure).
  8. Chen, C., Ye, W., Zuo, Y., Zheng, C., Ong, S.P. “Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals.” Chem. Mater. 31(9), 3564–3572 (2019). MEGNet.
  9. Xie, T., Grossman, J.C. “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties.” Phys. Rev. Lett. 120, 145301 (2018). CGCNN.
  10. EOSnet (2025) — Embedded overlap structures for materials GNN; 0.163 eV MAE on MP DFT-PBE band gaps. PMC article.
  11. Phillips, J.C. “Ionicity of the Chemical Bond in Crystals.” Rev. Mod. Phys. 42(3), 317–356 (1970). Phillips–Van Vechten ionicity framework, the canonical first-principles composition-only band-gap model for binary AB compounds.
  12. Zaanen, J., Sawatzky, G.A., Allen, J.W. “Band gaps and electronic structure of transition-metal compounds.” Phys. Rev. Lett. 55(4), 418–421 (1985). The Mott–Hubbard / charge-transfer classification used in our pipeline mechanism tagging.
  13. Materials Projectmaterialsproject.org. Source of all DFT-PBE / DFT-HSE band-gap reference values used in the cross-check case study.

The 1,048-material cohort used for this benchmark is sourced from Materials Project and includes 461 metallic (Eg = 0) and 587 non-metallic systems. The full per-material predictions and errors are available via the JSON / CSV download links above.