CASE STUDY — DFT CROSS-CHECK

We ran first-principles DFT against the engine, locally, on 15 canonical materials.

Plane-wave PBE in GPAW — the same DFT package the open-source materials community uses every day — on Si, Ge, GaAs, GaN, ZnO, MgO, TiO₂, NaCl, Al, Cu, Fe, Ni, graphite, h-BN, and MoS₂. Same fixed inputs. Three-layer comparison: engine vs DFT, engine vs experiment, DFT vs experiment.

0.2%

Lattice MAPE (median 0.1%)

7.6%

Band gap MAPE (DFT-PBE 45.1%)

3.6%

Magnetic moment MAPE (PBE 9.0%)

~25,000×

Mean speedup vs DFT

The challenge

FluxMateria takes a chemical formula and returns 40+ material properties from first-principles physics, in milliseconds, with zero fitted parameters. Predictions match experiment to within a few percent across thousands of compounds. But the obvious skeptic question is: are those numbers real, or do they look right because we curated the dataset?

The cleanest answer is to put the engine head-to-head with first-principles density functional theory — the workhorse ab-initio method that has carried materials science for thirty years — on a fixed, externally-specified material set. Fixed materials, fixed DFT settings, no per-material adjustments.

The question

On a canonical 15-material validation set spanning the families commonly used in solid-state DFT benchmarks — Si, Ge, GaAs, GaN, ZnO, MgO, TiO₂, NaCl, Al, Cu, Fe, Ni, graphite, h-BN, MoS₂ — how does FluxMateria's composition-only prediction compare to plane-wave PBE-DFT, side by side, on lattice constant, band gap, and magnetic moment?

The setup

We installed GPAW 25.7 and ASE 3.28 in a clean WSL2 Ubuntu environment and built a benchmark harness that runs both engines on the same manifest. For each material the harness records lattice constant, cell volume, total energy, band gap, magnetic moment, and wall-clock time. Three numbers come out of every row: engine error vs DFT, engine error vs experiment, and DFT error vs experiment.

Layer 1 — engine vs DFT

Method agreement

Are we converging on the same answer as DFT?

Both methods run on the same manifest. Where they agree, the engine is converging on the same answer this PBE setup gives, from a fundamentally different (and dramatically faster) computational path.

Layer 2 — engine vs experiment

Engine accuracy

How close to reality is FluxMateria?

This is the headline accuracy claim. Experimental lattice constants and band gaps come from Madelung 2004 and the CRC Handbook 95^th ed.

Layer 3 — DFT vs experiment

DFT's ceiling

What can plane-wave PBE actually deliver?

PBE has well-known systematic errors — band gaps underestimated by 30–50% on wide-gap insulators, no van der Waals on layered materials. Reporting it side by side keeps everyone honest.

DFT settings are deliberately ordinary: PBE exchange-correlation, 200 eV plane-wave cutoff, 6³ k-mesh (6×6×4 for hexagonal cells), Fermi–Dirac smearing. Magnetic metals (Fe, Ni) use spin-polarised PBE with a band buffer to converge cleanly. This is a standard, inexpensive PBE screening setup — the kind of first-pass DFT used for rapid materials triage, not a fully-converged hybrid-functional or GW reference. The accuracy claims on this page are against this specific PBE setup, not against DFT in general. At Tier 1 the DFT lattice is fixed to the experimental input (no relaxation); at Tier 2 the lattice is relaxed via a 7-point equation-of-state scan.

The results

Property	FluxMateria vs experiment	DFT (PBE) vs experiment	Verdict
Lattice constant a	MAPE 0.2% · median 0.1%	Reference (lattice fixed at experiment)	Strong composition-only lattice prediction
Band gap	MAPE 7.6% · median 1.2%	MAPE 45.1% · median 50.7%	Engine beats PBE
Magnetic moment	MAPE 3.6% (Fe, Ni)	MAPE 9.0% (Fe, Ni)	Engine beats PBE on Fe/Ni
Speed per material	~3 ms (per call)	seconds–minutes (PBE) · hours (hybrid / GW)	~25,000× measured

What the numbers say

Lattice constant. 14 of 15 materials match experiment to within 1%; only TiO₂-rutile sits just outside that band at 1.1%. The structural channel is tight: median error 0.1%, MAPE 0.2%. Earlier passes had remaining error concentrated in wurtzite in-plane lattice (GaN, ZnO ~6%) and layered systems (graphite, h-BN ~8%, MoS₂-2H +32%); the latest structural-geometry refinements bring all of those under 1%.

Band gap. Engine median error 1.2%; DFT-PBE median 50.7%. The engine matches Si, Ge, GaAs, GaN, ZnO, MoS₂, NaCl to within 0–7% of experiment. PBE's well-known wide-gap underestimate shows up at MgO (3.13 eV vs 7.83 experimental, −60%), h-BN (3.84 vs 5.96, −36%), and ZnO (0.93 vs 3.37, −72%) — the engine doesn't inherit that systematic. The remaining engine outliers are MgO (−26.6%, wide-gap ionic class under audit) and h-BN (−33.3%).

Magnetic moment. Fe: experiment 2.22 μ_B, engine 2.26 (+1.9%), DFT 2.20 (−1.0%). Ni: experiment 0.62 μ_B, engine 0.65 (+5.4%), DFT 0.72 (+16.8%). Both engine moments come from a single composition-only call. This is a direct DFT-vs-engine comparison on a known-hard property — spin-polarised PBE on FM transition metals is not a cheap calculation, and the engine matches Fe's moment more tightly than DFT does, while DFT in turn matches Ni a bit looser than the engine.

Speed. The DFT side of the Tier 2 EOS sweep took about 22 minutes on a modern laptop CPU. The engine completed the same 15-material panel in 1.4 seconds total wall time, with typical per-material calls around 3 ms. The benchmark page reports the per-material speedup at ~25,000× (or ~950× if you charge the engine's one-time 600 ms first-call import to the 15-material panel).

What this means

The benchmark establishes four things on a fixed, reproducible material set, against this specific PBE screening setup:

Engine predicts lattice composition-only with 0.1% median error (0.2% MAPE across all 15 materials). 14 of 15 at sub-1% (TiO₂-rutile sits just outside at 1.1%). At Tier 1 the DFT side is fixed to the experimental lattice (so DFT has 0% by construction) — the comparison here is engine-vs-experiment, not engine-vs-DFT, but it demonstrates that the engine reaches DFT-grade structural accuracy from a chemical formula alone.
Engine band gaps beat this PBE setup on the same set. 7.6% MAPE vs 45.1% MAPE on the 10 materials with finite gaps. This isn't a fair-fight statement about hybrid functionals or GW — it's a head-to-head against a standard PBE screening setup, on the canonical materials the field uses to validate.
Engine magnetic moments beat this PBE setup on Fe and Ni. 3.6% MAPE vs 9.0% MAPE. Small-N (2 magnetic materials in this benchmark), but the moments come from the same composition-only call.
Engine bulk modulus is stable where fast-PBE EOS is noisy. Median 0.7% off experiment, MAPE 6.0% across all 15 materials. The same fast-PBE EOS produces noisy B values for several materials at this DFT cost (Cu 1036 GPa vs 140 experimental, MgO 922 vs 160). Production-quality DFT would recover these at higher wall-time cost; we did not run that comparison.

No per-material fitting or ML training is used in this benchmark. The engine consumes a chemical formula, runs first-principles physics, and returns the same predictions any caller would get from the public API. The DFT side has full crystal-structure input and ran on the same laptop a graduate student would use. The 15-material manifest, the DFT settings, and the full numerical results are downloadable on the benchmark page.

What's next

The shipped scope is Tier 1 (single-point lattice / band gap / magnetic moment) plus Tier 2 (relaxed lattice + bulk modulus from a 7-point Birch–Murnaghan equation-of-state). Two extensions are in scope on the roadmap:

Tier 2B — anisotropic relaxation for wurtzite (GaN, ZnO), tetragonal (TiO₂), bcc-magnetic (Fe), and vdW-bonded layered (graphite, h-BN, MoS₂) cells where isotropic strain can't capture independent c/a relaxation
Tier 3 — DFPT-derived phonon spectra, sound velocity, dielectric function, and elastic constants C₁₁, C₁₂, C₄₄: the most expensive end of the DFT toolchain, where the engine's composition-only milliseconds-per-material story is the most striking against DFT's hours-per-material

Layered materials and wurtzite in-plane geometry, previously flagged for engine refinement, are now under 1% lattice error after the structural-geometry refinements landed. Remaining residuals concentrate in wide-gap ionic band gap (MgO) and layered B (h-BN out-of-plane), both under audit in the next iteration.

See the full benchmark

Per-material errors across both tiers, three-layer scorecard, full DFT settings, and downloadable JSON / CSV / Markdown artifacts.

Full Benchmark Page → Materials Module Request Access