CASE STUDY — ENTERPRISE TCO / MATERIALS

5,008 DFT-grade property predictions. 13.5 seconds. The science that makes it possible.

313 materials × 16 properties, end-to-end. Accuracy audited against the public materials-universal benchmark: 1.17% MAPE on family holdout9 to 31 times more accurate than AFLOW, JARVIS, and MatBench on the same hard split. Total cost of ownership: dramatically lower than the realistic alternatives a discovery team faces today.

5,008
Property predictions
13.5 s
End-to-end wall-clock
1.17%
MAPE, family holdout
9–31×
More accurate than alternatives
0
Training labels required

The challenge

Materials discovery teams in industry and academia face a structural bottleneck that has nothing to do with ideas and everything to do with the realistic alternatives for getting accurate property predictions. None of the four standard pathways scale gracefully with the questions a modern discovery program needs to ask.

The practical effect is that programs constrain the scope of their screening campaigns to fit available compute capacity and integration overhead. Many lines of inquiry remain unattempted because the cost structure of the underlying tools rules them out.

The question

Can a first-principles physics engine — built from a fundamental scientific derivation rather than fitted to data — deliver DFT-grade accuracy across a broad property suite at workflow speed, with a unified output schema, and a total cost of ownership lower than the realistic alternatives an enterprise discovery program faces today?

Study design

The study reproduces a generic 16-property screen across the FluxMateria materials reference cohort — the same 313 materials and 16 properties used in the publicly audited materials-universal benchmark. Each material is evaluated on the full property suite. Wall-clock time is measured end-to-end, including I/O. The run is deterministic: identical inputs produce bit-identical outputs across machines.

1

Cohort

313 materials spanning 38 structural categories — FCC and BCC metals, HCP intermetallics, III-V and II-VI semiconductors, oxides, halides, perovskites, nitrides, transition-metal carbides, chalcogenides, and more. Same set used in the materials-universal benchmark.

2

Property suite

16 properties: crystal structure, lattice constant, atomic volume, band gap, optical band gap, dielectric constant, refractive index, reflectivity, hardness, Cv (mol and mass), Cp, thermal expansion, melting point, density. Audited strict and out-of-family aggregates: all under 1% MAPE.

3

Run

End-to-end execution measured on a single CPU core for deterministic reproducibility. Median per-prediction runtime 2.7 ms (universal engine). Total: 313 × 16 = 5,008 individual property evaluations, wall-clock from launch to JSON output.

4

Compare

Total cost of ownership benchmarked against the four realistic enterprise alternatives: DFT on HPC, in-house ML pipelines, stitched commercial stacks, and database lookup. Accuracy benchmarked against AFLOW, JARVIS, and MatBench on identical hard hold-outs.

What the engine consumes

  • Material composition (formula) only
  • No experimental property labels
  • No DFT-derived training data
  • No basis-set or functional choice
  • No transition-state or k-point convergence sweep

What the engine returns

  • 16 numerical properties with units
  • Per-prediction confidence band
  • Family classification + scope flag
  • Deterministic output (bit-identical re-runs)
  • Frozen JSON manifest with commit hash for audit

Results overview

FluxMateria completed all 5,008 property predictions in 13.5 seconds of wall-clock time, with accuracy matching the publicly audited materials-universal benchmark: 1.17% MAPE on family holdout. On the same hard split, the named alternatives report 36.07% (AFLOW), 10.92% (JARVIS), and 18.42% (MatBench) — FluxMateria is between 9 and 31 times more accurate than each of them.

13.5 s
Wall-clock end-to-end
313 materials × 16 properties
2.7 ms
Median per-prediction
Universal 16-property engine
1.17%
MAPE on family holdout
9–31× more accurate than alternatives

Wall-clock measured on a single CPU core for deterministic reproducibility. Production deployment scales horizontally. Accuracy figure from publicly audited materials-universal benchmark, sealed to commit f4fb848.

Total cost of ownership: the annualized view

Enterprise discovery programs do not buy a single screen — they buy a capability. The relevant comparison is the annualized cost of running materials property prediction as an ongoing function: licenses, personnel, compute, integration, and the engineering cycles spent keeping the pipeline alive. Below, the four realistic alternatives a materials program faces today, costed honestly.

Capability pathway Annualized cost Time to a 5,000-result screen Structural limitation
DFT on HPC cluster $300K–$900K Weeks–months wall-clock Cluster queueing; postdocs to operate
In-house ML pipeline $500K–$1.5M Seconds (post-training) Out-of-distribution failure on novel chemistry; drift retraining
Stitched commercial stack (Schrödinger + databases + internal ML) $350K–$700K Days–weeks (mix of lookup + DFT) Integration overhead; lookup-only for known compounds
Database lookup only ~free Seconds, but only if compound exists Falls back to full DFT for novel compositions
FluxMateria enterprise By tier — lower than the alternatives above 13.5 seconds, real-time turnaround Scope (current property suite + benchmarked families)

Annualized cost ranges represent typical industry benchmarks for ongoing capability: HPC pathway includes cluster allocation plus 1–2 computational personnel; ML pathway includes a small ML team plus data labeling plus drift retraining; stitched commercial stack includes typical pharma/materials site licenses ($200K–$500K range) plus integration engineering plus residual cluster compute for novel compositions. These are not headline list prices; they reflect what enterprise programs actually spend over a fiscal year. FluxMateria pricing is enterprise-tiered and disclosed under NDA.

Decision quality dominates the line items

A single materials program redirected away from a dead end on the basis of a wrong DFT or ML prediction can absorb the full annual capability budget across any of the pathways above. A program steered to the correct candidate via a 1.17% MAPE first-principles screen recovers that budget through experimental and personnel cycles avoided downstream.

Accuracy: validation against the public benchmark

The accuracy of the engine is validated against the publicly audited materials-universal benchmark — same engine, same property suite, same cohort, frozen to commit f4fb848. The numbers below come directly from the frozen JSON manifest released on that page.

Test FluxMateria AFLOW adapter JARVIS adapter MatBench adapter
S2 family holdout (overall MAPE) 1.17% 36.07% 10.92% 18.42%
S3 interaction holdout (overall MAPE) 1.38% 35.36% 10.94% 18.42%
Universal 16-property strict (worst property) 0.947%
Crystal structure mismatch 0.0%

Source: materials_physics_external_family_holdout_2026-02-24.json, commit f4fb848. NA on adapter rows = the named adapter does not score those tests.

In short: 9 to 31 times more accurate than the named alternatives on the same hard split. Not a small accuracy concession in exchange for speed — an accuracy gain on top of the speed gain.

Operational implications

First-principles accuracy at workflow latencies enables a class of design and review operations that conventional pathways constrain by compute budget or output-schema mismatch.

Inverse search at scale

Spec-driven materials discovery over millions of trial compositions per session. Candidates are filtered to an experiment-ready shortlist before wet-lab or DFT cycles are committed.

Sub-millisecond per-property latency

Median per-property runtime of 2.7 ms supports interactive iteration within the design loop, in place of queued execution against an HPC cluster.

Coverage of novel chemistry

No training distribution to extrapolate beyond. Compositions outside any prior dataset are evaluated within scope by construction, not handled as silent extrapolations.

Audit-grade reproducibility

Deterministic, bit-identical output across machines. No basis-set, functional, or k-point dependency. Every run produces a frozen JSON manifest with commit hash, suitable as primary computational evidence for IP filings and regulatory pre-submission.

DFT redirected to confirmation

FluxMateria functions as the first-pass filter; DFT and wet-lab are reserved for the top candidates that survive. Existing HPC and laboratory budgets shift from speculative screening to high-confidence confirmation.

Unified output schema

Sixteen properties, one engine, one output schema with per-prediction confidence bands. No cross-tool reconciliation required. Output integrates with chemist dashboards, decision packets, and inverse-search workflows.

Honest scope

DFT-grade accuracy at workflow speed is a strong claim. It deserves a clean fence around what is and is not in scope.

In scope

  • FCC, BCC, HCP metals and intermetallics
  • III-V, II-VI, IV-VI semiconductors
  • Diamond and zincblende network solids
  • Rocksalt, fluorite, perovskite oxides
  • Halides, nitrides, chalcogenides
  • Spinel and double-perovskite structures
  • Magnetic intermetallics and ferrites (separate Curie benchmark)
  • 16 properties: structural, electronic, optical, thermal, mechanical

Out of scope (today)

  • Strongly-correlated electron systems (heavy fermion, Mott insulator) beyond benchmarked families
  • Defect-mediated properties beyond the curated gemstone-color mini-benchmark
  • Surface and interface energetics (separate workstreams)
  • Replacing single-crystal X-ray diffraction for structure determination
  • Properties not in the validated 16-property suite without prior calibration audit

The accuracy and cost numbers in this case study apply to the cohort and property suite documented in the materials-universal benchmark. Extending to new families is a documented engineering process — not a free claim.

Conclusion

13.5 seconds
to predict 16 properties across 313 materials, end-to-end
1.17% MAPE
on hard family holdout, publicly audited
9–31× more accurate
than AFLOW, JARVIS, MatBench on the same split
TCO below alternatives
across DFT HPC, in-house ML, and stitched commercial stacks

FluxMateria delivers DFT-grade accuracy across a 16-property materials suite at workflow latencies, with no fitting and no training data. On the publicly audited family-holdout split, overall MAPE is 1.17% — between 9 and 31 times closer to experiment than the AFLOW, JARVIS, and MatBench adapters scored on the same materials. Annualized capability cost sits below the four enterprise pathways a materials program faces today: high-throughput DFT on cluster, in-house ML pipelines, stitched commercial stacks, and database lookup with DFT fall-back.

Accuracy and throughput are co-derived outputs of a single first-principles physics model. The model does not require functional, basis-set, or k-point selection; both metrics are consequences of the same axiomatic derivation, validated against external public datasets and frozen to a downloadable JSON manifest.

Technical specifications

Reference cohort
313 materials across 38 structural categories
Property suite
16 properties spanning structural, electronic, optical, thermal, and mechanical outputs
Validation protocol
S2 family holdout (19 folds, 171 unseen formulas); S3 interaction holdout (15 folds, 175 unseen formulas)
Public adapters scored
AFLOW, JARVIS, MatBench — on identical strict splits
Reported overall MAPE
FluxMateria 1.17% · AFLOW 36.07% · JARVIS 10.92% · MatBench 18.42%
Per-prediction runtime
2.7 ms median, universal 16-property engine; deterministic single-CPU-core wall-clock for reproducibility
Output
Per-property values with units and confidence band; frozen JSON manifest with commit hash
Reproducibility anchor
f4fb848fd7fa55be1b68d4e7592f1330553f1112, snapshot 2026-02-24

Reproducibility & audit

Accuracy figures sealed to commit f4fb848fd7fa55be1b68d4e7592f1330553f1112, snapshot 2026-02-24. The materials-universal benchmark page links the frozen JSON manifest, the per-fold strict scoring outputs, and the external adapter scoring on identical hold-outs. Wall-clock figures are reproducible from the per-prediction runtime numbers reported here. TCO ranges reflect typical industry benchmarks for ongoing capability and are independently verifiable from publicly cited license and personnel cost models.

Validate FluxMateria on your own materials

Submit a held-back set of compositions with measurements not yet published. FluxMateria runs blind; validation is performed by your team against your internal data. Co-authorship on the resulting work is welcomed.

Materials Benchmark Propose a Validation Study