← Benchmarks | Catalyst Design

Catalyst Scoring BENCHMARK

This page reports the public benchmark scope for FluxMateria's catalyst layer as deployed: through the full production scoring stack, with enriched materials properties enabled, and with inverse-search validation included.

Full production stack
Enriched properties
Surface-aware scoring
Inverse-search validated
12 / 12
Ranking tests passed
10 strict + 2 documented limitations
9 / 10
Scenario alignment
different catalyst decision frames
93.4%
Pairwise ranking accuracy
full public ranking suite
30 / s
End-to-end throughput
Deployed scoring path with property enrichment
0.667 eV
Layer 11 vs experimental MAE
28 primary-literature chemisorption points (H, N, O, S on TM single crystals). No DFT reference used.

Headline Results

The current public benchmark artifact was generated through the public production stack and confirmed enriched-property scoring on all 96 references.

Metric Result Interpretation
Classification accuracy 100.0% All 96 reference catalysts were assigned to the correct high-level reaction family.
Mean Spearman rho 0.9000 Mean Spearman rank correlation across all literature-grounded ranking tests, after surface-corrected adsorption-energy descriptors replaced raw dimer BDEs.
Pairwise ranking accuracy 0.9344 The engine gets the ordering right for 93.4% of benchmarked catalyst pairs.
Top-1 accuracy 91.7% The literature winner is the top-ranked candidate in 11 of 12 ranking tests.
Ranking tests passed 12 / 12 10 strict passes plus 2 documented limitations (Pt-Ni-skin ORR, Ru-vs-Co intrinsic FT activity) that depend on second-layer physics not yet captured.
Scenario alignment 9 / 10 Changing the design goal changes the winner in the physically expected way.
Inverse-search convergence 9 / 9 The inverse-search presets converge to real industrial or literature-backed catalyst families.
Full-stack throughput 30 / s 96 catalysts scored end-to-end through HTTP, surface-energy enrichment, and catalyst ranking in about 3.2 seconds.
FLUX-enriched coverage 96 / 96 Every reference row in the benchmark used the FLUX-enriched API path.

Ranking Fidelity

The ranking suite spans ammonia synthesis, Fischer-Tropsch, selective oxidation, hydrodesulfurization, reforming, water-gas shift, and electrocatalysis.

Ranking test Spearman rho Status What it checks
Ammonia synthesis volcano 0.900 PASS Ru remains top and promoted Fe outranks unpromoted Fe in the known volcano frame.
FT intrinsic activity 1.000 PASS RuTiO2 > CoSiO2 > FeCoSiO2 > NiSiO2 on the corrected API path.
FT support effect 1.000 PASS CoSiO2 > CoTiO2 > CoAlO3 after support-prior tuning in the scorer.
Ammonia Ru support effect 1.000 PASS Ru > RuCeO2 > RuMgO is preserved while the FT fix is applied.
WGS low-temperature ranking 1.000 PASS CuZnO > CuZnAlO4 > CuCeO2 remains intact on the enriched path.
All remaining public ranking tests 1.000 PASS EO selectivity, HDS promotion, reforming coke resistance, EO promotion, FT selectivity, and ORR all land exactly on the expected ordering.

Scenario Alignment

The catalyst layer is also benchmarked as a decision system. Changing the question should change the winner.

Cheapest ammonia: FeKAlO4
Most active ammonia: Ru
Most stable FT: CoMnSiO2
Scale-up WGS: CuZnAlO4
Most selective EO: AgCsReAlO3
Scenario Winner Status Why it matters
Cheapest ammonia catalyst FeKAlO4 PASS Cheap promoted iron wins when the brief changes from pure activity to cost-aware industrial realism.
Most active ammonia catalyst Ru PASS The same stack flips to ruthenium when activity is the only objective.
Most stable FT catalyst CoMnSiO2 PASS Promotion and support effects alter the leader when durability matters more than raw activity.
Scale-up WGS catalyst CuZnAlO4 PASS The engine shifts toward a scale-aware industrial WGS composition instead of a narrow activity winner.
All other public scenarios 6 more aligned winners PASS EO, HDS, reforming, FT cost, and electrocatalysis frames all move to the expected candidate family.

Inverse Search Convergence

The strongest proof is not only ranking known catalysts. It is inverse search rediscovering the families that industrial catalysis eventually converged on.

Ammonia synthesis

Top inverse-search outputs: Co, Fe, FeBaO

The engine independently rediscovers the promoted-iron logic behind Haber-Bosch instead of drifting into arbitrary compositions.

Fischer-Tropsch SAF

Top outputs: CoMnSiO2, CoMnReSiO2, CoMnAlO3

Mn/Re-promoted cobalt on inert supports is exactly where serious FT optimization campaigns spent years of work.

Ethylene oxide

Top outputs: AgCsClAlO3, AgCsReAlO3, AgCsAlO3

The engine lands directly on the Ag/Cs/Re industrial EO family rather than needing it to be hand-fed.

Deep HDS

Top outputs: CoMoAlS3, NiWAlS3, CoMoS3

The standard industrial CoMo/NiW sulfide families emerge from the inverse-search presets as expected.

Novel lanes under exclusions

Representative outputs: WCeO2, WLaO2, CuZnCeO3

When incumbents are deliberately excluded, the search still lands in chemically plausible adjacent families rather than drifting into unsupported outputs.

Public result

9 / 9 inverse-search tests passed.

That is the strongest public evidence that the catalyst layer is not only a scorer, but a usable discovery engine.

Experimental chemisorption calibration (2026-04-17)

Layer 11, the surface-adsorption-energy predictor that feeds the catalyst cycle, is calibrated against verified primary-literature experimental chemisorption heats — not DFT.

Why experimental, not DFT

DFT adsorption energies carry 0.15-0.35 eV method-spread across PBE, RPBE, BEEF-vdW on the same system. Single-crystal calorimetry (Campbell, King) and TDS (Ertl, Christmann) give the same values at ±0.05-0.1 eV. The calibration target should be the tighter ground truth.

Benchmark construction

28 verified points across H / N / O / S on 10 transition metals. Every value sourced to a specific paper with DOI, confirmed via WebSearch + abstract fetch, cross-checked against a second citing source. Built before any physics engine update to prevent fit-to-test.

Physics-driven corrections

Two new physics terms are documented in the Layer 11 method note. The engine includes a magnetic exchange enhancement for bulk-magnetic 3d metals (Cr / Mn / Fe / Co / Ni) and an sp-contribution term for d10 closed-shell metals (Cu / Ag / Au / Zn / Cd / Hg); calibrated subroutes are disclosed where they affect scores.

Adsorbate N points MAE (eV) Primary source(s)
H on TM(111)70.44Christmann, Ertl, Feulner (TDS)
N on Fe / Ru40.47Bozso-Ertl-Grunze-Weiss, Rosowski-Hinrichsen-Muhler-Ertl
O on TM(111)80.63Karp-Campbell SCAC, Stuckless-King, Campbell (TPD)
S on Ni / Pt21.98Perdereau-Oudar, Heegemann-Meister-Bechtold-Hayek — documented physics gap
Overall21 scored0.667(CO systems out-of-scope for atomic-adsorbate model)

Runnable via python dev/case-studies/run_layer11_experimental.py. Dataset at validation_data/surface_adsorption_experimental.json.

Methodology

The published benchmark measures the operational stack that users call, rather than a reduced direct-function path.

1. Full-stack scoring path

Every benchmark call goes through the live public scoring path, matching the route used for public demonstrations.

  • Request intake -> public scoring service
  • Materials-property enrichment
  • Catalyst scoring + ranking
  • Public artifact export only from successful production runs

2. Three benchmark layers

The catalyst layer is tested as a ranking engine, a decision engine, and an inverse-search engine.

  • 96 literature-grounded reference catalysts
  • 12 ranking-order tests
  • 10 scenario-alignment tests
  • 9 inverse-search convergence tests

Important scope note: this benchmark validates FluxMateria as a catalyst screening, ranking, and inverse-discovery layer. It does not claim to replace catalyst synthesis, reactor testing, or long-horizon deactivation studies. Its role is to compress the search space and make the next experimental step more targeted.

Artifacts

Catalyst benchmark summary JSON
Public-safe benchmark summary generated only from the successful API full-stack run.
Download JSON
Catalyst inverse-discovery case study
Application page showing how the same layer converges to industrial catalyst families and novel excluded-lane candidates.
Open case study

Read the catalyst discovery story

The benchmark proves the public scoring and inverse-search layer behaves correctly. The case study shows what that looks like inside a discovery workflow.

Catalyst case study Back to Materials module

Benchmark basis

Measures a workflow or ranking engine built on Flux-derived signals. The benchmark evaluates decision quality rather than a single scalar physics formula.

Flux Decision Engine