Catalyst Scoring BENCHMARK
This page reports the public benchmark scope for FluxMateria's catalyst layer as deployed: through the full production scoring stack, with enriched materials properties enabled, and with inverse-search validation included.
This page reports the public benchmark scope for FluxMateria's catalyst layer as deployed: through the full production scoring stack, with enriched materials properties enabled, and with inverse-search validation included.
The current public benchmark artifact was generated through the public production stack and confirmed enriched-property scoring on all 96 references.
| Metric | Result | Interpretation |
|---|---|---|
| Classification accuracy | 100.0% | All 96 reference catalysts were assigned to the correct high-level reaction family. |
| Mean Spearman rho | 0.9000 | Mean Spearman rank correlation across all literature-grounded ranking tests, after surface-corrected adsorption-energy descriptors replaced raw dimer BDEs. |
| Pairwise ranking accuracy | 0.9344 | The engine gets the ordering right for 93.4% of benchmarked catalyst pairs. |
| Top-1 accuracy | 91.7% | The literature winner is the top-ranked candidate in 11 of 12 ranking tests. |
| Ranking tests passed | 12 / 12 | 10 strict passes plus 2 documented limitations (Pt-Ni-skin ORR, Ru-vs-Co intrinsic FT activity) that depend on second-layer physics not yet captured. |
| Scenario alignment | 9 / 10 | Changing the design goal changes the winner in the physically expected way. |
| Inverse-search convergence | 9 / 9 | The inverse-search presets converge to real industrial or literature-backed catalyst families. |
| Full-stack throughput | 30 / s | 96 catalysts scored end-to-end through HTTP, surface-energy enrichment, and catalyst ranking in about 3.2 seconds. |
| FLUX-enriched coverage | 96 / 96 | Every reference row in the benchmark used the FLUX-enriched API path. |
The ranking suite spans ammonia synthesis, Fischer-Tropsch, selective oxidation, hydrodesulfurization, reforming, water-gas shift, and electrocatalysis.
| Ranking test | Spearman rho | Status | What it checks |
|---|---|---|---|
| Ammonia synthesis volcano | 0.900 | PASS | Ru remains top and promoted Fe outranks unpromoted Fe in the known volcano frame. |
| FT intrinsic activity | 1.000 | PASS | RuTiO2 > CoSiO2 > FeCoSiO2 > NiSiO2 on the corrected API path. |
| FT support effect | 1.000 | PASS | CoSiO2 > CoTiO2 > CoAlO3 after support-prior tuning in the scorer. |
| Ammonia Ru support effect | 1.000 | PASS | Ru > RuCeO2 > RuMgO is preserved while the FT fix is applied. |
| WGS low-temperature ranking | 1.000 | PASS | CuZnO > CuZnAlO4 > CuCeO2 remains intact on the enriched path. |
| All remaining public ranking tests | 1.000 | PASS | EO selectivity, HDS promotion, reforming coke resistance, EO promotion, FT selectivity, and ORR all land exactly on the expected ordering. |
The catalyst layer is also benchmarked as a decision system. Changing the question should change the winner.
| Scenario | Winner | Status | Why it matters |
|---|---|---|---|
| Cheapest ammonia catalyst | FeKAlO4 | PASS | Cheap promoted iron wins when the brief changes from pure activity to cost-aware industrial realism. |
| Most active ammonia catalyst | Ru | PASS | The same stack flips to ruthenium when activity is the only objective. |
| Most stable FT catalyst | CoMnSiO2 | PASS | Promotion and support effects alter the leader when durability matters more than raw activity. |
| Scale-up WGS catalyst | CuZnAlO4 | PASS | The engine shifts toward a scale-aware industrial WGS composition instead of a narrow activity winner. |
| All other public scenarios | 6 more aligned winners | PASS | EO, HDS, reforming, FT cost, and electrocatalysis frames all move to the expected candidate family. |
The strongest proof is not only ranking known catalysts. It is inverse search rediscovering the families that industrial catalysis eventually converged on.
Top inverse-search outputs: Co, Fe, FeBaO
The engine independently rediscovers the promoted-iron logic behind Haber-Bosch instead of drifting into arbitrary compositions.
Top outputs: CoMnSiO2, CoMnReSiO2, CoMnAlO3
Mn/Re-promoted cobalt on inert supports is exactly where serious FT optimization campaigns spent years of work.
Top outputs: AgCsClAlO3, AgCsReAlO3, AgCsAlO3
The engine lands directly on the Ag/Cs/Re industrial EO family rather than needing it to be hand-fed.
Top outputs: CoMoAlS3, NiWAlS3, CoMoS3
The standard industrial CoMo/NiW sulfide families emerge from the inverse-search presets as expected.
Representative outputs: WCeO2, WLaO2, CuZnCeO3
When incumbents are deliberately excluded, the search still lands in chemically plausible adjacent families rather than drifting into unsupported outputs.
9 / 9 inverse-search tests passed.
That is the strongest public evidence that the catalyst layer is not only a scorer, but a usable discovery engine.
Layer 11, the surface-adsorption-energy predictor that feeds the catalyst cycle, is calibrated against verified primary-literature experimental chemisorption heats — not DFT.
DFT adsorption energies carry 0.15-0.35 eV method-spread across PBE, RPBE, BEEF-vdW on the same system. Single-crystal calorimetry (Campbell, King) and TDS (Ertl, Christmann) give the same values at ±0.05-0.1 eV. The calibration target should be the tighter ground truth.
28 verified points across H / N / O / S on 10 transition metals. Every value sourced to a specific paper with DOI, confirmed via WebSearch + abstract fetch, cross-checked against a second citing source. Built before any physics engine update to prevent fit-to-test.
Two new physics terms are documented in the Layer 11 method note. The engine includes a magnetic exchange enhancement for bulk-magnetic 3d metals (Cr / Mn / Fe / Co / Ni) and an sp-contribution term for d10 closed-shell metals (Cu / Ag / Au / Zn / Cd / Hg); calibrated subroutes are disclosed where they affect scores.
| Adsorbate | N points | MAE (eV) | Primary source(s) |
|---|---|---|---|
| H on TM(111) | 7 | 0.44 | Christmann, Ertl, Feulner (TDS) |
| N on Fe / Ru | 4 | 0.47 | Bozso-Ertl-Grunze-Weiss, Rosowski-Hinrichsen-Muhler-Ertl |
| O on TM(111) | 8 | 0.63 | Karp-Campbell SCAC, Stuckless-King, Campbell (TPD) |
| S on Ni / Pt | 2 | 1.98 | Perdereau-Oudar, Heegemann-Meister-Bechtold-Hayek — documented physics gap |
| Overall | 21 scored | 0.667 | (CO systems out-of-scope for atomic-adsorbate model) |
Runnable via python dev/case-studies/run_layer11_experimental.py. Dataset at validation_data/surface_adsorption_experimental.json.
The published benchmark measures the operational stack that users call, rather than a reduced direct-function path.
Every benchmark call goes through the live public scoring path, matching the route used for public demonstrations.
The catalyst layer is tested as a ranking engine, a decision engine, and an inverse-search engine.
Important scope note: this benchmark validates FluxMateria as a catalyst screening, ranking, and inverse-discovery layer. It does not claim to replace catalyst synthesis, reactor testing, or long-horizon deactivation studies. Its role is to compress the search space and make the next experimental step more targeted.
The benchmark proves the public scoring and inverse-search layer behaves correctly. The case study shows what that looks like inside a discovery workflow.
Measures a workflow or ranking engine built on Flux-derived signals. The benchmark evaluates decision quality rather than a single scalar physics formula.