# Materials Physics Benchmark Snapshot (Public Export)

Date: February 24, 2026  
Snapshot ID: `materials_physics_benchmark_snapshot_2026-02-24`

## Freeze Metadata

| Field | Value |
|---|---|
| Git commit | `f4fb848fd7fa55be1b68d4e7592f1330553f1112` |
| Git short commit | `f4fb848f` |
| Generated at (UTC) | `2026-02-24T20:50:56.039717+00:00` |
| Machine-readable freeze artifact | `benchmarks/data/materials_physics_benchmark_snapshot_2026-02-24.json` |

Security note: Internal script names, local paths, and command-line details were removed in this public export.

## Public Artifact Hashes (SHA256)

| Artifact Label | SHA256 |
|---|---|
| `manifest_family_holdout` | `ba61990460b34b72b372d3e11eadebf3c2276c54905bad2534376be17e3f14c5` |
| `manifest_interaction_holdout` | `7001d92b44d0221497c5a50cb8866349267e0eba8aecd8bf8067f88a1acb0540` |
| `validation_dataset` | `91c03ff42391f47439541dd02431e2912a62b6ce5d25e165d00224662be0a63d` |
| `adapter_aflow` | `c411af4e6bb84474a6048124ca36d13f6f418e3ff942747196923b1bdbea15bd` |
| `adapter_jarvis` | `559302f88458e0d176cba982b7257b19ce941209835f62f92e24059c2f6b7e63` |
| `adapter_matbench` | `13dd30858e19ac083a119de016fea7edaef871348b4380c8269ed635af127043` |
| `adapter_manifest` | `5f20c57693785f211c8188d04c54a35e881a8d5d575bea485bd1f518fc44140a` |
| `runtime_engine_module` | `136105ba8041730076350a8d3d25b2c2f35028cf24d7768868e0785dd5bc8e1d` |
| `offline_rule_derivation_module` | `d69ae02e39fcec8f6aa9ce6e26362cce8d3b3060ed95965d8b479163daf0ac42` |
| `scoring_module` | `10adac3cbd38aa804c4be19492f3392f427f3857ad427ad85ce0d385697d2b14` |
| `output_strict_s2` | `73cf358893501d30727dbe8045f16d31a3c5152c2313db832bec5765f01abdd1` |
| `output_strict_s3` | `6cd990f9efb6541a80e5e7188aee305b597827797b755f84723d9c77eaa81fb8` |
| `output_external_s2` | `2fbd81455a0540883d94915a2f2b1233378cbb5d9efe7893166a3b71fe96a1e9` |
| `output_external_s3` | `49c54489c68396d4faeed78ed6448f756879e78d8e9d9002083fcaa676bbe9bf` |

## Evaluation Workflow (Public Summary)

1. Strict scoring on family holdout (S2) with FLUX.
2. Strict scoring on interaction holdout (S3) with FLUX.
3. Strict scoring on S2 and S3 with FLUX plus external baseline adapters (AFLOW, JARVIS, Matbench).

## Strict FLUX Results

| Protocol | Overall MAPE | B | theta_D | v_sound | density | kappa | Runtime mean |
|---|---:|---:|---:|---:|---:|---:|---:|
| S2 family holdout | 1.1668% | 0.9056% | 1.3283% | 1.4405% | 1.2231% | 0.9185% | 0.3294 ms/call |
| S3 interaction holdout | 1.3750% | 1.0149% | 1.5636% | 1.6081% | 1.3398% | 1.3473% | 0.3205 ms/call |

## External Apples-to-Apples Results

S2 family holdout:

| Model | Overall MAPE | B | theta_D | v_sound | density | kappa |
|---|---:|---:|---:|---:|---:|---:|
| FLUX | 1.1668% | 0.9056% | 1.3283% | 1.4405% | 1.2231% | 0.9185% |
| AFLOW adapter | 36.0693% | 14.3580% | 15.6790% | 29.5091% | 5.5695% | 122.9277% |
| JARVIS adapter | 10.9230% | 15.4191% | NA | NA | 6.3425% | NA |
| Matbench adapter | 18.4214% | 18.4214% | NA | NA | NA | NA |

S3 interaction holdout:

| Model | Overall MAPE | B | theta_D | v_sound | density | kappa |
|---|---:|---:|---:|---:|---:|---:|
| FLUX | 1.3750% | 1.0149% | 1.5636% | 1.6081% | 1.3398% | 1.3473% |
| AFLOW adapter | 35.3566% | 14.5881% | 14.1047% | 30.0588% | 5.5493% | 119.6271% |
| JARVIS adapter | 10.9356% | 15.4816% | NA | NA | 6.2780% | NA |
| Matbench adapter | 18.4247% | 18.4247% | NA | NA | NA | NA |

## Coverage Caveat

Coverage differs across baselines and properties; adapters with unsupported channels report `NA` and are not directly comparable on those missing channels.
