The proof: same cohort, same protocol, four methods
Three modern empirical force fields — OpenFF Sage 2.2.0, GAFF2, and MMFF94 — were
evaluated on the same 98 independent experimental rotors as FluxMateria, with the
same relaxed-scan protocol (24 angles per rotor, harmonic torsion restraint, full minimisation of
all non-torsion DOF, barrier = max − min). No cherry-picking. No training-test
overlap. The result, side-by-side:
| Force field |
Fitted params |
MAE (kJ/mol) |
RMSE |
Within ±2 kJ/mol |
FLUX ratio |
| FluxMateria (this work) |
0 |
0.83 |
1.16 |
90 % |
— |
| OpenFF Sage 2.2.0 (Boothroyd 2023) |
hundreds |
9.73 |
12.94 |
16 % |
11.6× |
| GAFF2 (He 2020, AmberTools) |
~1,500 |
11.04 |
17.32 |
24 % |
13.2× |
| MMFF94 (Halgren 1996, RDKit) |
~5,000 |
14.08 |
34.20 |
25 % |
16.8× |
Each of these three force fields reports ~1–1.5 kJ/mol MAE on its own
training-and-test set. On the strictly independent set of 98 experimentally-measured rotational
barriers used here, that generalisation gap is exposed: typical errors expand to 9.7–14.1 kJ/mol,
and only 16–25 % of cases land within 2 kJ/mol of experiment.
Force fields not available for a local same-set run — OPLS-AA (Jorgensen 1996,
MAE ~1.5 kJ/mol on own set), OPLS-3e / OPLS-4 (Schrödinger
commercial, MAE ~0.6–0.85 kJ/mol on own set), AMBER ff14SB (Maier 2015,
~1.5 kJ/mol on protein backbone), CHARMM CGenFF (Vanommeslaeghe 2010,
~1–2 kJ/mol on own set) — carry 1,000–5,000 fitted torsion parameters and
report their accuracies on chemistry within their own training distribution.
The headline
Three modern empirical force fields evaluated on the same 98 experimental rotors return MAE
9.73 (Sage 2.2), 11.04 (GAFF2), and 14.08 (MMFF94) kJ/mol. FluxMateria stays at
0.83 kJ/mol on the same set with no torsion training data and no empirical torsion
table. Predictions are deterministic and the same evaluation applies whether the
rotor has been observed before or not.