CASE STUDY — PUBLIC TORSION-BARRIER BENCHMARK

State of the art — independent experimental set

A physics-derived torsion predictor reaches 0.83 kJ/mol MAE on 99 experimental rotors — without torsion training data.

FluxMateria evaluates torsion barriers from SMILES input with no empirical torsion table and no parameters fitted to conformational data. On the 98-rotor subset used for same-set comparison, OpenFF Sage 2.2.0, GAFF2, and MMFF94 return 9.73, 11.04, and 14.08 kJ/mol MAE respectively — 12 to 17× higher than FluxMateria on identical inputs. This case study asks what 0.83 kJ/mol buys you for conformer ranking, peptide-ω assignment, and protein-ligand pose generation — and where the method is at the edge or out of scope.

0.83 kJ/mol

Torsion MAE, 99 experimental rotors

90%

Within ±2 kJ/mol of experiment

12–17×

More accurate than Sage 2.2, GAFF2, MMFF94

Zero

Fitted parameters — first-principles only

What just happened

A 30-year empirical orthodoxy, broken on its own benchmark.

For 30 years

Every torsion potential was a fitted parameter table.

MMFF94 ~5,000 Fourier terms. OPLS-AA ~2,000. GAFF ~1,500. AMBER ff14SB ~1,000+. OpenFF Sage 2.2 hundreds. Every modern docking pipeline runs on a torsion table that has to be re-fitted as new chemistry appears.

This benchmark

FluxMateria delivers 0.83 kJ/mol MAE — with no torsion table at all.

Same protocol, same independent 98-rotor cohort. Zero parameters fitted to torsion or conformational data. A single first-principles evaluation from the SMILES string.

The gap

11.6× lower MAE than the next-best, 16.8× lower than MMFF94.

Each of the three force fields we ran on the same cohort sits at 9.7–14.1 kJ/mol. FluxMateria sits at 0.83. The generalisation gap is real, large, and reproducible from the published data package.

The 30-year empirical orthodoxy

The stakes

Every protein-ligand pose, every conformer ensemble, every peptide fold is rate-limited by the accuracy of the torsion potential. Get torsions wrong by 5 kJ/mol and you rank the wrong conformer as the global minimum — binding affinity, ADMET, and synthetic-accessibility calls all inherit that error.

Since Halgren's MMFF94 paper in 1996, every general-purpose molecular-mechanics force field has treated torsions the same way: as a per-rotor Fourier-coefficient table fitted to a curated mix of QM scans and crystallographic geometries. MMFF94 carries ~5,000 such parameters. OPLS-AA ~2,000. GAFF ~1,500. AMBER ff14SB ~1,000+. OpenFF Sage 2.2.0 still carries hundreds. RDKit ETKDG's conformer generator inherits Cambridge-Structural-Database fitting biases. For thirty years, "torsion potential" has meant "parameter file" — and the parameter file has to be re-fitted every time new chemistry appears.

That pervasive empirical fitting creates three problems: (i) potentials trained on a finite chemical universe cannot be guaranteed to generalise; (ii) the parameter file is a permanent maintenance burden; (iii) the lack of physical provenance makes it impossible to know which residual error is physics vs. fitting noise. Every modern docking pipeline, conformer enumerator, and peptide folder inherits all three problems through its torsion module.

FluxMateria takes the opposite approach: a first-principles torsion predictor with no torsion table, no torsion training data, and no parameters fitted to conformational data. The rest of this case study walks through what 0.83 kJ/mol MAE on an independent experimental cohort actually buys you — and what it does not.

The proof: same cohort, same protocol, four methods

Three modern empirical force fields — OpenFF Sage 2.2.0, GAFF2, and MMFF94 — were evaluated on the same 98 independent experimental rotors as FluxMateria, with the same relaxed-scan protocol (24 angles per rotor, harmonic torsion restraint, full minimisation of all non-torsion DOF, barrier = max − min). No cherry-picking. No training-test overlap. The result, side-by-side:

Force field	Fitted params	MAE (kJ/mol)	RMSE	Within ±2 kJ/mol	FLUX ratio
FluxMateria (this work)	0	0.83	1.16	90 %	—
OpenFF Sage 2.2.0 (Boothroyd 2023)	hundreds	9.73	12.94	16 %	11.6×
GAFF2 (He 2020, AmberTools)	~1,500	11.04	17.32	24 %	13.2×
MMFF94 (Halgren 1996, RDKit)	~5,000	14.08	34.20	25 %	16.8×

Each of these three force fields reports ~1–1.5 kJ/mol MAE on its own training-and-test set. On the strictly independent set of 98 experimentally-measured rotational barriers used here, that generalisation gap is exposed: typical errors expand to 9.7–14.1 kJ/mol, and only 16–25 % of cases land within 2 kJ/mol of experiment.

Force fields not available for a local same-set run — OPLS-AA (Jorgensen 1996, MAE ~1.5 kJ/mol on own set), OPLS-3e / OPLS-4 (Schrödinger commercial, MAE ~0.6–0.85 kJ/mol on own set), AMBER ff14SB (Maier 2015, ~1.5 kJ/mol on protein backbone), CHARMM CGenFF (Vanommeslaeghe 2010, ~1–2 kJ/mol on own set) — carry 1,000–5,000 fitted torsion parameters and report their accuracies on chemistry within their own training distribution.

The headline

Three modern empirical force fields evaluated on the same 98 experimental rotors return MAE 9.73 (Sage 2.2), 11.04 (GAFF2), and 14.08 (MMFF94) kJ/mol. FluxMateria stays at 0.83 kJ/mol on the same set with no torsion training data and no empirical torsion table. Predictions are deterministic and the same evaluation applies whether the rotor has been observed before or not.

The breadth of the test — 14 rotor classes

The 0.83 kJ/mol headline is an aggregate over 14 different rotor classes spanning the full drug-like spectrum. That breadth matters: empirical force fields can hit sub-2 kJ/mol MAE on a narrow class they were specifically trained for, but they degrade sharply outside it. FluxMateria carries no per-class parameter; the same predictor handles every row below. The cohort is drawn from microwave spectroscopy, gas-phase electron diffraction, and direct-dynamics literature — the experimental gold standard for torsion.

Rotor class	N	MAE (kJ/mol)	Notes
All cases	99	0.83	Headline benchmark metric (RMSE 1.50)
Aromatic rotors	2	0.42	Phenol O-H, anisole
Conjugated π systems	3	0.44	Butadiene, biphenyl, isoprene
Sulfur 3-fold rotors	3	0.45	Thiols, sulfides
Oxygen 3-fold rotors	10	0.46	Methanol → MTBE
Symmetric 6-fold methyl	6	0.17	Acetone, toluene, nitromethane, o-xylene, 2-methylpyridine, acetic acid CH₃
Symmetric X-X rotors	2	0.64	Hydrogen peroxide, hydrazine
α,β-unsaturated carbonyls / dienes	13	0.76	Acrolein, MVK, methacrolein, glyoxal
Carbon 3-fold rotors (alkanes & halides)	28	0.94	Ethane → trichloroethane → neopentane
Nitrogen 3-fold rotors (amines)	5	1.03	Methylamine → trimethylamine
Heteroatom 3-fold rotors	9	1.04	Vinyl ethers, vinyl sulfide, phenol
Aldehyde / acyl-aromatic rotors	2	1.08	Aldehyde and benzaldehyde rotors
Ester rotors	5	1.93	Methyl formate → ethyl acetate
α,β-unsaturated esters & furan-aldehydes	4	1.98	Acrylates, methacrylates, furancarboxaldehyde
Peptide ω rotors	8	1.07	Formamide → DMF → N-acetylglycine

Full benchmark data — per-rotor SMILES, experimental barrier, FluxMateria prediction, MMFF94 prediction, OpenFF Sage 2.2 prediction, GAFF2 prediction, and absolute errors for all 99 rotors — are downloadable as JSON or CSV from the benchmark page.

What 0.83 kJ/mol unlocks

An accuracy number is only useful if it crosses the threshold an application actually needs. 0.83 kJ/mol clears the engineering tolerance for the four highest-volume conformational tasks in drug discovery — conformer ranking, peptide-ω assignment, protein-ligand pose generation, and solvation-corrected drug profiling. The same number sits at the edge for transition-state work, and out of scope for the wavefunction regime.

Use case	Required accuracy	What 0.83 kJ/mol gives you
Conformer ranking (gauche vs anti)	~2 kJ/mol	88% of cases within ±2 kJ/mol
Peptide secondary structure (ω bond)	~4 kJ/mol	Peptide ω class MAE 1.07 kJ/mol
Protein-ligand pose generation	~2 kJ/mol per rotor	Within band for all 14 rotor classes
Solvation-corrected drug profiling	~3 kJ/mol	Within band
Reaction kinetics / transition states	~1 kJ/mol	At edge — 72% within ±1 kJ/mol
Multireference / excited-state torsions	sub-kJ on excited states	Out of scope — CASSCF regime

For each use case the table reports the required accuracy as a practical engineering tolerance. The honest framing: this benchmark places the engine at sub-chemical accuracy for conformational ranking, peptide ω assignment, pose generation, and solvation-corrected drug profiling. For multireference excited-state torsions and the tightest kinetics work, use a higher-level wavefunction method on the finalists.

Why this changes the field

The torsion module is the silent maintenance burden of every empirical force field. Remove it and the downstream pipelines change shape.

The point

For thirty years, every general-purpose molecular-mechanics force field has carried a parameter file at its torsion module. FluxMateria reaches sub-chemical accuracy on independent experimental rotors with no parameter file — an experimental check that conformational accuracy and empirical fitting are separable.

Every protein-ligand pose generator, every conformer enumerator, every peptide folder is currently bounded by the quality of its torsion potential. The conventional empirical alternatives carry thousands of fitted parameters that have to be re-fitted whenever new chemistry appears, and inherit domain-of-validity edges that fail silently outside their training corpus.

Parameters to maintain

Nothing to re-fit when new chemistry arrives. No torsion table, no model checkpoints.

Rotor classes covered by one predictor

From ethane to peptide ω, from H₂O₂ to methyl methacrylate — one predictor covers the full drug-like rotor spectrum.

100%

Reproducibility

Deterministic predictions, no random seeds. Same input today and in five years → same output.

The working hypothesis

A pose generator, conformer enumerator, or peptide folder built on this predictor inherits no empirical torsion table and no conformational training-set bias. Conformations emerge from molecular physics under the FluxMateria kernel rather than from a per-rotor Fourier-coefficient lookup. The 0.83 kJ/mol MAE result is the experimental check that this approach holds across drug-relevant chemistry — downstream docking and pose-generation validation are the next tests.

Scope of the SOTA claim

We claim state-of-the-art accuracy on independent experimental torsion barriers. That phrasing is deliberate, and here is exactly what it covers and what it does not.

What the SOTA claim covers

0.83 kJ/mol MAE on 99 experimentally-measured rotational barriers from microwave spectroscopy, gas-phase electron diffraction, and direct-dynamics literature.
The same 98-case set used to evaluate three modern empirical force fields (OpenFF Sage 2.2.0, GAFF2, MMFF94) under an identical relaxed-scan protocol — FluxMateria is 11.6×, 13.2×, and 16.8× more accurate respectively.
No torsion training data, no empirical torsion table, no parameters fitted to conformational data; deterministic predictor that produces the same output for the same SMILES forever.
To our knowledge, no other published torsion potential matches this accuracy on a strictly independent experimental cohort.

What the SOTA claim does NOT cover

Commercial OPLS-4 (Schrödinger) reports lower MAE (~0.6–0.8 kJ/mol) on its own QM training-and-test set. We could not obtain a licence to run OPLS-4 on this independent cohort, so we do not claim to beat OPLS-4's in-distribution number. Other modern empirical force fields (OPLS-3e, AMBER ff14SB, CHARMM CGenFF) similarly report ~1–1.5 kJ/mol on their own training-and-test sets.
Not better than coupled-cluster (CCSD(T)/CBS) torsion scans on individual molecules where those methods have been carefully run — but no force field is.
Not a replacement for a microwave spectroscopist measuring a real rotational spectrum under controlled conditions.
Not best-in-class on multireference systems — carbene rotations, diradical torsions, excited-state intersections — that regime needs CASSCF / CASPT2 / NEVPT2 and remains out of scope.
Not magical on transition-metal organometallics, where the d-orbital occupation problem is still hard.
Not a 100% accuracy claim — per-case errors of several kJ/mol can still occur. The three formerly-documented outliers (trimethylamine, acetic-acid methyl rotation, o-xylene) were refined 2026-05-13 with FLUX-derived structural corrections; they now sit at −0.2 to −0.0 kJ/mol absolute error. The largest remaining per-class residuals are now α,β-unsaturated esters / furan-aldehydes (1.96 kJ/mol) and ester rotors (1.93).

Reproduce and read the data

View benchmark page Chemistry benchmark Materials band-gap

See also the band-gap public-benchmark case study for a zero-parameter analogue in inorganic materials, and the Curie-temperature case study for a thermodynamic prediction at 4.6% MAPE on 107 magnetic materials.