Independent Validation

Independent validation starts here.

FluxMateria’s public benchmarks are the starting point. We invite external researchers to choose blind datasets, define metrics, and score frozen predictions independently.

How a validation study works

Small, scoped tests produce clearer evidence than broad claims.

  1. Choose a validation track.
  2. Define the property and metric.
  3. Freeze the dataset before prediction.
  4. Hide targets where possible.
  5. Receive frozen predictions.
  6. Score independently.
  7. Document the outcome.

Choose a validation track

Each track is designed around a narrow first test, frozen predictions, and independently scored results.

Chemistry Core

Bond and thermochemistry tests

Bond lengths, bond dissociation energies, reaction enthalpies, thermochemical quantities, and spectroscopy-adjacent properties.

  • Best for: molecular reference datasets and thermochemical compilations.
  • First test: 20-100 blind bond lengths or bond energies.
  • Output: versioned predictions with units, confidence, and status labels.
Open track →
Materials Holdout

External materials splits

Band gaps, densities, elastic constants, thermal properties, electrochemical properties, catalyst descriptors, and magnetic properties.

  • Best for: materials informatics, DFT benchmarking, and experimental materials labs.
  • First test: out-of-family holdout set chosen by the external group.
  • Output: row-level predictions and family-level error analysis.
Open track →
Life Science / ADMET

Endpoint and selectivity tests

Solubility, PPB, BBB, CYP, hERG, DILI, permeability, target identification, binding affinity, and selectivity endpoints.

  • Best for: CADD, cheminformatics, pharmacology, and translational modeling teams.
  • First test: one endpoint, one split, one pre-registered metric.
  • Output: scores or classes with confidence and mechanistic annotations when available.
Open track →
Reaction Mechanisms

Mechanisms, barriers, and ranking

SN1/SN2/E1/E2/E1cb classification, activation barriers, mechanism ranking, catalyst scoring, and microkinetic inputs.

  • Best for: physical organic chemistry, catalysis, and reaction informatics groups.
  • First test: a blind mechanism or activation-barrier set with defined scope.
  • Output: mechanisms, barriers, confidence, units, and module versions.
Open track →
Experimental Validation

Predictions before measurement

External measurement of a molecular, materials, catalytic, spectroscopic, or biological property predicted before the result is known.

  • Best for: labs able to measure a clear property or rank a small batch.
  • First test: one property class, one small batch, one experimental readout.
  • Output: frozen predictions ready for measurement comparison.
Open track →
Evidence Packets

Templates and scoring structure

Packet templates define scope, input rows, target files, output schemas, scoring metrics, module labels, and version manifests.

  • Best for: turning a validation idea into a frozen, scoreable packet.
  • First step: request a packet for one track and one property.
  • Output: a small package that can be scored independently.
Open packets →

Prediction basis labels

Benchmark pages disclose the route behind reported results so validators can compare like with like.

Label Meaning
Flux Physics Computed from Flux physical terms for the reported endpoint. Benchmark references are used to score accuracy, not to look up each prediction.
Flux-Calibrated Physics Flux physical model with a fixed endpoint calibration applied before evaluation.
Flux Hybrid Flux physics signals combined with endpoint-specific reference evidence for tasks where local chemical context is part of the production route.
Flux Decision Engine Flux scoring, ranking, or selection workflow evaluated against benchmark outcomes.
Flux Preview Early public result or demonstration that is useful context but not the primary benchmark claim.
Mixed basis Aggregate result set containing more than one route; row-level labels identify the basis for each result family.

What validators receive

Frozen packet

Dataset templates, target-file templates, prediction-output schema, metric definitions, and version manifest.

Clear policies

Authorship, acknowledgement, confidentiality, and negative-results policy before predictions are generated.

Authorship policy · Negative-results policy

Scoped claims

Module status labels and declared boundaries so a result can be interpreted as in-domain, out-of-domain, or boundary-defining.

Start with one property and one metric

Positive, mixed, and negative results are all useful when the scope is clear.