Battery Electrochemistry Benchmark

1.0

Family accuracy

6 calibrated holdout references

0.149 V

Voltage MAE

calibrated holdout

5 / 5

Scenario alignment

public battery decision cases

26.8 s

End-to-end workflow

latest local battery study

Methodology

Two benchmark layers were used because a serious battery engine should be accurate and also change its answer when the engineering question changes.

1. Calibrated holdout benchmark

The battery layer was evaluated against known cathode-family references using calibration rows plus an untouched holdout slice.

16 total reference materials
10 calibration rows
6 holdout rows
Tracked capacity, voltage, transport, cycle, electrolyte, interface, cost, and manufacturing errors

2. Scenario stress test

The same layer was then tested across different battery decision frames instead of only a single blended score.

Energy-dense cobalt-free search
High-voltage frontier screen
Fast-charge transport screen
Cycle-life conservative screen
Immediate build handoff

Important scope note: this benchmark validates FluxMateria's battery-native decision layer as a screening, triage, and prototype-handoff engine. It does not claim complete electrochemical or wet-lab validation. The point is to reduce the search space to better-supported build candidates and make the next physical experiment sharper.

Calibrated Holdout Summary

The holdout result checks whether the battery layer is directionally and quantitatively aligned with known battery families before it is used for novel ranking.

Metric	Result	Interpretation
Family accuracy	1.0	Every holdout reference was assigned to the correct high-level battery family.
Capacity MAE	0.812 mAh/g	Specific-capacity estimates stayed close to nominal literature-aligned reference values.
Voltage MAE	0.149 V	Average-voltage prediction remained within a tight screening-grade range on holdout materials.
Transport MAE	0.1427	Transport and rate-readiness signals were directionally consistent with known family behavior.
Cycle-life MAE	0.0642	Cycle/degradation heuristics stayed close to the nominal reference scoring used for calibration.
Electrolyte MAE	0.09	Electrolyte compatibility stayed well aligned with the known chemistry tradeoffs in the holdout slice.
Interface MAE	0.0883	Interface-readiness scoring tracked the reference set closely enough for shortlist triage.
Cost / manufacturing MAE	0.0372 / 0.0765	Cost and practical build signals stayed stable enough to support the handoff layer.
Energy-rank Spearman	0.9429	The model preserved the energy-ordering structure of the holdout references.

Scenario Alignment

The battery layer was then stressed across different engineering questions. All five public scenarios aligned with the intended family or material outcome.

Scenario	Primary metric	Observed top result	Margin	Why it matters
Energy-dense cobalt-free screen	battery_readiness_score	LiMnO2	5.2	The energy-focused screen lifted layered manganese oxide instead of collapsing back to cobalt-heavy chemistry.
High-voltage frontier	voltage_surrogate_V	LiNiPO4	0.2 V	The voltage layer correctly pushed very-high-voltage phosphate chemistry to the top when voltage itself was the question.
Fast-charge transport	rate_capability_proxy	LiMn2O4	0.006	The screen surfaced a 3D spinel transport leader, with Li4Ti5O12 essentially tied as the other valid 3D transport winner.
Cycle-life conservative	cycle_life_proxy	Li4Ti5O12	0.124	The long-life framing favored the most stability-oriented chemistry rather than the highest-energy candidate.
Immediate build handoff	prototype_handoff_priority_score	Li4Ti5O12	4.7	The handoff layer separated the best immediate prototype package from the highest-upside chemistry.

Pipeline Outcome

The same battery workflow produced different leaders as more battery-native logic was added. That is a feature, not a bug.

Bulk: LiNiO2 Interface: LiMnPO4 Battery-native: LiMnO2 Build: Li4Ti5O12

What this means: FluxMateria is not behaving like a single-score materials ranker. The battery layer changes its answer when the engineering question changes. That is exactly what a usable battery decision engine should do. Bulk energy density, interface readiness, balanced electrochemistry, and immediate prototyping are related questions, but they are not the same question.

Download Benchmark Package

Public benchmark materials for independent review and reuse.

Battery benchmark summary JSON

Headline holdout metrics, scenario alignment, and pipeline winner summary.

Download JSON

Public benchmark summary

Reader-facing markdown summary of the holdout and scenario benchmark.

Open summary

Battery case study

End-to-end application of the same battery layer inside a real candidate-ranking workflow.

Open case study

Scope and Limitations

What this benchmark supports

Battery-family-aware triage instead of generic materials ranking
Directionally useful screening across energy, voltage, transport, cycle, and build-readiness questions
Fast shortlist compression for prototype planning
Decision-layer validation before lab work

What this benchmark does not claim

It is not a substitute for real electrochemical testing.
It does not prove commercial superiority of any one candidate.
It does not replace cell build, cycling, safety, or manufacturing validation.
The public benchmark is intentionally narrower than the internal workflow details.

Explore the battery layer in context

Review the full battery case study, then see how the battery layer fits inside the broader FluxMateria materials stack.

Battery case study Back to Materials module

Battery Electrochemistry BENCHMARK