Public battery cell test data, harmonized and sealed in one schema (Parquet + JSON).
celljar reads raw files from 9 published sources - ORNL Leaf, HNEI Kollmeyer, MATR (Severson 2019), CLO (Attia 2020), BILLS eVTOL, MOHTAT 2021, NASA PCoE, SNL Preger, Naumann - and writes them to a canonical schema with four entities: cell_metadata + test_metadata (JSON), timeseries + cycle_summary (Parquet). Query all sources via one SQL statement (DuckDB / pandas / Polars).
Scope: harmonization only. celljar focuses on measurements - unit conversion and schema normalization. It deliberately leaves fitting and modeling to downstream tools that specialize in those steps.
The full harmonized bundle lives at huggingface.co/datasets/mihnathul/celljar. Query it directly - no clone needed:
import duckdb
df = duckdb.sql("""
SELECT * FROM 'https://huggingface.co/datasets/mihnathul/celljar/resolve/main/timeseries.parquet'
WHERE test_id = 'ORNL_LEAF_2013_HPPC_25C'
""").df()Pandas and Polars work the same way against the HuggingFace URL.
Browser viewer - clone the repo (a PyPI release is on the roadmap):
git clone https://github.com/mihnathul/celljar.git
cd celljar
pip install -e ".[viewer]"
streamlit run apps/viewer.py # fetches from HuggingFace by defaultPin a release for reproducibility: CELLJAR_HF_REVISION=v0.2.1 streamlit run apps/viewer.py.
Regenerate locally from raw sources: same setup, then python examples/demo_end_to_end.py and CELLJAR_LOCAL=1 streamlit run apps/viewer.py.
| Source | Chemistry | Cells | Test types | Raw data |
|---|---|---|---|---|
| ORNL Leaf 2013 | mixed (LMO/NCA pouch) | 1 | HPPC × 3 temperatures | bundled |
| HNEI (Kollmeyer) | NCA (Panasonic NCR18650PF) | 1 | HPPC, drive cycle, capacity_check, cycle_aging | download |
| MATR (Severson 2019) | LFP (A123 18650) | 119 | Cycling-to-failure | download |
| CLO (Attia 2020) | LFP (A123 18650) | 45 | Cycling, BO-optimized fast-charge | download |
| BILLS / eVTOL (Bills 2023) | NMC (Sony US18650VTC6) | 22 | Drive cycle (flight-duty) + RPTs | download |
| MOHTAT (Mohtat 2021) | NMC (UMich NMC532 pouch) | 31 | Cycle aging + synchronous expansion | download |
| NASA PCoE | LCO (vendor undisclosed, 2.0 Ah 18650) | 34 | Cycle aging | download |
| SNL Preger 2020 | LFP / NMC / NCA grid (18650) | 87 | Cycle aging across T × DoD × C-rate | download |
| Naumann 2018/2020 | LFP / graphite | 17 calendar + 17 cycle | Calendar + cycle aging (summary-only) | download |
Four entities joined by cell_id and test_id:
cell_metadata.json hardware (chemistry, capacity, form factor)
test_metadata.json protocol, SOH, provenance, license
timeseries.parquet V / I / T per-sample + signed running coulomb count (∫I dt)
cycle_summary.parquet per-cycle aggregates (capacity, R_DC, …) for aging studies
Conventions: SI units. Timestamps relative. Missing data is explicit null. Current is positive = charge (into the cell), negative = discharge.
Authoritative field list + types in schemas/ (JSON Schema). Pandera mirrors at runtime in celljar/harmonize/harmonize_schema.py.
-- Single test's timeseries
SELECT timestamp_s, voltage_V, current_A, temperature_C
FROM 'data/harmonized/timeseries.parquet'
WHERE test_id = 'ORNL_LEAF_2013_HPPC_25C'
ORDER BY timestamp_s;-- Cross-source filter - same query works across all sources
SELECT cell_id, test_id, temperature_C_min
FROM 'data/harmonized/tests/*.json'
WHERE test_type = 'hppc' AND temperature_C_min = 25;Same patterns from Python via duckdb.sql(...).df() or pl.read_parquet(..., filters=[...]).
Parameterization · modeling · aging studies · cross-source analysis.
Out of scope: field/fleet telemetry; ML cycling-life prediction (use BatteryLife (KDD 2025) - 990 cells, 18 baselines). OCV/R0 extractors, ECM/SPM/DFN fitting, ML modeling all live in separate companion repos.
celljar tries to fit alongside, not replace, the other excellent tools in this space:
- Battery Data Commons - registry indexing 300+ public battery datasets. Great for discovery; celljar complements it by providing a harmonized data layer for a subset of those sources.
- Iontech (Shiyun Liu) - curated index of open-source battery monitoring & modeling datasets (RWTH home-storage, NREL failure databank, Stanford second-life, etc.) with paper links. Another good starting point for discovering datasets celljar hasn't yet harmonized.
- BatteryLife / BatteryML - cycling-to-failure ML benchmark (KDD 2025). Optimized for lifetime-prediction ML; celljar keeps the full V/I/T timeseries that physics-based parameterization (ECM/SPM/DFN) needs.
- More sources (CALCE, RWTH, HUST, Tongji, XJTU; Ecker 2015 + Chen 2020 for DFN parameterization)
- PyPI release (
pip install celljar) - SOH methodology iteration
- BDF-export converter
See CONTRIBUTING.md. Issues, ideas, and PRs welcome.
The science here belongs to the original authors; celljar simply puts their data in one place with a shared schema. Please cite their papers when you use the data, and, if it's helpful, celljar alongside.
- celljar code (this repository): MIT (
LICENSE). - Harmonized bundle (packaging, schema, derived fields): CC-BY-4.0.
- Upstream raw data retains each publisher's original license - see per-source provenance in
data/raw/<source>/.
To make attribution easy, every test_metadata row carries its own source_doi, source_citation, source_license, and source_license_url. You can pull the references for any analysis with one query:
import duckdb
duckdb.sql("""
SELECT DISTINCT source_doi, source_citation, source_license
FROM 'data/harmonized/tests/*.json'
WHERE test_id IN ('ORNL_LEAF_2013_HPPC_25C', 'HNEI_NCA_HPPC_25C')
""").df()If you'd like to cite celljar:
@software{celljar,
author = {Mihna Neerulpan},
title = {celljar: Public Battery Test Dataset Harmonization with a Canonical Schema},
year = {2026},
url = {https://github.com/mihnathul/celljar},
}celljar exists because of the labs and authors who designed, ran, and openly published these experiments - work that took years of careful instrumentation and analysis. Thank you to:
Phillip Kollmeyer (HNEI) · G. Wiggins, S. Allu, H. Wang (ORNL) · K. Severson, P. Attia et al. (MATR, CLO; Stanford / MIT / TRI) · A. Bills et al. (BILLS; CMU) · P. Mohtat et al. (UMich) · B. Saha, K. Goebel (NASA PCoE) · Y. Preger et al. (Sandia) · M. Naumann et al. (TUM) · M. Ecker et al. (RWTH Aachen)