🔬 LeMat-GenBench: A Unified Benchmark for Generative Models of Crystalline Materials

Generative machine learning models hold great promise for accelerating materials discovery, particularly through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it difficult to evaluate, compare and further develop these ML models meaningfully.

LeMat-GenBench introduces a unified benchmark for generative models of crystalline materials, with standardized evaluation metrics for meaningful model comparison, diverse tasks, and this leaderboard to encourage and track community progress.

📄 Paper: arXiv | 💻 Code: GitHub | 📧 Contact: siddharth.betala-ext [at] entalpic.ai, alexandre.duval [at] entalpic.ai

LeMat-GenBench

Group	Metrics	Direction
Validity	Valid, Charge Neutral, Distance Valid, Plausibility Valid	↑ Higher is better
Uniqueness & Novelty	Unique, Novel	↑ Higher is better
Energy Metrics	E Above Hull, Formation Energy, Relaxation RMSD (with std)	↓ Lower is better
Stability	Stable, Unique in Stable, SUN	↑ Higher is better
Metastability	Metastable, Unique in Metastable, MSUN	↑ Higher is better
Distribution	JS Distance, MMD, FID	↓ Lower is better
Diversity	Element, Space Group, Atomic Site, Crystal Size	↑ Higher is better
HHI	HHI Production, HHI Reserve	↓ Lower is better

GenBench Leaderboard

GenBench Leaderboard

CrystaLLM-pi 📄 ✅	LeMat-Bulk	95.7%	95.1%	70.5%	12.4%	33.4%	0.2%	15.0%	0.1834	-0.4975	0.3878


MatterGen 📄 ⚡	MP-20	95.7%	95.1%	70.5%	2.0%	33.4%	0.2%	15.0%	0.183400	-0.702000	0.387800
PLaID++ 📄 ⚡	MP-20	96.0%	77.8%	24.2%	12.4%	60.7%	1.0%	7.6%	0.085400	-0.497500	0.128600
DiffCSP 📄	MP-20	95.7%	94.8%	66.2%	2.3%	29.8%	0.1%	8.5%	0.274700	-0.636700	0.585700
WyFormer-DFT 📄 ⚡	MP-20	95.2%	95.0%	66.4%	3.7%	24.8%	0.4%	7.8%	0.270800	-0.666000	0.417300
CrystaLLM-pi 📄 ✅	LeMat-Bulk	93.6%	93.0%	26.4%	2.1%	31.6%	0.3%	7.4%	0.269100	-0.083400	0.219300
DiffCSP++ 📄	MP-20	95.3%	95.1%	62.0%	1.0%	26.4%	0.2%	5.0%	0.409300	-0.518900	0.693300
SymmCD 📄	MP-20	73.4%	73.0%	47.0%	1.4%	18.6%	0.1%	2.4%	0.876100	-0.019500	0.872000
LLaMat2 📄		84.4%	81.4%	30.0%	0.7%	34.7%	0.1%	2.1%	0.439500	-0.472600	0.536300
WyFormer 📄 ⚡	MP-20	93.4%	93.0%	66.4%	0.5%	15.7%	0.1%	1.9%	0.498800	-0.430600	0.812100
CrystalFormer 📄	MP-20	69.9%	69.4%	31.8%	1.4%	28.8%	0.0%	3.1%	0.703900	-0.168900	0.658500
ADiT 📄	MP-20	90.6%	87.8%	26.0%	0.4%	36.5%	0.0%	1.0%	0.333300	-0.733900	0.379400
CrystalFlow 📄		80.7%	80.7%	69.4%	0.2%	4.0%	0.0%	0.4%	0.966700	0.060900	1.182200
LLaMat3 📄		15.4%	15.2%	10.5%	0.1%	2.1%	0.0%	0.2%	1.706700	0.742300	1.005700
Crystal-GFN 📄	MP-20	51.7%	51.7%	51.7%	0.0%	0.0%	0.0%	0.0%	2.085800	-1.303600	1.866500
AFLOW ◆		91.4%	91.4%	21.5%	9.3%	30.4%	0.0%	0.7%	0.345400	-0.189500	0.123800
Alexandria ★		93.4%	93.4%	0.0%	2.3%	27.4%	0.0%	0.0%	0.438800	-0.183200	0.142500
OQMD ★		96.8%	96.4%	0.0%	5.3%	29.1%	0.0%	0.0%	0.393800	-0.227500	0.140900

Symbol Legend:

📄 Paper available (click to view)
✅ Model output verified
⚡ Structures were already relaxed
★ Contributes to LeMat-Bulk reference dataset (in-distribution)
◆ Out-of-distribution relative to LeMat-Bulk reference dataset

Verified submissions mean the results came from a model submission rather than a CIF submission.

Models marked as baselines appear below the separator line at the bottom of the table.