[1] Cerqueira, T. F. et al. Identification of Novel Cu, Ag, and Au Ternary Oxides from
Global Structural Prediction. Chemistry of Materials 27, 4562–4573 (2015).
[2] Zhu, B. & Scanlon, D. O. Predicting Lithium Iron Oxysulfides for Battery Cathodes.
ACS Applied Energy Materials 5, 575–584 (2022).
[3] Harper, A. F., Evans, M. L. & Morris, A. J. Computational Investigation of Copper
Phosphides as Conversion Anodes for Lithium-Ion Batteries. Chemistry of Materials
32, 6629–6639 (2020).
[4] Oganov, A. R., Pickard, C. J., Zhu, Q. & Needs, R. J. Structure prediction drives
materials discovery. Nature Reviews Materials 4, 331–348 (2019).
[5] Oganov, A. R. Modern Methods of Crystal Structure Prediction (John Wiley & Sons,
2011).
[6] Pickard, C. J. & Needs, R. High-Pressure Phases of Silane. Physical Review Letters
97, 045504 (2006).
[7] Pickard, C. J. & Needs, R. Ab initio random structure searching. Journal of Physics:
Condensed Matter 23, 053201 (2011).
[8] Oganov, A. R. & Glass, C. W. Crystal structure prediction using ab initio evolu-
tionary techniques: Principles and applications. The Journal of Chemical Physics
124, 244704 (2006).
[9] Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine
learning for molecular and materials science. Nature 559, 547–555 (2018).
[10] Podryabinkin, E. V., Tikhonov, E. V., Shapeev, A. V. & Oganov, A. R. Accelerating
crystal structure prediction by machine-learning interatomic potentials with active
learning. Physical Review B 99, 064114 (2019).
[11] Choudhary, K. et al. Recent advances and applications of deep learning methods in
materials science. npj Computational Materials 8, 59 (2022).
[12] Goodfellow, I. et al. Generative Adversarial Nets. In Ghahramani, Z.,
Welling, M., Cortes, C., Lawrence, N. & Weinberger, K. (eds.) Advances
in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc.,
2014). URL https://proceedings.neurips.cc/paper_files/paper/
2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
[13] Court, C. J., Yildirim, B., Jain, A. & Cole, J. M. 3-D Inorganic Crystal Struc-
ture Generation and Property Prediction via Representation Learning. Journal of
Chemical Information and Modeling 60, 4518–4535 (2020).
[14] Xie, T., Fu, X., Ganea, O.-E., Barzilay, R. & Jaakkola, T. Crystal Diffu-
sion Variational Autoencoder for Periodic Material Generation. arXiv preprint
arXiv:2110.06197 (2021).
[15] Yan, D., Smith, A. D. & Chen, C.-C. Structure prediction and materials design with
generative neural networks. Nature Computational Science 3, 572–574 (2023).
[16] Alverson, M. et al. Generative adversarial networks and diffusion models in material
discovery. Digital Discovery 3, 62–80 (2024).
[17] Chen, L., Zhang, W., Nie, Z., Li, S. & Pan, F. Generative models for inverse design
of inorganic solid materials. J. Mater. Inform 1, 4 (2021).
[18] Cao, Y. et al. A Comprehensive Survey of AI-Generated Content (AIGC): A History
of Generative AI from GAN to ChatGPT. arXiv preprint arXiv:2303.04226 (2023).
[19] Vaswani, A. et al. Attention Is All You Need. Advances in Neural Information
Processing Systems 30 (2017).
[20] Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. et al. Im-
proving Language Understanding by Generative Pre-Training. Tech. Rep.,
OpenAI (2018). URL https://cdn.openai.com/research-covers/
language-unsupervised/language_understanding_paper.pdf.
[21] Introducing ChatGPT. https://openai.com/blog/chatgpt. OpenAI Blog.
Accessed: 2024-10-07.
[22] Liu, Y. et al. Generative artificial intelligence and its applications in materials sci-
ence: Current situation and future perspectives. Journal of Materiomics 9, 798–816
(2023).
[23] Bran, A. M., Cox, S., White, A. D. & Schwaller, P. ChemCrow: Augmenting large-
language models with chemistry tools. arXiv preprint arXiv:2304.05376 (2023).
[24] Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large
language models for predictive chemistry. Nature Machine Intelligence 6, 161–169
(2024).
[25] Xie, T. et al. Large Language Models as Master Key: Unlocking the Secrets of
Materials Science with GPT. arXiv preprint arXiv:2304.02213 (2023).
[26] Fu, N. et al. Material transformers: deep learning language models for generative
materials design. Machine Learning: Science and Technology 4, 015001 (2023).
[27] Jablonka, K. M. et al. 14 examples of how LLMs can transform materials science
and chemistry: a reflection on a large language model hackathon. Digital Discovery
2, 1233–1250 (2023).
[28] Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research
with large language models. Nature 624, 570–578 (2023).
[29] Flam-Shepherd, D. & Aspuru-Guzik, A. Language models can generate molecules,
materials, and protein binding sites directly in three dimensions as XYZ, CIF, and
PDB files. arXiv preprint arXiv:2305.05708 (2023).
[30] Hall, S. R., Allen, F. H. & Brown, I. D. The crystallographic information file (CIF):
a new standard archive file for crystallography. Acta Crystallographica Section A:
Foundations of Crystallography 47, 655–685 (1991).
[31] Chen, M. et al. Generative Pretraining from Pixels. In International Conference on
Machine Learning, 1691–1703 (PMLR, 2020).
[32] Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph Networks as a Universal
Machine Learning Framework for Molecules and Crystals. Chemistry of Materials
31, 3564–3572 (2019).
[33] Toshniwal, S., Wiseman, S., Livescu, K. & Gimpel, K. Chess as a Testbed for
Language Model State Tracking. In Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 36, 11385–11393 (2022).
[34] Li, K. et al. Emergent World Representations: Exploring a Sequence Model Trained
on a Synthetic Task. In The Eleventh International Conference on Learning Repre-
sentations (2023). URL https://openreview.net/forum?id=DeG07_TcZvT.
[35] Coulom, R. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search.
In International Conference on Computers and Games, 72–83 (Springer, 2006).
[36] Browne, C. B. et al. A Survey of Monte Carlo Tree Search Methods. IEEE Trans-
actions on Computational Intelligence and AI in games 4, 1–43 (2012).
[37] Brown, T. et al. Language Models are Few-Shot Learners. Advances in Neural
Information Processing Systems 33, 1877–1901 (2020).
[38] Antunes, L. M., Grau-Crespo, R. & Butler, K. T. Distributed representations of
atoms and materials for machine learning. npj Computational Materials 8, 44 (2022).
[39] Onwuli, A., Hegde, A. V., Nguyen, K. V., Butler, K. T. & Walsh, A. Element
similarity in high-dimensional materials representations. Digital Discovery 2, 1558–
1564 (2023).
[40] Jiao, R. et al. Crystal Structure Prediction by Joint Equivariant Diffusion. arXiv
preprint arXiv:2309.04475 (2023).
[41] Jiao, R., Huang, W., Liu, Y., Zhao, D. & Liu, Y. Space Group Constrained Crystal
Generation. arXiv preprint arXiv:2402.03992 (2024).
[42] Yang, M. et al. Scalable Diffusion for Materials Generation. arXiv preprint
arXiv:2311.09235 (2023).
[43] Gruver, N. et al. Fine-Tuned Language Models Generate Stable Inorganic Materials
as Text. arXiv preprint arXiv:2402.04379 (2024).
[44] Touvron, H. et al. LLaMA: Open and Efficient Foundation Language Models. arXiv
preprint arXiv:2302.13971 (2023).
[45] C¸ i¸cek, ¨O., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O. 3D U-Net:
Learning Dense Volumetric Segmentation from Sparse Annotation. In Medical Image
Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International
Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, 424–432
(Springer, 2016).
[46] Ho, J. et al. Video Diffusion Models. Advances in Neural Information Processing
Systems 35, 8633–8646 (2022).
[47] Castelli, I. E. et al. New cubic perovskites for one- and two-photonwater splitting
using the computational materials repository. Energy & Environmental Science 5,
9034–9043 (2012).
[48] Castelli, I. E. et al. Computational screening of perovskite metal oxides for optimal
solar light capture. Energy & Environmental Science 5, 5814–5819 (2012).
[49] Pickard, C. J. AIRSS Data for Carbon at 10GPa and the C+N+H+O System
at 1GPa. https://archive.materialscloud.org/record/2020.0026/v1
(2020).
[50] Jain, A. et al. Commentary: The Materials Project: A materials genome approach
to accelerating materials innovation. APL Materials 1, 011002 (2013).
[51] Baird, S. mp-time-split. https://github.com/sparks-baird/
mp-time-split (Accessed in 2024).
[52] Mazet, T., Welter, R. & Malaman, B. A study of the new ferromagnetic YbMn6Sn6
compound by magnetization and neutron diffraction measurements. Journal of Mag-
netism and Magnetic Materials 204, 11–19 (1999).
[53] Pamplin, B. A systematic method of deriving new semiconducting compounds by
structural analogy. Journal of Physics and Chemistry of Solids 25, 675–684 (1964).
[54] Davies, D. W. et al. Computational Screening of All Stoichiometric Inorganic Ma-
terials. Chem 1, 617–627 (2016).
[55] Zagorac, D., M¨uller, H., Ruehl, S., Zagorac, J. & Rehme, S. Recent developments
in the Inorganic Crystal Structure Database: theoretical crystal structure data and
related features. Journal of Applied Crystallography 52, 918–925 (2019).
[56] Hyde, P. et al. Lithium Intercalation into the Excitonic Insulator Candidate
Ta2NiSe5. Inorganic Chemistry 62, 12027–12037 (2023).
[57] Ponou, S., Lidin, S. & Mudring, A.-V. Optimization of Chemical Bonding through
Defect Formation and Ordering–The Case of Mg7Pt4Ge4. Inorganic Chemistry 62,
8519–8529 (2023).
[58] Gonz´alez-L´opez, J., Cockcroft, J. K., Fern´andez-Gonz´alez, A., Jimenez, A. & Grau-
Crespo, R. Crystal structure of cobalt hydroxide carbonate Co2CO3(OH)2: density
functional theory and X-ray diffraction investigation. Acta Crystallographica Section
B: Structural Science, Crystal Engineering and Materials 73, 868–873 (2017).
[59] Speech Understanding Systems. Summary of Results of the Five-Year Research Effort
at Carnegie-Mellon University. Tech. Rep. 1529, Carnegie-Mellon Univ Pittsburgh
PA Dept Of Computer Science (1977).
[60] Chaffin, A., Claveau, V. & Kijak, E. PPL-MCTS: Constrained Textual Generation
Through Discriminator-Guided MCTS Decoding. In Carpuat, M., de Marneffe, M.
& Ru´ız, I. V. M. (eds.) Proceedings of the 2022 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Tech-
nologies,NAACL 2022, Seattle, WA, United States, July 10-15, 2022, 2953–2967
(Association for Computational Linguistics, 2022).
[61] Rosin, C. D. Multi-armed Bandits with Episode Context. Annals of Mathematics
and Artificial Intelligence 61, 203–230 (2011).
[62] Silver, D. et al. Mastering the game of Go with deep neural networks and tree search.
Nature 529, 484–489 (2016).
[63] Choudhary, K. & DeCost, B. Atomistic Line Graph Neural Network for improved
materials property predictions. npj Computational Materials 7, 185 (2021).
[64] Kusaba, M., Liu, C. & Yoshida, R. Crystal structure prediction with machine
learning-based element substitution. Computational Materials Science 211, 111496
(2022).
[65] Wei, L. et al. TCSP: a Template-Based Crystal Structure Prediction Algorithm for
Materials Discovery. Inorganic Chemistry 61, 8431–8439 (2022).
[66] Fredericks, S., Parrish, K., Sayre, D. & Zhu, Q. PyXtal: A Python library for crystal
structure generation and symmetry analysis. Computer Physics Communications
261, 107810 (2021).
[67] Avery, P. & Zurek, E. RandSpg: An open-source program for generating atomistic
crystal structures with specific spacegroups. Computer Physics Communications
213, 208–216 (2017).
[68] Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85
(2023).
[69] Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT press,
2018).
[70] Ziegler, D. M. et al. Fine-Tuning Language Models from Human Preferences. arXiv
preprint arXiv:1909.08593 (2019).
[71] Illustrating Reinforcement Learning from Human Feedback (RLHF). https://
huggingface.co/blog/rlhf. Accessed: 2023-07-05.
[72] Kang, S. et al. Accelerated identification of equilibrium structures of multicomponent
inorganic crystals using machine learning potentials. npj Computational Materials
8, 108 (2022).
[73] Chen, C. & Ong, S. P. A universal graph deep learning interatomic potential for the
periodic table. Nature Computational Science 2, 718–728 (2022).
[74] Pausewang, G. & R¨udorff, W. ¨Uber Alkali-oxofluorometallate der ¨Ubergangsmetalle.
A′3MeOxF6-x-Verbindungen mit x = 1, 2, 3. Zeitschrift f¨ur anorganische und allge-
meine Chemie 364, 69–87 (1969).
[75] Hegde, V. I. et al. Quantifying uncertainty in high-throughput density functional
theory: A comparison of AFLOW, Materials Project, and OQMD. Physical Review
Materials 7, 053805 (2023).
[76] Ye, W., Lei, X., Aykol, M. & Montoya, J. H. Novel inorganic crystal structures
predicted using autonomous simulation agents. Scientific Data 9, 302 (2022).
[77] Antunes, L. M. et al. Machine Learning Approaches for Accelerating the Discovery of
Thermoelectric Materials. In Machine Learning in Materials Informatics: Methods
and Applications, 1–32 (ACS Publications, 2022).
[78] Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials Design and
Discovery with High-Throughput Density Functional Theory: The Open Quantum
Materials Database (OQMD). JOM 65, 1501–1509 (2013).
[79] Draxl, C. & Scheffler, M. The NOMAD laboratory: from data sharing to artificial
intelligence. Journal of Physics: Materials 2, 036001 (2019).
[80] Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open-source
python library for materials analysis. Computational Materials Science 68, 314–319
(2013).
[81] Liu, P. J. et al. Generating Wikipedia by Summarizing Long Sequences. In 6th
International Conference on Learning Representations, ICLR 2018, Vancouver, BC,
Canada, April 30 - May 3, 2018, Conference Track Proceedings (2018).
[82] Togo, A. & Tanaka, I. Spglib: a software library for crystal symmetry search. arXiv
preprint arXiv:1808.01590 (2018).
[83] Ward, L. et al. Matminer: An open source toolkit for materials data mining. Com-
putational Materials Science 152, 60–69 (2018).
[84] Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made
simple. Physical review letters 77, 3865 (1996).
[85] Jain, A. et al. A high-throughput infrastructure for density functional theory calcu-
lations. Computational Materials Science 50, 2295–2310 (2011).
[86] Horton, M. et al. Crystal Toolkit: A Web App Framework to Improve Usabil-
ity and Accessibility of Materials Science Research Algorithms. arXiv preprint
arXiv:2302.06147 (2023).
[87] Antunes, L., Butler, K. & Grau-Crespo, R. Supporting data for: Crystal Structure
Generation with Autoregressive Large Language Modeling (2024). URL https:
//doi.org/10.5281/zenodo.10642388.
[88] Creative Commons Attribution 4.0 License. https://creativecommons.org/
licenses/by/4.0/. Accessed: 2023-06-26.
[89] Antunes, L. lantunes/CrystaLLM: CrystaLLM v1.0 (2024). URL https://doi.org/10.5281/zenodo.13883399.