AI Molecular Dynamics Dataset for Electrochemical Interfaces
Source: nature.com
Understanding Electrochemical Interfaces
Understanding atomic-scale structures at electrochemical interfaces is important for advancing electrochemistry research and applications. Although experiments offer detailed microscopic insights, their complexity and inefficiency can limit large-scale data generation. Computational methods, such as ab initio molecular dynamics and machine learning-accelerated molecular dynamics, provide an efficient way to gain microscopic information.
However, computational interface studies often share research data in isolation via private repositories. This has resulted in fragmented knowledge, reduced data accessibility, and limited opportunities for cross-study comparisons. To address these issues, ElectroFace, an artificial intelligence-accelerated ab initio molecular dynamics dataset for electrochemical interfaces, has been introduced. ElectroFace compiles, visualizes, and provides open access to interface data to promote collaboration and accelerate progress.
The Significance of Electrochemical Interfaces
Electrochemical interfaces are found everywhere in nature and are important in geochemistry, energy, environmental chemistry, and materials science. For example, in geochemistry, the stability of colloidal clay suspensions and ion adsorption are important for the geological disposal of nuclear waste. Understanding the structures and protonation states of clay edges in contact with water under different pH conditions is crucial.
In hydrogen production, water reduction and oxidation reactions occur at oxide- or metal-water interfaces. Understanding their reaction mechanisms depends largely on microscopic understanding of the interface structures. Obtaining an atomic-scale picture of interfaces is challenging due to their complexity.
Experimental and Theoretical Methods
Experimental methods like X-ray reflectivity can reveal the structures of surface terminations and hydration layers for electrochemical interfaces. However, these methods cannot directly detect hydrogen atoms due to their low masses and charge densities, resulting in a loss of information about the hydrogen bond network in interfacial water.
Vibrational spectroscopy methods, such as infrared and Raman spectroscopy, can examine the OH vibrational mode, which is related to the strength of hydrogen bonding. However, these techniques face challenges in directly investigating interfacial water due to signal interference from bulk water. Theoretical simulation techniques like molecular dynamics (MD) can directly provide detailed insight into interface structures.
A limitation of MD simulations using classical force fields is the inability to accurately describe the interaction between atoms at interfaces. The ab initio molecular dynamics (AIMD) method addresses this, treating solid and liquid phases at the same level of electronic-structure theory and accounting for water dynamics. However, the high computational cost of AIMD restricts the accessible time scales for interfaces to hundreds of picoseconds, which is insufficient for equilibrating interface structures.
Machine learning potential methods extend the time scale of AIMD simulations to the nanosecond scale while maintaining ab initio accuracy. This approach is called machine learning accelerated molecular dynamics (MLMD) or artificial intelligence accelerated ab initio molecular dynamics (AI2MD) simulations.
ElectroFace Dataset
An AI2MD dataset for electrochemical interfaces (ElectroFace) has been created, containing over 60 distinct AIMD and MLMD trajectories. The dataset includes trajectories for the charge-neutral interfaces of 2D materials, zinc-blend-type semiconductors, oxides, and metals. Currently, MLMD trajectories are available for Pt(111), SnO2(110), GaP(110), r-TiO2(110), CoO(100), and CoO(111) interfaces.
The electrochemical interfaces community can use this dataset to build interface models that include electric double layers or counter ions, prepare initial datasets for building machine learning potentials, serve as benchmarks of properties obtained at ab initio accuracy, and gain insight into solid-liquid interfaces by comparing different interfaces.
For simulating interfaces, initial structures are constructed by cleaving a bulk material along a selected facet to generate a slab-vacuum model. The slab is symmetric along the surface normal direction and stoichiometric. The slab thickness is determined through convergence tests of band alignment and water molecule adsorption energy. An orthorhombic box is created, filled with water molecules using the PACKMOL package to achieve a water density of 1 g/cm3, and equilibrated using classical MD simulations with the SPC/E force field.
The slab and water box are then merged to create an interface model. Surface under-coordinated atoms in the slab are saturated with adsorbed water molecules. A 5-picosecond AIMD simulation is performed to ensure the water density in the bulk regions is 1.0 g/cm3 within a 5% error margin. This step is repeated until the requirement is met. The last structure is used as the initial structure for a 20–30 ps AIMD simulation.
All AIMD trajectories are generated using the CP2K/QUICKSTEP code with the Perdew-Berke-Ernzerhof (PBE) functional and Grimme D3 dispersion correction. The orbitals are represented in a Gaussian-type double-ζ basis with one set of polarization function (DZVP) basis, and core electrons are described by analytic Goedecker-Teter-Hutter (GTH) pseudopotentials. MD simulations are performed in the NVT ensemble with a 0.5 fs time step at 330K using a Nosé-Hoover thermostat.
MLMD trajectories are generated using the LAMMPS code with machine learning potentials (MLPs) trained using the DeePMD-kit code and the concurrent learning packages DP-GEN and ai2-kit. Initial datasets are prepared by extracting 50–100 structures from an AIMD trajectory and expanded through iterative processes of Training, Exploration, Screening, and Labeling. The iterative process ends when 99% of sampled structures are categorized into the good group over two consecutive iterations.
ElectroFace includes 69 trajectories of charge-neutral aqueous interfaces and associated AIMD/MLMD input files. Force and velocity information are included where applicable. For MLMD trajectories, machine learning potentials and training datasets are provided. All data within ElectroFace can be accessed via https://dataverse.ai4ec.ac.cn/.
Individual entries are named as “IF-
The AIMD and MLMD methods have been applied to various solid-liquid interfaces. For the interfaces in ElectroFace, DFT parameters and machine learning potentials were validated in their respective papers. Water densities in interface models and proton transfer events at interfaces were also validated. An algorithm for automatic detection of proton transfer events is implemented, with pathways displayed on the website https://ai2db.ai4ec.ac.cn/electroface.
Interface models are validated through water density profiles, ensuring water densities in the bulk regions are 1.0 g/cm3 within a 5% error margin. Four interfaces were selected from ElectroFace: graphene(001)-water, Pt(111)-water, InP(110)-water, and TiO2(001)-water, representing 2D-materials-water, metals-water, zinc-blends-water, and oxides-water interfaces. AIMD production runs typically last for 20–30 picoseconds. MLMD trajectories are provided for several interfaces, extending MD simulations to a nanosecond scale.
A proton-tracking algorithm, implemented in the ai2-kit package, identifies proton transfer pathways by requiring inputs specifying potential proton donor indices and the list of elements of proton acceptors along with MD trajectories to search for proton transfer pathways. The results are output to files that contain the coordinates of proton indicators and the paths of proton transfer. The algorithm's validation is assisted by trajectory visualization.
MD trajectories are provided in Gromacs XTC format, and PBC lattice vectors and element information are provided in PDB format. The data are also available at https://ai2db.ai4ec.ac.cn/. For training machine learning potentials, extracting 50–100 structures from a trajectory and recomputing forces is recommended. AIMD trajectories are generated using CP2K, and MLMD trajectories are generated using LAMMPS driven by machine learning potentials trained using DeePMD-kit. Water density analysis is performed using ECToolkits, and proton transfer analysis is carried out using ai2-kit.