I want to train the MatterGen model on my own dataset to generate new crystal structures based on targeted bandgap values. To prepare the data, I created a CSV file where each row contains the name of the compound or folder, the full content of the CIF file (not just a path), and the corresponding DFT-computed bandgap value . My goal is for the model to learn structure–property relationships and generate novel materials with desired electronic properties. My question is: is this data format sufficient for training MatterGen effectively, or are there additional preprocessing steps or formatting requirements needed to ensure the model can learn and generate materials conditioned on the bandgap? Also, when I have my custom dataset, how do I write the command to train the model? mattergen-train data_module=mp_20 ~trainer.logger

I want to train the MatterGen model on my own dataset to generate new crystal structures based on targeted bandgap values. To prepare the data, I created a CSV file where each row contains the name of the compound or folder, the full content of the CIF file (not just a path), and the corresponding DFT-computed bandgap value . My goal is for the model to learn structure–property relationships and generate novel materials with desired electronic properties. My question is: is this data format sufficient for training MatterGen effectively, or are there additional preprocessing steps or formatting requirements needed to ensure the model can learn and generate materials conditioned on the bandgap? Also, when I have my custom dataset, how do I write the command to train the model?
mattergen-train data_module=mp_20 ~trainer.logger