Abstract
Machine learning and artificial intelligence promise to accelerate research and understanding across many scientific disciplines. Harnessing the power of these techniques requires aggregating scientific data. In tandem, the importance of open data for reproducibility and scientific transparency is gaining recognition, and data are increasingly available through digital repositories. Leveraging efforts from disparate data collection sources, however, requires interoperable and adaptable standards for data description and storage. Through the synthesis of experiences in astronomy, high-energy physics, earth science, and neuroscience, we contend that the open-source software (OSS) model provides significant benefits for standard creation and adaptation. We highlight resultant issues, such as balancing flexibility vs. stability and utilizing new computing paradigms and technologies, that must be considered from both the user and developer perspectives to ensure pathways for recognition and sustainability. We recommend supporting and recognizing the development and maintenance of OSS data standards and software consistent with widely adopted scientific tools.