Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 79 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,105 +6,123 @@

[![PyPI](https://img.shields.io/pypi/v/foundry_ml.svg)](https://pypi.python.org/pypi/foundry_ml)
[![Tests](https://github.com/MLMI2-CSSI/foundry/actions/workflows/tests.yml/badge.svg)](https://github.com/MLMI2-CSSI/foundry/actions/workflows/tests.yml)
[![Tests](https://github.com/MLMI2-CSSI/foundry/actions/workflows/python-publish.yml/badge.svg)](https://github.com/MLMI2-CSSI/foundry/actions/workflows/python-publish.yml)
[![NSF-1931306](https://img.shields.io/badge/NSF-1931306-blue)](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1931306&HistoricalAwards=false)
[<img src="https://img.shields.io/badge/view-documentation-blue">](https://ai-materials-and-chemistry.gitbook.io/foundry/)

**Foundry-ML** simplifies access to machine learning-ready datasets in materials science and chemistry.

Foundry-ML simplifies the discovery and usage of ML-ready datasets in materials science and chemistry providing a simple API to access even complex datasets.
* Load ML-ready data with just a few lines of code
* Work with datasets in local or cloud environments.
* Publish your own datasets with Foundry to promote community usage
* (in progress) Run published ML models without hassle
- **Search & Load** - Find and use curated datasets with a few lines of code
- **Understand** - Rich schemas describe what each field means
- **Cite** - Automatic citation generation for publications
- **Publish** - Share your datasets with the community
- **AI-Ready** - MCP server for Claude and other AI assistants

Learn more and see our available datasets on [Foundry-ML.org](https://foundry-ml.org/)
## Quick Start

```bash
pip install foundry-ml
```

```python
from foundry import Foundry

# Connect
f = Foundry()

# Search
results = f.search("band gap", limit=5)

# Documentation
Information on how to install and use Foundry is available in our documentation [here](https://ai-materials-and-chemistry.gitbook.io/foundry/v/docs/).
# Load
dataset = results.iloc[0].FoundryDataset
X, y = dataset.get_as_dict()['train']

DLHub documentation for model publication and running information can be found [here](https://dlhub-sdk.readthedocs.io/en/latest/servable-publication.html).
# Understand
schema = dataset.get_schema()
print(schema['fields'])

# Quick Start
Install Foundry-ML via command line with:
`pip install foundry_ml`
# Cite
print(dataset.get_citation())
```

## Cloud Environments

You can use the following code to import and instantiate Foundry-ML, then load a dataset.
For Google Colab or remote Jupyter:

```python
from foundry import Foundry
f = Foundry(index="mdf")
f = Foundry(no_browser=True, no_local_server=True)
```

## CLI

f = f.load("10.18126/e73h-3w6n", globus=True)
```bash
foundry search "band gap"
foundry schema 10.18126/abc123
foundry --help
```
*NOTE*: If you run locally and don't want to install the [Globus Connect Personal endpoint](https://www.globus.org/globus-connect-personal), just set the `globus=False`.

If running this code in a notebook, a table of metadata for the dataset will appear:
## AI Agent Integration

<img width="903" alt="metadata" src="https://user-images.githubusercontent.com/16869564/197038472-0b6ae559-4a6b-4b20-88e5-679bb6eb4f5c.png">
```bash
foundry mcp install # Add to Claude Code
```

We can use the data with `f.load_data()` and specifying splits such as `train` for different segments of the dataset, then use matplotlib to visualize it.
## Documentation

```python
res = f.load_data()
- [Getting Started](https://ai-materials-and-chemistry.gitbook.io/foundry/quickstart)
- [User Guide](https://ai-materials-and-chemistry.gitbook.io/foundry/)
- [API Reference](https://ai-materials-and-chemistry.gitbook.io/foundry/api/foundry)
- [Examples](./examples)

## Features

| Feature | Description |
|---------|-------------|
| Search | Find datasets by keyword, DOI, or browse catalog |
| Load | Automatic download, caching, and format conversion |
| PyTorch/TensorFlow | `dataset.get_as_torch()`, `dataset.get_as_tensorflow()` |
| CLI | Terminal-based workflows |
| MCP Server | AI assistant integration |
| HuggingFace Export | Publish to HuggingFace Hub |

imgs = res['train']['input']['imgs']
desc = res['train']['input']['metadata']
coords = res['train']['target']['coords']
## Available Datasets

n_images = 3
offset = 150
key_list = list(res['train']['input']['imgs'].keys())[0+offset:n_images+offset]
Browse datasets at [Foundry-ML.org](https://foundry-ml.org/) or:

fig, axs = plt.subplots(1, n_images, figsize=(20,20))
for i in range(n_images):
axs[i].imshow(imgs[key_list[i]])
axs[i].scatter(coords[key_list[i]][:,0], coords[key_list[i]][:,1], s = 20, c = 'r', alpha=0.5)
```python
f = Foundry()
f.list(limit=20) # See available datasets
```
<img width="595" alt="Screen Shot 2022-10-20 at 2 22 43 PM" src="https://user-images.githubusercontent.com/16869564/197039252-6d9c78ba-dc09-4037-aac2-d6f7e8b46851.png">

[See full examples](./examples)
## How to Cite

# How to Cite
If you find Foundry-ML useful, please cite the following [paper](https://doi.org/10.21105/joss.05467)
If you use Foundry-ML, please cite:

```
```bibtex
@article{Schmidt2024,
doi = {10.21105/joss.05467},
url = {https://doi.org/10.21105/joss.05467},
year = {2024}, publisher = {The Open Journal},
year = {2024},
publisher = {The Open Journal},
volume = {9},
number = {93},
pages = {5467},
author = {Kj Schmidt and Aristana Scourtas and Logan Ward and Steve Wangen and Marcus Schwarting and Isaac Darling and Ethan Truelove and Aadit Ambadkar and Ribhav Bose and Zoa Katok and Jingrui Wei and Xiangguo Li and Ryan Jacobs and Lane Schultz and Doyeon Kim and Michael Ferris and Paul M. Voyles and Dane Morgan and Ian Foster and Ben Blaiszik},
title = {Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science}, journal = {Journal of Open Source Software}
author = {Kj Schmidt and Aristana Scourtas and Logan Ward and others},
title = {Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science},
journal = {Journal of Open Source Software}
}
```

# Contributing
Foundry is an Open Source project and we encourage contributions from the community. To contribute, please fork from the `main` branch and open a Pull Request on the `main` branch. A member of our team will review your PR shortly.
## Contributing

## Developer notes
In order to enforce consistency with external schemas for the metadata and datacite structures ([contained in the MDF data schema repository](https://github.com/materials-data-facility/data-schemas)) the `dc_model.py` and `project_model.py` pydantic data models (found in the `foundry/jsonschema_models` folder) were generated using the [datamodel-code-generator](https://github.com/koxudaxi/datamodel-code-generator/) tool. In order to ensure compliance with the flake8 linting, the `--use-annoted` flag was passed to ensure regex patterns in `dc_model.py` were specified using pydantic's `Annotated` type vs the soon to be deprecated `constr` type. The command used to run the datamodel-code-generator looks like:
```
datamodel-codegen --input dc.json --output dc_model.py --use-annotated
```
Foundry is open source. To contribute:

# Primary Support
This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".
1. Fork from `main`
2. Make your changes
3. Open a Pull Request

# Other Support
Foundry-ML brings together many components in the materials data ecosystem. Including [MAST-ML](https://mastmldocs.readthedocs.io/en/latest/), the [Data and Learning Hub for Science](https://www.dlhub.org) (DLHub), and the [Materials Data Facility](https://materialsdatafacility.org) (MDF).
See [CONTRIBUTING.md](docs/how-to-contribute/contributing.md) for details.

## MAST-ML
This work was supported by the National Science Foundation (NSF) SI2 award No. 1148011 and DMREF award number DMR-1332851
## Support

## The Data and Learning Hub for Science (DLHub)
This material is based upon work supported by Laboratory Directed Research and Development (LDRD) funding from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357.
https://www.dlhub.org
This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".

## The Materials Data Facility
This work was performed under financial assistance award 70NANB14H012 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the [Center for Hierarchical Material Design (CHiMaD)](http://chimad.northwestern.edu). This work was performed under the following financial assistance award 70NANB19H005 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design (CHiMaD). This work was also supported by the National Science Foundation as part of the [Midwest Big Data Hub](http://midwestbigdatahub.org) under NSF Award Number: 1636950 "BD Spokes: SPOKE: MIDWEST: Collaborative: Integrative Materials Design (IMaD): Leverage, Innovate, and Disseminate".
https://www.materialsdatafacility.org
Foundry integrates with [Materials Data Facility](https://materialsdatafacility.org), [DLHub](https://www.dlhub.org), and [MAST-ML](https://mastmldocs.readthedocs.io/).
88 changes: 66 additions & 22 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,87 @@
# Getting started with Foundry
# Introduction

![](.gitbook/assets/foundry-purple%20%283%29.png)
<p align="center">
<img src=".gitbook/assets/foundry-purple%20%283%29.png" alt="Foundry" width="400">
</p>

## What is Foundry?
**Foundry-ML** is a Python library that simplifies access to machine learning-ready datasets in materials science and chemistry.

Foundry is a Python package that simplifies the discovery and usage of machine-learning ready datasets in materials science and chemistry. Foundry provides software tools that make it easy to load these datasets and work with them in local or cloud environments. Further, Foundry provides a dataset specification, and defined curation flows, that allow users to create new datasets for the community to use through this same interface.
## Features

## Installation
- **Search & Discover** - Find datasets by keyword or browse the catalog
- **Rich Metadata** - Understand datasets before downloading with detailed schemas
- **Easy Loading** - Get data in Python, PyTorch, or TensorFlow format
- **Automatic Caching** - Fast subsequent access after first download
- **Publishing** - Share your own datasets with the community
- **AI Integration** - MCP server for AI assistant access
- **CLI** - Terminal-based workflows

Foundry can be installed on any operating system with Python with pip
## Quick Example

```text
pip install foundry-ml
```python
from foundry import Foundry

# Connect
f = Foundry()

# Search for datasets
results = f.search("band gap", limit=5)

# Load a dataset
dataset = results.iloc[0].FoundryDataset
X, y = dataset.get_as_dict()['train']

# Get citation for your paper
print(dataset.get_citation())
```

### Globus
## Installation

Foundry uses the Globus platform for authentication, search, and to optimize some data transfer operations. Follow the steps below to get set up.
```bash
pip install foundry-ml
```

* [Create a free account.](https://app.globus.org) You can create a free account here with your institutional credentials or with free IDs \(GlobusID, Google, ORCID, etc\).
* [Set up a Globus Connect Personal endpoint ](https://www.globus.org/globus-connect-personal)_**\(optional\)**_. While this step is optional, some Foundry capabilities will work more efficiently when using GCP.
For cloud environments (Colab, remote Jupyter):

## Project Support
```python
f = Foundry(no_browser=True, no_local_server=True)
```

This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".
## What's Next?

### Other Support
<table>
<tr>
<td>

Foundry brings together many components in the materials data ecosystem. Including MAST-ML, the Data and Learning Hub for Science \(DLHub\), and The Materials Data Facility \(MDF\).
**Getting Started**
- [Installation](installation.md)
- [Quick Start](quickstart.md)

#### MAST-ML
</td>
<td>

This work was supported by the National Science Foundation \(NSF\) SI2 award No. 1148011 and DMREF award number DMR-1332851
**User Guide**
- [Searching](guide/searching.md)
- [Loading Data](guide/loading-data.md)
- [ML Frameworks](guide/ml-frameworks.md)

#### The Data and Learning Hub for Science \(DLHub\)
</td>
<td>

This material is based upon work supported by Laboratory Directed Research and Development \(LDRD\) funding from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357. [https://www.dlhub.org](https://www.dlhub.org)
**Features**
- [CLI](features/cli.md)
- [MCP Server](features/mcp-server.md)
- [HuggingFace](features/huggingface.md)

#### The Materials Data Facility
</td>
</tr>
</table>

This work was performed under financial assistance award 70NANB14H012 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the [Center for Hierarchical Material Design \(CHiMaD\)](http://chimad.northwestern.edu). This work was performed under the following financial assistance award 70NANB19H005 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design \(CHiMaD\). This work was also supported by the National Science Foundation as part of the [Midwest Big Data Hub](http://midwestbigdatahub.org) under NSF Award Number: 1636950 "BD Spokes: SPOKE: MIDWEST: Collaborative: Integrative Materials Design \(IMaD\): Leverage, Innovate, and Disseminate". [https://www.materialsdatafacility.org](https://www.materialsdatafacility.org)
## Project Support

This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".

Foundry brings together components from:
- [Materials Data Facility (MDF)](https://materialsdatafacility.org)
- [Data and Learning Hub for Science (DLHub)](https://www.dlhub.org)
- [MAST-ML](https://mastmldocs.readthedocs.io/)
50 changes: 42 additions & 8 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,48 @@
# Table of contents
# Table of Contents

* [Getting started with Foundry](README.md)
## Getting Started

## How to contribute
* [Introduction](README.md)
* [Installation](installation.md)
* [Quick Start](quickstart.md)

* [Contribution Process](how-to-contribute/contributing.md)
* [Contributor Covenant](how-to-contribute/code_of_conduct.md)
## User Guide

---
* [Searching for Datasets](guide/searching.md)
* [Loading Data](guide/loading-data.md)
* [Using with ML Frameworks](guide/ml-frameworks.md)
* [Dataset Schemas](guide/schemas.md)

* [Sphinx Autogenerated documentation - markdown](sphinx-autogenerated-documentation.md)
* [foundry package — Foundry\_test 1.1 documentation - HTML AUTOGENERATION](foundry-package-foundry_test-1.1-documentation-html-autogeneration.md)
## Features

* [Command Line Interface](features/cli.md)
* [MCP Server (AI Agents)](features/mcp-server.md)
* [HuggingFace Integration](features/huggingface.md)
* [Error Handling](features/errors.md)

## Concepts

* [Overview](concepts/overview.md)
* [Foundry Datasets](concepts/foundry-datasets.md)
* [Data Packages](concepts/foundry-data-packages.md)

## Publishing

* [Publishing Datasets](publishing/publishing-datasets.md)
* [Metadata Reference](publishing/metadata-reference.md)

## Reference

* [API Reference](api/foundry.md)
* [CLI Reference](api/cli-reference.md)
* [Configuration](api/configuration.md)

## Community

* [Contributing](how-to-contribute/contributing.md)
* [Code of Conduct](how-to-contribute/code_of_conduct.md)

## Support

* [Troubleshooting](support/troubleshooting.md)
* [FAQ](support/faq.md)
Loading
Loading