Skip to content

fair2wise/matkg

Repository files navigation

MatKG: Materials Knowledge Graph Framework

Overview

MatKG is a knowledge graph framework for materials science, with a particular focus on organic photovoltaics (OPV). It provides a structured way to represent materials, their properties, processing methods, and experimental techniques within a unified graph-based data model.

The knowledge graph is designed to capture:

  • Materials and their properties
  • Devices containing materials (e.g., photovoltaic cells)
  • Processing methods for materials
  • Experimental techniques for measuring properties
  • Relationships between all these entities

This KG was generated almost entirely by Claude Code, with minimal manual guidance, using the LinkML-KG-AI Workflow, seeded by David's initial ChatGPT generated triples.

See CONTRIBUTING.md to see how to use the workflow.

Installation

MatKG can be installed from PyPI (TODO):

pip install matkg

Or directly from the repository:

git clone https://github.com/cmungall/matkg.git
cd matkg
uv venv
source .venv/bin/activate
uv pip install -e .

This project uses uv for package management and virtual environments.

Usage

Pydantic Classes

Here's a simple example of creating a materials knowledge graph using the Pydantic models:

from matkg.schema.matkg_schema import (
    Graph,
    Material,
    MaterialProperty,
    MaterialHasProperty
)

# Create materials
silicon = Material(
    id="MATKG:Silicon",
    name="Silicon",
    description="A semiconductor material",
    formula="Si",
    type="matkg:Material"
)

# Create properties
band_gap = MaterialProperty(
    id="MATKG:BandGap",
    name="Band Gap",
    description="Energy difference between valence and conduction bands",
    unit="eV",
    quantifiable=True,
    type="matkg:MaterialProperty"
)

# Create associations between materials and properties
silicon_band_gap = MaterialHasProperty(
    id="MATKG:Association001",
    subject=silicon.id,
    predicate="matkg:MaterialHasProperty",
    object=band_gap.id,
    provided_by="doi:10.1234/silicon.bandgap",
    has_evidence="Measured using UV-vis spectroscopy"
)

# Create the graph
kg = Graph(
    things=[silicon, band_gap],
    associations=[silicon_band_gap]
)

# Serialize the graph to JSON
with open("my_materials_kg.json", "w") as f:
    f.write(kg.model_dump_json(indent=2))

Loading Data from YAML Files

MatKG can build a knowledge graph from YAML files:

from matkg.build_kg import build_kg

# Build a knowledge graph from YAML files in the 'kg' directory
build_kg(kg_dir="kg", output_file="matkg_graph.json")

The YAML files should follow this convention:

  • Node/entity files are prefixed with Thing_ (e.g., Thing_Material.yaml)
  • Edge/association files are prefixed with Assoc_ (e.g., Assoc_MaterialHasProperty.yaml)

YAML File Format

Node YAML Example (Thing_Material.yaml):

---
# Materials Knowledge Graph - Materials
- id: MATKG:P3HT
  name: P3HT
  description: "Poly(3-hexylthiophene-2,5-diyl), a semiconducting polymer used in organic electronics"
  formula: "(C10H14S)n"
  type: matkg:Material

Edge YAML Example (Assoc_MaterialHasProperty.yaml):

---
# Materials Knowledge Graph - Material-Property Associations
- id: MATKG:Assoc_P3HT-BandGap
  subject: MATKG:P3HT
  predicate: matkg:MaterialHasProperty
  object: MATKG:BandGap
  provided_by: "doi:10.1021/ma0518786"
  has_evidence: "Value measured as 2.0 eV using UV-vis spectroscopy"

DuckDB Database Integration

MatKG can store and query the knowledge graph data using DuckDB, a high-performance analytical database:

Loading Data into DuckDB

# Create a DuckDB database from a JSON knowledge graph
make db-load

# Or manually using linkml-store
linkml-store -d db/matkg.ddb store kg/matkg_graph.json

Querying the Database

The database has two main collections: things (nodes) and associations (edges).

# View database schema
make db-schema

# Or manually
linkml-store -d db/matkg.ddb schema

Query Examples

Query all materials:

# Using the linkml-store CLI
linkml-store -d db/matkg.ddb -c things query -w "type: matkg:Material"

Query associations for a specific material:

# Get all associations for P3HT
linkml-store -d db/matkg.ddb -c associations query -w "subject: MATKG:P3HT"

Filter by predicate type:

# Get all material-property relationships
linkml-store -d db/matkg.ddb -c associations query -w "predicate: matkg:MaterialHasProperty"

Data Export

Export query results to various formats:

# Export to CSV
linkml-store -d db/matkg.ddb -c things query -w "type: matkg:Material" -O csv -o materials.csv

# Export to YAML
linkml-store -d db/matkg.ddb -c associations query -w "subject: MATKG:P3HT" -O yaml -o p3ht_associations.yaml

Schema

MatKG is built using LinkML, which provides a flexible schema definition language that can generate multiple artifacts including:

  • JSON Schema
  • OWL Ontology
  • GraphQL Schema
  • Python Pydantic Classes
  • SHACL Shapes

The core entity types in MatKG include:

  • Material: Substances like polymers, small molecules, etc.
  • ChemicalEntity: Chemical compounds with properties like molecular weight
  • Device: Physical devices composed of materials (e.g., solar cells)
  • MaterialProperty: Properties of materials (e.g., band gap, conductivity)
  • ElectronicProperty: Specific electronic properties (e.g., HOMO, LUMO)
  • ProcessingMethod: Methods for processing materials (e.g., spin coating)
  • ExperimentalTechnique: Measurement techniques (e.g., UV-vis spectroscopy)

The core relationship types include:

  • MaterialHasProperty: Links materials to their properties
  • DeviceContainsMaterial: Links devices to their component materials
  • MaterialProcessedBy: Links materials to processing methods
  • PropertyMeasuredBy: Links properties to measurement techniques

Contributing

See CONTRIBUTING.md for information on how to contribute to the project.

License

Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

(3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code (“Enhancements”) to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such Enhancements or derivative works thereof, in binary and source code form.

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors