diff --git a/.gitignore b/.gitignore index 7fd27471..645b6210 100644 --- a/.gitignore +++ b/.gitignore @@ -34,3 +34,7 @@ tests/data_*.h5 tests/data_*/ tests/tmp.* tests/.coverage + +# local dev artifact +uv.lock +.venv/ diff --git a/skills/dpdata-driver/SKILL.md b/skills/dpdata-driver/SKILL.md new file mode 100644 index 00000000..132c36c7 --- /dev/null +++ b/skills/dpdata-driver/SKILL.md @@ -0,0 +1,142 @@ +--- +name: dpdata-driver +description: Use dpdata Python Driver plugins to label systems (energies/forces/virials) via System.predict(), list available drivers, and build Driver objects (ase/deepmd/gaussian/sqm/hybrid). Use when working with dpdata Python API (not CLI) and you need driver-based energy/force prediction, plugin registration keys, or examples of using dpdata with ASE calculators or DeePMD models. +--- + +# dpdata-driver + +Use dpdata “driver plugins” to **label** a `dpdata.System` (predict energies/forces/virials) and obtain a `dpdata.LabeledSystem`. + +## Key idea + +- A **Driver** converts an unlabeled `System` into a `LabeledSystem` by computing: + - `energies` (required) + - `forces` (optional but common) + - `virials` (optional) + +In dpdata, this is exposed as: + +- `System.predict(*args, driver="dp", **kwargs) -> LabeledSystem` + +`driver` can be: + +- a **string key** (plugin name), e.g. `"ase"`, `"dp"`, `"gaussian"` +- a **Driver object**, e.g. `Driver.get_driver("ase")(...)` + +## List supported driver keys (runtime) + +When unsure what drivers exist in *this* dpdata version/env, query them at runtime: + +```python +from dpdata.driver import Driver + +print(sorted(Driver.get_drivers().keys())) +``` + +In the current repo state, keys include: + +- `ase` +- `dp` / `deepmd` / `deepmd-kit` +- `gaussian` +- `sqm` +- `hybrid` + +(Exact set depends on dpdata version and installed extras.) + +## Minimal workflow + +```python +import dpdata +from dpdata.system import System + +sys = System("input.xyz", fmt="xyz") +ls = sys.predict(driver="ase", calculator=...) # returns dpdata.LabeledSystem +``` + +### Verify you got a labeled system + +```python +assert "energies" in ls.data +# optional: +# assert "forces" in ls.data +# assert "virials" in ls.data +``` + +## Example: use the ASE driver with an ASE calculator (runnable) + +This is the easiest *fully runnable* example because it doesn’t require external QM software. + +Dependencies (recommended): use `uv`. + +Option A (one-off invocation): + +```bash +uv run --with dpdata --with numpy --with ase python3 your_script.py +``` + +Option B (recommended for shareable scripts): declare dependencies in the script via inline metadata, then run `uv run script.py`. +See: https://docs.astral.sh/uv/guides/scripts/#inline-metadata + +Script: + +```python +import numpy as np +from ase.calculators.emt import EMT +from dpdata.system import System + +# write a tiny molecule +open("tmp.xyz", "w").write("""2\n\nH 0 0 0\nH 0 0 0.74\n""") + +sys = System("tmp.xyz", fmt="xyz") +ls = sys.predict(driver="ase", calculator=EMT()) + +print("energies", np.array(ls.data["energies"])) +print("forces shape", np.array(ls.data["forces"]).shape) +if "virials" in ls.data: + print("virials shape", np.array(ls.data["virials"]).shape) +else: + print("virials: ") +``` + +## Example: pass a Driver object instead of a string + +```python +from ase.calculators.emt import EMT +from dpdata.driver import Driver +from dpdata.system import System + +sys = System("tmp.xyz", fmt="xyz") +ase_driver = Driver.get_driver("ase")(calculator=EMT()) +ls = sys.predict(driver=ase_driver) +``` + +## Hybrid driver + +Use `driver="hybrid"` to sum energies/forces/virials from multiple drivers. + +The `HybridDriver` accepts `drivers=[ ... ]` where each item is either: + +- a `Driver` instance +- a dict like `{"type": "sqm", ...}` (type is the driver key) + +Example (structure only; may require external executables): + +```python +from dpdata.driver import Driver + +hyb = Driver.get_driver("hybrid")( + drivers=[ + {"type": "sqm", "qm_theory": "DFTB3"}, + {"type": "dp", "dp": "frozen_model.pb"}, + ] +) +# ls = sys.predict(driver=hyb) +``` + +## Notes / gotchas + +- Many drivers require extra dependencies or external programs: + - `dp` requires `deepmd-kit` + a model file + - `gaussian` requires Gaussian and a valid executable (default `g16`) + - `sqm` requires AmberTools `sqm` +- If you just need file format conversion, use the existing **dpdata CLI** skill instead. diff --git a/skills/dpdata-plugin/SKILL.md b/skills/dpdata-plugin/SKILL.md new file mode 100644 index 00000000..ef9dba2f --- /dev/null +++ b/skills/dpdata-plugin/SKILL.md @@ -0,0 +1,113 @@ +--- +name: dpdata-plugin +description: Create and install dpdata plugins (especially custom Format readers/writers) using Format.register(...) and pyproject.toml entry_points under 'dpdata.plugins'. Use when extending dpdata with new formats or distributing plugins as separate Python packages. +--- + +# dpdata-plugin + +dpdata loads plugins in two ways: + +1. **Built-in plugins** in `dpdata.plugins.*` (imported automatically) +1. **External plugins** exposed via Python package entry points: `dpdata.plugins` + +This skill focuses on **external plugin packages**, the recommended way to add new formats without modifying dpdata itself. + +## What can be extended? + +Most commonly: add a new **Format** (file reader/writer) via: + +```python +from dpdata.format import Format + + +@Format.register("myfmt") +class MyFormat(Format): ... +``` + +## How dpdata discovers plugins + +dpdata imports `dpdata.plugins` during normal use (e.g. `dpdata.system` imports it). That module: + +- imports every built-in module in `dpdata/plugins/*.py` +- then loads all **entry points** in group `dpdata.plugins` + +So an external plugin package only needs to ensure that importing the entry-point target triggers the `@Format.register(...)` side effects. + +## Minimal external plugin package (based on plugin_example/) + +### 1) Create a new Python package + +Example layout: + +```text +dpdata_random/ + pyproject.toml + dpdata_random/ + __init__.py +``` + +### 2) Implement and register your Format + +In `dpdata_random/__init__.py` (shortened example): + +```python +from __future__ import annotations + +import numpy as np +from dpdata.format import Format + + +@Format.register("random") +class RandomFormat(Format): + def from_system(self, N, **kwargs): + return { + "atom_numbs": [20], + "atom_names": ["X"], + "atom_types": np.zeros(20, dtype=int), + "cells": np.repeat(np.eye(3)[None, ...], N, axis=0) * 100.0, + "coords": np.random.rand(N, 20, 3) * 100.0, + "orig": np.zeros(3), + "nopbc": False, + } +``` + +Return dicts must match dpdata’s expected schema (cells/coords/atom_names/atom_types/...). + +### 3) Expose an entry point + +In `pyproject.toml`: + +```toml +[project] +name = "dpdata_random" +version = "0.0.0" +dependencies = ["numpy", "dpdata"] + +[project.entry-points.'dpdata.plugins'] +random = "dpdata_random:RandomFormat" +``` + +Any importable target works; this pattern points directly at the class. + +### 4) Install and test + +In a clean env (recommended via `uv`): + +```bash +uv run --with dpdata --with numpy python3 - <<'PY' +import dpdata +from dpdata.format import Format + +# importing dpdata will load entry points (dpdata.plugins) +print('random' in Format.get_formats()) +PY +``` + +If it prints `True`, your plugin was discovered. + +## Debug checklist + +- Did you install the plugin package into the same environment where you run dpdata? +- Does `pyproject.toml` contain `[project.entry-points.'dpdata.plugins']`? +- Does importing the entry point module/class execute the `@Format.register(...)` decorator? +- If using `uv run`, remember each command runs in its own environment unless you’re in a `uv` project (or you rely on `uv run --with ...`).