feat(pt_expt): add dp compress support for pt_expt backend#5323
feat(pt_expt): add dp compress support for pt_expt backend#5323wanghan-iapcm wants to merge 2 commits intodeepmodeling:masterfrom
Conversation
Implement model compression (embedding net tabulation) for the pt_expt backend, replacing embedding net forward passes with polynomial lookup tables via C++ custom ops (tabulate_fusion_se_*). - Register fake tensor implementations for custom ops to enable torch.export/make_fx tracing through compressed forward paths - Create pt_expt DPTabulate subclass using serialized type detection instead of isinstance checks against pt-specific classes - Add enable_compression() and compressed call() to all descriptors: se_e2_a, se_r, se_t, se_t_tebd, dpa1, se_atten_v2, dpa2 (hybrid delegates to sub-descriptors automatically) - Add compress entry point and command dispatch in main.py - Add test_compressed_forward tests for all descriptor types - Initialize self.compress = False in dpmodel descriptor __init__
| ) | ||
|
|
||
|
|
||
| class DPTabulate(DPTabulatePT): |
Check failure
Code scanning / CodeQL
Missing call to superclass `__init__` during object initialization Error
| def call( | ||
| self, | ||
| coord_ext: torch.Tensor, | ||
| atype_ext: torch.Tensor, | ||
| nlist: torch.Tensor, | ||
| mapping: torch.Tensor | None = None, | ||
| fparam: torch.Tensor | None = None, | ||
| ) -> Any: |
Check warning
Code scanning / CodeQL
Signature mismatch in overriding method Warning
| def call( | ||
| self, | ||
| coord_ext: torch.Tensor, | ||
| atype_ext: torch.Tensor, | ||
| nlist: torch.Tensor, | ||
| mapping: torch.Tensor | None = None, | ||
| fparam: torch.Tensor | None = None, | ||
| ) -> Any: |
Check warning
Code scanning / CodeQL
Signature mismatch in overriding method Warning
| def call( | ||
| self, | ||
| coord_ext: torch.Tensor, | ||
| atype_ext: torch.Tensor, | ||
| nlist: torch.Tensor, | ||
| mapping: torch.Tensor | None = None, | ||
| fparam: torch.Tensor | None = None, | ||
| ) -> Any: |
Check warning
Code scanning / CodeQL
Signature mismatch in overriding method Warning
| def call( | ||
| self, | ||
| coord_ext: torch.Tensor, | ||
| atype_ext: torch.Tensor, | ||
| nlist: torch.Tensor, | ||
| mapping: torch.Tensor | None = None, | ||
| fparam: torch.Tensor | None = None, | ||
| ) -> Any: |
Check warning
Code scanning / CodeQL
Signature mismatch in overriding method Warning
| # type_embedding.call() returns (ntypes+1) x tebd_dim (with padding) | ||
| full_embd = self.type_embedding.call() | ||
| nt, t_dim = full_embd.shape | ||
| ng = self.se_atten.neuron[-1] |
Check notice
Code scanning / CodeQL
Unused local variable Note
| # type_embedding.call() returns (ntypes+1) x tebd_dim (with padding) | ||
| full_embd = self.type_embedding.call() | ||
| nt, t_dim = full_embd.shape | ||
| ng = self.repinit.neuron[-1] |
Check notice
Code scanning / CodeQL
Unused local variable Note
| """Register a fake implementation, silently skipping if already registered.""" | ||
| try: | ||
| torch.library.register_fake(op_name)(fn) | ||
| except RuntimeError: |
Check notice
Code scanning / CodeQL
Empty except Note
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: adae1e005e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if isinstance(min_nbor_dist, (int, float)): | ||
| min_nbor_dist_val = float(min_nbor_dist) | ||
| else: | ||
| min_nbor_dist_val = float(min_nbor_dist.item()) |
There was a problem hiding this comment.
Persist computed min_nbor_dist before enabling compression
When min_nbor_dist is missing, this block computes min_nbor_dist_val from the training data but never assigns it back to the model. model.enable_compression(...) then reads self.get_min_nbor_dist() (via the dpmodel base implementation) and still receives None, so dp --pt_expt compress -t ... can fail for models that rely on --training-script to provide neighbor statistics.
Useful? React with 👍 / 👎.
| new_model_dict = model.serialize() | ||
| data = { |
There was a problem hiding this comment.
Keep compression state when writing the output .pte
Serializing the in-memory model right after enable_compression() drops the tabulation state, because descriptor serialize() paths in dpmodel (for example DescrptSeA.serialize) do not include runtime fields like compress, compress_data, or compress_info. Since deserialize_to_file rebuilds from that dict, the exported file reverts to the uncompressed descriptor path even though compression was enabled just before export.
Useful? React with 👍 / 👎.
📝 WalkthroughWalkthroughAdds a compression feature: dpmodel descriptors get a Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as CLI (main.py)
participant Entrypoint as Compress Entrypoint
participant Model as Deserialized Model
participant Descriptor as Descriptor Module
participant Tabulate as DPTabulate
participant Buffers as compress_data/compress_info
User->>CLI: deepmd compress input.pte output.pte
CLI->>Entrypoint: enable_compression(input, output, ...)
Entrypoint->>Model: load & deserialize .pte
Entrypoint->>Model: set/get min_nbor_dist
Entrypoint->>Descriptor: call enable_compression(min_nbor_dist,...)
Descriptor->>Tabulate: build DPTabulate table
Tabulate->>Descriptor: return table buffers
Descriptor->>Buffers: _store_compress_data() (save tables & info)
Descriptor->>Buffers: _store_type_embd_data() (optional)
Descriptor->>Descriptor: set self.compress = True
Entrypoint->>Model: re-serialize compressed model
Entrypoint->>User: save output.pte
sequenceDiagram
participant Input as Input Tensors
participant Descriptor as Descriptor.call
participant Check as compress flag
participant Compressed as _call_compressed
participant Op as Custom tabulate_fusion_* op
participant Buffers as compress_data/compress_info
participant Output as Output
Input->>Descriptor: coord_ext, atype_ext, nlist
Descriptor->>Check: if self.compress?
alt compressed
Check->>Compressed: run _call_compressed
Compressed->>Buffers: load compress_data/compress_info
Compressed->>Op: invoke tabulate_fusion_* with buffers
Op->>Compressed: return accumulated env/type outputs
Compressed->>Output: assemble final tensor (+ aux)
else uncompressed
Check->>Descriptor: run original call path
Descriptor->>Output: return original outputs
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (14)
deepmd/dpmodel/descriptor/se_atten_v2.py (1)
199-199: Redundant initialization, but harmless.
DescrptSeAttenV2explicitly callsDescrptDPA1.__init__()which already setsself.compress = False(line 347 in dpa1.py). Re-setting it here is redundant but ensures the child class explicitly controls its compression state for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/descriptor/se_atten_v2.py` at line 199, Remove the redundant assignment to self.compress in DescrptSeAttenV2.__init__; DescrptDPA1.__init__ already initializes self.compress = False, so delete the line "self.compress = False" in the DescrptSeAttenV2 constructor to avoid duplication while preserving behavior.source/tests/pt_expt/descriptor/test_se_e2_a.py (1)
135-163: Also trace/export the compressed path.This only validates eager execution after
enable_compression(). The new fake-op registrations are there specifically to maketorch.export/make_fxwork on compressed models, so please add one compressed export/tracing assertion here or in a shared helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/descriptor/test_se_e2_a.py` around lines 135 - 163, The test_compressed_forward currently only checks eager outputs after dd0.enable_compression(0.5); add a compressed export/tracing check by exporting or tracing the compressed DescrptSeA instance (e.g., using torch.export or torch.fx.make_fx on dd0 after enable_compression) and run the traced/exported graph on the same coord_ext/atype_ext/nlist inputs, then assert the traced/export outputs match result_ref/result_cmp (use torch.testing.assert_close with the same atol/rtol); locate the test function test_compressed_forward and the DescrptSeA instance/enable_compression call to insert this additional export/trace + assertion.source/tests/pt_expt/descriptor/test_dpa1.py (1)
151-160: Keep an attention-enabled case in this compression test.
attn_layer=0only exercises the no-attention configuration, while the rest of this file usesattn_layer=2. Reusing a nonzero layer count here would make the new compression coverage much closer to the DPA1 path this feature is targeting. If compression is intentionally limited toattn_layer=0, please assert that explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/descriptor/test_dpa1.py` around lines 151 - 160, The test instantiates DescrptDPA1 with attn_layer=0 which only covers the no-attention path; update the instantiation to use a nonzero attention layer (e.g., attn_layer=2) to exercise the attention-enabled compression path consistent with other tests, or if compression is intentionally limited to no-attention, add an explicit assertion or comment near the DescrptDPA1 creation (referencing DescrptDPA1 and attn_layer) that verifies/declares attn_layer==0 and documents the intentional limitation.deepmd/pt_expt/entrypoints/main.py (1)
198-210: Add a smoke test for the newcompressdispatch.This branch depends on parser-provided fields (
step,frequency,training_script) that are easy to mis-wire and aren't exercised anywhere in this diff. A tinymain()dispatch test would catch CLI regressions early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/entrypoints/main.py` around lines 198 - 210, Add a small smoke test that exercises the "compress" dispatch in main: call the main entrypoint (or the function that reads FLAGS/FLAGS.command) with FLAGS.command set to "compress" and set the parser-provided fields FLAGS.step, FLAGS.frequency, FLAGS.training_script (and FLAGS.INPUT/FLAGS.output) to safe dummy values, then assert that enable_compression (imported in the snippet) is invoked (by patching/mocking it) or that main returns/executes without error; this ensures the dispatch that calls enable_compression(input_file=..., stride=..., extrapolate=..., check_frequency=..., training_script=...) is wired correctly and prevents CLI regressions.deepmd/pt_expt/descriptor/se_e2_a.py (3)
172-192: Addstrict=Trueto zip for safety.Same recommendation for the non-type_one_side branch.
♻️ Suggested fix
else: atype_loc = atype_ext[:, :nloc].reshape(nfnl) for embedding_idx, (compress_data_ii, compress_info_ii) in enumerate( - zip(self.compress_data, self.compress_info) + zip(self.compress_data, self.compress_info, strict=True) ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/se_e2_a.py` around lines 172 - 192, The zip over self.compress_data and self.compress_info used in the loop (zip(self.compress_data, self.compress_info) in the block that iterates embedding_idx and calls torch.ops.deepmd.tabulate_fusion_se_a) should be made safe by adding strict=True so mismatched lengths raise immediately; update that zip call and the analogous zip call in the non-type_one_side branch to zip(self.compress_data, self.compress_info, strict=True) to ensure length mismatches fail fast and prevent silent truncation.
133-139: Prefix unused variable with underscore.The
diffvariable is unpacked but never used.♻️ Suggested fix
- rr, diff, ww = self.env_mat.call( + rr, _diff, ww = self.env_mat.call( coord_ext, atype_ext, nlist, self.davg[...], self.dstd[...], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/se_e2_a.py` around lines 133 - 139, The unpacked return from self.env_mat.call assigns rr, diff, ww but diff is unused; update the tuple unpacking at the env_mat.call invocation to prefix the unused value (e.g., rr, _diff, ww or rr, _, ww) so the unused variable is clearly marked, and verify no other references to diff remain in the surrounding code (look for rr, diff, ww in the call site and any subsequent uses).
154-171: Addstrict=Trueto zip for safety.Since
compress_dataandcompress_infoare created together with matching lengths, addingstrict=Trueserves as a runtime assertion that catches potential bugs if the invariant is ever broken.♻️ Suggested fix
if self.type_one_side: for embedding_idx, (compress_data_ii, compress_info_ii) in enumerate( - zip(self.compress_data, self.compress_info) + zip(self.compress_data, self.compress_info, strict=True) ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/se_e2_a.py` around lines 154 - 171, In the loop gated by self.type_one_side that currently does "for embedding_idx, (compress_data_ii, compress_info_ii) in enumerate(zip(self.compress_data, self.compress_info)):", change the zip call to use strict=True so the runtime asserts the two sequences have identical lengths (i.e., zip(self.compress_data, self.compress_info, strict=True)); update the loop header where compress_data and compress_info are enumerated (referenced by embedding_idx, compress_data_ii, compress_info_ii) so any mismatch will raise immediately while leaving the rest of the logic (including calls to torch.ops.deepmd.tabulate_fusion_se_a and accumulation into xyz_scatter) unchanged.deepmd/pt_expt/descriptor/dpa1.py (2)
176-182: Prefix unused variable with underscore.The
diffvariable is unpacked but never used.♻️ Suggested fix
- rr, diff, sw = self.se_atten.env_mat.call( + rr, _diff, sw = self.se_atten.env_mat.call( coord_ext, atype_ext, nlist, self.se_atten.mean[...], self.se_atten.stddev[...], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/dpa1.py` around lines 176 - 182, The unpacked variable diff returned by self.se_atten.env_mat.call(...) is unused; change the unpacking to prefix that element with an underscore (e.g., _diff or _) so it's clear it is intentionally ignored — update the tuple assignment rr, diff, sw = self.se_atten.env_mat.call(...) to rr, _diff, sw = self.se_atten.env_mat.call(...) (references: rr, diff/_diff, sw, self.se_atten.env_mat.call, coord_ext, atype_ext, nlist, self.se_atten.mean, self.se_atten.stddev).
63-65: Consider using explicit exception instead of assert for validation.Using
assertfor input validation is problematic because assertions can be disabled with Python's-Oflag. For production code validation that should always run, prefer explicitif/raise.♻️ Suggested fix
- assert not self.se_atten.resnet_dt, ( - "Model compression error: descriptor resnet_dt must be false!" - ) + if self.se_atten.resnet_dt: + raise ValueError("Model compression error: descriptor resnet_dt must be false!")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/dpa1.py` around lines 63 - 65, Replace the runtime-only assert with an explicit validation that always runs: check the boolean flag self.se_atten.resnet_dt in the constructor or init path in dpa1.py and if it is True raise a clear exception (e.g., ValueError or RuntimeError) with the same message "Model compression error: descriptor resnet_dt must be false!" so the validation is enforced even when Python optimizations are enabled.deepmd/pt_expt/descriptor/se_r.py (2)
122-129: Prefix unused variable with underscore.The
diffvariable is unpacked but never used.♻️ Suggested fix
- rr, diff, ww = self.env_mat.call( + rr, _diff, ww = self.env_mat.call( coord_ext, atype_ext, nlist, self.davg[...], self.dstd[...], True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/se_r.py` around lines 122 - 129, The unpacked variable named diff from the call to self.env_mat.call (assigned as rr, diff, ww) is unused; change the unpacking to underscore-prefixed name(s) (e.g., rr, _diff, ww or rr, _, ww) so the unused value is clearly marked. Update the assignment site where self.env_mat.call is invoked (in the function/method containing rr, diff, ww and using coord_ext, atype_ext, nlist, self.davg, self.dstd) to use the underscore-prefixed variable and leave the rest of the logic unchanged.
139-153: Addstrict=Trueto zip for safety.♻️ Suggested fix
for ii, (compress_data_ii, compress_info_ii) in enumerate( - zip(self.compress_data, self.compress_info) + zip(self.compress_data, self.compress_info, strict=True) ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/se_r.py` around lines 139 - 153, The loop zips self.compress_data and self.compress_info without strict checking; change the zip call in the for loop inside the method (the for ii, (compress_data_ii, compress_info_ii) in enumerate(zip(self.compress_data, self.compress_info))) to use zip(..., strict=True) so mismatched lengths raise immediately and avoid silent misalignment when producing mm/ss/xyz_scatter in that block (refer to variables compress_data_ii, compress_info_ii, sec, ss, and the torch.ops.deepmd.tabulate_fusion_se_r call to locate the code).deepmd/pt_expt/utils/tabulate.py (1)
40-47: Avoid mutable default argument and function call in defaults.Using a mutable list
[]as a default argument and callingActivationFn("tanh")in the signature can cause subtle bugs. The mutable default is shared across all calls, and the function call default is evaluated once at definition time.♻️ Suggested fix
def __init__( self, descrpt: Any, neuron: list[int], type_one_side: bool = False, - exclude_types: list[list[int]] = [], - activation_fn: ActivationFn = ActivationFn("tanh"), + exclude_types: list[list[int]] | None = None, + activation_fn: ActivationFn | None = None, ) -> None: + if exclude_types is None: + exclude_types = [] + if activation_fn is None: + activation_fn = ActivationFn("tanh") # Call BaseTabulate.__init__ directly (skip DPTabulatePT.__init__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/utils/tabulate.py` around lines 40 - 47, The __init__ signature for the class uses a mutable default for exclude_types and calls ActivationFn("tanh") at definition time; change the parameters to use exclude_types: Optional[list[list[int]]] = None and activation_fn: Optional[ActivationFn] = None, then inside __init__ set self.exclude_types = [] if exclude_types is None else exclude_types (or a shallow copy) and set self.activation_fn = activation_fn if activation_fn is not None else ActivationFn("tanh"); update any references to use these instance attributes (referencing the __init__ method, parameter names exclude_types and activation_fn, and ActivationFn) to avoid shared mutable state and premature evaluation.deepmd/pt_expt/descriptor/se_t_tebd.py (1)
159-165: Prefix unused variable with underscore.The
diffvariable is unpacked but never used. Prefix it with an underscore to indicate it's intentionally ignored and satisfy the linter.♻️ Suggested fix
- rr, diff, sw = self.se_ttebd.env_mat.call( + rr, _diff, sw = self.se_ttebd.env_mat.call( coord_ext, atype_ext, nlist, self.se_ttebd.mean[...], self.se_ttebd.stddev[...], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/descriptor/se_t_tebd.py` around lines 159 - 165, The unpacking from self.se_ttebd.env_mat.call currently assigns rr, diff, sw but diff is never used; update the unpacking in the caller of self.se_ttebd.env_mat.call (where rr, diff, sw are assigned) to prefix the unused variable with an underscore (e.g., rr, _diff, sw or rr, _, sw) so the intent to ignore it is clear to linters and readers; leave rr and sw as-is and ensure any references to the old name aren’t used elsewhere.deepmd/pt_expt/utils/tabulate_ops.py (1)
26-29: Consider adding a comment explaining the side-effect access.The static analysis tool flags this as a useless expression (B018), but it intentionally triggers lazy loading of the custom op library. Adding a brief comment would clarify intent and silence the warning.
💡 Suggested improvement
# Ensure the custom op library is loaded (if available) try: - torch.ops.deepmd + torch.ops.deepmd # Access triggers lazy library loading except AttributeError: pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/utils/tabulate_ops.py` around lines 26 - 29, The try/except that accesses torch.ops.deepmd is intentionally used to trigger lazy loading of the custom op library, so add a short comment immediately above that block clarifying this side-effect (e.g., "Access torch.ops.deepmd to force lazy-load the custom op library; keep except to avoid import error") and optionally append a linter suppression (e.g., # noqa: B018) to the line to silence the static-analysis warning while leaving the existing try/except (torch.ops.deepmd and the AttributeError handler) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/pt_expt/descriptor/dpa2.py`:
- Line 215: Unpackings in dpa2.py are creating unused variables causing RUF059;
change the unpack of nlist.shape in the scope where nframes, nloc, nnei =
nlist.shape to use a prefixed unused name for the third value (e.g., nframes,
nloc, _nnei = nlist.shape) and likewise rename the unused variable diff at the
other occurrence to _diff so intent is explicit; update both occurrences
referenced in the file and then run ruff check . and ruff format . before
committing.
In `@deepmd/pt_expt/descriptor/se_t.py`:
- Line 133: Rename the unused unpacked variable diff from the call rr, diff, ww
= self.env_mat.call(...) to rr, _diff, ww = self.env_mat.call(...) in the
compressed forward path to satisfy RUF059, and update the subsequent loop that
currently uses zip(param_list_a, param_list_b) to zip(param_list_a,
param_list_b, strict=True) (use the exact parameter list variable names present
around line 155) to enforce equal-length iteration and satisfy B905; make these
two edits inside the same compressed forward function/method where
self.env_mat.call and the zip loop occur (preserving all other logic).
In `@deepmd/pt_expt/entrypoints/compress.py`:
- Around line 91-98: The call to model.enable_compression currently passes
arguments in the wrong order so extrapolate is being used as min_nbor_dist;
replace the call to model.enable_compression(...) so that min_nbor_dist_val is
supplied as the first positional argument (per the enable_compression signature
in descriptor/se_t.py), followed by extrapolate, stride, stride * 10, and
check_frequency; update the invocation at the model.enable_compression(...) site
to pass min_nbor_dist_val first to ensure correct table bounds.
In `@source/tests/pt_expt/descriptor/test_dpa2.py`:
- Around line 243-244: The tuple unpacking of self.nlist.shape assigns unused
variables nf and nloc causing a RUF059 lint error; change the unpack to use
placeholders (e.g., "_, _, nnei = self.nlist.shape" or "_, _, nnei = ..." ) so
only nnei is kept and rng = np.random.default_rng(GLOBAL_SEED) remains
unchanged—update the line that currently reads "nf, nloc, nnei =
self.nlist.shape" to use underscores for the unused members.
---
Nitpick comments:
In `@deepmd/dpmodel/descriptor/se_atten_v2.py`:
- Line 199: Remove the redundant assignment to self.compress in
DescrptSeAttenV2.__init__; DescrptDPA1.__init__ already initializes
self.compress = False, so delete the line "self.compress = False" in the
DescrptSeAttenV2 constructor to avoid duplication while preserving behavior.
In `@deepmd/pt_expt/descriptor/dpa1.py`:
- Around line 176-182: The unpacked variable diff returned by
self.se_atten.env_mat.call(...) is unused; change the unpacking to prefix that
element with an underscore (e.g., _diff or _) so it's clear it is intentionally
ignored — update the tuple assignment rr, diff, sw =
self.se_atten.env_mat.call(...) to rr, _diff, sw =
self.se_atten.env_mat.call(...) (references: rr, diff/_diff, sw,
self.se_atten.env_mat.call, coord_ext, atype_ext, nlist, self.se_atten.mean,
self.se_atten.stddev).
- Around line 63-65: Replace the runtime-only assert with an explicit validation
that always runs: check the boolean flag self.se_atten.resnet_dt in the
constructor or init path in dpa1.py and if it is True raise a clear exception
(e.g., ValueError or RuntimeError) with the same message "Model compression
error: descriptor resnet_dt must be false!" so the validation is enforced even
when Python optimizations are enabled.
In `@deepmd/pt_expt/descriptor/se_e2_a.py`:
- Around line 172-192: The zip over self.compress_data and self.compress_info
used in the loop (zip(self.compress_data, self.compress_info) in the block that
iterates embedding_idx and calls torch.ops.deepmd.tabulate_fusion_se_a) should
be made safe by adding strict=True so mismatched lengths raise immediately;
update that zip call and the analogous zip call in the non-type_one_side branch
to zip(self.compress_data, self.compress_info, strict=True) to ensure length
mismatches fail fast and prevent silent truncation.
- Around line 133-139: The unpacked return from self.env_mat.call assigns rr,
diff, ww but diff is unused; update the tuple unpacking at the env_mat.call
invocation to prefix the unused value (e.g., rr, _diff, ww or rr, _, ww) so the
unused variable is clearly marked, and verify no other references to diff remain
in the surrounding code (look for rr, diff, ww in the call site and any
subsequent uses).
- Around line 154-171: In the loop gated by self.type_one_side that currently
does "for embedding_idx, (compress_data_ii, compress_info_ii) in
enumerate(zip(self.compress_data, self.compress_info)):", change the zip call to
use strict=True so the runtime asserts the two sequences have identical lengths
(i.e., zip(self.compress_data, self.compress_info, strict=True)); update the
loop header where compress_data and compress_info are enumerated (referenced by
embedding_idx, compress_data_ii, compress_info_ii) so any mismatch will raise
immediately while leaving the rest of the logic (including calls to
torch.ops.deepmd.tabulate_fusion_se_a and accumulation into xyz_scatter)
unchanged.
In `@deepmd/pt_expt/descriptor/se_r.py`:
- Around line 122-129: The unpacked variable named diff from the call to
self.env_mat.call (assigned as rr, diff, ww) is unused; change the unpacking to
underscore-prefixed name(s) (e.g., rr, _diff, ww or rr, _, ww) so the unused
value is clearly marked. Update the assignment site where self.env_mat.call is
invoked (in the function/method containing rr, diff, ww and using coord_ext,
atype_ext, nlist, self.davg, self.dstd) to use the underscore-prefixed variable
and leave the rest of the logic unchanged.
- Around line 139-153: The loop zips self.compress_data and self.compress_info
without strict checking; change the zip call in the for loop inside the method
(the for ii, (compress_data_ii, compress_info_ii) in
enumerate(zip(self.compress_data, self.compress_info))) to use zip(...,
strict=True) so mismatched lengths raise immediately and avoid silent
misalignment when producing mm/ss/xyz_scatter in that block (refer to variables
compress_data_ii, compress_info_ii, sec, ss, and the
torch.ops.deepmd.tabulate_fusion_se_r call to locate the code).
In `@deepmd/pt_expt/descriptor/se_t_tebd.py`:
- Around line 159-165: The unpacking from self.se_ttebd.env_mat.call currently
assigns rr, diff, sw but diff is never used; update the unpacking in the caller
of self.se_ttebd.env_mat.call (where rr, diff, sw are assigned) to prefix the
unused variable with an underscore (e.g., rr, _diff, sw or rr, _, sw) so the
intent to ignore it is clear to linters and readers; leave rr and sw as-is and
ensure any references to the old name aren’t used elsewhere.
In `@deepmd/pt_expt/entrypoints/main.py`:
- Around line 198-210: Add a small smoke test that exercises the "compress"
dispatch in main: call the main entrypoint (or the function that reads
FLAGS/FLAGS.command) with FLAGS.command set to "compress" and set the
parser-provided fields FLAGS.step, FLAGS.frequency, FLAGS.training_script (and
FLAGS.INPUT/FLAGS.output) to safe dummy values, then assert that
enable_compression (imported in the snippet) is invoked (by patching/mocking it)
or that main returns/executes without error; this ensures the dispatch that
calls enable_compression(input_file=..., stride=..., extrapolate=...,
check_frequency=..., training_script=...) is wired correctly and prevents CLI
regressions.
In `@deepmd/pt_expt/utils/tabulate_ops.py`:
- Around line 26-29: The try/except that accesses torch.ops.deepmd is
intentionally used to trigger lazy loading of the custom op library, so add a
short comment immediately above that block clarifying this side-effect (e.g.,
"Access torch.ops.deepmd to force lazy-load the custom op library; keep except
to avoid import error") and optionally append a linter suppression (e.g., #
noqa: B018) to the line to silence the static-analysis warning while leaving the
existing try/except (torch.ops.deepmd and the AttributeError handler) unchanged.
In `@deepmd/pt_expt/utils/tabulate.py`:
- Around line 40-47: The __init__ signature for the class uses a mutable default
for exclude_types and calls ActivationFn("tanh") at definition time; change the
parameters to use exclude_types: Optional[list[list[int]]] = None and
activation_fn: Optional[ActivationFn] = None, then inside __init__ set
self.exclude_types = [] if exclude_types is None else exclude_types (or a
shallow copy) and set self.activation_fn = activation_fn if activation_fn is not
None else ActivationFn("tanh"); update any references to use these instance
attributes (referencing the __init__ method, parameter names exclude_types and
activation_fn, and ActivationFn) to avoid shared mutable state and premature
evaluation.
In `@source/tests/pt_expt/descriptor/test_dpa1.py`:
- Around line 151-160: The test instantiates DescrptDPA1 with attn_layer=0 which
only covers the no-attention path; update the instantiation to use a nonzero
attention layer (e.g., attn_layer=2) to exercise the attention-enabled
compression path consistent with other tests, or if compression is intentionally
limited to no-attention, add an explicit assertion or comment near the
DescrptDPA1 creation (referencing DescrptDPA1 and attn_layer) that
verifies/declares attn_layer==0 and documents the intentional limitation.
In `@source/tests/pt_expt/descriptor/test_se_e2_a.py`:
- Around line 135-163: The test_compressed_forward currently only checks eager
outputs after dd0.enable_compression(0.5); add a compressed export/tracing check
by exporting or tracing the compressed DescrptSeA instance (e.g., using
torch.export or torch.fx.make_fx on dd0 after enable_compression) and run the
traced/exported graph on the same coord_ext/atype_ext/nlist inputs, then assert
the traced/export outputs match result_ref/result_cmp (use
torch.testing.assert_close with the same atol/rtol); locate the test function
test_compressed_forward and the DescrptSeA instance/enable_compression call to
insert this additional export/trace + assertion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 16ab0304-492e-4fb6-af81-6e1a61b3394a
📒 Files selected for processing (26)
deepmd/dpmodel/descriptor/dpa1.pydeepmd/dpmodel/descriptor/dpa2.pydeepmd/dpmodel/descriptor/se_atten_v2.pydeepmd/dpmodel/descriptor/se_e2_a.pydeepmd/dpmodel/descriptor/se_r.pydeepmd/dpmodel/descriptor/se_t.pydeepmd/dpmodel/descriptor/se_t_tebd.pydeepmd/pt_expt/descriptor/dpa1.pydeepmd/pt_expt/descriptor/dpa2.pydeepmd/pt_expt/descriptor/se_atten_v2.pydeepmd/pt_expt/descriptor/se_e2_a.pydeepmd/pt_expt/descriptor/se_r.pydeepmd/pt_expt/descriptor/se_t.pydeepmd/pt_expt/descriptor/se_t_tebd.pydeepmd/pt_expt/entrypoints/compress.pydeepmd/pt_expt/entrypoints/main.pydeepmd/pt_expt/utils/__init__.pydeepmd/pt_expt/utils/tabulate.pydeepmd/pt_expt/utils/tabulate_ops.pysource/tests/pt_expt/descriptor/test_dpa1.pysource/tests/pt_expt/descriptor/test_dpa2.pysource/tests/pt_expt/descriptor/test_se_atten_v2.pysource/tests/pt_expt/descriptor/test_se_e2_a.pysource/tests/pt_expt/descriptor/test_se_r.pysource/tests/pt_expt/descriptor/test_se_t.pysource/tests/pt_expt/descriptor/test_se_t_tebd.py
| then the rest (g1_shape_transform, repformers, etc.) proceeds normally. | ||
| """ | ||
| use_three_body = self.use_three_body | ||
| nframes, nloc, nnei = nlist.shape |
There was a problem hiding this comment.
Fix Ruff RUF059 unused unpacked variables to avoid lint/CI noise.
Line [215] (nnei) and Line [327] (diff) are unpacked but never used. Prefix both with _ to make intent explicit.
♻️ Proposed fix
- nframes, nloc, nnei = nlist.shape
+ nframes, nloc, _nnei = nlist.shape
...
- rr, diff, sw = self.repinit.env_mat.call(
+ rr, _diff, sw = self.repinit.env_mat.call(As per coding guidelines, **/*.py: Always run ruff check . and ruff format . before committing changes or CI will fail.
Also applies to: 327-327
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 215-215: Unpacked variable nnei is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/pt_expt/descriptor/dpa2.py` at line 215, Unpackings in dpa2.py are
creating unused variables causing RUF059; change the unpack of nlist.shape in
the scope where nframes, nloc, nnei = nlist.shape to use a prefixed unused name
for the third value (e.g., nframes, nloc, _nnei = nlist.shape) and likewise
rename the unused variable diff at the other occurrence to _diff so intent is
explicit; update both occurrences referenced in the file and then run ruff check
. and ruff format . before committing.
| ) -> Any: | ||
| """Compressed forward using tabulate_fusion_se_t custom op.""" | ||
| # env_mat: nf x nloc x nnei x 4 | ||
| rr, diff, ww = self.env_mat.call( |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Verify configured Python target is compatible with zip(strict=...)
fd -HI 'pyproject.toml|setup.cfg|tox.ini|.ruff.toml'
rg -n "requires-python|python_requires|target-version" || true
# 2) Re-scan this file for the flagged patterns
rg -nP 'rr,\s*diff,\s*ww\s*=' deepmd/pt_expt/descriptor/se_t.py || true
rg -nP 'zip\s*\(\s*self\.compress_data\s*,\s*self\.compress_info\s*\)' deepmd/pt_expt/descriptor/se_t.py || trueRepository: deepmodeling/deepmd-kit
Length of output: 802
🏁 Script executed:
cat -n deepmd/pt_expt/descriptor/se_t.py | sed -n '130,140p'Repository: deepmodeling/deepmd-kit
Length of output: 478
🏁 Script executed:
cat -n deepmd/pt_expt/descriptor/se_t.py | sed -n '152,160p'Repository: deepmodeling/deepmd-kit
Length of output: 469
🏁 Script executed:
cat -n deepmd/pt_expt/descriptor/se_t.py | sed -n '125,180p'Repository: deepmodeling/deepmd-kit
Length of output: 2592
🏁 Script executed:
# Check if 'diff' variable is used after line 133 in the _call_compressed function
rg -A 50 'def _call_compressed' deepmd/pt_expt/descriptor/se_t.py | grep -n 'diff'Repository: deepmodeling/deepmd-kit
Length of output: 110
🏁 Script executed:
# Verify that both ww and diff are unused in the function
rg -A 60 'def _call_compressed' deepmd/pt_expt/descriptor/se_t.py | grep -E '\b(ww|diff)\b'Repository: deepmodeling/deepmd-kit
Length of output: 152
🏁 Script executed:
# Check the definition and usage of compress_data and compress_info
rg -B 5 -A 5 'compress_data|compress_info' deepmd/pt_expt/descriptor/se_t.py | head -80Repository: deepmodeling/deepmd-kit
Length of output: 2664
🏁 Script executed:
# Verify ww is indeed used in the return statement
rg -A 65 'def _call_compressed' deepmd/pt_expt/descriptor/se_t.py | tail -10Repository: deepmodeling/deepmd-kit
Length of output: 455
Address Ruff RUF059 and B905 violations in the compressed forward path.
Line 133 unpacks diff but never uses it; rename to _diff to indicate intentional non-use.
Line 155 should use zip(..., strict=True) to enforce that both parameter lists have equal length and satisfy the B905 check.
♻️ Proposed fix
- rr, diff, ww = self.env_mat.call(
+ rr, _diff, ww = self.env_mat.call( for embedding_idx, (compress_data_ii, compress_info_ii) in enumerate(
- zip(self.compress_data, self.compress_info)
+ zip(self.compress_data, self.compress_info, strict=True)
):As per coding guidelines, **/*.py: Always run ruff check . and ruff format . before committing changes or CI will fail.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rr, diff, ww = self.env_mat.call( | |
| rr, _diff, ww = self.env_mat.call( |
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 133-133: Unpacked variable diff is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/pt_expt/descriptor/se_t.py` at line 133, Rename the unused unpacked
variable diff from the call rr, diff, ww = self.env_mat.call(...) to rr, _diff,
ww = self.env_mat.call(...) in the compressed forward path to satisfy RUF059,
and update the subsequent loop that currently uses zip(param_list_a,
param_list_b) to zip(param_list_a, param_list_b, strict=True) (use the exact
parameter list variable names present around line 155) to enforce equal-length
iteration and satisfy B905; make these two edits inside the same compressed
forward function/method where self.env_mat.call and the zip loop occur
(preserving all other logic).
| # 3. Enable compression | ||
| log.info("Enabling compression...") | ||
| model.enable_compression( | ||
| extrapolate, | ||
| stride, | ||
| stride * 10, | ||
| check_frequency, | ||
| ) |
There was a problem hiding this comment.
Critical bug: min_nbor_dist_val is not passed to enable_compression.
The computed min_nbor_dist_val is never used. According to the descriptor's enable_compression signature (see deepmd/pt_expt/descriptor/se_t.py lines 30-37), min_nbor_dist must be the first positional argument. Currently, extrapolate is being passed as the first argument, which will be interpreted as min_nbor_dist, causing incorrect table bounds.
🐛 Proposed fix
# 3. Enable compression
log.info("Enabling compression...")
model.enable_compression(
+ min_nbor_dist_val,
extrapolate,
stride,
stride * 10,
check_frequency,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # 3. Enable compression | |
| log.info("Enabling compression...") | |
| model.enable_compression( | |
| extrapolate, | |
| stride, | |
| stride * 10, | |
| check_frequency, | |
| ) | |
| # 3. Enable compression | |
| log.info("Enabling compression...") | |
| model.enable_compression( | |
| min_nbor_dist_val, | |
| extrapolate, | |
| stride, | |
| stride * 10, | |
| check_frequency, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/pt_expt/entrypoints/compress.py` around lines 91 - 98, The call to
model.enable_compression currently passes arguments in the wrong order so
extrapolate is being used as min_nbor_dist; replace the call to
model.enable_compression(...) so that min_nbor_dist_val is supplied as the first
positional argument (per the enable_compression signature in
descriptor/se_t.py), followed by extrapolate, stride, stride * 10, and
check_frequency; update the invocation at the model.enable_compression(...) site
to pass min_nbor_dist_val first to ensure correct table bounds.
| rng = np.random.default_rng(GLOBAL_SEED) | ||
| nf, nloc, nnei = self.nlist.shape |
There was a problem hiding this comment.
Remove the unused shape tuple members.
Line 244 leaves nf and nloc unused, and Ruff is already flagging this as RUF059. Use _ placeholders so this test doesn't fail the Python lint gate.
🧹 Minimal fix
- nf, nloc, nnei = self.nlist.shape
+ _, _, nnei = self.nlist.shape📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rng = np.random.default_rng(GLOBAL_SEED) | |
| nf, nloc, nnei = self.nlist.shape | |
| rng = np.random.default_rng(GLOBAL_SEED) | |
| _, _, nnei = self.nlist.shape |
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 244-244: Unpacked variable nf is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 244-244: Unpacked variable nloc is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/tests/pt_expt/descriptor/test_dpa2.py` around lines 243 - 244, The
tuple unpacking of self.nlist.shape assigns unused variables nf and nloc causing
a RUF059 lint error; change the unpack to use placeholders (e.g., "_, _, nnei =
self.nlist.shape" or "_, _, nnei = ..." ) so only nnei is kept and rng =
np.random.default_rng(GLOBAL_SEED) remains unchanged—update the line that
currently reads "nf, nloc, nnei = self.nlist.shape" to use underscores for the
unused members.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5323 +/- ##
==========================================
+ Coverage 82.29% 82.37% +0.07%
==========================================
Files 775 778 +3
Lines 77627 78311 +684
Branches 3676 3675 -1
==========================================
+ Hits 63887 64512 +625
- Misses 12566 12625 +59
Partials 1174 1174 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… compression pipeline - torch_module decorator auto-registers dpmodel base classes in the converter registry, so auto-wrapped parent objects (e.g. atomic models) get pt_expt sub-components with enable_compression support - Store min_nbor_dist as a buffer on the model (property with -1.0 sentinel), matching the pt backend behavior for serialization - Clean up compress entry point: remove dead fallback, set min_nbor_dist on model before calling enable_compression - Add end-to-end model compression tests: model.enable_compression, freeze→compress→eval pipeline, min_nbor_dist round-trip
There was a problem hiding this comment.
♻️ Duplicate comments (1)
deepmd/pt_expt/entrypoints/compress.py (1)
91-97:⚠️ Potential issue | 🔴 CriticalCritical bug:
min_nbor_distis not passed toenable_compression.The
enable_compressionmethod on descriptors (seedeepmd/pt_expt/descriptor/se_e2_a.py:29-36) expectsmin_nbor_distas the first positional argument:def enable_compression( self, min_nbor_dist: float, table_extrapolate: float = 5, table_stride_1: float = 0.01, table_stride_2: float = 0.1, check_frequency: int = -1, ) -> None:Currently,
extrapolateis passed as the first argument, which will be interpreted asmin_nbor_dist, causing incorrect table bounds during compression.🐛 Proposed fix
# 3. Enable compression log.info("Enabling compression...") model.enable_compression( + min_nbor_dist, extrapolate, stride, stride * 10, check_frequency, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/entrypoints/compress.py` around lines 91 - 97, The call to model.enable_compression is passing arguments in the wrong order — the descriptor method enable_compression expects min_nbor_dist as the first parameter. Update the call to include the min_nbor_dist variable as the first argument (so the call becomes model.enable_compression(min_nbor_dist, extrapolate, stride, stride * 10, check_frequency)), ensuring the parameters align with the enable_compression signature in descriptor/se_e2_a.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@deepmd/pt_expt/entrypoints/compress.py`:
- Around line 91-97: The call to model.enable_compression is passing arguments
in the wrong order — the descriptor method enable_compression expects
min_nbor_dist as the first parameter. Update the call to include the
min_nbor_dist variable as the first argument (so the call becomes
model.enable_compression(min_nbor_dist, extrapolate, stride, stride * 10,
check_frequency)), ensuring the parameters align with the enable_compression
signature in descriptor/se_e2_a.py.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d0856493-8293-4f4e-81a0-8f835b074124
📒 Files selected for processing (4)
deepmd/pt_expt/common.pydeepmd/pt_expt/entrypoints/compress.pydeepmd/pt_expt/model/make_model.pysource/tests/pt_expt/model/test_model_compression.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c343a10521
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| meta/fake tensor implementations for custom ops. | ||
| """ | ||
| # --- tabulate_fusion_se_a --- | ||
| if hasattr(torch.ops.deepmd, "tabulate_fusion_se_a"): |
There was a problem hiding this comment.
Register fake tabulate ops after custom ops are loaded
_register_fake_if_available() only installs fake kernels when hasattr(torch.ops.deepmd, ...) is already true, but in pt_expt the module is imported early via deepmd.pt_expt.common._ensure_registrations() before deepmd.pt loads the custom library (deepmd/pt/__init__.py). In that import order every check is skipped and registration is never retried, so later symbolic tracing in deserialize_to_file(..., tracing_mode="symbolic") can hit torch.ops.deepmd.tabulate_fusion_* without fake/meta implementations and fail during compression export.
Useful? React with 👍 / 👎.
| data = { | ||
| "model": model.serialize(), | ||
| "model_def_script": model_dict.get("model_def_script"), | ||
| } |
There was a problem hiding this comment.
Preserve min_nbor_dist when writing compressed .pte metadata
The output payload omits min_nbor_dist even though this command reads that field on input (model_dict.get("min_nbor_dist")), and BaseModel.serialize() does not include model-level neighbor stats. As a result, a once-compressed .pte can lose neighbor-distance metadata in model.json, so re-running dp --pt_expt compress on that artifact may incorrectly require --training-script again.
Useful? React with 👍 / 👎.
|
pre-commit.ci autofix |
Summary
tabulate_fusion_se_*), significantly speeding up inferencese_e2_a,se_r,se_t,se_t_tebd,dpa1,se_atten_v2,dpa2(hybrid delegates automatically)Key changes
Infrastructure:
deepmd/pt_expt/utils/tabulate_ops.py— Registertorch.library.register_fakefor all 5 custom ops to enabletorch.export/make_fxtracing through compressed forward pathsdeepmd/pt_expt/utils/tabulate.py—DPTabulatesubclass that detects descriptor type via serialized data (avoidsisinstancechecks against pt-specific classes)deepmd/pt_expt/entrypoints/compress.py— Entry point: load.pte→ deserialize →enable_compression()→ re-export.pteDescriptors: Each gets
enable_compression()+@cast_precisioncall()override with compressed branch using the appropriate custom op.dpmodel: Initialize
self.compress = Falsein all descriptor__init__methods.Summary by CodeRabbit
New Features
Utilities
Tests