Refactor finetune by zjgemi · Pull Request #247 · deepmodeling/dpgen2

zjgemi · 2024-07-31T09:11:47Z

Summary by CodeRabbit

New Features
- Introduced a streamlined approach to finetuning within the workflow, integrating parameters directly without separate step creation.
Bug Fixes
- Revised logic for model initialization to improve clarity and reduce complexity.
Refactor
- Removed outdated classes and methods related to modifying training scripts, simplifying the PrepRunDPTrain class.
- Enhanced decision-making processes by eliminating redundant conditions across various functions.
- Updated test cases to remove references to outdated "finetune" operations, refining the testing strategy.
- Adjusted test cases and configurations to reflect the removal of "finetune" references and enhance clarity.
Chores
- Cleansed the logic for handling "finetune" patterns in specific utility functions for better maintainability.

Signed-off-by: zjgemi <liuxin_zijian@163.com>

coderabbitai · 2024-07-31T09:11:56Z

Walkthrough

The recent updates to the dpgen2 project simplify finetuning management and parameter handling by eliminating the explicit finetuning step and integrating it into existing workflows. This streamlining enhances maintainability, reduces complexity in training command generation, and improves the handling of optional parameters across various modules, ultimately leading to clearer and more efficient code.

Changes

Files/Directories	Change Summary
`dpgen2/entrypoint/submit.py`	Removed `make_finetune_step` and integrated finetuning into `workflow_concurrent_learning`, updating return values to streamline the workflow.
`dpgen2/flow/dpgen_loop.py`	Added `make_next_optional_parameter` to consistently set `finetune_mode` to "no" during scheduling.
`dpgen2/op/run_dp_train.py`	Simplified model initialization logic by removing redundant checks, clarifying conditions for model training.
`dpgen2/superop/prep_run_dp_train.py`	Removed `ModifyTrainScript` class and related methods, simplifying the `PrepRunDPTrain` class and its parameter handling.
`dpgen2/utils/dflow_query.py`	Eliminated checks for "finetune" patterns, streamlining key identification logic in regex functions.
`tests/mocked_ops.py`	Removed `MockedModifyTrainScript` class and its `execute` method, simplifying test structure and dependencies.
`tests/test_prep_run_dp_train.py`	Removed tests for `ModifyTrainScript`, refocusing on `MockedPrepDPTrain` and `MockedRunDPTrain`, simplifying the test suite.
`tests/utils/test_dflow_query.py`	Removed "finetune" entries from `dpgen_keys` and associated test cases, streamlining expected outputs.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Workflow
    participant Model

    User->>Workflow: Start Training
    Workflow->>Model: Initialize Model
    alt Finetuning Enabled
        Model->>Workflow: Check finetune parameters
        Workflow->>Model: Set to finetune mode
    end
    Workflow->>Model: Execute Training
    Model->>Workflow: Training Results
    Workflow->>User: Return Results

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 33bac05 and ef494e9.

Files selected for processing (2)

dpgen2/op/run_dp_train.py (2 hunks)
tests/op/test_run_dp_train.py (5 hunks)

Files skipped from review as they are similar to previous changes (2)

dpgen2/op/run_dp_train.py
tests/op/test_run_dp_train.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 1

coderabbitai · 2024-07-31T09:16:39Z

dpgen2/entrypoint/submit.py

        },
    )
-    return dpgen_step, finetune_step
+    return dpgen_step


Fix the return type of workflow_concurrent_learning.

The return type should be Step instead of Tuple[Step, Optional[Step]] to match the updated logic.

- ) -> Tuple[Step, Optional[Step]]: + ) -> Step:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return dpgen_step

) -> Step:

Tools

GitHub Check: pyright

[failure] 603-603:
Expression of type "Step" cannot be assigned to return type "Tuple[Step, Step | None]"
"Step" is incompatible with "Tuple[Step, Step | None]" (reportGeneralTypeIssues)

Signed-off-by: zjgemi <liuxin_zijian@163.com>

…actor-finetune

Signed-off-by: zjgemi <liuxin_zijian@163.com>

coderabbitai

Actionable comments posted: 1

Outside diff range, codebase verification and nitpick comments (1)

dpgen2/entrypoint/args.py (1)

377-382: Clarify the documentation string for do_finetune.

The documentation string for do_finetune is clear but can be further simplified for better readability.

-        "Finetune the pretrained model during the first iteration. If it is set to True, then in the first iteration, "
-        'the internal flag finetune_mode is set to "finetune". In this step, we finetune the pretrained model in the '
-        'train step. After that, in the following iterations, init_model_policy is forced to be "yes", the flag '
-        'finetune_mode is set as "no", which means we use `--init-frz-model` or `--init-model` to train based on '
-        'models from the previous iteration. The "do_finetune" flag is set to False by default, while the internal '
-        'flag finetune_mode is set to "no", which means anything related to finetuning will not be done.'
+        "Finetune the pretrained model during the first iteration. If set to True, the internal flag `finetune_mode` "
+        'is set to "finetune" during the first iteration to finetune the pretrained model. In subsequent iterations, '
+        'the `init_model_policy` is forced to "yes" and `finetune_mode` is set to "no", using `--init-frz-model` or '
+        '`--init-model` to train based on models from the previous iteration. By default, `do_finetune` is False, '
+        'and `finetune_mode` is "no", meaning finetuning is not performed.'

coderabbitai · 2024-08-01T01:12:17Z

tests/test_prep_run_dp_train.py

        new_template_script = step.outputs.parameters["template_script"].value
-        expected_list = [{"foo": "bar"} for i in range(self.numb_models)]
-        assert new_template_script == expected_list
+        assert new_template_script == self.template_script


Remove unnecessary assertion.

The assertion assert new_template_script == self.template_script is redundant as it does not provide additional value.

- assert new_template_script == self.template_script

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert new_template_script == self.template_script

Signed-off-by: zjgemi <liuxin_zijian@163.com>

codecov · 2024-08-01T05:47:50Z

Codecov Report

Attention: Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.75%. Comparing base (f5c5d95) to head (ef494e9).
Report is 22 commits behind head on master.

Files with missing lines	Patch %	Lines
dpgen2/entrypoint/submit.py	40.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #247      +/-   ##
==========================================
- Coverage   83.84%   83.75%   -0.10%     
==========================================
  Files          97       97              
  Lines        5368     5318      -50     
==========================================
- Hits         4501     4454      -47     
+ Misses        867      864       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wanghan-iapcm

LGTM

wanghan-iapcm · 2024-08-03T08:57:50Z

dpgen2/op/run_dp_train.py

-    elif (
-        do_init_model or finetune_mode == "train-init"
-    ) and not init_model_with_finetune:
+    elif do_init_model and not init_model_with_finetune:


it seems that _make_train_command_old is not used anymore, we may simply remove the function.

_make_train_command_old has been removed.

Signed-off-by: zjgemi <liuxin_zijian@163.com>

zjgemi added 2 commits June 19, 2024 17:16

do finetune in iter-0 instead of additional finetune step

3a45d4c

Signed-off-by: zjgemi <liuxin_zijian@163.com>

Merge branch 'master' into refactor-finetune

fdefc70

[pre-commit.ci] auto fixes from pre-commit.com hooks

0bb073e

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Jul 31, 2024

View reviewed changes

zjgemi added 2 commits July 31, 2024 17:24

fix UTs

77d3ac6

Signed-off-by: zjgemi <liuxin_zijian@163.com>

Merge branch 'refactor-finetune' of github.com:zjgemi/dpgen2 into ref…

d33b0fd

…actor-finetune

zjgemi changed the title ~~Refactor finetune~~ [WIP] Refactor finetune Jul 31, 2024

zjgemi added 2 commits August 1, 2024 09:06

modify docs and fix UTs

bdbba50

Signed-off-by: zjgemi <liuxin_zijian@163.com>

remove print

351b7a4

Signed-off-by: zjgemi <liuxin_zijian@163.com>

coderabbitai bot reviewed Aug 1, 2024

View reviewed changes

zjgemi added 3 commits August 1, 2024 09:17

fix type hint

0acbb67

Signed-off-by: zjgemi <liuxin_zijian@163.com>

UTs fix

1ea9069

fix UTs

33bac05

Signed-off-by: zjgemi <liuxin_zijian@163.com>

zjgemi changed the title ~~[WIP] Refactor finetune~~ Refactor finetune Aug 1, 2024

wanghan-iapcm approved these changes Aug 3, 2024

View reviewed changes

remove _make_train_command_old

ef494e9

Signed-off-by: zjgemi <liuxin_zijian@163.com>

zjgemi merged commit 777c7fa into deepmodeling:master Aug 5, 2024

coderabbitai bot mentioned this pull request Dec 8, 2025

support spin #286

Open

coderabbitai bot mentioned this pull request Dec 20, 2025

Regarding Not running dpgen deepmodeling/dpgen#1847

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor finetune#247

Refactor finetune#247
zjgemi merged 11 commits intodeepmodeling:masterfrom
zjgemi:refactor-finetune

zjgemi commented Jul 31, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 31, 2024 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 31, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 1, 2024

Uh oh!

codecov bot commented Aug 1, 2024 •

edited

Loading

Uh oh!

wanghan-iapcm left a comment

Uh oh!

wanghan-iapcm Aug 3, 2024

Uh oh!

zjgemi Aug 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zjgemi commented Jul 31, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wanghan-iapcm left a comment

Choose a reason for hiding this comment

Uh oh!

wanghan-iapcm Aug 3, 2024

Choose a reason for hiding this comment

Uh oh!

zjgemi Aug 5, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zjgemi commented Jul 31, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 31, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Aug 1, 2024 •

edited

Loading