Skip to content

⚡️ Speed up function apply_tweaks by 99% in PR #9467 (block-code-on-tweaks)#9468

Closed
codeflash-ai[bot] wants to merge 3 commits into
mainfrom
codeflash/optimize-pr9467-2025-08-21T12.02.06
Closed

⚡️ Speed up function apply_tweaks by 99% in PR #9467 (block-code-on-tweaks)#9468
codeflash-ai[bot] wants to merge 3 commits into
mainfrom
codeflash/optimize-pr9467-2025-08-21T12.02.06

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai Bot commented Aug 21, 2025

⚡️ This pull request contains optimizations for PR #9467

If you approve this dependent PR, these changes will be merged into the original PR branch block-code-on-tweaks.

This PR will be automatically closed if the original PR is merged.


📄 99% (0.99x) speedup for apply_tweaks in src/backend/base/langflow/processing/process.py

⏱️ Runtime : 3.83 milliseconds 1.92 milliseconds (best of 168 runs)

📝 Explanation and details

Here is a highly optimized rewrite of your program, focusing on major runtime bottlenecks visible in your line profiler, while preserving behavioral and stylistic constraints.

Key optimizations:

  • validate_and_repair_json: The repair_json call is the dominant bottleneck. If the input is already valid and parseable with json.loads, you can skip the expensive repair step.
  • apply_tweaks:
    • Move invariants like template_data[tweak_name] and 'type' to local variables once per loop to avoid unnecessary repeated lookups.
    • Unify/flatten inner checks and avoid redundant dictionary accesses.
    • Optimize the repeated check for "file" type, do it once per tweak per subkey rather than per item.
    • Use local variables for looked-up dictionary entries to avoid repeated hash table ops.
    • Slight loop unrolling and early continue for the most frequent paths.
    • Avoid unnecessary checks: the presence of tweak_name in template_data is checked once at the start, and not inside branches.

Code below:


Summary of improvements:

  • validate_and_repair_json now avoids calling repair_json on already-valid JSON strings, drastically reducing unnecessary repair time.
  • apply_tweaks now pulls tpl_entry and its "type" to local variables, saving dictionary lookups.
  • Inner "file" type check is performed once per entry, not per subkey.
  • Short-circuit continues remove redundant checks.
  • All dictionary gets are batched/hoisted for speed.

Behavior and structure remain unchanged, so all logging, warnings, returns, and type annotations are preserved per your requirements.

Correctness verification report:

Test Status
⏪ Replay Tests 🔘 None Found
⚙️ Existing Unit Tests 8 Passed
🔎 Concolic Coverage Tests 🔘 None Found
🌀 Generated Regression Tests 41 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import copy
import json
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.processing.process import apply_tweaks

# --------------------------
# Unit tests for apply_tweaks
# --------------------------

# --------------------------
# 1. Basic Test Cases
# --------------------------

def test_basic_value_override():
    """Test simple value override for a non-nested, non-file field."""
    node = {
        "data": {
            "node": {
                "template": {
                    "foo": {"type": "str", "value": "bar"},
                }
            }
        }
    }
    tweaks = {"foo": "baz"}
    apply_tweaks(node, tweaks)

def test_basic_file_type_override():
    """Test file type override sets 'file_path' instead of 'value'."""
    node = {
        "data": {
            "node": {
                "template": {
                    "myfile": {"type": "file", "file_path": "/tmp/old.txt"},
                }
            }
        }
    }
    tweaks = {"myfile": "/tmp/new.txt"}
    apply_tweaks(node, tweaks)

def test_basic_nested_dict_override():
    """Test NestedDict type with a valid JSON string tweak."""
    node = {
        "data": {
            "node": {
                "template": {
                    "settings": {"type": "NestedDict", "value": {"a": 1}},
                }
            }
        }
    }
    tweaks = {"settings": '{"b":2}'}
    apply_tweaks(node, tweaks)

def test_basic_dict_tweak_updates_keys():
    """Test that a dict tweak updates keys for a non-file, non-NestedDict type."""
    node = {
        "data": {
            "node": {
                "template": {
                    "params": {"type": "dict", "value": 1, "other": 2},
                }
            }
        }
    }
    tweaks = {"params": {"value": 42, "other": 99}}
    apply_tweaks(node, tweaks)

def test_basic_dict_tweak_file_type():
    """Test dict tweak for file type sets 'file_path' for each key."""
    node = {
        "data": {
            "node": {
                "template": {
                    "myfile": {"type": "file", "file_path": "/tmp/old.txt"},
                }
            }
        }
    }
    tweaks = {"myfile": {"some": "value", "another": "test"}}
    apply_tweaks(node, tweaks)

def test_tweak_key_not_in_template():
    """Test that tweaks for keys not in template are ignored."""
    node = {
        "data": {
            "node": {
                "template": {
                    "foo": {"type": "str", "value": "bar"},
                }
            }
        }
    }
    tweaks = {"not_in_template": "baz"}
    before = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

# --------------------------
# 2. Edge Test Cases
# --------------------------

def test_template_data_missing():
    """Test when 'template' is missing from the node."""
    node = {
        "data": {
            "node": {
                # no 'template' key
            }
        }
    }
    tweaks = {"foo": "baz"}
    before = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

def test_template_data_not_dict():
    """Test when 'template' is not a dict (e.g., None)."""
    node = {
        "data": {
            "node": {
                "template": None
            }
        }
    }
    tweaks = {"foo": "baz"}
    before = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

def test_tweak_code_field_ignored():
    """Test that tweaks to 'code' field are ignored for security."""
    node = {
        "data": {
            "node": {
                "template": {
                    "code": {"type": "str", "value": "print('hello')"},
                }
            }
        }
    }
    tweaks = {"code": "print('hacked')"}
    before = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

def test_nested_dict_invalid_json_string():
    """Test NestedDict with an invalid JSON string tweak returns the string."""
    node = {
        "data": {
            "node": {
                "template": {
                    "settings": {"type": "NestedDict", "value": {"a": 1}},
                }
            }
        }
    }
    tweaks = {"settings": '{"b":2'}  # invalid JSON
    apply_tweaks(node, tweaks)

def test_tweak_value_is_dict_for_nested_dict_type():
    """Test NestedDict with a dict as tweak value (should set as is)."""
    node = {
        "data": {
            "node": {
                "template": {
                    "settings": {"type": "NestedDict", "value": {"a": 1}},
                }
            }
        }
    }
    tweaks = {"settings": {"c": 3}}
    apply_tweaks(node, tweaks)

def test_file_type_with_int_tweak():
    """Test file type with an int tweak (should set as file_path)."""
    node = {
        "data": {
            "node": {
                "template": {
                    "myfile": {"type": "file", "file_path": "/tmp/old.txt"},
                }
            }
        }
    }
    tweaks = {"myfile": 123456}
    apply_tweaks(node, tweaks)

def test_non_str_dict_tweak_for_non_file():
    """Test dict tweak for non-file type with non-string keys."""
    node = {
        "data": {
            "node": {
                "template": {
                    "params": {"type": "dict", "value": 1, "other": 2},
                }
            }
        }
    }
    tweaks = {"params": {0: "zero", 1: "one"}}
    apply_tweaks(node, tweaks)

def test_empty_tweaks_dict():
    """Test that an empty tweaks dict does not change the node."""
    node = {
        "data": {
            "node": {
                "template": {
                    "foo": {"type": "str", "value": "bar"},
                }
            }
        }
    }
    tweaks = {}
    before = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

def test_empty_template_dict():
    """Test that an empty template dict does not error."""
    node = {
        "data": {
            "node": {
                "template": {}
            }
        }
    }
    tweaks = {"foo": "baz"}
    before = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

# --------------------------
# 3. Large Scale Test Cases
# --------------------------

def test_large_number_of_fields():
    """Test node with many fields in template and tweaks."""
    N = 500
    node = {
        "data": {
            "node": {
                "template": {
                    f"field{i}": {"type": "str", "value": f"old{i}"} for i in range(N)
                }
            }
        }
    }
    tweaks = {f"field{i}": f"new{i}" for i in range(N)}
    apply_tweaks(node, tweaks)
    for i in range(N):
        pass

def test_large_nested_dict_fields():
    """Test node with many NestedDict fields and tweaks as JSON strings."""
    N = 200
    node = {
        "data": {
            "node": {
                "template": {
                    f"nested{i}": {"type": "NestedDict", "value": {"old": i}} for i in range(N)
                }
            }
        }
    }
    tweaks = {f"nested{i}": json.dumps({"new": i}) for i in range(N)}
    apply_tweaks(node, tweaks)
    for i in range(N):
        pass

def test_large_file_type_fields():
    """Test node with many file type fields and tweaks."""
    N = 300
    node = {
        "data": {
            "node": {
                "template": {
                    f"file{i}": {"type": "file", "file_path": f"/tmp/old{i}.txt"} for i in range(N)
                }
            }
        }
    }
    tweaks = {f"file{i}": f"/tmp/new{i}.txt" for i in range(N)}
    apply_tweaks(node, tweaks)
    for i in range(N):
        pass

def test_large_mixed_types():
    """Test node with a mix of types and tweaks."""
    N = 100
    node = {
        "data": {
            "node": {
                "template": {
                    f"str{i}": {"type": "str", "value": f"old{i}"} for i in range(N)
                } | {
                    f"file{i}": {"type": "file", "file_path": f"/tmp/old{i}.txt"} for i in range(N)
                } | {
                    f"nested{i}": {"type": "NestedDict", "value": {"old": i}} for i in range(N)
                }
            }
        }
    }
    tweaks = {
        **{f"str{i}": f"new{i}" for i in range(N)},
        **{f"file{i}": f"/tmp/new{i}.txt" for i in range(N)},
        **{f"nested{i}": json.dumps({"new": i}) for i in range(N)}
    }
    apply_tweaks(node, tweaks)
    for i in range(N):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations

import copy
import json
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.processing.process import apply_tweaks

# unit tests

# ------------------ BASIC TEST CASES ------------------

def test_basic_value_update():
    """Test updating a simple value field."""
    node = {
        "data": {
            "node": {
                "template": {
                    "param1": {"type": "str", "value": "old"}
                }
            }
        }
    }
    tweaks = {"param1": "new"}
    apply_tweaks(node, tweaks)

def test_basic_file_type_update():
    """Test updating a file type field."""
    node = {
        "data": {
            "node": {
                "template": {
                    "file_param": {"type": "file", "file_path": "old/path.txt"}
                }
            }
        }
    }
    tweaks = {"file_param": "new/path.txt"}
    apply_tweaks(node, tweaks)

def test_basic_nested_dict_update():
    """Test updating a NestedDict field with a JSON string."""
    node = {
        "data": {
            "node": {
                "template": {
                    "nested": {"type": "NestedDict", "value": {"a": 1}}
                }
            }
        }
    }
    tweaks = {"nested": '{"b": 2, "c": 3}'}
    apply_tweaks(node, tweaks)

def test_basic_dict_update_for_non_nested():
    """Test updating a field with a dict (should update keys inside)."""
    node = {
        "data": {
            "node": {
                "template": {
                    "param": {"type": "str", "value": "old", "extra": 1}
                }
            }
        }
    }
    tweaks = {"param": {"value": "new", "extra": 2}}
    apply_tweaks(node, tweaks)

def test_basic_dict_update_for_file_type():
    """Test updating a file type with a dict (should map to file_path)."""
    node = {
        "data": {
            "node": {
                "template": {
                    "file_param": {"type": "file", "file_path": "old/path.txt", "meta": "foo"}
                }
            }
        }
    }
    tweaks = {"file_param": {"file_path": "new/path.txt", "meta": "bar"}}
    apply_tweaks(node, tweaks)

def test_ignore_nonexistent_tweak_key():
    """Test that tweaks for keys not in template are ignored."""
    node = {
        "data": {
            "node": {
                "template": {
                    "param": {"type": "str", "value": "old"}
                }
            }
        }
    }
    tweaks = {"not_in_template": "value"}
    apply_tweaks(node, tweaks)

def test_ignore_code_field_override():
    """Test that the 'code' field cannot be overridden."""
    node = {
        "data": {
            "node": {
                "template": {
                    "code": {"type": "str", "value": "print('hi')"}
                }
            }
        }
    }
    tweaks = {"code": "malicious_code"}
    original = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

# ------------------ EDGE TEST CASES ------------------

def test_template_data_missing():
    """Test when template data is missing."""
    node = {
        "data": {
            "node": {
                # No 'template' key
            }
        }
    }
    tweaks = {"param": "value"}
    original = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

def test_template_data_not_dict():
    """Test when template data is not a dictionary."""
    node = {
        "data": {
            "node": {
                "template": "not_a_dict"
            }
        }
    }
    tweaks = {"param": "value"}
    original = copy.deepcopy(node)
    apply_tweaks(node, tweaks)

def test_tweak_value_is_dict_for_nested_dict_type():
    """Test when tweak value is dict for NestedDict type (should set as-is)."""
    node = {
        "data": {
            "node": {
                "template": {
                    "nested": {"type": "NestedDict", "value": {"a": 1}}
                }
            }
        }
    }
    tweaks = {"nested": {"b": 2}}
    apply_tweaks(node, tweaks)

def test_tweak_value_is_invalid_json_for_nested_dict():
    """Test when tweak value is invalid JSON for NestedDict (should leave as string)."""
    node = {
        "data": {
            "node": {
                "template": {
                    "nested": {"type": "NestedDict", "value": {"a": 1}}
                }
            }
        }
    }
    tweaks = {"nested": "{invalid json"}
    apply_tweaks(node, tweaks)

def test_tweak_value_is_none():
    """Test when tweak value is None."""
    node = {
        "data": {
            "node": {
                "template": {
                    "param": {"type": "str", "value": "old"}
                }
            }
        }
    }
    tweaks = {"param": None}
    apply_tweaks(node, tweaks)

def test_tweak_value_is_empty_string():
    """Test when tweak value is an empty string."""
    node = {
        "data": {
            "node": {
                "template": {
                    "param": {"type": "str", "value": "old"}
                }
            }
        }
    }
    tweaks = {"param": ""}
    apply_tweaks(node, tweaks)

def test_tweak_value_is_empty_dict():
    """Test when tweak value is an empty dict for a non-NestedDict type."""
    node = {
        "data": {
            "node": {
                "template": {
                    "param": {"type": "str", "value": "old", "foo": "bar"}
                }
            }
        }
    }
    tweaks = {"param": {}}
    apply_tweaks(node, tweaks)

def test_tweak_value_is_list():
    """Test when tweak value is a list (should set as value)."""
    node = {
        "data": {
            "node": {
                "template": {
                    "param": {"type": "str", "value": "old"}
                }
            }
        }
    }
    tweaks = {"param": [1, 2, 3]}
    apply_tweaks(node, tweaks)

def test_file_type_with_dict_and_extra_keys():
    """Test file type with dict tweak having extra keys."""
    node = {
        "data": {
            "node": {
                "template": {
                    "file_param": {"type": "file", "file_path": "old/path.txt", "desc": "old"}
                }
            }
        }
    }
    tweaks = {"file_param": {"file_path": "new/path.txt", "desc": "new", "other": 123}}
    apply_tweaks(node, tweaks)

def test_multiple_tweaks():
    """Test updating multiple fields at once."""
    node = {
        "data": {
            "node": {
                "template": {
                    "a": {"type": "str", "value": 1},
                    "b": {"type": "file", "file_path": "foo"},
                    "c": {"type": "NestedDict", "value": {"x": 1}}
                }
            }
        }
    }
    tweaks = {"a": 2, "b": "bar", "c": '{"y": 2}'}
    apply_tweaks(node, tweaks)

# ------------------ LARGE SCALE TEST CASES ------------------

def test_large_number_of_template_fields():
    """Test with a large number of template fields to ensure scalability."""
    template = {f"param{i}": {"type": "str", "value": i} for i in range(500)}
    node = {
        "data": {
            "node": {
                "template": template
            }
        }
    }
    tweaks = {f"param{i}": i * 2 for i in range(500)}
    apply_tweaks(node, tweaks)
    for i in range(500):
        pass

def test_large_number_of_tweaks_with_some_invalid():
    """Test with a large number of tweaks, some of which do not exist in the template."""
    template = {f"param{i}": {"type": "str", "value": i} for i in range(500)}
    node = {
        "data": {
            "node": {
                "template": template
            }
        }
    }
    tweaks = {f"param{i}": i * 2 for i in range(400)}
    tweaks.update({f"notparam{i}": "foo" for i in range(100)})
    apply_tweaks(node, tweaks)
    for i in range(400):
        pass
    for i in range(400, 500):
        pass

def test_large_nested_dicts():
    """Test with large NestedDict fields."""
    template = {f"nested{i}": {"type": "NestedDict", "value": {"a": i}} for i in range(100)}
    node = {
        "data": {
            "node": {
                "template": template
            }
        }
    }
    tweaks = {f"nested{i}": '{"b": %d}' % (i*2) for i in range(100)}
    apply_tweaks(node, tweaks)
    for i in range(100):
        pass

def test_large_file_type_dicts():
    """Test with many file type fields, using dict tweaks."""
    template = {f"file{i}": {"type": "file", "file_path": f"path{i}.txt", "meta": f"old{i}"} for i in range(100)}
    node = {
        "data": {
            "node": {
                "template": template
            }
        }
    }
    tweaks = {f"file{i}": {"file_path": f"newpath{i}.txt", "meta": f"new{i}"} for i in range(100)}
    apply_tweaks(node, tweaks)
    for i in range(100):
        pass

def test_large_mix_of_types():
    """Test a large template with a mix of str, file, and NestedDict types."""
    template = {}
    tweaks = {}
    for i in range(300):
        template[f"str{i}"] = {"type": "str", "value": i}
        tweaks[f"str{i}"] = i + 1
    for i in range(100):
        template[f"file{i}"] = {"type": "file", "file_path": f"path{i}.txt"}
        tweaks[f"file{i}"] = f"newpath{i}.txt"
    for i in range(50):
        template[f"nested{i}"] = {"type": "NestedDict", "value": {"a": i}}
        tweaks[f"nested{i}"] = '{"b": %d}' % (i+10)
    node = {"data": {"node": {"template": template}}}
    apply_tweaks(node, tweaks)
    for i in range(300):
        pass
    for i in range(100):
        pass
    for i in range(50):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr9467-2025-08-21T12.02.06 and push.

Codeflash

ogabrielluiz and others added 3 commits August 21, 2025 08:48
…n-tweaks`)

Here is a highly optimized rewrite of your program, focusing on *major* runtime bottlenecks visible in your line profiler, while preserving behavioral and stylistic constraints.

**Key optimizations:**

- **`validate_and_repair_json`:** The `repair_json` call is the dominant bottleneck. If the input is already valid and parseable with `json.loads`, you can *skip* the expensive repair step.
- **`apply_tweaks`:**
  - Move invariants like `template_data[tweak_name]` and `'type'` to local variables once per loop to avoid unnecessary repeated lookups.
  - Unify/flatten inner checks and avoid redundant dictionary accesses.
  - Optimize the repeated check for `"file"` type, do it once per tweak per subkey rather than per item.
  - Use local variables for looked-up dictionary entries to avoid repeated hash table ops.
  - Slight loop unrolling and early `continue` for the most frequent paths.
  - Avoid unnecessary checks: the presence of `tweak_name in template_data` is checked once at the start, and not inside branches.

**Code below:**



---

**Summary of improvements:**

- `validate_and_repair_json` now avoids calling `repair_json` on already-valid JSON strings, drastically reducing unnecessary repair time.
- `apply_tweaks` now pulls `tpl_entry` and its `"type"` to local variables, saving dictionary lookups.
- Inner `"file"` type check is performed once per entry, not per subkey.
- Short-circuit continues remove redundant checks.
- All dictionary gets are batched/hoisted for speed.

**Behavior and structure remain unchanged, so all logging, warnings, returns, and type annotations are preserved per your requirements.**
@codeflash-ai codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Aug 21, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 21, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Join our Discord community for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sonarqubecloud
Copy link
Copy Markdown

Base automatically changed from block-code-on-tweaks to main August 25, 2025 18:21
@codeflash-ai codeflash-ai Bot closed this Aug 25, 2025
@codeflash-ai
Copy link
Copy Markdown
Contributor Author

codeflash-ai Bot commented Aug 25, 2025

This PR has been automatically closed because the original PR #9297 by Empreiteiro was closed.

@codeflash-ai codeflash-ai Bot deleted the codeflash/optimize-pr9467-2025-08-21T12.02.06 branch August 25, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant