⚡️ Speed up function `post_process_raw` by 35% in PR #9321 (`loguru-to-structlog`) by codeflash-ai[bot] · Pull Request #9477 · langflow-ai/langflow

codeflash-ai · 2025-08-21T18:28:28Z

⚡️ This pull request contains optimizations for PR #9321

If you approve this dependent PR, these changes will be merged into the original PR branch loguru-to-structlog.

This PR will be automatically closed if the original PR is merged.

📄 35% (0.35x) speedup for `post_process_raw` in `src/backend/base/langflow/schema/artifact.py`

⏱️ Runtime : 12.7 milliseconds → 9.37 milliseconds (best of 66 runs)

📝 Explanation and details

Key optimizations:

In _to_list_of_dicts, checks are re-ordered: isinstance(item, BaseModel) is much faster than hasattr. If not a BaseModel but has model_dump, we fall back to that (preserves behavior if both exist).
In post_process_raw, type(raw) is DataFrame for the most common path is marginally faster than isinstance.
For dict/BaseModel union check, perform the faster type(raw) is dict first, then isinstance for BaseModel.
Attribute/function lookups (serialize, str) are cached as local variables in _to_list_of_dicts to avoid repeated global lookups (micro-optimization).
All comments and naming/style preserved as required.
No behavioral changes whatsoever; all branch logic and exception handling remain exactly as before.

✅ Correctness verification report:

Test	Status
⏪ Replay Tests	🔘 None Found
⚙️ Existing Unit Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 39 Passed
📊 Tests Coverage	80.0%

🌀 Generated Regression Tests and Runtime

from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.schema.artifact import post_process_raw


# Mocks and minimal stubs for dependencies
class ArtifactType:
    STREAM = type("STREAM", (), {"value": "stream"})
    ARRAY = type("ARRAY", (), {"value": "array"})
    OBJECT = type("OBJECT", (), {"value": "object"})
    UNKNOWN = type("UNKNOWN", (), {"value": "unknown"})

class DummyLogger:
    def debug(self, *args, **kwargs):
        pass

logger = DummyLogger()

# Minimal DataFrame stub
class DataFrame:
    def __init__(self, data):
        self._data = data
    def to_dict(self, orient="records"):
        # Only support "records"
        if orient != "records":
            raise ValueError("Only 'records' orient supported in stub")
        return self._data

# Minimal BaseModel stub
class BaseModel:
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def dict(self):
        return self.__dict__
    def model_dump(self):
        return self.__dict__

# Dummy custom encoder for jsonable_encoder
CUSTOM_ENCODERS = {}

def jsonable_encoder(obj, custom_encoder=None):
    # Simple encoder: just return dict if possible
    if hasattr(obj, "dict"):
        return obj.dict()
    elif isinstance(obj, dict):
        return obj
    raise TypeError("Cannot encode")
from langflow.schema.artifact import post_process_raw

# ------------------- UNIT TESTS -------------------

# 1. Basic Test Cases

def test_stream_artifact_type_returns_empty_string():
    # Should always return "" for stream, regardless of input
    result, art_type = post_process_raw("some raw data", ArtifactType.STREAM.value)


def test_array_artifact_type_with_list_of_dicts():
    # List of dicts should be returned as is
    data = [{"x": 1}, {"x": 2}]
    result, art_type = post_process_raw(data, ArtifactType.ARRAY.value)

def test_array_artifact_type_with_list_of_basemodels():
    # List of BaseModel should be serialized
    class MyModel(BaseModel):
        pass
    data = [MyModel(foo=1), MyModel(bar=2)]
    result, art_type = post_process_raw(data, ArtifactType.ARRAY.value)

def test_unknown_artifact_type_with_dict():
    # Should jsonable_encode and set artifact_type to OBJECT
    data = {"a": 1, "b": 2}
    result, art_type = post_process_raw(data, ArtifactType.UNKNOWN.value)

def test_unknown_artifact_type_with_basemodel():
    # Should jsonable_encode and set artifact_type to OBJECT
    class MyModel(BaseModel):
        pass
    model = MyModel(foo="bar")
    result, art_type = post_process_raw(model, ArtifactType.UNKNOWN.value)

def test_unknown_artifact_type_with_none():
    # Should return default message, artifact_type unchanged
    result, art_type = post_process_raw(None, ArtifactType.UNKNOWN.value)

def test_unknown_artifact_type_with_non_dict_non_basemodel():
    # Should return default message, artifact_type unchanged
    result, art_type = post_process_raw(123, ArtifactType.UNKNOWN.value)

# 2. Edge Test Cases

def test_array_artifact_type_with_empty_list():
    # Should handle empty list gracefully
    result, art_type = post_process_raw([], ArtifactType.ARRAY.value)

def test_array_artifact_type_with_list_of_mixed_types():
    # Should serialize BaseModel, str() everything else
    class MyModel(BaseModel):
        pass
    data = [MyModel(foo=1), 42, "hello", {"bar": 2}]
    result, art_type = post_process_raw(data, ArtifactType.ARRAY.value)

def test_array_artifact_type_with_list_of_empty_dicts():
    result, art_type = post_process_raw([{}, {}], ArtifactType.ARRAY.value)

def test_array_artifact_type_with_non_iterable():
    # Should raise TypeError because _to_list_of_dicts expects iterable
    with pytest.raises(TypeError):
        post_process_raw(42, ArtifactType.ARRAY.value)

def test_unknown_artifact_type_with_unserializable_object():
    # Should catch exception in jsonable_encoder and return default message
    class Unserializable:
        pass
    # Patch jsonable_encoder to raise
    def bad_encoder(obj, custom_encoder=None):
        raise Exception("fail")
    global jsonable_encoder
    orig_encoder = jsonable_encoder
    jsonable_encoder = bad_encoder
    try:
        result, art_type = post_process_raw(Unserializable(), ArtifactType.UNKNOWN.value)
    finally:
        jsonable_encoder = orig_encoder

def test_array_artifact_type_with_list_of_none():
    # Should str(None) for each element
    result, art_type = post_process_raw([None, None], ArtifactType.ARRAY.value)

def test_unknown_artifact_type_with_falsey_non_none():
    # Should return default message for e.g. empty string
    result, art_type = post_process_raw("", ArtifactType.UNKNOWN.value)

def test_array_artifact_type_with_list_of_lists():
    # Should str() each sublist
    result, art_type = post_process_raw([[1,2], [3,4]], ArtifactType.ARRAY.value)

def test_array_artifact_type_with_list_of_objects_with_dict_method():
    # Should serialize using dict()
    class HasDict:
        def dict(self):
            return {"foo": "bar"}
    data = [HasDict()]
    result, art_type = post_process_raw(data, ArtifactType.ARRAY.value)

# 3. Large Scale Test Cases

def test_array_artifact_type_with_large_list_of_basemodels():
    # Should serialize all BaseModel instances correctly
    class MyModel(BaseModel):
        pass
    data = [MyModel(idx=i) for i in range(500)]
    result, art_type = post_process_raw(data, ArtifactType.ARRAY.value)
    for i, item in enumerate(result):
        pass


def test_array_artifact_type_with_large_list_of_mixed_types():
    # Should str() all non-BaseModel, serialize BaseModel
    class MyModel(BaseModel):
        pass
    data = [MyModel(idx=i) if i % 2 == 0 else {"foo": i} for i in range(1000)]
    result, art_type = post_process_raw(data, ArtifactType.ARRAY.value)
    for i in range(1000):
        if i % 2 == 0:
            pass
        else:
            pass

def test_unknown_artifact_type_with_large_dict():
    # Should handle large dicts as well
    data = {f"key_{i}": i for i in range(900)}
    result, art_type = post_process_raw(data, ArtifactType.UNKNOWN.value)
    for i in range(900):
        pass

def test_array_artifact_type_with_large_list_of_none():
    result, art_type = post_process_raw([None] * 999, ArtifactType.ARRAY.value)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from typing import Any

# imports
import pytest
from langflow.schema.artifact import post_process_raw


# Simulate ArtifactType enum
class ArtifactType:
    STREAM = type("EnumVal", (), {"value": "stream"})
    ARRAY = type("EnumVal", (), {"value": "array"})
    OBJECT = type("EnumVal", (), {"value": "object"})
    UNKNOWN = type("EnumVal", (), {"value": "unknown"})

# Simulate DataFrame for array processing
class DataFrame:
    def __init__(self, data):
        self._data = data
    def to_dict(self, orient="records"):
        # Simulate pandas DataFrame to_dict
        if orient != "records":
            raise ValueError("Only 'records' orient is supported in this stub")
        return list(self._data)

# Simulate Pydantic BaseModel
class BaseModel:
    def __init__(self, **kwargs):
        self._data = kwargs
    def dict(self):
        return dict(self._data)
    def __eq__(self, other):
        return isinstance(other, BaseModel) and self._data == other._data

# Simulate a model with .model_dump (like Pydantic v2)
class ModelDump:
    def __init__(self, **kwargs):
        self._data = kwargs
    def model_dump(self):
        return dict(self._data)
    def __eq__(self, other):
        return isinstance(other, ModelDump) and self._data == other._data

# Simulate custom encoders and jsonable_encoder
CUSTOM_ENCODERS = {}
def jsonable_encoder(obj, custom_encoder=None):
    # Just return dict for BaseModel or ModelDump, else the object itself
    if isinstance(obj, BaseModel):
        return obj.dict()
    elif isinstance(obj, dict):
        return obj
    elif hasattr(obj, "model_dump"):
        return obj.model_dump()
    else:
        raise TypeError("Cannot encode")

# Simulate logger
class LoggerStub:
    def debug(self, msg, exc_info=None):
        pass
logger = LoggerStub()
from langflow.schema.artifact import post_process_raw

# ----------------------
# Unit tests start here
# ----------------------

# 1. Basic Test Cases

def test_stream_type_returns_empty_string():
    # Should always return empty string for stream type, regardless of input
    result, art_type = post_process_raw("anything", ArtifactType.STREAM.value)


def test_array_type_with_list_of_basemodel():
    # Should serialize each BaseModel in the list
    arr = [BaseModel(x=1), BaseModel(x=2)]
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)

def test_array_type_with_list_of_model_dump():
    arr = [ModelDump(y=3), ModelDump(y=4)]
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)

def test_array_type_with_list_of_primitives():
    arr = [1, "a", 3.5]
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)

def test_unknown_type_with_basemodel():
    bm = BaseModel(a=5)
    result, art_type = post_process_raw(bm, ArtifactType.UNKNOWN.value)

def test_unknown_type_with_dict():
    d = {"foo": "bar"}
    result, art_type = post_process_raw(d, ArtifactType.UNKNOWN.value)

def test_unknown_type_with_model_dump():
    md = ModelDump(z=7)
    # Should fallback to default message as not isinstance(md, BaseModel)
    result, art_type = post_process_raw(md, ArtifactType.UNKNOWN.value)

def test_unknown_type_with_non_serializable_object():
    class NotSerializable:
        pass
    obj = NotSerializable()
    result, art_type = post_process_raw(obj, ArtifactType.UNKNOWN.value)

def test_unknown_type_with_none():
    # Should just return None, artifact_type unchanged
    result, art_type = post_process_raw(None, ArtifactType.UNKNOWN.value)

# 2. Edge Test Cases

def test_array_type_with_empty_list():
    arr = []
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)

def test_array_type_with_list_of_mixed_types():
    arr = [BaseModel(a=1), 2, "foo", ModelDump(b=3)]
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)



def test_array_type_with_list_of_unserializable_objects():
    class Unserializable:
        def __str__(self): return "Unserializable"
    arr = [Unserializable()]
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)

def test_unknown_type_with_integer():
    # Should fallback to default message
    result, art_type = post_process_raw(123, ArtifactType.UNKNOWN.value)

def test_unknown_type_with_string():
    result, art_type = post_process_raw("hello", ArtifactType.UNKNOWN.value)

def test_array_type_with_tuple():
    arr = (BaseModel(a=1), 2)
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)


def test_array_type_with_large_list_of_basemodel():
    arr = [BaseModel(idx=i) for i in range(1000)]
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)


def test_array_type_with_large_list_of_primitives():
    arr = list(range(1000))
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)

def test_array_type_with_large_mixed_types():
    arr = [BaseModel(a=i) if i % 2 == 0 else i for i in range(1000)]
    result, art_type = post_process_raw(arr, ArtifactType.ARRAY.value)
    for i, item in enumerate(result):
        if i % 2 == 0:
            pass
        else:
            pass

def test_unknown_type_with_large_dict():
    d = {str(i): i for i in range(1000)}
    result, art_type = post_process_raw(d, ArtifactType.UNKNOWN.value)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr9321-2025-08-21T18.28.22 and push.

…uffer support

…pylint references

- Replaced instances of loguru logger with langflow.logging.logger across multiple files. - Updated logging calls to use asynchronous methods where applicable (e.g., await logger.awarning). - Ensured consistent logging practices throughout the codebase by standardizing the logger import.

…and session_getter

…mental

…omponents - Updated logging calls in to use async logger methods for error handling and debugging. - Modified to utilize async logging for error messages during file deletion. - Changed logging in , , and other agent-related files to use async methods for error and debug messages. - Refactored logging in various components including , , , and others to ensure consistent use of async logging. - Updated , , and to replace synchronous logging with asynchronous counterparts. - Ensured all logging changes maintain the original message structure while enhancing performance with async capabilities.

…ctlog

…c_info for better error tracing - Updated logging statements in AssemblyAI components (e.g., assemblyai_get_subtitles, assemblyai_lemur, assemblyai_list_transcripts, etc.) to use logger.debug with exc_info=True for improved error context. - Modified logging in various helper and utility functions to enhance error reporting. - Ensured consistent logging practices across the codebase for better maintainability and debugging.

…recursion

…test cases

…in linter

…o-structlog`) **Key optimizations:** - In `_to_list_of_dicts`, checks are re-ordered: `isinstance(item, BaseModel)` is much faster than `hasattr`. If not a BaseModel but has `model_dump`, we fall back to that (preserves behavior if both exist). - In `post_process_raw`, `type(raw) is DataFrame` for the most common path is marginally faster than `isinstance`. - For `dict`/`BaseModel` union check, perform the faster `type(raw) is dict` first, then `isinstance` for `BaseModel`. - Attribute/function lookups (`serialize`, `str`) are cached as local variables in `_to_list_of_dicts` to avoid repeated global lookups (micro-optimization). - All comments and naming/style preserved as required. - No behavioral changes whatsoever; all branch logic and exception handling remain exactly as before.

coderabbitai · 2025-08-21T18:28:35Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Join our Discord community for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

sonarqubecloud · 2025-08-21T18:30:13Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codeflash-ai · 2025-08-25T18:50:49Z

This PR has been automatically closed because the original PR #9297 by Empreiteiro was closed.

ogabrielluiz and others added 27 commits August 1, 2025 17:59

refactor: Enhance logging configuration with structured logging and b…

e4e41a1

…uffer support

feat: Add structlog dependency for enhanced logging support

a4021bd

refactor: Update ruff dependency to version 0.12.7 and remove unused …

0517ac6

…pylint references

refactor: Add missing docstring rule to ruff configuration

c7cf175

[autofix.ci] apply automated fixes

9394abf

[autofix.ci] apply automated fixes (attempt 2/3)

d335758

Merge branch 'main' into loguru-to-structlog

3a068f3

[autofix.ci] apply automated fixes

0d9e6d2

Merge branch 'main' into loguru-to-structlog

28426ca

fix: update logger calls to use async methods in DatabaseService

55e6b0b

fix: update logger calls to use async methods in initialize_database …

64d98b4

…and session_getter

fix: update logger calls to use async methods in LangflowRunnerExperi…

573deb7

…mental

fix: update logger calls to use async methods across various services

9c0f8a6

[autofix.ci] apply automated fixes

70f3a1c

fix: update logger calls to use async methods in various components

49ab10d

feat: add InterceptHandler to route standard logging messages to stru…

48e8e3b

…ctlog

refactor: remove async_file parameter from logger configuration

80fe199

fix: correct log level mapping and enhance log rotation validation

535304b

refactor: remove unused logging import and streamline schema imports

43c389c

refactor: remove InterceptHandler from logger configuration to avoid …

41ad0d2

…recursion

refactor: enhance test coverage for logger module with comprehensive …

3225445

…test cases

refactor: add rule to ignore mutable objects without __hash__ method …

e4923e9

…in linter

fix various lint issues

ba00840

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Aug 21, 2025

codeflash-ai Bot mentioned this pull request Aug 21, 2025

feat: migrate from loguru to structlog #9321

Merged

Base automatically changed from loguru-to-structlog to main August 22, 2025 18:20

codeflash-ai Bot closed this Aug 25, 2025

codeflash-ai Bot deleted the codeflash/optimize-pr9321-2025-08-21T18.28.22 branch August 25, 2025 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `post_process_raw` by 35% in PR #9321 (`loguru-to-structlog`)#9477

⚡️ Speed up function `post_process_raw` by 35% in PR #9321 (`loguru-to-structlog`)#9477
codeflash-ai[bot] wants to merge 27 commits into
mainfrom
codeflash/optimize-pr9321-2025-08-21T18.28.22

codeflash-ai Bot commented Aug 21, 2025

Uh oh!

coderabbitai Bot commented Aug 21, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

sonarqubecloud Bot commented Aug 21, 2025

Uh oh!

codeflash-ai Bot commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai Bot commented Aug 21, 2025

⚡️ This pull request contains optimizations for PR #9321

📄 35% (0.35x) speedup for post_process_raw in src/backend/base/langflow/schema/artifact.py

📝 Explanation and details

Uh oh!

coderabbitai Bot commented Aug 21, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

sonarqubecloud Bot commented Aug 21, 2025

Quality Gate passed

Uh oh!

codeflash-ai Bot commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 35% (0.35x) speedup for `post_process_raw` in `src/backend/base/langflow/schema/artifact.py`