Skip to content

⚡️ Speed up function hamilton_filter by 21%#128

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-hamilton_filter-mkphed51
Open

⚡️ Speed up function hamilton_filter by 21%#128
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-hamilton_filter-mkphed51

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai Bot commented Jan 22, 2026

📄 21% (0.21x) speedup for hamilton_filter in quantecon/_filter.py

⏱️ Runtime : 1.62 milliseconds 1.34 milliseconds (best of 35 runs)

📝 Explanation and details

The optimized code achieves a 20% speedup through two key memory allocation optimizations in the p is not None branch:

Key Optimizations

1. Matrix Initialization Efficiency (np.onesnp.empty + explicit assignment)

  • Original: X = np.ones((T-p-h+1, p+1)) initializes all elements to 1.0
  • Optimized: X = np.empty((T-p-h+1, p+1)) allocates uninitialized memory, then X[:, 0] = 1.0 sets only the first column
  • Why faster: np.empty() avoids unnecessary initialization of memory that will be overwritten in the loop. Since columns 1 through p are immediately assigned lag values, only column 0 needs initialization to 1.0.
  • Line profiler evidence: X matrix creation time drops from 371,019ns to 48,136ns (87% reduction)

2. Trend Array Construction (np.append → pre-allocated np.empty with slicing)

  • Original: trend = np.append(np.zeros(p+h-1)+np.nan, X@b) creates two temporary arrays and concatenates them
  • Optimized: Pre-allocates trend = np.empty(T), then fills sections directly via trend[:p+h-1] = np.nan and trend[p+h-1:] = X@b
  • Why faster: Eliminates array copying overhead from np.append(). Direct slicing assignment into pre-allocated memory is significantly more efficient than creating intermediate arrays and concatenating.
  • Line profiler evidence: Trend construction time drops from 436,874ns to 156,172ns total (28,587ns + 41,635ns + 85,950ns), a 64% reduction

Test Case Performance

The optimizations particularly benefit:

  • Regression-based cases (p specified): 25-45% faster across test cases (e.g., test_regression_based_filter_p_equals_1 shows 45% improvement)
  • Small to medium datasets: Most pronounced gains when the matrix operations dominate runtime
  • Random walk cases (p=None): Minimal impact since these don't use the optimized code paths

Impact Assessment

Since function_references are unavailable, we cannot determine the exact calling context. However, these optimizations benefit any workload that:

  • Uses the regression-based Hamilton filter (p specified)
  • Processes multiple time series in a pipeline
  • Calls this function repeatedly (e.g., in parameter sweeps or Monte Carlo simulations)

The optimizations maintain identical numerical results and algorithmic behavior—they purely improve memory efficiency without changing the mathematical operations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 2 Passed
🌀 Generated Regression Tests 38 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_filter.py::test_hamilton_filter 151μs 134μs 13.1%✅
🌀 Click to see Generated Regression Tests
import math  # for isnan and isclose checks

import numba  # required because the function under test uses numba.njit
import numpy as np  # used to construct expected numeric results
import pytest  # used for our unit tests
from quantecon._filter import hamilton_filter

def test_random_walk_basic_small():
    """
    Basic test for the random-walk (p is None) branch with a very small
    sequence where h = 1. We check exact values for cycle and trend,
    and that NaNs occur where expected.
    """
    data = [1.0, 2.0, 4.0]  # simple small sequence
    # call with h=1 and default p (None -> random walk)
    cycle, trend = hamilton_filter(data, h=1, p=None) # 25.1μs -> 24.7μs (1.27% faster)

def test_p_specified_linear_regression_matches_numpy_solution():
    """
    Test the p >= 0 branch by constructing a simple linear deterministic
    sequence y_t = intercept + slope * t. We compute the expected trend for
    the fitted tail using numpy linear algebra mirroring the algorithm,
    and compare the results elementwise (handling NaNs properly).
    """
    # deterministic linear data: intercept=2.0, slope=3.0
    T = 5
    intercept = 2.0
    slope = 3.0
    y = np.array([intercept + slope * t for t in range(T)], dtype=np.float64)

    h = 1
    p = 1  # single lag auto-regression / OLS variable count 2 (constant + lag)

    # run the function under test
    cycle, trend = hamilton_filter(y, h=h, p=p) # 73.9μs -> 51.0μs (44.7% faster)

    # now compute expected values using numpy (mirroring the algorithm)
    # compute nrows as in function
    nrows = T - int(p) - int(h) + 1
    if nrows <= 0:
        # If there are no rows, expected is all NaN
        for i in range(T):
            pass
        return

    # Build X (nrows x (p+1)) exactly as the function does
    X = np.empty((nrows, int(p) + 1), dtype=np.float64)
    X[:, 0] = 1.0
    for j in range(1, int(p) + 1):
        start = int(p) - j
        for i in range(nrows):
            X[i, j] = y[start + i]

    # build y_rhs
    start_rhs = int(p) + int(h) - 1
    y_rhs = y[start_rhs:start_rhs + nrows]

    # compute coefficients via normal equations (X'X)^{-1} X'y
    XTX = X.T.dot(X)
    XTy = X.T.dot(y_rhs)
    b = np.linalg.solve(XTX, XTy)

    # fitted tail
    fitted = X.dot(b)

    # build expected trend: prefix p+h-1 set to NaN, then fitted assigned
    expected_trend = np.empty(T, dtype=np.float64)
    prefix = int(p) + int(h) - 1
    for i in range(prefix):
        expected_trend[i] = np.nan
    for i in range(nrows):
        expected_trend[p + h - 1 + i] = fitted[i]

    expected_cycle = y - expected_trend

    # Compare elementwise: NaNs must be in the same positions; numbers compared with isclose
    for i in range(T):
        if math.isnan(expected_trend[i]):
            pass
        else:
            pass

def test_input_types_and_p_as_float_casting_behavior():
    """
    Verify the function accepts integer-like inputs and float-like p values
    that are integral (p passed as 1.0) by ensuring results match the same
    run with p as integer.
    """
    data = [0, 1, 4, 9, 16]  # integer inputs (will be converted to floats)
    h = 1
    p_int = 1
    p_float = 1.0

    # run with p as int
    cycle_int, trend_int = hamilton_filter(data, h=h, p=p_int)
    # run with p as float (should behave identically because int(p) is used)
    cycle_float, trend_float = hamilton_filter(data, h=h, p=p_float)
    for i in range(len(data)):
        if math.isnan(trend_int[i]):
            pass
        else:
            pass

def test_large_scale_random_walk_efficiency_and_correctness():
    """
    Large-scale test for random-walk branch: construct a long deterministic
    sequence (here under 1000 elements to keep test resource usage reasonable)
    and verify the vectorized relation cycle[t] = y[t] - y[t-h] holds for all
    appropriate indices. We use numpy to form the expected result and then
    assert equality using a boolean check wrapped by Python's assert.
    """
    T = 800  # large but below 1000 as per instructions
    # choose a monotonic sequence so differences are easy to reason about
    y = np.linspace(0.0, float(T - 1), T, dtype=np.float64)
    h = 12  # a typical horizon (e.g., months)

    cycle, trend = hamilton_filter(y, h=h, p=None) # 21.4μs -> 29.2μs (26.7% slower)

    # expected cycle: NaN for first h elements, then y[i] - y[i-h]
    expected_cycle = np.empty_like(y)
    expected_cycle[:h] = np.nan
    expected_cycle[h:] = y[h:] - y[:-h]

    # expected trend: y - expected_cycle
    expected_trend = y - expected_cycle

    # Use numpy allclose for numeric part and explicit NaN checks for the prefix.
    # We assert using Python's assert with a bool conversion.
    # Check prefix NaNs
    for i in range(h):
        pass

    # For the remainder, use numpy's allclose and convert to bool for assert
    numeric_ok_cycle = bool(np.allclose(cycle[h:], expected_cycle[h:], rtol=1e-12, atol=0.0))
    numeric_ok_trend = bool(np.allclose(trend[h:], expected_trend[h:], rtol=1e-12, atol=0.0))
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
import pytest
from quantecon._filter import hamilton_filter

class TestHamiltonFilterBasicFunctionality:
    """Test basic functionality of hamilton_filter under normal conditions."""

    def test_simple_linear_trend_random_walk(self):
        """Test basic random walk case (p=None) with simple linear data."""
        # Create simple linear data
        data = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0])
        h = 2
        cycle, trend = hamilton_filter(data, h) # 21.2μs -> 21.1μs (0.445% faster)

    def test_random_walk_with_different_h(self):
        """Test random walk with h=4."""
        data = np.array([10.0, 15.0, 12.0, 18.0, 20.0, 22.0, 21.0, 25.0, 27.0, 30.0])
        h = 4
        cycle, trend = hamilton_filter(data, h) # 20.3μs -> 19.7μs (2.75% faster)
        
        # First h values should be NaN in cycle
        for i in range(h):
            pass

    def test_regression_based_filter_p_equals_1(self):
        """Test regression-based filter with p=1."""
        data = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0])
        h = 2
        p = 1
        cycle, trend = hamilton_filter(data, h, p=p) # 75.3μs -> 51.9μs (45.0% faster)
        
        # Check that cycle + trend reconstructs original data
        reconstructed = cycle + trend
        # Use allclose to handle NaN values properly
        valid_mask = ~np.isnan(cycle)

    def test_regression_based_filter_p_equals_2(self):
        """Test regression-based filter with p=2."""
        data = np.array([5.0, 10.0, 8.0, 12.0, 15.0, 13.0, 18.0, 20.0, 19.0, 22.0])
        h = 1
        p = 2
        cycle, trend = hamilton_filter(data, h, p=p) # 68.9μs -> 54.9μs (25.4% faster)
        
        # Check that cycle and trend reconstruct the data where defined
        reconstructed = cycle + trend
        valid_mask = ~np.isnan(cycle)

    def test_output_types(self):
        """Test that output types are numpy arrays."""
        data = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
        h = 2
        cycle, trend = hamilton_filter(data, h) # 24.4μs -> 24.4μs (0.143% slower)

    def test_list_input_conversion(self):
        """Test that function works with Python lists as input."""
        data_list = [1.0, 2.0, 3.0, 4.0, 5.0]
        data_array = np.array(data_list)
        h = 1
        
        cycle_from_list, trend_from_list = hamilton_filter(data_list, h) # 20.7μs -> 20.6μs (0.578% faster)
        cycle_from_array, trend_from_array = hamilton_filter(data_array, h) # 7.22μs -> 7.24μs (0.207% slower)
        
        # Results should be identical
        valid_mask = ~np.isnan(cycle_from_list)

    

To edit these changes git checkout codeflash/optimize-hamilton_filter-mkphed51 and push.

Codeflash Static Badge

The optimized code achieves a **20% speedup** through two key memory allocation optimizations in the `p is not None` branch:

## Key Optimizations

**1. Matrix Initialization Efficiency (`np.ones` → `np.empty` + explicit assignment)**
- **Original**: `X = np.ones((T-p-h+1, p+1))` initializes all elements to 1.0
- **Optimized**: `X = np.empty((T-p-h+1, p+1))` allocates uninitialized memory, then `X[:, 0] = 1.0` sets only the first column
- **Why faster**: `np.empty()` avoids unnecessary initialization of memory that will be overwritten in the loop. Since columns 1 through p are immediately assigned lag values, only column 0 needs initialization to 1.0.
- **Line profiler evidence**: X matrix creation time drops from 371,019ns to 48,136ns (87% reduction)

**2. Trend Array Construction (`np.append` → pre-allocated `np.empty` with slicing)**
- **Original**: `trend = np.append(np.zeros(p+h-1)+np.nan, X@b)` creates two temporary arrays and concatenates them
- **Optimized**: Pre-allocates `trend = np.empty(T)`, then fills sections directly via `trend[:p+h-1] = np.nan` and `trend[p+h-1:] = X@b`
- **Why faster**: Eliminates array copying overhead from `np.append()`. Direct slicing assignment into pre-allocated memory is significantly more efficient than creating intermediate arrays and concatenating.
- **Line profiler evidence**: Trend construction time drops from 436,874ns to 156,172ns total (28,587ns + 41,635ns + 85,950ns), a 64% reduction

## Test Case Performance

The optimizations particularly benefit:
- **Regression-based cases (p specified)**: 25-45% faster across test cases (e.g., `test_regression_based_filter_p_equals_1` shows 45% improvement)
- **Small to medium datasets**: Most pronounced gains when the matrix operations dominate runtime
- **Random walk cases (p=None)**: Minimal impact since these don't use the optimized code paths

## Impact Assessment

Since function_references are unavailable, we cannot determine the exact calling context. However, these optimizations benefit any workload that:
- Uses the regression-based Hamilton filter (p specified) 
- Processes multiple time series in a pipeline
- Calls this function repeatedly (e.g., in parameter sweeps or Monte Carlo simulations)

The optimizations maintain identical numerical results and algorithmic behavior—they purely improve memory efficiency without changing the mathematical operations.
@codeflash-ai codeflash-ai Bot requested a review from aseembits93 January 22, 2026 13:22
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants