Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 9, 2025

📄 7% (0.07x) speedup for zsqrt in pandas/core/window/common.py

⏱️ Runtime : 6.71 milliseconds 6.25 milliseconds (best of 43 runs)

📝 Explanation and details

The optimization replaces direct mask-based assignment (result[mask] = 0) with vectorized conditional operations. For DataFrames, it uses result.where(~mask, other=0) and for arrays, it uses np.where(mask, 0, result).

Key Performance Improvements:

  1. Vectorized operations: Both where and np.where are implemented in C and optimized for element-wise operations, avoiding Python loop overhead that can occur with direct assignment on masked arrays.

  2. Memory efficiency: The where operations create new arrays more efficiently than in-place assignment, which can trigger additional memory allocations and copying in pandas DataFrames.

  3. DataFrame optimization: The original result[mask] = 0 on DataFrames is particularly slow (706μs per hit in the profiler) because it involves pandas indexing machinery. The optimized result.where(~mask, other=0) reduces this to 603μs per hit, a 14% improvement on the hottest line.

Function Usage Context:
The zsqrt function is called in exponentially weighted moving window calculations for computing standard deviation and correlation in pandas/core/window/ewm.py. These are common statistical operations that may be called repeatedly in financial analysis or time series processing, making the 7% overall speedup meaningful.

Test Case Performance:
The optimization shows consistent improvements on DataFrame operations (8-11% faster for most DataFrame tests) while showing mixed results on simple arrays. The largest gains are seen in DataFrame-heavy workloads, which aligns with the function's usage in EWM calculations that typically operate on DataFrame columns.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 40 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import numpy as np
import pandas as pd

# imports
import pytest  # used for our unit tests
from pandas.core.window.common import zsqrt

# unit tests

# 1. Basic Test Cases


def test_basic_positive_array():
    # Test with a numpy array of positive values
    arr = np.array([1, 4, 9, 16])
    expected = np.array([1.0, 2.0, 3.0, 4.0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 27.5μs -> 26.9μs (2.50% faster)


def test_basic_mixed_array():
    # Test with a numpy array containing positive, zero, and negative values
    arr = np.array([-1, 0, 1, 4])
    expected = np.array([0.0, 0.0, 1.0, 2.0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 23.5μs -> 26.1μs (9.97% slower)


def test_basic_dataframe():
    # Test with a pandas DataFrame of positive and negative values
    df = pd.DataFrame({"a": [4, -9, 0], "b": [1, 16, -25]})
    expected = pd.DataFrame({"a": [2.0, 0.0, 0.0], "b": [1.0, 4.0, 0.0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 467μs -> 423μs (10.3% faster)


# 2. Edge Test Cases


def test_edge_nan_input():
    # Test with NaN input
    arr = np.array([np.nan, 4, -1])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 29.4μs -> 31.1μs (5.57% slower)


def test_edge_inf_input():
    # Test with inf input
    arr = np.array([np.inf, -np.inf])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 21.5μs -> 22.7μs (5.28% slower)


def test_edge_empty_array():
    # Test with empty array
    arr = np.array([])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 17.2μs -> 16.6μs (3.95% faster)


def test_edge_empty_dataframe():
    # Test with empty DataFrame
    df = pd.DataFrame({"a": [], "b": []})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 141μs -> 143μs (1.30% slower)


def test_edge_boolean_array():
    # Test with boolean array
    arr = np.array([True, False])
    expected = np.array([1.0, 0.0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 28.8μs -> 27.5μs (4.71% faster)


def test_edge_dataframe_with_nan():
    # DataFrame with NaN values
    df = pd.DataFrame({"a": [np.nan, -1, 4]})
    expected = pd.DataFrame({"a": [np.nan, 0.0, 2.0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 459μs -> 425μs (8.09% faster)


def test_edge_dataframe_with_inf():
    # DataFrame with inf values
    df = pd.DataFrame({"a": [np.inf, -np.inf, 9]})
    expected = pd.DataFrame({"a": [np.inf, 0.0, 3.0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 438μs -> 398μs (9.96% faster)


def test_edge_non_numeric_input():
    # Test with non-numeric input should raise TypeError
    with pytest.raises(TypeError):
        zsqrt("string")  # 15.4μs -> 15.7μs (2.00% slower)


# 3. Large Scale Test Cases


def test_large_array():
    # Test with a large array of mixed values
    arr = np.concatenate([np.arange(-500, 0), np.arange(0, 500)])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 27.9μs -> 28.9μs (3.61% slower)


def test_large_array_with_nan_inf():
    # Large array with NaN and inf values
    arr = np.random.randn(1000)
    arr[::100] = np.nan
    arr[::200] = np.inf
    arr[1::100] = -np.inf
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 34.8μs -> 35.6μs (2.31% slower)


def test_large_dataframe_with_nan_inf():
    # Large DataFrame with NaN and inf values
    size = 1000
    df = pd.DataFrame({"a": np.random.randn(size), "b": np.random.randn(size)})
    df.loc[::100, "a"] = np.nan
    df.loc[::200, "b"] = np.inf
    df.loc[1::100, "a"] = -np.inf
    expected_a = df["a"].copy()
    expected_a[df["a"] < 0] = 0.0
    expected_a = np.sqrt(expected_a)
    expected_a[df["a"] < 0] = 0.0
    expected_b = df["b"].copy()
    expected_b[df["b"] < 0] = 0.0
    expected_b = np.sqrt(expected_b)
    expected_b[df["b"] < 0] = 0.0
    expected = pd.DataFrame({"a": expected_a, "b": expected_b})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 420μs -> 378μs (11.3% faster)


# Additional edge cases for completeness


def test_single_element_array_negative():
    # Single-element array, negative value
    arr = np.array([-7])
    expected = np.array([0.0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 25.5μs -> 28.9μs (11.9% slower)


def test_single_element_array_positive():
    # Single-element array, positive value
    arr = np.array([49])
    expected = np.array([7.0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 20.1μs -> 18.8μs (7.04% faster)


def test_dataframe_single_row():
    # DataFrame with a single row
    df = pd.DataFrame({"a": [25], "b": [-36]})
    expected = pd.DataFrame({"a": [5.0], "b": [0.0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 457μs -> 418μs (9.35% faster)


def test_dataframe_single_column():
    # DataFrame with a single column
    df = pd.DataFrame({"a": [1, -1, 4, -4]})
    expected = pd.DataFrame({"a": [1.0, 0.0, 2.0, 0.0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 437μs -> 399μs (9.58% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
import pandas as pd

# imports
# function to test
from pandas.core.window.common import zsqrt

# unit tests

# ----------------------
# Basic Test Cases
# ----------------------


def test_basic_positive_array():
    # Simple array of positive numbers
    arr = np.array([0, 1, 4, 9, 16])
    expected = np.array([0, 1, 2, 3, 4])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 25.3μs -> 24.4μs (3.73% faster)


def test_basic_mixed_array():
    # Array with positive, zero, and negative numbers
    arr = np.array([-4, -1, 0, 1, 4])
    expected = np.array([0, 0, 0, 1, 2])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 23.7μs -> 25.7μs (7.82% slower)


def test_basic_single_scalar_positive():
    # Scalar positive value
    arr = np.array([25])
    expected = np.array([5])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 18.9μs -> 17.6μs (7.24% faster)


def test_basic_single_scalar_negative():
    # Scalar negative value
    arr = np.array([-9])
    expected = np.array([0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 21.6μs -> 23.1μs (6.62% slower)


def test_basic_dataframe_positive():
    # DataFrame with only positive values
    df = pd.DataFrame({"a": [0, 1, 4], "b": [9, 16, 25]})
    expected = pd.DataFrame({"a": [0, 1, 2], "b": [3, 4, 5]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 144μs -> 143μs (0.707% faster)


def test_basic_dataframe_mixed():
    # DataFrame with negative and positive values
    df = pd.DataFrame({"a": [-1, 0, 1], "b": [4, -9, 16]})
    expected = pd.DataFrame({"a": [0, 0, 1], "b": [2, 0, 4]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 453μs -> 416μs (9.03% faster)


# ----------------------
# Edge Test Cases
# ----------------------


def test_edge_empty_array():
    # Empty array should return empty array
    arr = np.array([])
    expected = np.array([])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 19.9μs -> 19.6μs (1.24% faster)


def test_edge_empty_dataframe():
    # Empty DataFrame should return empty DataFrame
    df = pd.DataFrame({"a": [], "b": []})
    expected = pd.DataFrame({"a": [], "b": []})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 136μs -> 137μs (0.919% slower)


def test_edge_all_negative_array():
    # All negative values
    arr = np.array([-1, -2, -3])
    expected = np.array([0, 0, 0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 25.3μs -> 27.1μs (6.70% slower)


def test_edge_all_negative_dataframe():
    # All negative values in DataFrame
    df = pd.DataFrame({"a": [-1, -2], "b": [-3, -4]})
    expected = pd.DataFrame({"a": [0, 0], "b": [0, 0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 455μs -> 416μs (9.40% faster)


def test_edge_all_zero_array():
    # All zeros
    arr = np.array([0, 0, 0])
    expected = np.array([0, 0, 0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 22.0μs -> 20.4μs (8.01% faster)


def test_edge_all_zero_dataframe():
    # All zeros in DataFrame
    df = pd.DataFrame({"a": [0, 0], "b": [0, 0]})
    expected = pd.DataFrame({"a": [0, 0], "b": [0, 0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 142μs -> 141μs (0.345% faster)


def test_edge_nan_array():
    # Array with NaN values
    arr = np.array([np.nan, 4, -1])
    expected = np.array([np.nan, 2, 0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 22.2μs -> 26.1μs (15.0% slower)


def test_edge_nan_dataframe():
    # DataFrame with NaN values
    df = pd.DataFrame({"a": [np.nan, 4, -1], "b": [9, np.nan, -16]})
    expected = pd.DataFrame({"a": [np.nan, 2, 0], "b": [3, np.nan, 0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 453μs -> 414μs (9.20% faster)


def test_edge_inf_array():
    # Array with inf values
    arr = np.array([np.inf, -np.inf, 4])
    expected = np.array([np.inf, 0, 2])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 23.4μs -> 25.6μs (8.87% slower)


def test_edge_inf_dataframe():
    # DataFrame with inf values
    df = pd.DataFrame({"a": [np.inf, -np.inf], "b": [4, -9]})
    expected = pd.DataFrame({"a": [np.inf, 0], "b": [2, 0]})
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 591μs -> 558μs (5.91% faster)


def test_edge_dtype_int_array():
    # Integer dtype array
    arr = np.array([0, 1, 4, -1], dtype=int)
    expected = np.array([0, 1, 2, 0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 25.2μs -> 26.4μs (4.34% slower)


def test_edge_dtype_float_array():
    # Float dtype array
    arr = np.array([0.0, 1.0, 4.0, -1.0])
    expected = np.array([0.0, 1.0, 2.0, 0.0])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 20.9μs -> 21.4μs (2.45% slower)


def test_edge_2d_array():
    # 2D array
    arr = np.array([[0, 1], [4, -1]])
    expected = np.array([[0, 1], [2, 0]])
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 33.3μs -> 33.6μs (0.880% slower)


def test_edge_dataframe_with_index_and_columns():
    # DataFrame with custom index and columns
    df = pd.DataFrame([[4, -1], [9, 0]], index=["x", "y"], columns=["a", "b"])
    expected = pd.DataFrame([[2, 0], [3, 0]], index=["x", "y"], columns=["a", "b"])
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 458μs -> 418μs (9.64% faster)


# ----------------------
# Large Scale Test Cases
# ----------------------


def test_large_array_with_nan_inf():
    # Large array with NaN and Inf scattered
    arr = np.zeros(1000)
    arr[::100] = np.nan
    arr[1::100] = np.inf
    arr[2::100] = -np.inf
    arr[3::100] = -1
    expected = np.zeros(1000)
    expected[::100] = np.nan
    expected[1::100] = np.inf
    expected[2::100] = 0
    expected[3::100] = 0
    codeflash_output = zsqrt(arr)
    result = codeflash_output  # 29.8μs -> 32.3μs (7.80% slower)
    # Check nan, inf, and zeros at correct positions
    for i in range(0, 1000, 100):
        pass


def test_large_dataframe_with_nan_inf():
    # Large DataFrame with NaN and Inf scattered
    size = 1000
    df = pd.DataFrame({"a": np.zeros(size), "b": np.ones(size), "c": -np.ones(size)})
    df.loc[::100, "a"] = np.nan
    df.loc[1::100, "b"] = np.inf
    df.loc[2::100, "c"] = -np.inf
    expected = pd.DataFrame(
        {"a": np.zeros(size), "b": np.ones(size), "c": np.zeros(size)}
    )
    expected.loc[::100, "a"] = np.nan
    expected.loc[1::100, "b"] = np.inf
    expected.loc[2::100, "c"] = 0
    codeflash_output = zsqrt(df)
    result = codeflash_output  # 466μs -> 418μs (11.7% faster)
    for i in range(0, size, 100):
        pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-zsqrt-miy5i2oe and push.

Codeflash Static Badge

The optimization replaces direct mask-based assignment (`result[mask] = 0`) with vectorized conditional operations. For DataFrames, it uses `result.where(~mask, other=0)` and for arrays, it uses `np.where(mask, 0, result)`.

**Key Performance Improvements:**

1. **Vectorized operations**: Both `where` and `np.where` are implemented in C and optimized for element-wise operations, avoiding Python loop overhead that can occur with direct assignment on masked arrays.

2. **Memory efficiency**: The `where` operations create new arrays more efficiently than in-place assignment, which can trigger additional memory allocations and copying in pandas DataFrames.

3. **DataFrame optimization**: The original `result[mask] = 0` on DataFrames is particularly slow (706μs per hit in the profiler) because it involves pandas indexing machinery. The optimized `result.where(~mask, other=0)` reduces this to 603μs per hit, a 14% improvement on the hottest line.

**Function Usage Context:**
The `zsqrt` function is called in exponentially weighted moving window calculations for computing standard deviation and correlation in `pandas/core/window/ewm.py`. These are common statistical operations that may be called repeatedly in financial analysis or time series processing, making the 7% overall speedup meaningful.

**Test Case Performance:**
The optimization shows consistent improvements on DataFrame operations (8-11% faster for most DataFrame tests) while showing mixed results on simple arrays. The largest gains are seen in DataFrame-heavy workloads, which aligns with the function's usage in EWM calculations that typically operate on DataFrame columns.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 9, 2025 05:39
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant