Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 9, 2025

📄 17% (0.17x) speedup for ttfFontProperty in lib/matplotlib/font_manager.py

⏱️ Runtime : 850 microseconds 724 microseconds (best of 41 runs)

📝 Explanation and details

The optimized code achieves a 17% speedup through two key optimizations that reduce redundant work in regex pattern matching and string operations:

1. Pre-compiled Regular Expressions
The original code used string patterns in _weight_regexes that were compiled on-demand during re.fullmatch() and re.search() calls inside the get_weight() function. The optimization pre-compiles all regex patterns at module load time, storing compiled regex objects instead of strings. This eliminates repeated regex compilation overhead, which is particularly beneficial since get_weight() is called for every font and may iterate through multiple patterns.

2. Optimized String Operations

  • Replaced str.find() >= 0 with the more efficient in operator for substring searches when checking style keywords like 'oblique', 'italic', and 'regular'
  • Changed list comprehensions to tuples where mutation isn't needed (e.g., styles variable), reducing memory allocation overhead
  • Converted list literals to tuples in any() calls for stretch keyword checking, providing minor performance gains

Performance Impact Analysis
Based on the function references, ttfFontProperty is called in critical rendering paths within matplotlib's backends (Cairo and SVG), where it processes font information for text rendering. The SVG backend particularly shows intensive usage within mathtext parsing loops, making this optimization especially valuable for:

  • Mathematical text rendering with multiple fonts/glyphs
  • Applications that render large amounts of text
  • Font-heavy visualizations where font property extraction is repeatedly performed

The test results show the optimization is most effective for edge cases involving fallback weight detection (68-83% faster) where multiple regex patterns are tested, while maintaining consistent 5-6% improvements for typical font processing scenarios. This suggests the regex compilation overhead was a significant bottleneck in the original implementation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 10 Passed
🌀 Generated Regression Tests 138 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 94.4%
⚙️ Existing Unit Tests and Runtime
🌀 Generated Regression Tests and Runtime
# imports
from matplotlib.font_manager import ttfFontProperty


# Simulate FontEntry as returned by ttfFontProperty
class FontEntry:
    def __init__(self, fname, name, style, variant, weight, stretch, size):
        self.fname = fname
        self.name = name
        self.style = style
        self.variant = variant
        self.weight = weight
        self.stretch = stretch
        self.size = size

    def __eq__(self, other):
        return (
            isinstance(other, FontEntry)
            and self.fname == other.fname
            and self.name == other.name
            and self.style == other.style
            and self.variant == other.variant
            and self.weight == other.weight
            and self.stretch == other.stretch
            and self.size == other.size
        )

    def __repr__(self):
        return (
            f"FontEntry(fname={self.fname!r}, name={self.name!r}, style={self.style!r}, "
            f"variant={self.variant!r}, weight={self.weight!r}, stretch={self.stretch!r}, size={self.size!r})"
        )


# Simulate ft2font constants
class ft2font:
    ITALIC = 2
    BOLD = 32


# --- Test helpers ---


class DummyFont:
    """A dummy FT2Font-like object for testing."""

    def __init__(
        self,
        fname,
        family_name,
        style_name,
        style_flags=0,
        scalable=True,
        sfnt=None,
        os2=None,
        ps_font_info=None,
    ):
        self.fname = fname
        self.family_name = family_name
        self.style_name = style_name
        self.style_flags = style_flags
        self.scalable = scalable
        self._sfnt = sfnt or {}
        self._os2 = os2
        self._ps_font_info = ps_font_info

    def get_sfnt(self):
        return self._sfnt

    def get_sfnt_table(self, name):
        if name == "OS/2":
            return self._os2
        return None

    def get_ps_font_info(self):
        if self._ps_font_info is None:
            raise ValueError("No PS font info")
        return self._ps_font_info


# --- Basic Test Cases ---


def test_basic_normal_font():
    # Test a regular font with normal style and weight
    sfnt = {
        (1, 0, 0, 2): b"Regular",
        (1, 0, 0, 4): b"Regular",
    }
    font = DummyFont(
        fname="font.ttf",
        family_name="Arial",
        style_name="Regular",
        style_flags=0,
        scalable=True,
        sfnt=sfnt,
        os2={"version": 1, "usWeightClass": 400},
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 14.2μs -> 14.3μs (0.830% slower)


def test_edge_os2_table_missing():
    # Test font with missing OS/2 table, should fallback to style_name weight
    font = DummyFont(
        fname="fontos2.ttf",
        family_name="NoOS2Font",
        style_name="Bold",
        style_flags=ft2font.BOLD,
        scalable=True,
        sfnt={},
        os2=None,
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 28.0μs -> 16.6μs (68.3% faster)


def test_edge_ps_font_info_weight():
    # Test font with PS font info weight
    font = DummyFont(
        fname="fontps.ttf",
        family_name="PSFont",
        style_name="Regular",
        style_flags=0,
        scalable=True,
        sfnt={},
        os2=None,
        ps_font_info={"weight": "Black"},
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 30.7μs -> 17.0μs (80.0% faster)


def test_edge_style_name_weight_regex():
    # Test style_name matching regexes for weight
    font = DummyFont(
        fname="fontul.ttf",
        family_name="UltraLightFont",
        style_name="UltraLight",
        style_flags=0,
        scalable=True,
        sfnt={},
        os2=None,
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 19.3μs -> 15.1μs (27.8% faster)


def test_edge_sfnt4_semi_condensed():
    # Test 'demi cond' in sfnt4 triggers semi-condensed
    sfnt = {
        (1, 0, 0, 4): b"Demi Cond",
    }
    font = DummyFont(
        fname="fontdc.ttf",
        family_name="SemiCondensedFont",
        style_name="Regular",
        style_flags=0,
        scalable=True,
        sfnt=sfnt,
        os2={"version": 1, "usWeightClass": 400},
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 12.8μs -> 11.0μs (16.0% faster)


def test_edge_sfnt4_multiple_stretch_keywords():
    # Test that 'wide' overrides 'condensed' if both present (expanded wins)
    sfnt = {
        (1, 0, 0, 4): b"Condensed Wide",
    }
    font = DummyFont(
        fname="fontcw.ttf",
        family_name="WideCondensedFont",
        style_name="Regular",
        style_flags=0,
        scalable=True,
        sfnt=sfnt,
        os2={"version": 1, "usWeightClass": 400},
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 12.1μs -> 11.4μs (6.47% faster)


def test_edge_style_flags_bold_and_italic():
    # Test font with both bold and italic style_flags
    font = DummyFont(
        fname="fontbi.ttf",
        family_name="BoldItalicFont",
        style_name="Bold Italic",
        style_flags=ft2font.BOLD | ft2font.ITALIC,
        scalable=True,
        sfnt={},
        os2=None,
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 30.0μs -> 19.1μs (57.1% faster)


def test_edge_empty_style_name():
    # Test font with empty style_name, should fallback to style_flags
    font = DummyFont(
        fname="fontempty.ttf",
        family_name="EmptyStyleFont",
        style_name="",
        style_flags=ft2font.BOLD,
        scalable=True,
        sfnt={},
        os2=None,
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 29.3μs -> 16.0μs (83.2% faster)


def test_edge_unknown_weight():
    # Test font with unknown weight, should fallback to 500
    font = DummyFont(
        fname="fontunk.ttf",
        family_name="UnknownWeightFont",
        style_name="Unusual",
        style_flags=0,
        scalable=True,
        sfnt={},
        os2=None,
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 31.1μs -> 17.5μs (78.3% faster)


# --- Large Scale Test Cases ---


def test_large_many_fonts():
    # Test many fonts in a loop for scalability
    for i in range(100):  # Limit to 100 for speed
        sfnt = {
            (1, 0, 0, 2): b"Regular",
            (1, 0, 0, 4): b"Regular",
        }
        font = DummyFont(
            fname=f"font_{i}.ttf",
            family_name=f"FontFamily{i}",
            style_name="Regular",
            style_flags=0,
            scalable=True,
            sfnt=sfnt,
            os2={"version": 1, "usWeightClass": 400},
        )
        codeflash_output = ttfFontProperty(font)
        entry = codeflash_output  # 379μs -> 360μs (5.34% faster)


def test_large_varied_weights():
    # Test large number of fonts with varied weights and style_names
    weights = [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
    style_names = [
        "Thin",
        "UltraLight",
        "Light",
        "Regular",
        "Medium",
        "SemiBold",
        "Bold",
        "ExtraBold",
        "Black",
        "UltraBlack",
    ]
    for i, (w, sn) in enumerate(zip(weights, style_names)):
        font = DummyFont(
            fname=f"fontw_{i}.ttf",
            family_name=f"WeightFamily{i}",
            style_name=sn,
            style_flags=0,
            scalable=True,
            sfnt={},
            os2={"version": 1, "usWeightClass": w},
        )
        codeflash_output = ttfFontProperty(font)
        entry = codeflash_output  # 48.6μs -> 45.7μs (6.25% faster)


def test_large_sfnt_entries():
    # Test font with large number of sfnt entries
    sfnt = {}
    for i in range(50):
        sfnt[(1, 0, 0, i)] = b"Regular"
        sfnt[(3, 1, 0x0409, i)] = b"Regular".decode("latin-1").encode("utf_16_be")
    font = DummyFont(
        fname="fontlarge.ttf",
        family_name="LargeFont",
        style_name="Regular",
        style_flags=0,
        scalable=True,
        sfnt=sfnt,
        os2={"version": 1, "usWeightClass": 400},
    )
    codeflash_output = ttfFontProperty(font)
    entry = codeflash_output  # 15.4μs -> 14.9μs (3.05% faster)


def test_large_stretch_keywords():
    # Test many fonts with different stretch keywords in sfnt4
    stretch_keywords = [
        ("Condensed", "condensed"),
        ("Expanded", "expanded"),
        ("Wide", "expanded"),
        ("Demi Cond", "semi-condensed"),
        ("Normal", "normal"),
        ("", "normal"),
    ]
    for i, (sfnt4_val, expected_stretch) in enumerate(stretch_keywords):
        sfnt = {(1, 0, 0, 4): sfnt4_val.encode("latin-1")}
        font = DummyFont(
            fname=f"fontstr_{i}.ttf",
            family_name=f"StretchFont{i}",
            style_name="Regular",
            style_flags=0,
            scalable=True,
            sfnt=sfnt,
            os2={"version": 1, "usWeightClass": 400},
        )
        codeflash_output = ttfFontProperty(font)
        entry = codeflash_output  # 36.1μs -> 34.5μs (4.91% faster)


def test_large_style_flags_combinations():
    # Test all combinations of style_flags for bold and italic
    for bold in [0, ft2font.BOLD]:
        for italic in [0, ft2font.ITALIC]:
            style_flags = bold | italic
            font = DummyFont(
                fname=f"fontflags_{bold}_{italic}.ttf",
                family_name="FlagFont",
                style_name="Regular",
                style_flags=style_flags,
                scalable=True,
                sfnt={},
                os2=None,
            )
            codeflash_output = ttfFontProperty(font)
            entry = codeflash_output
            if italic:
                pass
            else:
                pass
            if bold:
                pass
            else:
                pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-ttfFontProperty-miy5u6nq and push.

Codeflash Static Badge

The optimized code achieves a **17% speedup** through two key optimizations that reduce redundant work in regex pattern matching and string operations:

**1. Pre-compiled Regular Expressions**
The original code used string patterns in `_weight_regexes` that were compiled on-demand during `re.fullmatch()` and `re.search()` calls inside the `get_weight()` function. The optimization pre-compiles all regex patterns at module load time, storing compiled regex objects instead of strings. This eliminates repeated regex compilation overhead, which is particularly beneficial since `get_weight()` is called for every font and may iterate through multiple patterns.

**2. Optimized String Operations**
- Replaced `str.find() >= 0` with the more efficient `in` operator for substring searches when checking style keywords like 'oblique', 'italic', and 'regular'
- Changed list comprehensions to tuples where mutation isn't needed (e.g., `styles` variable), reducing memory allocation overhead
- Converted list literals to tuples in `any()` calls for stretch keyword checking, providing minor performance gains

**Performance Impact Analysis**
Based on the function references, `ttfFontProperty` is called in critical rendering paths within matplotlib's backends (Cairo and SVG), where it processes font information for text rendering. The SVG backend particularly shows intensive usage within mathtext parsing loops, making this optimization especially valuable for:

- Mathematical text rendering with multiple fonts/glyphs
- Applications that render large amounts of text
- Font-heavy visualizations where font property extraction is repeatedly performed

The test results show the optimization is most effective for edge cases involving fallback weight detection (68-83% faster) where multiple regex patterns are tested, while maintaining consistent 5-6% improvements for typical font processing scenarios. This suggests the regex compilation overhead was a significant bottleneck in the original implementation.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 9, 2025 05:49
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant