Skip to content

⚡️ Speed up function find_react_components by 26% in PR #1561 (add/support_react)#1682

Open
codeflash-ai[bot] wants to merge 1 commit intoadd/support_reactfrom
codeflash/optimize-pr1561-2026-02-27T00.23.56
Open

⚡️ Speed up function find_react_components by 26% in PR #1561 (add/support_react)#1682
codeflash-ai[bot] wants to merge 1 commit intoadd/support_reactfrom
codeflash/optimize-pr1561-2026-02-27T00.23.56

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 27, 2026

⚡️ This pull request contains optimizations for PR #1561

If you approve this dependent PR, these changes will be merged into the original PR branch add/support_react.

This PR will be automatically closed if the original PR is merged.


📄 26% (0.26x) speedup for find_react_components in codeflash/languages/javascript/frameworks/react/discovery.py

⏱️ Runtime : 7.34 milliseconds 5.82 milliseconds (best of 250 runs)

📝 Explanation and details

Runtime improvement: the optimized code reduces end-to-end runtime from ~7.34 ms to ~5.82 ms — a 26% speedup — by removing Python-level work and repeated allocations in the hot path.

What changed (concrete optimizations)

  • Cached source bytes: added an lru_cache-backed _encode_source(source) so repeated source.encode("utf-8") calls reuse the same bytes object instead of allocating/encoding every time.
  • Faster hook extraction: replaced the Python-level regex iteration + seen-set loop with HOOK_EXTRACT_RE.findall(...) then list(dict.fromkeys(...)) to deduplicate while preserving first-seen order. This shifts most work into C (re.findall and dict construction) and removes per-match Python bookkeeping.
  • Cheap early-exit for memo checks: added a fast substring check ("memo(" and "React.memo") to skip the more expensive AST-parent walk and repeated slice+decode operations when memo is not present in the source.
  • Minor micro-alloc reduction: switched some ephemeral lists to tuples where appropriate (e.g., memo_patterns) and removed duplicated encode calls elsewhere.

Why these changes speed things up

  • Avoiding repeated .encode calls eliminates expensive per-function memory allocations and Python function-call overhead. The original profiler showed significant time in source.encode() sites (e.g., _extract_props_type, _function_returns_jsx). Caching the encoded bytes eliminates these hotspots when the same source string is inspected multiple times (typical when scanning many functions in one file).
  • Using regex.findall and dict.fromkeys moves the heavy lifting into C implementations (re engine and dict internals), cutting Python loop/branch overhead. The line profiler shows _extract_hooks_used time dropped substantially.
  • The substring check for memo presence is O(n) at C speed and avoids the common-case cost of doing tree/parent inspection and repeated byte-slicing/decoding for every function when memo is not used in the file.
  • Together these changes reduce per-function overhead in the main loop of find_react_components, which is where most time is spent for large files.

How this affects real workloads / hot paths

  • find_react_components is used during project-wide discovery and in downstream analyzers (see integration tests). When scanning large files with many functions (the realistic hot path), per-function overhead dominates; these changes reduce that overhead, so the largest wins are for big files or many functions in a single source (the annotated large-scale tests show the biggest improvement: ~34% in that test).
  • Small files or single-function files still benefit (microsecond-level wins) but the biggest impact is when the analyzer processes hundreds of functions in one source — exactly the scenario exercised by the large-scale annotated test and the integration flows that call find_react_components.

Which tests / cases benefit most

  • Large-scale detection and deduping tests (thousands of functions, many repeated hook patterns) get the largest absolute wins because of eliminated allocations and cheaper hook extraction.
  • Any test or real workload that repeatedly slices/decodes source bytes for props/memo detection benefits from the cached encoded bytes.
  • Small, early-exit scenarios (files with "use server") are unaffected functionally and still return quickly.

Behavioral/implementation notes and trade-offs

  • Semantics preserved: the changes do not change detection logic; they only change how data is extracted (same regex, same tree checks).
  • Memory trade-off: lru_cache(maxsize=32) will keep recent encoded source bytes alive (small, bounded memory increase). This is an intentional and reasonable trade-off for eliminating repeated encodings in the common case of scanning many functions from the same file.
  • The early substring check is conservative: it only avoids the AST/decoding work when memo-like identifiers are absent; when present, the full checks still run so detection correctness is unchanged.

Summary

  • Primary benefit: 26% runtime reduction (7.34 ms → 5.82 ms) by cutting Python-level loops and repeated allocations in the hot path.
  • Changes are low-risk, preserve behavior, and give the biggest improvements on large files and workloads that scan many functions in the same source (the common case for project analysis).

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 10 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.languages.javascript.frameworks.react import discovery
from codeflash.languages.javascript.frameworks.react.discovery import (
    ComponentType, ReactComponentInfo, find_react_components)
from codeflash.languages.javascript.treesitter import (FunctionNode,
                                                       TreeSitterAnalyzer)

# NB: We will create small helper objects to emulate the minimal parts of a tree-sitter Node
# that the discovery module expects. These are minimal, focused on the attributes/methods used
# by the code under test (child_by_field_name, children, type, parent, start_byte, end_byte,
# next_named_sibling). We keep these helpers extremely small and deterministic so tests remain stable.

class _MiniNode:
    """Minimal node-like object implementing the small API that discovery._node_contains_jsx
    and related helpers expect. This is not a full tree-sitter Node; it's a tiny structural
    shim used only inside tests to exercise logic that inspects node.type and children.
    """
    def __init__(self, type_, children=None, parent=None, start_byte=0, end_byte=0):
        self.type = type_
        # children should be a list of _MiniNode
        self.children = children or []
        self.parent = parent
        self._children_by_field = {}
        self.start_byte = start_byte
        self.end_byte = end_byte
        # next_named_sibling is occasionally used; set by test if needed
        self.next_named_sibling = None

    def child_by_field_name(self, name: str):
        # Support only the "body", "parameters", and "function" lookups used in discovery
        return self._children_by_field.get(name)

    def set_child_field(self, name: str, node):
        self._children_by_field[name] = node
        if node is not None:
            node.parent = self

# Helper to construct a FunctionNode using the real dataclass from the project.
def _make_function_node(
    *,
    name: str,
    is_method: bool = False,
    is_arrow: bool = False,
    source_text: str = "",
    node: _MiniNode | None = None,
    start_line: int = 1,
    end_line: int = 1,
) -> FunctionNode:
    """Create a FunctionNode dataclass instance with sensible defaults."""
    # Fill all required dataclass fields. We keep many flags False for simplicity.
    return FunctionNode(
        name=name,
        node=node or _MiniNode("function_declaration"),
        start_line=start_line,
        end_line=end_line,
        start_col=0,
        end_col=0,
        is_async=False,
        is_method=is_method,
        is_arrow=is_arrow,
        is_generator=False,
        class_name=None,
        parent_function=None,
        source_text=source_text,
        doc_start_line=None,
        is_exported=False,
    )

def test_skips_file_with_use_server_directive():
    # If the file begins with a "use server" directive, find_react_components must return an empty list
    src = "'use server'\n\nexport function MyComp() { return <div/> }"
    analyzer = TreeSitterAnalyzer("javascript")  # real analyzer instance; we don't use its parsing here
    # We don't need to monkeypatch analyzer because the function returns early
    codeflash_output = find_react_components(src, Path("some_file.tsx"), analyzer); result = codeflash_output # 2.84μs -> 2.83μs (0.035% faster)

def test_finds_hook_functions_without_parsing_jsx():
    # Hooks are recognized purely by name (useXxx) and not by return type, so we can detect them
    src = """
    export function useMyThing() {
        const [s, setS] = useState(0);
        useEffect(() => {});
    }
    """
    analyzer = TreeSitterAnalyzer("javascript")
    # Monkeypatch the analyzer to return a single FunctionNode representing the hook.
    # Using the actual TreeSitterAnalyzer instance (real class) but assigning a custom function
    # as an attribute is a lightweight test-time configuration and keeps downstream logic deterministic.
    fn = _make_function_node(
        name="useMyThing",
        is_method=False,
        is_arrow=False,
        source_text=src,
        start_line=2,
        end_line=6,
    )

    # Replace the find_functions method on the real analyzer instance with a deterministic lambda.
    analyzer.find_functions = lambda source, include_methods=False, include_arrow_functions=True, require_name=True: [fn]

    codeflash_output = find_react_components(src, Path("hooks.tsx"), analyzer); comps = codeflash_output # 18.7μs -> 15.6μs (19.8% faster)
    comp = comps[0]

def test_non_pascal_and_method_functions_are_excluded():
    # Create three functions:
    # - non-pascal function name (should be ignored)
    # - a PascalCase but marked as method (should be ignored)
    # - a valid hook (should be included)
    src = """
    function myfunc() { return <div/> }  // lowercase name - not a component
    class C { MyMethod() { return <span/> } } // method - should be skipped by include_methods=False in analyzer
    function useThing() { useLayoutEffect(() => {}); } // hook - should be included
    """
    analyzer = TreeSitterAnalyzer("javascript")

    non_pascal = _make_function_node(name="myfunc", is_method=False, source_text="function myfunc() { }")
    method_like = _make_function_node(name="MyMethod", is_method=True, source_text="class C { MyMethod(){ } }")
    hook = _make_function_node(name="useThing", is_method=False, source_text="function useThing(){ useLayoutEffect(); }")

    # Ensure analyzer returns all three; find_react_components should filter appropriately
    analyzer.find_functions = lambda source, include_methods=False, include_arrow_functions=True, require_name=True: [
        non_pascal,
        method_like,
        hook,
    ]

    codeflash_output = find_react_components(src, Path("edge.tsx"), analyzer); comps = codeflash_output # 17.7μs -> 14.9μs (19.2% faster)

def test_detects_memo_wrapping_by_source_string_and_jsx_return():
    # This test verifies that memo-detection works both via AST parent call_expression
    # and via textual patterns like "memo(MyComp)".
    # We will rely on the textual pattern because creating full call_expression parents is more involved.
    src_template = """
    import React, {{ memo }} from "react";

    export const MyComp = (props: Props) => <div>{props.children}</div>;

    export default memo(MyComp);
    """
    analyzer = TreeSitterAnalyzer("javascript")

    # Construct a node structure where node.child_by_field_name("body") returns a node
    # which contains a child with a JSX type so _function_returns_jsx returns True.
    jsx_node = _MiniNode("jsx_element", children=[])
    body_node = _MiniNode("parenthesized_expression", children=[jsx_node])
    func_node = _MiniNode("arrow_function")
    func_node.set_child_field("body", body_node)

    fn = _make_function_node(
        name="MyComp",
        is_method=False,
        is_arrow=True,
        source_text='(props: Props) => <div>{props.children}</div>',
        node=func_node,
        start_line=3,
        end_line=3,
    )

    # Analyzer returns our single arrow component
    analyzer.find_functions = lambda source, include_methods=False, include_arrow_functions=True, require_name=True: [fn]

    codeflash_output = find_react_components(src_template, Path("memoed.tsx"), analyzer); comps = codeflash_output # 17.0μs -> 16.0μs (5.99% faster)
    comp = comps[0]

def test_large_scale_many_hooks_and_components():
    # Build a large list of FunctionNode objects to ensure find_react_components handles many items.
    # We'll create 800 hooks (useXxx) and 200 components (PascalCase). Hooks are easier because they don't
    # require JSX detection. For components, we'll mark them arrow functions and provide a simple body node
    # containing a jsx_element type so _function_returns_jsx sees them as returning JSX.
    analyzer = TreeSitterAnalyzer("javascript")

    functions = []

    # Generate 800 hooks named useHook0 .. useHook799
    for i in range(800):
        name = f"useHook{i}"
        src_text = f"function {name}() {{ useState(); }}"
        fn = _make_function_node(name=name, is_method=False, is_arrow=False, source_text=src_text, start_line=1 + i, end_line=1 + i)
        functions.append(fn)

    # Generate 200 PascalCase components named Comp0 .. Comp199
    for i in range(200):
        name = f"Comp{i}"
        # create a node with a body that contains a jsx_element to indicate JSX return
        jsx_node = _MiniNode("jsx_element")
        body_node = _MiniNode("block", children=[jsx_node])
        func_node = _MiniNode("function_declaration")
        func_node.set_child_field("body", body_node)
        src_text = f"function {name}() {{ return <div>{i}</div>; }}"
        fn = _make_function_node(name=name, is_method=False, is_arrow=False, source_text=src_text, node=func_node, start_line=1000 + i, end_line=1000 + i)
        functions.append(fn)

    # Shuffle order slightly to mix hooks and components (deterministic)
    functions = functions[::]  # keep order deterministic

    # Monkeypatch analyzer.find_functions to return our pre-built list.
    analyzer.find_functions = lambda source, include_methods=False, include_arrow_functions=True, require_name=True: functions

    src = "// big file with many functions\n" + "\n".join(f"// filler line {i}" for i in range(10))
    codeflash_output = find_react_components(src, Path("bigfile.tsx"), analyzer); comps = codeflash_output # 4.76ms -> 3.55ms (34.2% faster)

    # Verify some invariants: count of hooks and components
    hook_count = sum(1 for c in comps if c.component_type == ComponentType.HOOK)
    arrow_or_func_count = sum(1 for c in comps if c.component_type in (ComponentType.ARROW, ComponentType.FUNCTION))

    # Spot-check a few entries for correctness
    names = {c.function_name for c in comps}
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from pathlib import Path

# Import the module and symbols under test.
import codeflash.languages.javascript.frameworks.react.discovery as discovery
# imports
import pytest  # used for our unit tests
from codeflash.languages.javascript.frameworks.react.discovery import (
    ComponentType, ReactComponentInfo, find_react_components)
# Import the real analyzer and FunctionNode dataclass from the project's treesitter module.
from codeflash.languages.javascript.treesitter import (FunctionNode,
                                                       TreeSitterAnalyzer)

def test_basic_function_component_detection(monkeypatch):
    """
    Basic positive case:
    - A PascalCase standalone function that 'returns JSX' should be reported as a FUNCTION component.
    - Hook calls inside the function source_text should be extracted (unique, in first-seen order).
    - Props type and memo detection are read via patched extractors (to avoid requiring a real tree-sitter Node).
    """
    # Simple source and fake file path
    source = "/* some file */\nconst x = 1;\n"
    file_path = Path("components/MyComp.tsx")

    # Create a real TreeSitterAnalyzer instance (language string is accepted by ctor).
    analyzer = TreeSitterAnalyzer("javascript")

    # Build a FunctionNode instance using the real dataclass constructor.
    # We deliberately set node to None because we will patch internal helpers that would otherwise
    # attempt to access tree-sitter Node APIs.
    fn = FunctionNode(
        name="MyComp",  # PascalCase => candidate component
        node=None,
        start_line=10,
        end_line=20,
        start_col=0,
        end_col=0,
        is_async=False,
        is_method=False,  # standalone function
        is_arrow=False,
        is_generator=False,
        class_name=None,
        parent_function=None,
        # Include source text containing hook calls; the regex should capture useState and useCustom
        source_text="const [s, setS] = useState(0);\nuseCustom<TypeParam>();\nreturn <div/>;",
    )

    # Patch analyzer.find_functions to return our single FunctionNode.
    monkeypatch.setattr(analyzer, "find_functions", lambda *args, **kwargs: [fn])

    # Patch internal helpers to avoid needing a real tree-sitter Node structure.
    # _function_returns_jsx: treat presence of attribute _returns_jsx as the determinant.
    monkeypatch.setattr(discovery, "_function_returns_jsx", lambda f, s, a: getattr(f, "_returns_jsx", True))
    # _extract_props_type: read from an attribute on the FunctionNode if present.
    monkeypatch.setattr(discovery, "_extract_props_type", lambda f, s, a: getattr(f, "_props_type", None))
    # _is_wrapped_in_memo: read from attribute if present.
    monkeypatch.setattr(discovery, "_is_wrapped_in_memo", lambda f, s: getattr(f, "_is_memo", False))

    # Call the function under test.
    codeflash_output = find_react_components(source, file_path, analyzer); components = codeflash_output # 16.9μs -> 15.3μs (10.6% faster)

    comp = components[0]

def test_hook_is_classified_as_hook_and_not_component(monkeypatch):
    """
    A function named with the 'useXxx' pattern that is not a method should be classified as a HOOK.
    Hooks should be reported with component_type == ComponentType.HOOK and returns_jsx == False.
    """
    source = ""
    file_path = Path("hooks/useThing.ts")
    analyzer = TreeSitterAnalyzer("javascript")

    fn = FunctionNode(
        name="useThing",
        node=None,
        start_line=1,
        end_line=3,
        start_col=0,
        end_col=0,
        is_async=False,
        is_method=False,
        is_arrow=False,
        is_generator=False,
        class_name=None,
        parent_function=None,
        source_text="const val = useState(0); useMemo(()=>{});",
    )

    # Ensure analyzer returns our function
    monkeypatch.setattr(analyzer, "find_functions", lambda *args, **kwargs: [fn])

    # Patch helpers similarly to previous test.
    monkeypatch.setattr(discovery, "_function_returns_jsx", lambda f, s, a: getattr(f, "_returns_jsx", False))
    monkeypatch.setattr(discovery, "_extract_props_type", lambda f, s, a: None)
    monkeypatch.setattr(discovery, "_is_wrapped_in_memo", lambda f, s: False)

    codeflash_output = find_react_components(source, file_path, analyzer); components = codeflash_output # 12.2μs -> 11.2μs (8.39% faster)
    comp = components[0]

def test_skips_server_component_file_without_parsing(monkeypatch):
    """
    Files that begin with a 'use server' directive should be skipped entirely,
    and the analyzer.find_functions should not be called.
    """
    # Source that triggers server directive detection in the first five lines
    source = "'use server';\nimport React from 'react';\nfunction Ignored() { return <div/> }"
    file_path = Path("app/page.tsx")
    analyzer = TreeSitterAnalyzer("javascript")

    # If find_functions gets called, fail the test (it should not be called)
    def fail_if_called(*args, **kwargs):
        pytest.fail("analyzer.find_functions should not be called for server component files")

    monkeypatch.setattr(analyzer, "find_functions", fail_if_called)

    codeflash_output = find_react_components(source, file_path, analyzer); components = codeflash_output # 2.85μs -> 2.79μs (2.15% faster)

def test_non_pascal_case_and_methods_are_ignored(monkeypatch):
    """
    Ensure functions that are not PascalCase or that are class methods are ignored.
    """
    source = ""
    file_path = Path("misc/file.tsx")
    analyzer = TreeSitterAnalyzer("javascript")

    fn_lowercase = FunctionNode(
        name="notPascal",
        node=None,
        start_line=1,
        end_line=2,
        start_col=0,
        end_col=0,
        is_async=False,
        is_method=False,
        is_arrow=False,
        is_generator=False,
        class_name=None,
        parent_function=None,
        source_text="return <span/>;",
    )

    fn_method = FunctionNode(
        name="MyMethod",
        node=None,
        start_line=3,
        end_line=4,
        start_col=0,
        end_col=0,
        is_async=False,
        is_method=True,  # method inside class -> should be ignored
        is_arrow=False,
        is_generator=False,
        class_name="SomeClass",
        parent_function=None,
        source_text="return <div/>;",
    )

    # Analyzer returns both functions
    monkeypatch.setattr(analyzer, "find_functions", lambda *args, **kwargs: [fn_lowercase, fn_method])

    # Patch helpers to treat both as returning JSX if asked
    monkeypatch.setattr(discovery, "_function_returns_jsx", lambda f, s, a: True)
    monkeypatch.setattr(discovery, "_extract_props_type", lambda f, s, a: None)
    monkeypatch.setattr(discovery, "_is_wrapped_in_memo", lambda f, s: False)

    codeflash_output = find_react_components(source, file_path, analyzer); components = codeflash_output # 4.69μs -> 4.68μs (0.235% faster)

def test_large_scale_detection_and_hook_deduping(monkeypatch):
    """
    Large-scale test: create up to 1000 FunctionNode entries and verify:
    - Only PascalCase standalone functions that 'return JSX' are reported.
    - Hook extraction deduplicates calls and preserves first-seen order.
    - The collection scales to 1000 functions in a deterministic manner.
    """
    num_total = 1000
    # We'll make half of them valid PascalCase components, half invalid
    num_valid = num_total // 2

    source = "// large file\n"
    file_path = Path("big/Many.tsx")
    analyzer = TreeSitterAnalyzer("javascript")

    functions = []
    for i in range(num_total):
        if i < num_valid:
            # Valid PascalCase component names: Comp0, Comp1, ...
            name = f"Comp{i}"
            is_method = False
            is_arrow = (i % 3 == 0)  # some arrows, some normal functions
            # craft source_text to include hook calls with duplicates and generics: useA, useB, useA
            source_text = "useA(); useB<Type>(); useA(); return <div/>;"
            # Mark attributes that our patched helpers will consult
            fn = FunctionNode(
                name=name,
                node=None,
                start_line=i * 2 + 1,
                end_line=i * 2 + 2,
                start_col=0,
                end_col=0,
                is_async=False,
                is_method=is_method,
                is_arrow=is_arrow,
                is_generator=False,
                class_name=None,
                parent_function=None,
                source_text=source_text,
            )
            # set helper attributes that patched helpers will read
            setattr(fn, "_returns_jsx", True)
            setattr(fn, "_props_type", None)
            setattr(fn, "_is_memo", False)
        else:
            # Invalid names (not PascalCase) or methods
            idx = i - num_valid
            name = f"not_comp_{idx}"
            fn = FunctionNode(
                name=name,
                node=None,
                start_line=i * 2 + 1,
                end_line=i * 2 + 2,
                start_col=0,
                end_col=0,
                is_async=False,
                is_method=(idx % 5 == 0),  # some will be methods
                is_arrow=False,
                is_generator=False,
                class_name=None,
                parent_function=None,
                source_text="useA(); return <div/>;",
            )
            # even if they have _returns_jsx True, they should be ignored by PascalCase check or is_method
            setattr(fn, "_returns_jsx", True)
            setattr(fn, "_props_type", None)
            setattr(fn, "_is_memo", False)

        functions.append(fn)

    # Patch analyzer to return our synthetic functions list
    monkeypatch.setattr(analyzer, "find_functions", lambda *args, **kwargs: functions)

    # Patch internal helpers to use the attributes we set on FunctionNode objects.
    monkeypatch.setattr(discovery, "_function_returns_jsx", lambda f, s, a: getattr(f, "_returns_jsx", False))
    monkeypatch.setattr(discovery, "_extract_props_type", lambda f, s, a: getattr(f, "_props_type", None))
    monkeypatch.setattr(discovery, "_is_wrapped_in_memo", lambda f, s: getattr(f, "_is_memo", False))

    codeflash_output = find_react_components(source, file_path, analyzer); components = codeflash_output # 2.49ms -> 2.18ms (14.0% faster)

    # Verify that hook extraction on the first valid component preserved order and deduplicated
    first_comp = components[0]
    # All returned items should be either FUNCTION or ARROW depending on is_arrow flag
    for comp_info in components:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1561-2026-02-27T00.23.56 and push.

Codeflash Static Badge

Runtime improvement: the optimized code reduces end-to-end runtime from ~7.34 ms to ~5.82 ms — a 26% speedup — by removing Python-level work and repeated allocations in the hot path.

What changed (concrete optimizations)
- Cached source bytes: added an lru_cache-backed _encode_source(source) so repeated source.encode("utf-8") calls reuse the same bytes object instead of allocating/encoding every time.
- Faster hook extraction: replaced the Python-level regex iteration + seen-set loop with HOOK_EXTRACT_RE.findall(...) then list(dict.fromkeys(...)) to deduplicate while preserving first-seen order. This shifts most work into C (re.findall and dict construction) and removes per-match Python bookkeeping.
- Cheap early-exit for memo checks: added a fast substring check ("memo(" and "React.memo") to skip the more expensive AST-parent walk and repeated slice+decode operations when memo is not present in the source.
- Minor micro-alloc reduction: switched some ephemeral lists to tuples where appropriate (e.g., memo_patterns) and removed duplicated encode calls elsewhere.

Why these changes speed things up
- Avoiding repeated .encode calls eliminates expensive per-function memory allocations and Python function-call overhead. The original profiler showed significant time in source.encode() sites (e.g., _extract_props_type, _function_returns_jsx). Caching the encoded bytes eliminates these hotspots when the same source string is inspected multiple times (typical when scanning many functions in one file).
- Using regex.findall and dict.fromkeys moves the heavy lifting into C implementations (re engine and dict internals), cutting Python loop/branch overhead. The line profiler shows _extract_hooks_used time dropped substantially.
- The substring check for memo presence is O(n) at C speed and avoids the common-case cost of doing tree/parent inspection and repeated byte-slicing/decoding for every function when memo is not used in the file.
- Together these changes reduce per-function overhead in the main loop of find_react_components, which is where most time is spent for large files.

How this affects real workloads / hot paths
- find_react_components is used during project-wide discovery and in downstream analyzers (see integration tests). When scanning large files with many functions (the realistic hot path), per-function overhead dominates; these changes reduce that overhead, so the largest wins are for big files or many functions in a single source (the annotated large-scale tests show the biggest improvement: ~34% in that test).
- Small files or single-function files still benefit (microsecond-level wins) but the biggest impact is when the analyzer processes hundreds of functions in one source — exactly the scenario exercised by the large-scale annotated test and the integration flows that call find_react_components.

Which tests / cases benefit most
- Large-scale detection and deduping tests (thousands of functions, many repeated hook patterns) get the largest absolute wins because of eliminated allocations and cheaper hook extraction.
- Any test or real workload that repeatedly slices/decodes source bytes for props/memo detection benefits from the cached encoded bytes.
- Small, early-exit scenarios (files with "use server") are unaffected functionally and still return quickly.

Behavioral/implementation notes and trade-offs
- Semantics preserved: the changes do not change detection logic; they only change how data is extracted (same regex, same tree checks).
- Memory trade-off: lru_cache(maxsize=32) will keep recent encoded source bytes alive (small, bounded memory increase). This is an intentional and reasonable trade-off for eliminating repeated encodings in the common case of scanning many functions from the same file.
- The early substring check is conservative: it only avoids the AST/decoding work when memo-like identifiers are absent; when present, the full checks still run so detection correctness is unchanged.

Summary
- Primary benefit: 26% runtime reduction (7.34 ms → 5.82 ms) by cutting Python-level loops and repeated allocations in the hot path.
- Changes are low-risk, preserve behavior, and give the biggest improvements on large files and workloads that scan many functions in the same source (the common case for project analysis).
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants