⚡️ Speed up method `TreeSitterAnalyzer.has_return_statement` by 10% in PR #1561 (`add/support_react`) by codeflash-ai[bot] · Pull Request #1670 · codeflash-ai/codeflash

codeflash-ai · 2026-02-25T21:27:30Z

⚡️ This pull request contains optimizations for PR #1561

If you approve this dependent PR, these changes will be merged into the original PR branch add/support_react.

This PR will be automatically closed if the original PR is merged.

📄 10% (0.10x) speedup for `TreeSitterAnalyzer.has_return_statement` in `codeflash/languages/javascript/treesitter_utils.py`

⏱️ Runtime : 1.31 milliseconds → 1.19 milliseconds (best of 250 runs)

📝 Explanation and details

Primary benefit — runtime: The optimized version reduces average wall-clock time by ~10% (1.31 ms -> 1.19 ms). This win comes from reducing per-call allocations and expensive attribute lookups inside the traversal hot-loop, which compounds when traversing large ASTs or calling the routine repeatedly.

What changed (specific optimizations)

Pulled the small, constant collection of function-like node types out to a module-level frozenset (_FUNC_LIKE_NODE_TYPES). That avoids re-creating the tuple on every _node_has_return call.
Reused local variables for child lists inside the loop (body_children = body_node.children and child_nodes = current.children) to avoid repeated attribute lookups.
Kept traversal logic identical; only micro-optimizations that reduce Python overhead were applied.

Why this speeds things up (Python-level rationale)

Allocation cost: In the original code func_types = ("function_declaration", ...) creates a new tuple on every call. Even tiny allocations matter when the traversal loops many times. Moving it to a module-level constant removes that allocation.
Membership cost: Using a frozenset for func-like node types makes "current.type in func_types" an O(1) hash lookup instead of a tiny linear scan; for very hot loops this helps and avoids per-call construction overhead.
Attribute lookup reduction: Accessing attributes like current.children or body_node.children repeatedly triggers attribute lookups (and possibly C-to-Python boundary work for tree-sitter Node objects). Binding them once to a local name reduces that overhead inside the loop.
Lower Python interpreter overhead: Fewer temporary objects and fewer attribute lookups reduce work inside the hot while/stack loop, which is where most time is spent (the annotated tests and profiler show the traversal loop dominates).

How this affects workloads (when you see the benefit)

Best for deep/large ASTs and repeated calls: Annotated tests show the largest wins on the deep chain and large-tree tests (16–20% faster), exactly the scenarios where the traversal iterates many nodes and the per-iteration savings add up.
Small, trivial functions: A few microbench tests (very small inputs) show negligible differences and a couple are slightly slower by a few percent — this is expected noise/trade-off from different micro-bench timings and is reasonable given the overall runtime improvement on realistic workloads.
Hot path safe: The change is local, preserves behavior, and benefits any code that repeatedly calls has_return_statement or walks big trees.

Trade-offs

No behavioral changes were introduced. A small number of microbench cases are a bit slower (annotated tests show a few 3–4% regressions), but those are tiny inputs where the traversal work is minimal. This small regression is an acceptable trade-off for the measurable runtime improvement on realistic workloads.

Summary

Primary gain: ~10% overall runtime improvement (1.31ms → 1.19ms).
How: remove per-call tuple allocation, use a module-level frozenset, and reduce repeated attribute lookups in the hot traversal loop.
Good for: large/deep AST traversal and high-call-rate scenarios; negligible or acceptable differences for tiny inputs.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 1040 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import pytest  # used for our unit tests
from codeflash.languages.javascript.treesitter_utils import TreeSitterAnalyzer

# -----------------------------------------------------------------------------
# NOTE:
# The TreeSitterAnalyzer.has_return_statement expects a "FunctionNode"-like
# object whose attributes the implementation uses are:
#   - is_generator (bool)
#   - is_arrow (bool)
#   - node (object exposing .type, .children and .child_by_field_name(name))
#
# For unit testing we construct small, deterministic node-like objects that
# implement the minimal API used by has_return_statement/_node_has_return.
# These are not mocks from unittest.mock; they are tiny concrete helper
# classes defined here purely to exercise the real TreeSitterAnalyzer code.
# -----------------------------------------------------------------------------

class NodeLike:
    """
    Minimal node-like object compatible with the TreeSitterAnalyzer expectations.

    It provides:
      - .type: a string indicating the node type (e.g. "return_statement")
      - .children: a list of child nodes
      - .child_by_field_name(name): returns a specific child node by name
    """

    __slots__ = ("type", "children", "_fields")

    def __init__(self, type: str, children=None, fields=None):
        # Set the node type used by the analyzer
        self.type = type
        # Ensure children is always a list for iteration; preserve order
        self.children = list(children) if children else []
        # A mapping of named fields to node objects (for .child_by_field_name)
        self._fields = dict(fields) if fields else {}

    def child_by_field_name(self, name: str):
        # Return the previously stored field or None if missing
        return self._fields.get(name)

class FunctionNodeLike:
    """
    Minimal function-like wrapper expected by has_return_statement.

    Contains:
      - is_generator: True if generator (immediate True result)
      - is_arrow: True if arrow function (may imply implicit return)
      - node: the NodeLike root node representing the function AST
    """
    __slots__ = ("is_generator", "is_arrow", "node")

    def __init__(self, node, is_generator=False, is_arrow=False):
        self.node = node
        self.is_generator = is_generator
        self.is_arrow = is_arrow

# Helper factory functions to build common node shapes used across tests.
def make_return_node():
    # Create a direct return statement node (leaf)
    return NodeLike("return_statement")

def make_statement_block(children=None):
    # Create a statement block with given children
    return NodeLike("statement_block", children=children or [])

def make_function_node_body(child_nodes):
    # Create a function-like node that has a body field referencing a statement_block
    body = make_statement_block(children=child_nodes)
    # The function node's .child_by_field_name("body") should return the body node
    fn_node = NodeLike("function_declaration", children=[body], fields={"body": body})
    return fn_node

def test_generator_function_always_reports_true():
    # Generator functions should immediately return True regardless of node contents.
    empty_node = NodeLike("function_declaration", children=[])
    fn = FunctionNodeLike(node=empty_node, is_generator=True, is_arrow=False)

    analyzer = TreeSitterAnalyzer("javascript")
    # Because is_generator is True, has_return_statement must return True without
    # inspecting the node structure.
    codeflash_output = analyzer.has_return_statement(fn, source="") # 557ns -> 544ns (2.39% faster)

def test_arrow_function_with_expression_body_implicit_return():
    # Arrow function with a non-statement_block body indicates an implicit return.
    # Create a body node that is an expression (e.g., "binary_expression").
    expr_body = NodeLike("binary_expression")
    fn_node = NodeLike("arrow_function", children=[expr_body], fields={"body": expr_body})

    fn = FunctionNodeLike(node=fn_node, is_generator=False, is_arrow=True)
    analyzer = TreeSitterAnalyzer("javascript")

    # Because the body exists and its type is not "statement_block", this is an
    # implicit return and the method must return True.
    codeflash_output = analyzer.has_return_statement(fn, source="x => x + 1") # 1.61μs -> 1.66μs (3.25% slower)

def test_arrow_with_statement_block_body_delegates_to_node_traversal():
    # Arrow function with a statement_block must be inspected for explicit returns.
    # Build a body block containing a return_statement.
    return_node = make_return_node()
    body = make_statement_block(children=[return_node])
    fn_node = NodeLike("arrow_function", children=[body], fields={"body": body})

    fn = FunctionNodeLike(node=fn_node, is_generator=False, is_arrow=True)
    analyzer = TreeSitterAnalyzer("javascript")

    # There is an explicit return inside the statement block, so the result is True.
    codeflash_output = analyzer.has_return_statement(fn, source="() => { return 1; }") # 2.99μs -> 3.11μs (3.99% slower)

def test_regular_function_with_direct_return_in_body():
    # Non-arrow, non-generator function with a return in its body should return True.
    return_node = make_return_node()
    fn_node = make_function_node_body([return_node])  # function_declaration with body->return
    fn = FunctionNodeLike(node=fn_node, is_generator=False, is_arrow=False)
    analyzer = TreeSitterAnalyzer("javascript")

    codeflash_output = analyzer.has_return_statement(fn, source="function f(){ return 3; }") # 2.36μs -> 2.31μs (2.25% faster)

def test_no_return_statement_returns_false():
    # Function has no return statements anywhere; expect False.
    inner_expr = NodeLike("identifier")  # arbitrary node, not a return
    fn_node = make_function_node_body([inner_expr])
    fn = FunctionNodeLike(node=fn_node, is_generator=False, is_arrow=False)
    analyzer = TreeSitterAnalyzer("javascript")

    codeflash_output = analyzer.has_return_statement(fn, source="function noReturn(){ let x = 1; }") # 2.61μs -> 2.73μs (4.18% slower)

def test_arrow_with_missing_body_falls_back_to_node_traversal():
    # Arrow function whose child_by_field_name("body") returns None should fallback
    # to traversal of the provided node.
    # Build a node tree with a top-level return node (but no explicit body field).
    top_return = make_return_node()
    top_node = NodeLike("arrow_function", children=[top_return], fields={})  # no "body" field
    fn = FunctionNodeLike(node=top_node, is_generator=False, is_arrow=True)
    analyzer = TreeSitterAnalyzer("javascript")

    # There is a return reachable by traversal, so expect True.
    codeflash_output = analyzer.has_return_statement(fn, source="() => (malformed)") # 2.09μs -> 2.00μs (4.76% faster)

def test_empty_function_node_with_no_children_returns_false():
    # A function node with no children and not generator/arrow must return False.
    empty_fn_node = NodeLike("function_declaration", children=[], fields={"body": None})
    fn = FunctionNodeLike(node=empty_fn_node, is_generator=False, is_arrow=False)
    analyzer = TreeSitterAnalyzer("javascript")

    codeflash_output = analyzer.has_return_statement(fn, source="function empty(){}") # 1.63μs -> 1.56μs (4.43% faster)

def test_return_in_nested_inner_function_is_detected():
    # The analyzer traverses function bodies, and so a return inside a nested
    # function body that appears in the outer body will be found.
    # Outer body contains an inner function that contains a return.
    inner_return = make_return_node()
    inner_body = make_statement_block(children=[inner_return])
    inner_fn_node = NodeLike("function_expression", children=[inner_body], fields={"body": inner_body})

    # Outer function's body contains the inner function node
    outer_body = make_statement_block(children=[inner_fn_node])
    outer_fn_node = NodeLike("function_declaration", children=[outer_body], fields={"body": outer_body})

    fn = FunctionNodeLike(node=outer_fn_node, is_generator=False, is_arrow=False)
    analyzer = TreeSitterAnalyzer("javascript")

    # Because the traversal descends into function bodies, this will be reported as True.
    codeflash_output = analyzer.has_return_statement(fn, source="function outer(){ function inner(){ return 1; } }") # 2.85μs -> 2.88μs (0.868% slower)

def test_long_chain_with_final_return_detected():
    # Build a deep linear chain of nodes (1000 elements) with a return at the end.
    depth = 1000
    # Start with the final return node
    node = make_return_node()
    # Wrap it in many generic statement_block layers to create a deep tree
    for _ in range(depth):
        node = NodeLike("statement_block", children=[node])

    # Put that deep block as the function body
    fn_node = NodeLike("function_declaration", children=[node], fields={"body": node})
    fn = FunctionNodeLike(node=fn_node, is_generator=False, is_arrow=False)
    analyzer = TreeSitterAnalyzer("javascript")

    # Even with deep nesting, the iterative traversal should find the return.
    codeflash_output = analyzer.has_return_statement(fn, source="function deep(){ /* lots */ }") # 203μs -> 174μs (16.4% faster)

def test_large_tree_without_return_reports_false():
    # Build a large tree (1000 nodes) none of which are return statements.
    # We'll make a balanced-ish tree by repeatedly nesting two children.
    size = 1000

    leaves = [NodeLike("identifier") for _ in range(10)]  # start with some leaves
    nodes = leaves[:]
    # Grow the tree until it has at least 'size' nodes
    while len(nodes) < size:
        left = nodes[-1]
        right = NodeLike("identifier")
        parent = NodeLike("statement_block", children=[left, right])
        nodes.append(parent)

    root = nodes[-1]
    fn_node = NodeLike("function_declaration", children=[root], fields={"body": root})
    fn = FunctionNodeLike(node=fn_node, is_generator=False, is_arrow=False)
    analyzer = TreeSitterAnalyzer("javascript")

    # No return nodes in this large tree: must be False.
    codeflash_output = analyzer.has_return_statement(fn, source="function big(){ /* lots of non-return nodes */ }") # 282μs -> 234μs (20.4% faster)

def test_repeated_calls_are_deterministic_under_load():
    # Call has_return_statement many times to ensure deterministic behavior and
    # no stateful side effects across calls.
    return_node = make_return_node()
    fn_node = make_function_node_body([return_node])
    fn = FunctionNodeLike(node=fn_node, is_generator=False, is_arrow=False)
    analyzer = TreeSitterAnalyzer("javascript")

    # Call the method 1000 times and ensure it always yields True.
    for _ in range(1000):
        codeflash_output = analyzer.has_return_statement(fn, source="function repeated(){ return 1; }") # 609μs -> 605μs (0.650% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1561-2026-02-25T21.27.24 and push.

Primary benefit — runtime: The optimized version reduces average wall-clock time by ~10% (1.31 ms -> 1.19 ms). This win comes from reducing per-call allocations and expensive attribute lookups inside the traversal hot-loop, which compounds when traversing large ASTs or calling the routine repeatedly. What changed (specific optimizations) - Pulled the small, constant collection of function-like node types out to a module-level frozenset (_FUNC_LIKE_NODE_TYPES). That avoids re-creating the tuple on every _node_has_return call. - Reused local variables for child lists inside the loop (body_children = body_node.children and child_nodes = current.children) to avoid repeated attribute lookups. - Kept traversal logic identical; only micro-optimizations that reduce Python overhead were applied. Why this speeds things up (Python-level rationale) - Allocation cost: In the original code func_types = ("function_declaration", ...) creates a new tuple on every call. Even tiny allocations matter when the traversal loops many times. Moving it to a module-level constant removes that allocation. - Membership cost: Using a frozenset for func-like node types makes "current.type in func_types" an O(1) hash lookup instead of a tiny linear scan; for very hot loops this helps and avoids per-call construction overhead. - Attribute lookup reduction: Accessing attributes like current.children or body_node.children repeatedly triggers attribute lookups (and possibly C-to-Python boundary work for tree-sitter Node objects). Binding them once to a local name reduces that overhead inside the loop. - Lower Python interpreter overhead: Fewer temporary objects and fewer attribute lookups reduce work inside the hot while/stack loop, which is where most time is spent (the annotated tests and profiler show the traversal loop dominates). How this affects workloads (when you see the benefit) - Best for deep/large ASTs and repeated calls: Annotated tests show the largest wins on the deep chain and large-tree tests (16–20% faster), exactly the scenarios where the traversal iterates many nodes and the per-iteration savings add up. - Small, trivial functions: A few microbench tests (very small inputs) show negligible differences and a couple are slightly slower by a few percent — this is expected noise/trade-off from different micro-bench timings and is reasonable given the overall runtime improvement on realistic workloads. - Hot path safe: The change is local, preserves behavior, and benefits any code that repeatedly calls has_return_statement or walks big trees. Trade-offs - No behavioral changes were introduced. A small number of microbench cases are a bit slower (annotated tests show a few 3–4% regressions), but those are tiny inputs where the traversal work is minimal. This small regression is an acceptable trade-off for the measurable runtime improvement on realistic workloads. Summary - Primary gain: ~10% overall runtime improvement (1.31ms → 1.19ms). - How: remove per-call tuple allocation, use a module-level frozenset, and reduce repeated attribute lookups in the hot traversal loop. - Good for: large/deep AST traversal and high-call-rate scenarios; negligible or acceptable differences for tiny inputs.

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 25, 2026

codeflash-ai bot mentioned this pull request Feb 25, 2026

[WIP] react framework initial commit #1561

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `TreeSitterAnalyzer.has_return_statement` by 10% in PR #1561 (`add/support_react`)#1670

⚡️ Speed up method `TreeSitterAnalyzer.has_return_statement` by 10% in PR #1561 (`add/support_react`)#1670
codeflash-ai[bot] wants to merge 1 commit intoadd/support_reactfrom
codeflash/optimize-pr1561-2026-02-25T21.27.24

codeflash-ai bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Feb 25, 2026

⚡️ This pull request contains optimizations for PR #1561

📄 10% (0.10x) speedup for TreeSitterAnalyzer.has_return_statement in codeflash/languages/javascript/treesitter_utils.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 10% (0.10x) speedup for `TreeSitterAnalyzer.has_return_statement` in `codeflash/languages/javascript/treesitter_utils.py`