⚡️ Speed up method TreeSitterAnalyzer.has_return_statement by 10% in PR #1561 (add/support_react)#1670
Open
codeflash-ai[bot] wants to merge 1 commit intoadd/support_reactfrom
Conversation
Primary benefit — runtime: The optimized version reduces average wall-clock time by ~10% (1.31 ms -> 1.19 ms). This win comes from reducing per-call allocations and expensive attribute lookups inside the traversal hot-loop, which compounds when traversing large ASTs or calling the routine repeatedly.
What changed (specific optimizations)
- Pulled the small, constant collection of function-like node types out to a module-level frozenset (_FUNC_LIKE_NODE_TYPES). That avoids re-creating the tuple on every _node_has_return call.
- Reused local variables for child lists inside the loop (body_children = body_node.children and child_nodes = current.children) to avoid repeated attribute lookups.
- Kept traversal logic identical; only micro-optimizations that reduce Python overhead were applied.
Why this speeds things up (Python-level rationale)
- Allocation cost: In the original code func_types = ("function_declaration", ...) creates a new tuple on every call. Even tiny allocations matter when the traversal loops many times. Moving it to a module-level constant removes that allocation.
- Membership cost: Using a frozenset for func-like node types makes "current.type in func_types" an O(1) hash lookup instead of a tiny linear scan; for very hot loops this helps and avoids per-call construction overhead.
- Attribute lookup reduction: Accessing attributes like current.children or body_node.children repeatedly triggers attribute lookups (and possibly C-to-Python boundary work for tree-sitter Node objects). Binding them once to a local name reduces that overhead inside the loop.
- Lower Python interpreter overhead: Fewer temporary objects and fewer attribute lookups reduce work inside the hot while/stack loop, which is where most time is spent (the annotated tests and profiler show the traversal loop dominates).
How this affects workloads (when you see the benefit)
- Best for deep/large ASTs and repeated calls: Annotated tests show the largest wins on the deep chain and large-tree tests (16–20% faster), exactly the scenarios where the traversal iterates many nodes and the per-iteration savings add up.
- Small, trivial functions: A few microbench tests (very small inputs) show negligible differences and a couple are slightly slower by a few percent — this is expected noise/trade-off from different micro-bench timings and is reasonable given the overall runtime improvement on realistic workloads.
- Hot path safe: The change is local, preserves behavior, and benefits any code that repeatedly calls has_return_statement or walks big trees.
Trade-offs
- No behavioral changes were introduced. A small number of microbench cases are a bit slower (annotated tests show a few 3–4% regressions), but those are tiny inputs where the traversal work is minimal. This small regression is an acceptable trade-off for the measurable runtime improvement on realistic workloads.
Summary
- Primary gain: ~10% overall runtime improvement (1.31ms → 1.19ms).
- How: remove per-call tuple allocation, use a module-level frozenset, and reduce repeated attribute lookups in the hot traversal loop.
- Good for: large/deep AST traversal and high-call-rate scenarios; negligible or acceptable differences for tiny inputs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1561
If you approve this dependent PR, these changes will be merged into the original PR branch
add/support_react.📄 10% (0.10x) speedup for
TreeSitterAnalyzer.has_return_statementincodeflash/languages/javascript/treesitter_utils.py⏱️ Runtime :
1.31 milliseconds→1.19 milliseconds(best of250runs)📝 Explanation and details
Primary benefit — runtime: The optimized version reduces average wall-clock time by ~10% (1.31 ms -> 1.19 ms). This win comes from reducing per-call allocations and expensive attribute lookups inside the traversal hot-loop, which compounds when traversing large ASTs or calling the routine repeatedly.
What changed (specific optimizations)
Why this speeds things up (Python-level rationale)
How this affects workloads (when you see the benefit)
Trade-offs
Summary
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1561-2026-02-25T21.27.24and push.