⚡️ Speed up function find_react_components by 26% in PR #1561 (add/support_react)#1682
Open
codeflash-ai[bot] wants to merge 1 commit intoadd/support_reactfrom
Open
Conversation
Runtime improvement: the optimized code reduces end-to-end runtime from ~7.34 ms to ~5.82 ms — a 26% speedup — by removing Python-level work and repeated allocations in the hot path.
What changed (concrete optimizations)
- Cached source bytes: added an lru_cache-backed _encode_source(source) so repeated source.encode("utf-8") calls reuse the same bytes object instead of allocating/encoding every time.
- Faster hook extraction: replaced the Python-level regex iteration + seen-set loop with HOOK_EXTRACT_RE.findall(...) then list(dict.fromkeys(...)) to deduplicate while preserving first-seen order. This shifts most work into C (re.findall and dict construction) and removes per-match Python bookkeeping.
- Cheap early-exit for memo checks: added a fast substring check ("memo(" and "React.memo") to skip the more expensive AST-parent walk and repeated slice+decode operations when memo is not present in the source.
- Minor micro-alloc reduction: switched some ephemeral lists to tuples where appropriate (e.g., memo_patterns) and removed duplicated encode calls elsewhere.
Why these changes speed things up
- Avoiding repeated .encode calls eliminates expensive per-function memory allocations and Python function-call overhead. The original profiler showed significant time in source.encode() sites (e.g., _extract_props_type, _function_returns_jsx). Caching the encoded bytes eliminates these hotspots when the same source string is inspected multiple times (typical when scanning many functions in one file).
- Using regex.findall and dict.fromkeys moves the heavy lifting into C implementations (re engine and dict internals), cutting Python loop/branch overhead. The line profiler shows _extract_hooks_used time dropped substantially.
- The substring check for memo presence is O(n) at C speed and avoids the common-case cost of doing tree/parent inspection and repeated byte-slicing/decoding for every function when memo is not used in the file.
- Together these changes reduce per-function overhead in the main loop of find_react_components, which is where most time is spent for large files.
How this affects real workloads / hot paths
- find_react_components is used during project-wide discovery and in downstream analyzers (see integration tests). When scanning large files with many functions (the realistic hot path), per-function overhead dominates; these changes reduce that overhead, so the largest wins are for big files or many functions in a single source (the annotated large-scale tests show the biggest improvement: ~34% in that test).
- Small files or single-function files still benefit (microsecond-level wins) but the biggest impact is when the analyzer processes hundreds of functions in one source — exactly the scenario exercised by the large-scale annotated test and the integration flows that call find_react_components.
Which tests / cases benefit most
- Large-scale detection and deduping tests (thousands of functions, many repeated hook patterns) get the largest absolute wins because of eliminated allocations and cheaper hook extraction.
- Any test or real workload that repeatedly slices/decodes source bytes for props/memo detection benefits from the cached encoded bytes.
- Small, early-exit scenarios (files with "use server") are unaffected functionally and still return quickly.
Behavioral/implementation notes and trade-offs
- Semantics preserved: the changes do not change detection logic; they only change how data is extracted (same regex, same tree checks).
- Memory trade-off: lru_cache(maxsize=32) will keep recent encoded source bytes alive (small, bounded memory increase). This is an intentional and reasonable trade-off for eliminating repeated encodings in the common case of scanning many functions from the same file.
- The early substring check is conservative: it only avoids the AST/decoding work when memo-like identifiers are absent; when present, the full checks still run so detection correctness is unchanged.
Summary
- Primary benefit: 26% runtime reduction (7.34 ms → 5.82 ms) by cutting Python-level loops and repeated allocations in the hot path.
- Changes are low-risk, preserve behavior, and give the biggest improvements on large files and workloads that scan many functions in the same source (the common case for project analysis).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1561
If you approve this dependent PR, these changes will be merged into the original PR branch
add/support_react.📄 26% (0.26x) speedup for
find_react_componentsincodeflash/languages/javascript/frameworks/react/discovery.py⏱️ Runtime :
7.34 milliseconds→5.82 milliseconds(best of250runs)📝 Explanation and details
Runtime improvement: the optimized code reduces end-to-end runtime from ~7.34 ms to ~5.82 ms — a 26% speedup — by removing Python-level work and repeated allocations in the hot path.
What changed (concrete optimizations)
Why these changes speed things up
How this affects real workloads / hot paths
Which tests / cases benefit most
Behavioral/implementation notes and trade-offs
Summary
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1561-2026-02-27T00.23.56and push.