Skip to content

⚡️ Speed up function _add_behavior_instrumentation by 18% in PR #1655 (feat/add/void/func)#1672

Open
codeflash-ai[bot] wants to merge 1 commit intofeat/add/void/funcfrom
codeflash/optimize-pr1655-2026-02-26T00.34.13
Open

⚡️ Speed up function _add_behavior_instrumentation by 18% in PR #1655 (feat/add/void/func)#1672
codeflash-ai[bot] wants to merge 1 commit intofeat/add/void/funcfrom
codeflash/optimize-pr1655-2026-02-26T00.34.13

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 26, 2026

⚡️ This pull request contains optimizations for PR #1655

If you approve this dependent PR, these changes will be merged into the original PR branch feat/add/void/func.

This PR will be automatically closed if the original PR is merged.


📄 18% (0.18x) speedup for _add_behavior_instrumentation in codeflash/languages/java/instrumentation.py

⏱️ Runtime : 7.79 milliseconds 6.59 milliseconds (best of 212 runs)

📝 Explanation and details

Primary benefit — runtime: the change reduces median runtime from 7.79 ms to 6.59 ms (≈18% speedup). This is the reason the optimization was accepted.

What changed (concrete optimizations)

  • Avoid the expensive join + parse work when the target function name is absent:
    • Original always built body_text = "\n".join(body_lines) and checked func_name in body_text (which forces the allocation of the joined string).
    • Optimized does a cheap per-line substring scan (for ln in body_lines: if func_name in ln: ...) and only performs the join / imports / tree-sitter parse when there’s actually a candidate. This avoids allocating/joining and skipping parser imports for the common case where the function isn’t present.
  • Reduce repeated encodings:
    • Original repeatedly called line.encode("utf8") in multiple places.
    • Optimized pre-encodes body lines once into line_bytes_list and reuses those bytes for all byte-length and slicing computations.
  • Simpler method-name heuristic:
    • _extract_test_method_name replaced the more complex modifier/type-specific scanning with a single find("(") and then take the immediate preceding token. This eliminates multiple find/split loops and early-exit complexity while keeping the same fallback to the regex patterns.
  • Avoid per-call allocations in trivial helper:
    • _is_test_annotation no longer constructs a set literal every call (original used return next_char in {" ", "("}), replaced by simple equality checks (next_char == " " or next_char == "("), avoiding ephemeral allocation.

Why these changes produce the speedup (Python performance rationale)

  • Avoiding allocations and heavy string ops is one of the most effective micro-optimizations in Python. Building a large joined string and encoding it is relatively expensive; bypassing that work when unnecessary reduces both CPU and memory churn.
  • Pre-encoding lines avoids repeated encode() calls which were happening in hot loops (encoding is not free). Doing the work once and reusing the result reduces the per-call overhead for byte <-> char offset work.
  • The simplified method-name heuristic cuts the number of find/split/regex attempts in the common cases, reducing work per test method extraction.
  • Removing the per-call set construction in _is_test_annotation removes a tiny allocation that was being done thousands of times (the profiler shows this function is hit heavily).

Profiler evidence

  • _extract_test_method_name total time dropped (profilers show meaningful reduction).
  • wrap_target_calls_with_treesitter now does a cheap per-line scan first; the costly join/parse path is entered far less often.
  • Overall _add_behavior_instrumentation wall-time reduced in tests that exercise many methods or frequently call instrumentation.

Impact on workloads and hot paths

  • This function is used when instrumenting Java tests; function_references and tests show it’s invoked frequently (including large-scale instrumentation loops). The biggest wins are:
    • Large sources with many test methods where the target func_name is not present (the "func_name absent" case): avoids tree-sitter work and big string joins — these tests show the largest relative gains (see annotated_tests: large-scale tests report ~16–26% improvement).
    • Common small-to-medium inputs where many methods are scanned: lower per-method overhead for extracting names and checking annotations.
  • Small regressions can occur for tiny inputs where the per-line scan and extra checks slightly outweigh savings (annotated_tests shows one is_void= True low-volume case ~13% slower). This is an acceptable trade-off because the common, larger scenarios and hot paths (bulk instrumentation) benefit significantly.

Behavior and correctness

  • No change to AST parsing or the instrumentation output logic: the optimizer only adds cheap pre-checks and reduces string/encoding churn. Fallback regex parsing remains the same for ambiguous method signatures.
  • Tests in annotated_tests (including scale tests) validate correctness; improvements are performance-focused and preserve functionality.

Good fit for test cases

  • Best for tests that:
    • Instrument files where func_name is often absent (per-line check short-circuits heavy work).
    • Instrument many methods in one file (pre-encoded lines and lighter name-extraction add up).
  • Small single-method files may see tiny or negligible differences; extremely tiny tests with specific flags (is_void example) may show a small regression but this is outweighed by large-scale gains.

Summary

  • The optimization reduces unnecessary string allocations and repeated encodes, simplifies the common-case method-name heuristic, and avoids entering the expensive tree-sitter path unless needed. These targeted reductions of high-frequency work produced the observed ~18% runtime improvement while keeping behavior intact; small regressions on tiny edge cases are a reasonable trade-off for the consistent speed gains on real workloads.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 13 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 97.5%
🌀 Click to see Generated Regression Tests
import re

import pytest  # used for our unit tests
from codeflash.languages.java.instrumentation import \
    _add_behavior_instrumentation

def test_add_imports_and_behavior_for_simple_test():
    # Simple source with class and one @Test method; func_name absent so no AST parsing required.
    source = (
        "package com.example;\n"
        "public class MyTest {\n"
        "    @Test\n"
        "    public void simpleTest() {\n"
        "        // test body\n"
        "    }\n"
        "}\n"
    )

    # Call the instrumentation function: class_name arbitrary, func_name not present in body to avoid tree-sitter parse.
    codeflash_output = _add_behavior_instrumentation(source, class_name="MyTest", func_name="nonexistentFunction"); result = codeflash_output # 21.5μs -> 19.2μs (11.9% faster)

def test_existing_imports_not_duplicated_and_inserted_after_last_import():
    # Source already contains one of the import statements. The function should not duplicate it,
    # and should add missing imports after the existing import block.
    source = (
        "package com.example;\n"
        "import java.sql.Connection;\n"
        "import some.other.Lib;\n"
        "public class AnotherTest {\n"
        "    @Test\n"
        "    public void testOne() {\n"
        "    }\n"
        "}\n"
    )

    codeflash_output = _add_behavior_instrumentation(source, class_name="AnotherTest", func_name="foo"); result = codeflash_output # 21.2μs -> 18.6μs (14.0% faster)
    # Ensure imports appear before the class declaration (class line should come after the import block)
    # Find index positions to verify ordering
    idx_import = result.find("import java.sql.DriverManager;")
    idx_class = result.find("public class AnotherTest")

def test_annotation_variants_and_non_matching_annotations():
    # Ensure @Test annotations with parameters are recognized, and @TestOnly is ignored.
    source = (
        "public class EdgeAnnotations {\n"
        "    @Test(timeout = 1000)\n"
        "    public void withTimeout() {\n"
        "    }\n\n"
        "    @TestOnly\n"
        "    public void shouldNotInstrument() {\n"
        "    }\n"
        "}\n"
    )

    codeflash_output = _add_behavior_instrumentation(source, class_name="EdgeAnnotations", func_name="none"); result = codeflash_output # 21.4μs -> 19.0μs (13.0% faster)

def test_multiline_method_signature_extraction_and_indentation_preserved():
    # Test method signature split across lines to validate _extract_test_method_name behavior.
    source = (
        "public class MultiLineSig {\n"
        "    @Test\n"
        "    public void\n"
        "    multiLineTest() {\n"
        "        // body\n"
        "    }\n"
        "}\n"
    )

    codeflash_output = _add_behavior_instrumentation(source, class_name="MultiLineSig", func_name="absent"); result = codeflash_output # 19.8μs -> 17.6μs (12.3% faster)

    # Indentation: the behavior lines should be indented (method has 4 spaces + indent logic adds more)
    # Check that the behavior marker ('// Codeflash behavior instrumentation') appears with leading spaces on its line
    match = re.search(r"^\s+// Codeflash behavior instrumentation", result, re.MULTILINE)

def test_many_test_methods_scale_to_1000_methods():
    # Build a class with 1000 simple @Test methods to evaluate scalability.
    n = 1000
    lines = ["package scale;\n", "public class BigTest {\n"]
    for i in range(1, n + 1):
        lines.append("    @Test\n")
        # method signature on one line for simplicity
        lines.append(f"    public void test_{i}() {{\n")
        lines.append("        int x = 1;\n")
        lines.append("    }\n\n")
    lines.append("}\n")
    source = "".join(lines)

    # Call instrumentation; choose func_name absent to avoid tree-sitter dependency
    codeflash_output = _add_behavior_instrumentation(source, class_name="BigTest", func_name="no_such_function"); result = codeflash_output # 5.84ms -> 5.01ms (16.7% faster)

    # Count occurrences of the iteration variable pattern to ensure n instrumentations were added.
    # Each instrumented method includes a unique "int _cf_iter{iter_id} = {iter_id};" line.
    iter_var_occurrences = len(re.findall(r"\bint _cf_iter(\d+) = \d+;", result))

def test_is_void_flag_results_in_no_result_variable_declaration():
    # When is_void=True, wrapper should not attempt to declare per-call result variable.
    source = (
        "public class VoidTest {\n"
        "    @Test\n"
        "    public void voidTest() {\n"
        "        // no calls to instrument (func absent)\n"
        "    }\n"
        "}\n"
    )

    # Use a func_name that does not appear, but set is_void True to ensure branch handling in instrumentation call.
    codeflash_output = _add_behavior_instrumentation(source, class_name="VoidTest", func_name="absent", is_void=True); result = codeflash_output # 57.6μs -> 66.6μs (13.4% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest  # used for our unit tests
from codeflash.languages.java.instrumentation import \
    _add_behavior_instrumentation

def test_basic_imports_and_instrumentation_single_test():
    # A simple Java source with a package, an existing import, a public test method.
    source = (
        "package com.example;\n"
        "import java.util.List;\n"
        "public class MyTest {\n"
        "@Test\n"
        "public void myTest() {\n"
        "    // body without the target function name\n"
        "}\n"
        "}\n"
    )

    # Run the instrumentation function
    codeflash_output = _add_behavior_instrumentation(source, class_name="MyTest", func_name="nonexistentFunc"); result = codeflash_output # 21.3μs -> 18.9μs (12.8% faster)

def test_no_imports_before_class_inserts_imports_and_blank_line():
    # Source with no prior imports; class declaration should cause imports to be added before it.
    source = (
        "class SimpleTest {\n"
        "@Test\n"
        "void t() {\n"
        "}\n"
        "}\n"
    )

    codeflash_output = _add_behavior_instrumentation(source, class_name="SimpleTest", func_name="doesNotExist"); result = codeflash_output # 18.9μs -> 15.8μs (19.8% faster)

    # After instrumentation, the first lines should be our import statements (three lines)
    lines = result.splitlines()

def test_multiple_method_signatures_extracted_correctly():
    # Construct a source with several @Test methods using different declaration styles.
    source = (
        "public class VariedSignatures {\n"
        "@Test\n"
        "public void testOne() {\n"
        "}\n"
        "@Test\n"
        "void packagePrivate() {\n"
        "}\n"
        "@Test\n"
        "private static String abc123() {\n"
        "    return \"ok\";\n"
        "}\n"
        "}\n"
    )

    codeflash_output = _add_behavior_instrumentation(source, class_name="VariedSignatures", func_name="X_not_present"); result = codeflash_output # 34.5μs -> 28.3μs (21.9% faster)

def test_does_not_instrument_testonly_annotation():
    # Methods annotated with @TestOnly should NOT be treated as @Test and therefore not instrumented.
    source = (
        "public class NonTestAnnotation {\n"
        "@TestOnly\n"
        "void notATest() {\n"
        "}\n"
        "}\n"
    )

    codeflash_output = _add_behavior_instrumentation(source, class_name="NonTestAnnotation", func_name="nope"); result = codeflash_output # 7.32μs -> 7.37μs (0.665% slower)

def test_existing_imports_are_not_duplicated():
    # If the source already contains one of the imports, the function should not add a duplicate.
    source = (
        "import java.sql.Connection;\n"
        "public class HasConnectionImport {\n"
        "@Test\n"
        "public void single() {\n"
        "}\n"
        "}\n"
    )

    codeflash_output = _add_behavior_instrumentation(source, class_name="HasConnectionImport", func_name="absent"); result = codeflash_output # 18.7μs -> 16.1μs (16.3% faster)

def test_empty_source_returns_empty_string():
    # An empty source should be handled gracefully and return an empty string.
    codeflash_output = _add_behavior_instrumentation("", class_name="Empty", func_name="none"); result = codeflash_output # 3.35μs -> 3.30μs (1.52% faster)

def test_many_test_methods_instrumented_large_scale():
    # Create a class with many @Test methods (stress test up to several hundred).
    N = 300  # size within the allowed up-to-1000 range for large-scale testing
    lines = ["public class ManyTests {"]
    for i in range(N):
        # Each method has an @Test annotation and a unique method name that does NOT contain the target func_name.
        lines.append("@Test")
        lines.append(f"void t{i}() {{")
        lines.append("    // no matching function name inside body")
        lines.append("}")
    lines.append("}")
    source = "\n".join(lines) + "\n"

    codeflash_output = _add_behavior_instrumentation(source, class_name="ManyTests", func_name="absent_func"); result = codeflash_output # 1.71ms -> 1.36ms (25.8% faster)

    # Ensure we added a behavior instrumentation block for each method by checking for the test-name assignments.
    for idx in range(1, N + 1):
        expected = f'String _cf_test{idx} = "t{idx-1}";'

    # Also check that the marker comment appears exactly N times (one per method instrumentation start).
    marker_count = result.count("// Codeflash behavior instrumentation")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1655-2026-02-26T00.34.13 and push.

Codeflash Static Badge

Primary benefit — runtime: the change reduces median runtime from 7.79 ms to 6.59 ms (≈18% speedup). This is the reason the optimization was accepted.

What changed (concrete optimizations)
- Avoid the expensive join + parse work when the target function name is absent:
  - Original always built body_text = "\n".join(body_lines) and checked func_name in body_text (which forces the allocation of the joined string).
  - Optimized does a cheap per-line substring scan (for ln in body_lines: if func_name in ln: ...) and only performs the join / imports / tree-sitter parse when there’s actually a candidate. This avoids allocating/joining and skipping parser imports for the common case where the function isn’t present.
- Reduce repeated encodings:
  - Original repeatedly called line.encode("utf8") in multiple places.
  - Optimized pre-encodes body lines once into line_bytes_list and reuses those bytes for all byte-length and slicing computations.
- Simpler method-name heuristic:
  - _extract_test_method_name replaced the more complex modifier/type-specific scanning with a single find("(") and then take the immediate preceding token. This eliminates multiple find/split loops and early-exit complexity while keeping the same fallback to the regex patterns.
- Avoid per-call allocations in trivial helper:
  - _is_test_annotation no longer constructs a set literal every call (original used return next_char in {" ", "("}), replaced by simple equality checks (next_char == " " or next_char == "("), avoiding ephemeral allocation.

Why these changes produce the speedup (Python performance rationale)
- Avoiding allocations and heavy string ops is one of the most effective micro-optimizations in Python. Building a large joined string and encoding it is relatively expensive; bypassing that work when unnecessary reduces both CPU and memory churn.
- Pre-encoding lines avoids repeated encode() calls which were happening in hot loops (encoding is not free). Doing the work once and reusing the result reduces the per-call overhead for byte <-> char offset work.
- The simplified method-name heuristic cuts the number of find/split/regex attempts in the common cases, reducing work per test method extraction.
- Removing the per-call set construction in _is_test_annotation removes a tiny allocation that was being done thousands of times (the profiler shows this function is hit heavily).

Profiler evidence
- _extract_test_method_name total time dropped (profilers show meaningful reduction).
- wrap_target_calls_with_treesitter now does a cheap per-line scan first; the costly join/parse path is entered far less often.
- Overall _add_behavior_instrumentation wall-time reduced in tests that exercise many methods or frequently call instrumentation.

Impact on workloads and hot paths
- This function is used when instrumenting Java tests; function_references and tests show it’s invoked frequently (including large-scale instrumentation loops). The biggest wins are:
  - Large sources with many test methods where the target func_name is not present (the "func_name absent" case): avoids tree-sitter work and big string joins — these tests show the largest relative gains (see annotated_tests: large-scale tests report ~16–26% improvement).
  - Common small-to-medium inputs where many methods are scanned: lower per-method overhead for extracting names and checking annotations.
- Small regressions can occur for tiny inputs where the per-line scan and extra checks slightly outweigh savings (annotated_tests shows one is_void= True low-volume case ~13% slower). This is an acceptable trade-off because the common, larger scenarios and hot paths (bulk instrumentation) benefit significantly.

Behavior and correctness
- No change to AST parsing or the instrumentation output logic: the optimizer only adds cheap pre-checks and reduces string/encoding churn. Fallback regex parsing remains the same for ambiguous method signatures.
- Tests in annotated_tests (including scale tests) validate correctness; improvements are performance-focused and preserve functionality.

Good fit for test cases
- Best for tests that:
  - Instrument files where func_name is often absent (per-line check short-circuits heavy work).
  - Instrument many methods in one file (pre-encoded lines and lighter name-extraction add up).
- Small single-method files may see tiny or negligible differences; extremely tiny tests with specific flags (is_void example) may show a small regression but this is outweighed by large-scale gains.

Summary
- The optimization reduces unnecessary string allocations and repeated encodes, simplifies the common-case method-name heuristic, and avoids entering the expensive tree-sitter path unless needed. These targeted reductions of high-frequency work produced the observed ~18% runtime improvement while keeping behavior intact; small regressions on tiny edge cases are a reasonable trade-off for the consistent speed gains on real workloads.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants