feat: add language version support across multiple language implement… by HeshamHM28 · Pull Request #1680 · codeflash-ai/codeflash

HeshamHM28 · 2026-02-26T23:12:46Z

No description provided.

…ations

…nguage_version Make language_version the single source of truth for version info across all languages. PythonSupport.language_version now returns platform.python_version() instead of None. All API payloads use language_version as canonical, with python_version kept only as a backward-compat shim for the backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CLAassistant · 2026-02-26T23:31:40Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

@timeout

The Java Fibonacci E2E test was failing because AI-generated tests called fibonacci(92)/fibonacci(93) against the naive recursive implementation, which hangs forever. Since all tests run in a single Maven process, this caused a 120s timeout that killed ALL tests, including the fast ones, preventing any baseline from being established. Fix: inject @timeout(30) on each @test method during instrumentation. Individual hanging tests now get killed by JUnit without blocking others. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The default SAME_THREAD mode uses Thread.interrupt() which is silently ignored by CPU-bound code like naive recursive fibonacci. SEPARATE_THREAD runs the test in a new thread and fails it with TimeoutException when the deadline passes, which actually works for tight computational loops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeflash-ai · 2026-02-27T00:19:13Z

codeflash/languages/java/instrumentation.py

+        result_lines: list[str] = []
+        import_added = False
+        for line in lines:
+            result_lines.append(line)
+            # Insert after the last JUnit import line
+            if not import_added and line.strip().startswith("import org.junit.jupiter.api."):
+                # Peek ahead: if the next non-empty line is NOT another import, insert here
+                result_lines.append(timeout_import)
+                import_added = True
+        if not import_added:
+            # Fallback: insert before the first import
+            result_lines2: list[str] = []
+            for line in result_lines:
+                if not import_added and line.strip().startswith("import "):
+                    result_lines2.append(timeout_import)
+                    import_added = True
+                result_lines2.append(line)
+            result_lines = result_lines2
+        source = "\n".join(result_lines)
+        # Deduplicate: the import may appear twice if multiple junit imports existed
+        source = source.replace(f"{timeout_import}\n{timeout_import}", timeout_import)
+
+    # Add @Timeout after each @Test annotation (only if not already present)
+    lines = source.split("\n")
+    result_lines = []
+    for i, line in enumerate(lines):
+        result_lines.append(line)
+        stripped = line.strip()
+        if _is_test_annotation(stripped):
+            # Check if the next non-blank line is already @Timeout
+            next_idx = i + 1
+            while next_idx < len(lines) and not lines[next_idx].strip():
+                next_idx += 1
+            if next_idx >= len(lines) or not lines[next_idx].strip().startswith("@Timeout"):


⚡️Codeflash found 20% (0.20x) speedup for _add_per_test_timeout in codeflash/languages/java/instrumentation.py

⏱️ Runtime : 3.86 milliseconds → 3.22 milliseconds (best of 250 runs)

📝 Explanation and details

Runtime improvement (primary): the optimized version reduces average wall time from 3.86 ms to 3.22 ms — a 19% speedup — by cutting redundant string work and reducing list/appends during the import/annotation pass.

What changed (concrete optimizations)

Precompute stripped lines once: computed stripped_lines = [line.strip() for line in lines] and then reuse it for all checks. The original code called line.strip() repeatedly inside the main loop and inside the blank-line skipping loop; the optimized code does that work once.

Simplified import insertion: instead of building result_lines and doing a second pass to insert the import, the optimized code finds the insertion index with a single enumerate scan and uses list.insert. This removes multiple temporary lists, extra append calls and condition checks from the common path.

Use index-based iteration and reuse length: iterate by index over lines and reuse stripped_lines[i] (and n = len(lines)), which avoids repeated attribute lookups and redundant strip operations while scanning for the next non-blank line.

Why that yields a speedup

.strip() and string operations dominate cost when processing many lines. By doing strip once per line instead of many times, we convert many small repeated CPU-bound string operations into one predictable pass (the profiler shows that repeated stripping was a major hotspot).

Reducing intermediate list building and append churn (result_lines/result_lines2) lowers Python-level overhead (function calls, list resizing and per-append work) which is significant when the source has many lines / many @test annotations.

The while-loop that skips blank lines now checks precomputed stripped strings, so skipping is cheaper (no repeated .strip() calls per iteration).

Behavioral/Dependency changes

Behavior is preserved: the insertion logic and idempotency checks are the same. The only trade-off is an upfront allocation for stripped_lines (one list of len(lines)), which is negligible for typical file sizes and pays back when the source has many lines.

Trade-offs and when to expect regressions

Small files with only a few lines may see tiny regressions in several microbenchmarks because of the small upfront cost to build stripped_lines and the index-finding work; the annotated tests show a few microsecond slower cases. This is an acceptable trade-off for the overall runtime improvement because the optimization shines when there are many lines (the large-scale test with 1000 tests shows ~25–28% faster).

Memory: slight increase (one extra list of stripped strings). This is linear in number of lines and typically small compared to the source text itself.

Who benefits most

Hot paths that process large Java sources or files with many @test annotations (e.g., batch instrumentation of many tests) will benefit the most — shown by the large-scale test where runtime drops from ~1.38 ms to ~1.10 ms on the first pass and stays efficient on the idempotent second pass.

If this function is used frequently across many files (file-at-a-time transforms), the per-file reduction accumulates into meaningful CPU/time savings.

Summary

Primary win: 19% overall runtime improvement (3.86 ms -> 3.22 ms).

Key techniques: eliminate repeated strip() calls by precomputing stripped_lines, and reduce list/appending churn when inserting imports by finding an insertion index and using insert.

Result: faster processing on large inputs (best-case improvements visible in the annotated large-scale tests) with only a small upfront allocation and negligible change in behavior.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 18 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import re import pytest # used for our unit tests # Import the function under test from the real module where it is defined. from codeflash.languages.java.instrumentation import ( _PER_TEST_TIMEOUT_SECONDS, _add_per_test_timeout) def test_inserts_import_and_timeout_annotation_after_test(): # A minimal Java source with a JUnit Test import and a single @Test method. source = ( "package com.example;\n" "import org.junit.jupiter.api.Test;\n" "\n" "public class ExampleTest {\n" " @Test\n" " public void doesSomething() {\n" " }\n" "}\n" ) # Run the function under test with default timeout. codeflash_output = _add_per_test_timeout(source); result = codeflash_output # 10.00μs -> 10.4μs (4.32% slower) def test_does_not_duplicate_import_if_already_present(): # If the source already contains the Timeout import, it should not be duplicated. source = ( "import org.junit.jupiter.api.Timeout;\n" "import org.junit.jupiter.api.Test;\n" "\n" "class T {\n" " @Test\n" " void x() {}\n" "}\n" ) codeflash_output = _add_per_test_timeout(source); result = codeflash_output # 6.82μs -> 7.34μs (7.11% slower) def test_ignores_test_like_annotations_not_exactly_test(): # Ensure @TestOnly, @TestFactory, @TestTemplate are NOT considered @Test and do not get @Timeout. source = ( "import org.junit.jupiter.api.Test;\n" "class C {\n" " @TestOnly\n" " void a() {}\n" "\n" " @TestFactory\n" " void b() {}\n" "\n" " @TestTemplate\n" " void c() {}\n" "\n" " @Test\n" " void realTest() {}\n" "}\n" ) codeflash_output = _add_per_test_timeout(source); result = codeflash_output # 12.5μs -> 12.3μs (0.891% faster) def test_various_forms_of_test_annotation_recognized(): # Test the different possible syntactic forms of the @Test annotation: # exactly '@Test', '@Test(', and '@Test ' (with a space), and variations with parameters. source = ( "import org.junit.jupiter.api.Test;\n" "class V {\n" " @Test\n" " void a() {}\n" "\n" " @Test()\n" " void b() {}\n" "\n" " @Test ( )\n" " void c() {}\n" "\n" " @Test(timeout = 5000)\n" " void d() {}\n" "}\n" ) codeflash_output = _add_per_test_timeout(source); result = codeflash_output # 13.6μs -> 13.1μs (4.07% faster) def test_existing_timeout_no_additional_annotation_added(): # If a @Timeout annotation already follows the @Test (possibly with blank lines in between), # the function must not insert another @Timeout. source = ( "import org.junit.jupiter.api.Test;\n" "import org.junit.jupiter.api.Timeout;\n" "class E {\n" " @Test\n" "\n" " @Timeout(value = 5, threadMode = Timeout.ThreadMode.SEPARATE_THREAD)\n" " void alreadyTimedOut() {}\n" "}\n" ) codeflash_output = _add_per_test_timeout(source); result = codeflash_output # 6.52μs -> 6.99μs (6.72% slower) def test_when_no_imports_present_still_adds_annotations_but_no_import(): # If the file contains no import lines at all, the function will not add the import, # but it will still add @Timeout annotations after @Test. source = ( "package p;\n" "class N {\n" " @Test\n" " void t() {}\n" "}\n" ) codeflash_output = _add_per_test_timeout(source); result = codeflash_output # 9.60μs -> 9.47μs (1.37% faster) def test_deduplicates_import_added_after_multiple_junit_imports(): # If multiple junit.jupiter imports exist in a row, the algorithm may try to insert # the timeout import multiple times; it should deduplicate adjacent duplicates. source = ( "import org.junit.jupiter.api.Assertions;\n" "import org.junit.jupiter.api.Test;\n" "import org.junit.jupiter.api.Test;\n" "class D {\n" " @Test\n" " void t1() {}\n" "}\n" ) codeflash_output = _add_per_test_timeout(source); result = codeflash_output # 8.88μs -> 9.33μs (4.82% slower) def test_large_scale_many_tests_and_idempotency(): # Build a large Java source with a single junit Test import and many @Test methods. num_tests = 1000 # Use up to 1000 as requested header = "package big;\nimport org.junit.jupiter.api.Test;\n\npublic class BigTest {\n" methods = [] for i in range(num_tests): # Each method has an @Test annotation and a trivial method body. methods.append(f" @Test\n public void test{i}() {{}}\n") footer = "}\n" big_source = header + "\n".join(methods) + footer # Apply the transformation once. codeflash_output = _add_per_test_timeout(big_source); result = codeflash_output # 1.38ms -> 1.10ms (25.5% faster) # Now apply the transformation again to the already-transformed output. # The second application should be idempotent: no new @Timeout lines should be added. codeflash_output = _add_per_test_timeout(result); result2 = codeflash_output # 1.21ms -> 1.09ms (10.9% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import re # used for pattern checks in assertions import pytest # used for our unit tests # import the function and constant from the real module under test from codeflash.languages.java.instrumentation import ( _PER_TEST_TIMEOUT_SECONDS, _add_per_test_timeout) # Helper: construct the canonical import and annotation strings used by the implementation. _TIMEOUT_IMPORT = "import org.junit.jupiter.api.Timeout;" _TIMEOUT_ANNOTATION = f"@Timeout(value = {_PER_TEST_TIMEOUT_SECONDS}, threadMode = Timeout.ThreadMode.SEPARATE_THREAD)" def test_adds_import_and_timeout_after_junit_import_and_test_annotation(): # Simple source with a junit Test import and a single @Test annotation. src = "\n".join( [ "package example;", "import org.junit.jupiter.api.Test;", "public class Example {", " @Test", " public void testOne() {}", "}", ] ) # Run the function under test. codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 8.73μs -> 9.60μs (9.09% slower) def test_does_not_duplicate_import_if_already_present_and_adds_annotations(): # Source already contains the timeout import and a @Test. We should not add a second import. src = "\n".join( [ "package example;", _TIMEOUT_IMPORT, "import org.junit.jupiter.api.Test;", "public class Example2 {", "@Test", "void method() {}", "}", ] ) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 6.11μs -> 6.82μs (10.4% slower) def test_ignores_non_test_annotations_like_testonly_and_testfactory(): # Ensure annotations starting with @Test but not being @Test proper are ignored. src = "\n".join( [ "import org.junit.jupiter.api.Test;", "public class Example3 {", " @TestOnly", " public void helper() {}", " @TestFactory", " public void factory() {}", " @Test", " public void realTest() {}", "}", ] ) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 9.85μs -> 10.2μs (3.71% slower) def test_recognizes_parameterized_test_declarations_and_preserves_spacing(): # Tests with parentheses after @Test should be recognized. src = "\n".join( [ "import org.junit.jupiter.api.Test;", "public class Example4 {", " @Test(expected = RuntimeException.class)", " public void throwsException() {}", " @Test(timeout = 5000)", " public void hasTimeoutAttr() {}", "}", ] ) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 9.66μs -> 9.96μs (3.01% slower) def test_does_not_add_timeout_if_already_present_after_test(): # If a @Timeout is already present after @Test (possibly separated by blank lines), # the function should not insert another @Timeout. src = "\n".join( [ "import org.junit.jupiter.api.Test;", "import org.junit.jupiter.api.Timeout;", "public class Example5 {", " @Test", " @Timeout(value = 10, threadMode = Timeout.ThreadMode.SEPARATE_THREAD)", " public void alreadyTimeout() {}", "}", ] ) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 5.69μs -> 6.17μs (7.81% slower) def test_inserts_import_before_first_import_when_no_junit_imports_present(): # When there are imports, but none from org.junit.jupiter.api., the timeout import should be inserted before the first import. src = "\n".join( [ "package example;", "import java.util.List;", "import com.example.Foo;", "public class Example6 {", " @Test", " public void t() {}", "}", ] ) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 9.96μs -> 10.1μs (1.67% slower) # The timeout import should be inserted before the first import line. first_import_index = out.index("import java.util.List;") inserted_index = out.index(_TIMEOUT_IMPORT) def test_no_import_added_when_no_import_lines_exist_but_annotation_is_added(): # If the source contains no import lines at all, the function will not add the timeout import, # but it should still add the per-test timeout annotation after @Test. src = "\n".join( [ "package example;", "public class Example7 {", " @Test", " void lonelyTest() {}", "}", ] ) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 8.49μs -> 8.86μs (4.18% slower) def test_preserves_indentation_and_inserts_before_blank_lines(): # If there are blank lines between @Test and the method, the function should still insert @Timeout # immediately after @Test and before the blank lines, preserving indentation. src = "\n".join( [ "import org.junit.jupiter.api.Test;", "public class Example8 {", " @Test", # 8 spaces indentation "", " public void spacedTest() {}", "}", ] ) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 8.44μs -> 8.92μs (5.39% slower) def test_large_scale_insertion_for_many_tests(): # Large-scale test: create a source with a single junit import and many @Test annotations to verify performance # and correctness at scale. Use 1000 annotations as required by the spec. n = 1000 lines = ["package big;"] # include one junit import to anchor insertion lines.append("import org.junit.jupiter.api.Test;") lines.append("public class BigTests {") # add many test methods each with an @Test annotation for i in range(n): lines.append(" @Test") lines.append(f" public void test{i}() {{}}") lines.append("}") src = "\n".join(lines) codeflash_output = _add_per_test_timeout(src); out = codeflash_output # 1.14ms -> 888μs (27.9% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1680-2026-02-27T00.19.12

Click to see suggested changes

Suggested change

result_lines: list[str] = []

import_added = False

for line in lines:

result_lines.append(line)

# Insert after the last JUnit import line

if not import_added and line.strip().startswith("import org.junit.jupiter.api."):

# Peek ahead: if the next non-empty line is NOT another import, insert here

result_lines.append(timeout_import)

import_added = True

if not import_added:

# Fallback: insert before the first import

result_lines2: list[str] = []

for line in result_lines:

if not import_added and line.strip().startswith("import "):

result_lines2.append(timeout_import)

import_added = True

result_lines2.append(line)

result_lines = result_lines2

source = "\n".join(result_lines)

# Deduplicate: the import may appear twice if multiple junit imports existed

source = source.replace(f"{timeout_import}\n{timeout_import}", timeout_import)

# Add @Timeout after each @Test annotation (only if not already present)

lines = source.split("\n")

result_lines = []

for i, line in enumerate(lines):

result_lines.append(line)

stripped = line.strip()

if _is_test_annotation(stripped):

# Check if the next non-blank line is already @Timeout

next_idx = i + 1

while next_idx < len(lines) and not lines[next_idx].strip():

next_idx += 1

if next_idx >= len(lines) or not lines[next_idx].strip().startswith("@Timeout"):

# Insert after the last JUnit import line

# (original logic inserts after the first junit import encountered;

# preserve that behavior: insert after first "import org.junit.jupiter.api." occurrence)

idx_first_junit = None

for idx, line in enumerate(lines):

if line.strip().startswith("import org.junit.jupiter.api."):

idx_first_junit = idx

break

if idx_first_junit is not None:

lines.insert(idx_first_junit + 1, timeout_import)

else:

# Fallback: insert before the first import

idx_first_import = None

for idx, line in enumerate(lines):

if line.strip().startswith("import "):

idx_first_import = idx

break

if idx_first_import is not None:

lines.insert(idx_first_import, timeout_import)

# else: no imports present, do not add

source = "\n".join(lines)

# Deduplicate: the import may appear twice if multiple junit imports existed

# Deduplicate: the import may appear twice if multiple junit imports existed

source = source.replace(f"{timeout_import}\n{timeout_import}", timeout_import)

# Add @Timeout after each @Test annotation (only if not already present)

lines = source.split("\n")

stripped_lines = [line.strip() for line in lines]

result_lines = []

n = len(lines)

for i in range(n):

line = lines[i]

result_lines.append(line)

stripped = stripped_lines[i]

if _is_test_annotation(stripped):

# Check if the next non-blank line is already @Timeout

next_idx = i + 1

while next_idx < n and not stripped_lines[next_idx]:

next_idx += 1

if next_idx >= n or not stripped_lines[next_idx].startswith("@Timeout"):

Behavior-mode instrumentation captured function return values as Object, causing compilation errors when substituted back into generic contexts (e.g. List<Long>.add(func())). Fixed by extracting the target function's return type via tree-sitter and casting the Object variable back to the correct type when _infer_array_cast_type cannot determine it from context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add language version support across multiple language implement…

e8de7ab

…ations

HeshamHM28 requested a review from misrasaurabh1 February 26, 2026 23:13

Ubuntu and others added 2 commits February 26, 2026 23:58

codeflash-ai bot reviewed Feb 27, 2026

View reviewed changes

HeshamHM28 force-pushed the feat/java/wire-language-version branch from 2f9d026 to 15e0d16 Compare February 27, 2026 02:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add language version support across multiple language implement…#1680

feat: add language version support across multiple language implement…#1680
HeshamHM28 wants to merge 5 commits intoomni-javafrom
feat/java/wire-language-version

HeshamHM28 commented Feb 26, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 26, 2026

Uh oh!

codeflash-ai bot Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 18 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

Conversation

HeshamHM28 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Feb 26, 2026

Uh oh!

codeflash-ai bot Feb 27, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 20% (0.20x) speedup for _add_per_test_timeout in codeflash/languages/java/instrumentation.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HeshamHM28 commented Feb 26, 2026 •

edited

Loading

⚡️Codeflash found 20% (0.20x) speedup for `_add_per_test_timeout` in `codeflash/languages/java/instrumentation.py`