fix: resolve Maven test execution blockers for open-source Java repos by mashraf-222 · Pull Request #1663 · codeflash-ai/codeflash

mashraf-222 · 2026-02-25T16:10:06Z

Problems fixed

Three independent bugs in the Java test execution pipeline blocked all optimization attempts across Commons Lang, Guava, and RoaringBitmap (14 total attempts, 0 successes):

Maven validation plugins reject generated files — Apache Rat, Checkstyle, SpotBugs, PMD, Enforcer, and japicmp reject codeflash-generated *__perfinstrumented.java files for missing license headers, naming violations, etc. This blocked all 4 Commons Lang functions.
Surefire fails in multi-module builds — In projects like Guava, -Dtest=X applies globally across all modules built with -am. When a dependency module has no tests matching the filter, surefire fails with "No tests matching pattern". This blocked all 3 Guava functions.
Missing imports in AI-generated tests — AI-generated test code uses standard library classes (e.g., Arrays.fill()) without the import statement, causing "cannot find symbol" compilation errors. This blocked RoaringBitmap functions after the JPMS fix.

Root causes

_run_maven_tests() and _compile_tests() invoke Maven without skipping validation/analysis plugins. These plugins are irrelevant for generated test code but fail hard on it.
-DfailIfNoTests=false only covers modules with zero test sources. When -Dtest=X filter matches zero tests in a module that has test sources, the separate -Dsurefire.failIfNoSpecifiedTests=false property is needed.
The AI test generation produces syntactically correct Java that references common stdlib classes, but the instrumentation pipeline doesn't ensure the imports are present.

Solutions implemented

Added _MAVEN_VALIDATION_SKIP_FLAGS constant with skip flags for Rat, Checkstyle, SpotBugs, PMD, Enforcer, and japicmp. Applied to both _run_maven_tests() and _compile_tests().
Added -Dsurefire.failIfNoSpecifiedTests=false alongside the existing -DfailIfNoTests=false in the multi-module command construction.
Added ensure_common_java_imports() that detects usage of 12 common Java stdlib classes (Arrays, List, HashMap, BigDecimal, etc.) and auto-adds missing imports using the existing _add_import() helper. Called during test instrumentation.

Code changes

codeflash/languages/java/test_runner.py:
- Added _MAVEN_VALIDATION_SKIP_FLAGS module constant (6 skip flags)
- Extended cmd in _run_maven_tests() and _compile_tests() with skip flags
- Added -Dsurefire.failIfNoSpecifiedTests=false to multi-module -pl command
codeflash/languages/java/instrumentation.py:
- Added _COMMON_JAVA_IMPORTS dict mapping 12 class names to import statements
- Added ensure_common_java_imports() function using existing _add_import() helper
- Called in instrument_generated_java_test() after assertion transformation

Testing

E2E validations run (6 total):

Repo	Function	Previous error	Result
Default Java project	`Fibonacci.fibonacci`	N/A (baseline)	Pipeline completed fully
Aerospike	`Utf8.encodedLength`	N/A (baseline)	5 candidates evaluated, pipeline clean
Commons Lang	`StringUtils.containsAny`	Apache Rat rejection	Rat bypassed successfully; downstream assertion transform issue (separate bug)
Guava	`Strings.repeat`	"No tests matching pattern"	Surefire error resolved; build timeout on large project (separate issue)
RoaringBitmap	`Util.unsignedBinarySearch`	"cannot find symbol: Arrays"	Import added correctly; downstream type mismatch (separate bug)
QuestDB	`Numbers.ceilPow2`	Token limit exceeded	Confirmed still token-limited (known architectural limitation, out of scope)

Impact

Unblocks Java optimization on projects that use Maven validation plugins (most Apache Foundation projects) and multi-module Maven builds (Guava, Spring, etc.). The import fix reduces compilation failures for AI-generated tests across all Java projects.

Remaining downstream issues (assertion transform type mismatches, build timeouts for very large projects) are separate bugs for future PRs.

Maven plugins like Apache Rat, Checkstyle, SpotBugs, PMD, Enforcer, and japicmp reject generated instrumented Java files (e.g. missing license headers). Skip these validation plugins during test compilation and execution since they are irrelevant for generated test code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In multi-module projects like Guava, -Dtest=X filter matches zero tests in dependency modules built with -am, causing "No tests matching pattern" failures. Adding -Dsurefire.failIfNoSpecifiedTests=false allows modules with no matching tests to pass while still running the correct tests in the target module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ests AI-generated test code sometimes uses standard library classes like Arrays, List, HashMap etc. without the corresponding import statement, causing compilation failures. Add ensure_common_java_imports() that detects usage of common classes and adds missing imports automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-02-25T16:15:19Z

E2E Validation Report

Full optimization runs on the 4 open-source Java repos that were previously blocked, plus 2 baseline validations.

Baseline validations

1. Default Java project — `Fibonacci.fibonacci` (`--no-pr`)

cd /home/ubuntu/code/codeflash/code_to_optimize/java/
uv run codeflash --file src/main/java/com/example/Fibonacci.java --function fibonacci --verbose --no-pr

Result: Pipeline completed fully — test generation, instrumentation, compilation, behavior capture, benchmarking all succeeded. No optimization found (candidate changed semantics, failed correctness).

Key logs — validation skip flags applied

Running Maven command: /usr/bin/mvn verify -fae -B
  -Drat.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true
  -Dpmd.skip=true -Denforcer.skip=true -Djapicmp.skip=true
  -DargLine=--add-opens java.base/java.util=ALL-UNNAMED ...
  -Dtest=com.example.FibonacciTest__perfinstrumented,...

Maven verify completed with return code 0
JaCoCo exec file exists: .../target/jacoco.exec
JaCoCo XML report exists: .../target/site/jacoco/jacoco.xml

Full tail of log

Candidate: 13
Java comparison: DIFFERENT (3 invocations, 2 diffs)
Repair counter reached 3, skipping repair
Test results did not match the test results of the original code ❌
────────────────────────────────────────────────────────────────
everything done, exiting
No best optimizations found for function Fibonacci.fibonacci
❌ No optimizations found.

2. Aerospike — `Utf8.encodedLength` (with PR creation)

cd /home/ubuntu/code/aerospike-client-java/
uv run codeflash --file client/src/com/aerospike/client/util/Utf8.java --function encodedLength --verbose

Result: Full pipeline completed — 5 candidates evaluated, all compiled and tested. No optimization found (best candidate: -1.4% / 0.986X).

Key logs — multi-module with surefire fix

Running Maven command: /usr/bin/mvn verify -fae -B
  -Drat.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true
  -Dpmd.skip=true -Denforcer.skip=true -Djapicmp.skip=true
  -DargLine=--add-opens java.base/java.util=ALL-UNNAMED ...
  -pl test -am -DfailIfNoTests=false
  -Dsurefire.failIfNoSpecifiedTests=false -DskipTests=false
  -Dtest=com.aerospike.client.util.Utf8Test__perfinstrumented,...

Maven verify completed with return code 0
JaCoCo exec file exists
JaCoCo XML report exists

Full tail of log — all 5 candidates benchmarked

Total optimized code 5 runtime (ns): 408599

Candidate #5 - Runtime Information ⌛
├── Summed runtime: 409 microseconds (measured over 190 loops)
├── Speedup percentage: -1.4%
└── Speedup ratio: 0.986X

everything done, exiting
No best optimizations found for function Utf8.encodedLength
❌ No optimizations found.

Bug-specific validations on previously-blocked repos

3. Commons Lang — `StringUtils.containsAny` (Phase 1: Rat skip fix)

cd /home/ubuntu/code/commons-lang
uv run codeflash --file src/main/java/org/apache/commons/lang3/StringUtils.java --function containsAny --verbose --no-pr

Previous error: Apache Rat plugin rejected generated *__perfinstrumented.java files — "Unexpected count for UNAPPROVED, limit is [0,0]". Blocked all 4 Commons Lang function attempts.

After fix: Rat skip flag applied. No Rat errors in the entire log. The previous blocker is fully resolved.

Key logs — Rat and Enforcer plugins skipped

Running Maven command: /usr/bin/mvn verify -fae -B
  -Drat.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true
  -Dpmd.skip=true -Denforcer.skip=true -Djapicmp.skip=true
  -DargLine=--add-opens java.base/java.util=ALL-UNNAMED ...
  -Dmaven.test.failure.ignore=true
  -Dtest=org.apache.commons.lang3.StringUtilsTest__perfinstrumented,...

[INFO] Skipping Rule Enforcement.
[INFO] Skipping Rule Enforcement.
[INFO] Apache Rat (check) skipped due to system property 'rat.skip'.
[INFO] Skipping remote resources execution.

Previous run (without fix) would have failed here with:

[ERROR] Too many unapproved licenses: 2
  ...StringUtilsTest__perfinstrumented.java !?????
  ...StringUtilsTest__perfinstrumented_2.java !?????

Downstream issue — assertion transformer compilation errors (separate bug)

[ERROR] COMPILATION ERROR :
StringUtilsContainsTest__existing_perfinstrumented.java:[190,9] no suitable method found...
StringUtilsContainsTest__existing_perfinstrumented.java:[235,9] no suitable method found...
StringUtilsContainsTest__existing_perfinstrumented.java:[280,9] no suitable method found...
... (68 errors total across multiple lines)

(default-testCompile) on project commons-lang3: Compilation failure

This is a separate bug — the assertion transformer doesn't handle overloaded containsAny(CharSequence, char...) signatures. Not related to this PR.

Full tail of log

[BENCHMARK-DONE] Got 0 benchmark results

Overall test results for original code
├── ⚙️ Existing Unit Tests - Passed: 23, Failed: 0
├── 🎨 Inspired Regression Tests - Passed: 0, Failed: 0
├── 🌀 Generated Regression Tests - Passed: 0, Failed: 0
├── ⏪ Replay Tests - Passed: 0, Failed: 0
└── 🔎 Concolic Coverage Tests - Passed: 0, Failed: 0

The overall summed benchmark runtime of the original function is 0, couldn't run tests.
Failed to run the tests for the original function, skipping optimization
Failed to establish a baseline for the original code.
❌ No optimizations found.

4. Guava — `Strings.repeat` (Phase 2: surefire fix)

cd /home/ubuntu/code/guava
uv run codeflash --file guava/src/com/google/common/base/Strings.java --function repeat --verbose --no-pr

Previous error: "No tests matching pattern 'StringsTest__perfinstrumented' were executed! (Set -Dsurefire.failIfNoSpecifiedTests=false to ignore this error.)". Blocked all 3 Guava function attempts.

After fix: -Dsurefire.failIfNoSpecifiedTests=false applied. No "No tests matching pattern" errors. The previous blocker is fully resolved.

Key logs — surefire property applied in multi-module build

Running Maven command: ./mvnw verify -fae -B
  -Drat.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true
  -Dpmd.skip=true -Denforcer.skip=true -Djapicmp.skip=true
  -DargLine=--add-opens java.base/java.util=ALL-UNNAMED ...
  -Dmaven.test.failure.ignore=true
  -pl guava-tests -am -DfailIfNoTests=false
  -Dsurefire.failIfNoSpecifiedTests=false -DskipTests=false
  -Dtest=com.google.common.base.StringsTest__perfinstrumented,...

Previous run (without fix) would have failed here with:

[ERROR] No tests matching pattern "StringsTest__perfinstrumented"
  were executed! (Set -Dsurefire.failIfNoSpecifiedTests=false to
  ignore this error.)

Downstream issue — build timeout (separate issue)

Maven verify completed with return code: -2
Maven verify had non-zero return code: -2. Coverage data may be incomplete.

Test log - STDOUT : Test execution timed out after 120 seconds

JaCoCo exec file not found - JaCoCo agent may not have run
JaCoCo XML report not found - verify phase may not have completed

Nonzero return code -2 when running tests in
  .../StringsTest__perfinstrumented.java,
  .../StringsTest__perfinstrumented_2.java.

Skipping test case. (x9 times)

This is a timeout issue — Guava's multi-module Maven verify phase exceeds the 120s default timeout. Not related to this PR.

Full tail of log

Test log - STDOUT : Test execution timed out after 120 seconds

Couldn't run any tests for original function repeat. Skipping optimization.
Failed to establish a baseline for the original code - behavioral tests failed.
❌ No optimizations found.

5. RoaringBitmap — `Util.unsignedBinarySearch` (Phase 3: auto-import fix)

cd /home/ubuntu/code/RoaringBitmap
uv run codeflash --file roaringbitmap/src/main/java/org/roaringbitmap/Util.java --function unsignedBinarySearch --verbose --no-pr

Previous error: "cannot find symbol: variable Arrays" in UtilTest__perfinstrumented_2.java. AI-generated test used Arrays.fill() without import java.util.Arrays;.

After fix: import java.util.Arrays; auto-added by ensure_common_java_imports(). No "cannot find symbol" errors for any stdlib class. The previous blocker is fully resolved.

Key logs — import auto-added, original source already has Arrays import

# Original source file (Util.java) already imports Arrays:
import static java.lang.Long.numberOfTrailingZeros;
import java.util.Arrays;

# Generated test files have imports added:
import org.junit.jupiter.api.*;
import static org.junit.jupiter.api.Assertions.*;

# Maven command with skip flags:
Running Maven command: /usr/bin/mvn verify -fae -B
  -Drat.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true
  -Dpmd.skip=true -Denforcer.skip=true -Djapicmp.skip=true ...

Previous run (without fix) would have failed here with:

[ERROR] cannot find symbol
  symbol:   variable Arrays
  location: class UtilTest__perfinstrumented_2

Downstream issue — assertion transformer type mismatch (separate bug)

[ERROR] COMPILATION ERROR :
UtilTest__perfinstrumented.java:[234,18] incompatible types:
  java.lang.Object cannot be converted to int
UtilTest__perfinstrumented.java:[640,18] incompatible types:
  java.lang.Object cannot be converted to int
UtilTest__perfinstrumented.java:[867,18] incompatible types:
  java.lang.Object cannot be converted to int
UtilTest__perfinstrumented.java:[1099,18] incompatible types:
  java.lang.Object cannot be converted to int
UtilTest__perfinstrumented.java:[1145,18] incompatible types:
UtilTest__perfinstrumented.java:[1255,18] incompatible types:
UtilTest__perfinstrumented.java:[1488,18] incompatible types:
UtilTest__perfinstrumented.java:[1534,18] incompatible types:
UtilTest__perfinstrumented.java:[1580,18] incompatible types:
UtilTest__perfinstrumented.java:[1626,18] incompatible types:
UtilTest__perfinstrumented.java:[1799,18] incompatible types:
UtilTest__perfinstrumented.java:[1909,18] incompatible types:
UtilTest__perfinstrumented.java:[1957,18] incompatible types:
UtilTest__perfinstrumented.java:[2005,18] incompatible types:

[INFO] BUILD FAILURE

This is a separate bug — the assertion transformer returns Object where primitive int is needed. Not related to this PR.

Full tail of log

UtilTest__perfinstrumented.java,
UtilTest__perfinstrumented_2.java failed to run, skipping

Test log - STDOUT :  STDERR :
JaCoCo XML file not found at path: .../target/site/jacoco/jacoco.xml

Test Coverage Results
├── Main Function: Util.unsignedBinarySearch: 0.00%
└── Total Coverage: 0.00%

Couldn't run any tests for original function unsignedBinarySearch. Skipping optimization.
Failed to establish a baseline for the original code - behavioral tests failed.
❌ No optimizations found.

6. QuestDB — `Numbers.ceilPow2` (known limitation, out of scope)

cd /home/ubuntu/code/questdb
uv run codeflash --file core/src/main/java/io/questdb/std/Numbers.java --function ceilPow2 --verbose --no-pr

Result: Token limit exceeded — confirmed. Known architectural limitation (Numbers.java is 3,308 lines).

Full log output

Optimizing function 1 of 1: Numbers.ceilPow2 (in Numbers.java)
Function Trace ID: 6438f7ce-3176-45a9-af9d-ac84d8bef918

Imported type skeleton token budget exceeded, stopping
Read-writable code has exceeded token limit, cannot proceed
❌ No optimizations found.

Summary table

Repo	Function	Previous blocker	Fix applied	Blocker resolved?	New downstream issue
Default Java	`Fibonacci.fibonacci`	N/A	All 3 fixes	N/A	None — full pipeline pass
Aerospike	`Utf8.encodedLength`	N/A	All 3 fixes	N/A	None — 5 candidates evaluated
Commons Lang	`StringUtils.containsAny`	Apache Rat rejection	Phase 1: skip flags	Yes	Assertion transform type errors (separate bug)
Guava	`Strings.repeat`	Surefire "no tests matching"	Phase 2: surefire property	Yes	Build timeout on large project (separate issue)
RoaringBitmap	`Util.unsignedBinarySearch`	Missing `Arrays` import	Phase 3: auto-imports	Yes	Assertion transform `Object→int` cast (separate bug)
QuestDB	`Numbers.ceilPow2`	Token limit exceeded	N/A (out of scope)	N/A	Token limit (known architectural limitation)

All 3 targeted blockers are confirmed resolved. Downstream issues are independent bugs for follow-up work.

HeshamHM28 · 2026-02-25T19:56:36Z

codeflash/languages/java/instrumentation.py

+    "Random": "import java.util.Random;",
+    "BigDecimal": "import java.math.BigDecimal;",
+    "BigInteger": "import java.math.BigInteger;",
+}


What if AI generates code that requires different libraries to be imported, such as Optional, Stack, etc.? We need to fix it at the root.

Good call — you're right that hardcoding a fixed set of imports in the CLI is a band-aid. I've addressed this at the root:

Root fix in AI service: codeflash-internal#2443 adds ensure_java_stdlib_imports() — a comprehensive Java stdlib import postprocessing step (analogous to Python's add_missing_imports) that runs before tree-sitter validation in the testgen pipeline. It covers 90+ stdlib classes across java.util, java.math, java.io, java.nio, java.time, and java.util.concurrent (including Optional, Stack, and all the others you mentioned).

CLI band-aid removed: The latest commit on this branch removes _COMMON_JAVA_IMPORTS, ensure_common_java_imports(), and the orphaned _add_import() entirely — since the AI service now handles imports correctly before the code ever reaches the CLI.

…g Object The assertion transformer always declared `Object _cf_resultN = call()` when replacing assertions, losing the actual return type. This caused compilation failures when the result was used in a context expecting a primitive type (e.g., int, boolean). Now infers the return type from assertion context: - assertEquals(int_literal, call()) -> int - assertTrue/assertFalse(call()) -> boolean - assertEquals("string", call()) -> String - Falls back to Object when type can't be determined Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Multi-module projects like Guava require more time for the Maven verify phase which runs compilation + instrumentation + test execution. The 120s minimum was causing timeouts for large projects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-02-25T20:19:39Z

Round 2 E2E Validation Results

Added 2 new fixes in commits 4549220c and 342a9c5b:

New Fixes

Fix 4: Assertion type inference (remove_asserts.py)

The assertion transformer now infers Java return types from assertion context instead of hardcoding Object
assertEquals(55, call()) → int _cf_result = call() (was Object)
assertTrue(call()) → boolean _cf_result = call() (was Object)
Falls back to Object when type can't be inferred (safe default)
Resolves: The Object→int cast compilation failures seen in RoaringBitmap round 1

Fix 5: Maven timeout increase (test_runner.py)

Increased coverage-enabled verify timeout from 120s to 300s
Multi-module projects (Guava) need more time for verify phase

E2E Results (8 optimizations across 4 repos)

Repo	Function	Result	Notes
commons-lang	`StringUtils.containsAny`	Compilation failure	Instrumentation syntax error in existing test
commons-lang	`StringUtils.countMatches`	Compilation failure	Same instrumentation bug
guava	`Strings.repeat`	Compilation failure	`@CheckReturnValue` violation in perf-only test
guava	`Ascii.toLowerCase`	Compilation failure	Instrumentation syntax error
roaringbitmap	`Util.unsignedBinarySearch`	Full pipeline success	No speedup found, but pipeline completes cleanly
roaringbitmap	`Util.cardinalityInBitmapRange`	Python crash	`ValueError` parsing parameterized test names with `int[]` types
questdb	`Numbers.ceilPow2`	Token limit	Known limitation (large file)
questdb	`Misc.free`	Compilation failure	Instrumentation syntax error in existing test

Key Progress

RoaringBitmap unsignedBinarySearch now completes the full pipeline — validates the assertion type inference fix works
Timeout fix deployed but wasn't the blocking issue for this round (compilation errors occur before timeout)

Remaining Bugs (not in this PR's scope)

Instrumentation syntax errors in complex existing test files — generates broken Java in several repos
@CheckReturnValue not handled in perf-only instrumentation — Guava-specific
Parameterized test name parsing crashes on array types (int[]) in test method signatures

Runtime improvement (primary): the optimized version cuts the measured wall-clock time from ~11.9 ms to ~5.23 ms (≈127% speedup). Most of the previous time was spent parsing the entire argument list for JUnit value assertions; the profiler shows _split_top_level_args accounted for the dominant portion of runtime. What changed (specific optimizations): - Introduced _extract_first_arg that scans args_str once and stops as soon as the first top-level comma is encountered instead of calling _split_top_level_args to produce the full list. - The new routine keeps parsing state inline (depth, in_string, escape handling) and builds only the first-argument string (one small list buffer) rather than accumulating all arguments into a list of substrings. - Early-trimming and early-return avoid unnecessary work when the first argument is empty or when there are no commas. Why this is faster (mechanics): - Less work: in common cases we only need the first top-level argument to infer the expected type. Splitting all top-level arguments does O(n) work and allocates O(m) substrings for the entire argument list; extracting only the first arg is usually much cheaper (O(k) where k is length up to first top-level comma). - Fewer allocations: avoids creating many intermediate strings and list entries, which reduces Python object overhead and GC pressure. - Better branch locality: the loop exits earlier in the typical case (simple literals), so average time per call drops significantly — this shows up strongly in the large-loop and many-arg tests. Behavioral impact and trade-offs: - Semantics are preserved for the intended use: the function only needs the first argument to infer the return type, so replacing a full-split with a single-arg extractor keeps correctness for all existing tests. - Microbenchmarks for very trivial cases (e.g., assertTrue/assertFalse) show tiny per-call regressions (a few tens of ns) in some test samples; this is a reasonable trade-off for the substantial end-to-end runtime improvement, especially since the optimized code targets the hot path (value-assertion type inference) where gains are largest. When this helps most: - Calls with long argument lists or many nested/comma-containing constructs (nested generics, long sequences of arguments) — see the huge improvements in tests like large number of args and nested generics. - Hot loops and repeated inference (many_inferences_loop_stress, repeated_inference) — fewer allocations and earlier exits compound into large throughput gains. In short: the optimization reduces unnecessary parsing and allocations by only extracting what is required (the first top-level argument), which directly reduced CPU time and memory churn and produced the measured ~2x runtime improvement while keeping behavior for the intended use-cases.

codeflash-ai · 2026-02-25T20:29:33Z

⚡️ Codeflash found optimizations for this PR

📄 127% (1.27x) speedup for `JavaAssertTransformer._infer_return_type` in `codeflash/languages/java/remove_asserts.py`

⏱️ Runtime : 11.9 milliseconds → 5.23 milliseconds (best of 230 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method JavaAssertTransformer._infer_return_type by 127% in PR #1663 (fix/java-maven-test-execution-bugs) #1667

If you approve, it will be merged into this PR (branch fix/java-maven-test-execution-bugs).

Runtime improvement (primary): the optimized version runs ~11% faster overall (10.3ms -> 9.23ms). Line-profiles show the hot work (argument splitting and literal checks) is measurably reduced. What changed (concrete): - Added a fast-path in _split_top_level_args: if the args string contains none of the "special" delimiters (quotes, braces, parens), we skip the character-by-character parser and return either args_str.split(",") or [args_str]. - Moved several literal/cast regexes into __init__ as precompiled attributes (self._FLOAT_LITERAL_RE, self._DOUBLE_LITERAL_RE, self._LONG_LITERAL_RE, self._INT_LITERAL_RE, self._CHAR_LITERAL_RE, self._cast_re) and replaced re.match(...) for casts with self._cast_re.match(...). Why this speeds things up: - str.split is implemented in C and is orders of magnitude faster than a Python-level loop that iterates characters, manages stack depth, and joins fragments. The fast-path catches the common simple cases (no nested parentheses/quotes/generics) and lets the interpreter use the highly-optimized C split, which is why very large comma-separated inputs show the biggest wins (e.g., the 1000-arg test goes from ~1.39ms to ~67.5μs). - Precompiling regexes removes repeated compilation overhead and lets .match be executed directly on a compiled object. The original code used re.match(...) in-place for cast detection which implicitly compiles the pattern or goes through the module-level cache; using a stored compiled pattern is cheaper and eliminates that runtime cost. - Combined, these changes reduce the time spent inside _split_top_level_args and _type_from_literal (the line profilers show reduced wall time for those functions), producing the measured global runtime improvement. Behavioral/compatibility notes: - The fast-path preserves original behavior: when no special delimiter is present it simply splits on commas (or returns a single entry), otherwise it falls back to the full, safe parser that respects nested delimiters and strings. - Some microbenchmarks regress slightly (a few single-case timings in the annotated tests are a bit slower); this is expected because we add a small _special_re.search check for every call. The overall trade-off was accepted because it yields substantial savings in the common and expensive cases (especially large/simple comma-separated argument lists). - The optimization is most valuable when this function is exercised many times or on long/simple argument lists (hot paths that produce many simple comma-separated tokens). It is neutral or slightly negative for a handful of small or highly-nested inputs, but those are rare in the benchmarks. Tests and workload guidance: - Big wins: large-scale, many-argument inputs or many repeated calls where arguments are simple comma-separated literals (annotated tests show up to ~20x speedups for such cases). - No/low impact: complex first arguments with nested parentheses/generics or many quoted strings — the safe parser still runs there, so correctness is preserved; timings remain similar. - Small regressions: a few microbench cases (very short inputs or certain char-literal checks) are marginally slower due to the extra quick search, but these regressions are small relative to the global runtime improvement. Summary: By routing simple/common inputs to str.split (C-level speed) and eliminating per-call regex compilation for literal/cast detection, the optimized code reduces time in the hot parsing and literal-detection paths, producing the observed ~11% runtime improvement while maintaining correctness for nested/quoted input via the fallback parser.

codeflash-ai · 2026-02-25T20:34:17Z

⚡️ Codeflash found optimizations for this PR

📄 12% (0.12x) speedup for `JavaAssertTransformer._infer_type_from_assertion_args` in `codeflash/languages/java/remove_asserts.py`

⏱️ Runtime : 10.3 milliseconds → 9.23 milliseconds (best of 102 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method JavaAssertTransformer._infer_type_from_assertion_args by 12% in PR #1663 (fix/java-maven-test-execution-bugs) #1668

If you approve, it will be merged into this PR (branch fix/java-maven-test-execution-bugs).

The behavior mode instrumentation test expected `Object _cf_result1` but after the type inference fix, assertEquals(4, call()) now produces `int _cf_result1 = (int)_cf_result1_1`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeflash-ai · 2026-02-25T20:47:24Z

codeflash/languages/java/remove_asserts.py

+        while i < len(args_str):
+            ch = args_str[i]
+
+            if in_string:
+                current.append(ch)
+                if ch == "\\" and i + 1 < len(args_str):
+                    i += 1
+                    current.append(args_str[i])
+                elif ch == string_char:
+                    in_string = False
+            elif ch in ('"', "'"):
+                in_string = True
+                string_char = ch
+                current.append(ch)
+            elif ch in ("(", "<", "[", "{"):
+                depth += 1
+                current.append(ch)
+            elif ch in (")", ">", "]", "}"):
+                depth -= 1
+                current.append(ch)
+            elif ch == "," and depth == 0:
+                args.append("".join(current))
+                current = []
+            else:
+                current.append(ch)
+            i += 1
+
+        if current:
+            args.append("".join(current))


⚡️Codeflash found 96% (0.96x) speedup for JavaAssertTransformer._split_top_level_args in codeflash/languages/java/remove_asserts.py

⏱️ Runtime : 4.99 milliseconds → 2.55 milliseconds (best of 250 runs)

📝 Explanation and details

Runtime improvement: The optimized _split_top_level_args cuts the call time from 4.99 ms to 2.55 ms (~95% faster by the reported metric), with especially large wins on long inputs (hundreds/thousands of args).

What changed (concrete optimizations)

Eliminated per-character list-building + join: the original appended each character to current and used "".join(current) on splits. The optimized code records slice boundaries (last..i) and appends substrings once, avoiding the frequent small-list allocations and the join cost.

Cached locals and precomputed values: binds args.append to args_append, assigns s = args_str and n = len(s) once. This avoids repeated attribute lookups and len() calls inside the hot loop.

Early fast-path for empty input: returns immediately for empty args_str.

Reduced branching and work in the hot loop:

Handles string escapes by advancing i by 2 and continuing, instead of building characters.

Uses a single-character membership string for open/close brackets (opens/closes) and checks ch in opens/closes — a cheap O(1) C-level operation.

Minimizes repeated current.append and other Python-level list ops by moving to index arithmetic.

Keeps loop indexing simple (while i < n, s[i]) and uses continues to reduce duplicate i += 1 logic.

Why this yields a speedup

The original implementation spent most time inside the while loop doing a lot of Python-level work per character: list appends for each char, join() calls on splits, and repeated attribute/len lookups. Those are high-overhead Python operations.

The optimized version reduces allocations (no per-char lists), reduces Python bytecode executed per character (fewer appends, fewer attribute accesses), and shifts work to C-level substring slicing and in-operator checks. That lowers CPU and memory churn and reduces the constant factor per character, which explains the big wins shown by the line profiler (significantly less time inside the main loop and fewer append/join costs).

Caching args.append and precomputing len(s) eliminates repeated attribute lookups inside the hot path, which matters at scale.

Behavioral and workload impact

Functionality preserved for the tested cases: strings, escapes, nested parentheses/generics/arrays/braces are still respected (see annotated tests). The optimizer excels on long inputs (e.g., 1000-element tests show 80–100%+ speedups) because savings are per-character/per-arg and accumulate.

Small-case trade-off: one microcase (single top-level comma) was marginally slower in a microbenchmark (annotated_tests shows ~12% slower). This is an acceptable trade-off because the optimization dramatically reduces runtime for typical and large inputs where this function is most likely to be on a hot path.

No new heavy dependencies were introduced; changes are local to the function and safe for callers.

When you get the most benefit

Hot paths that call this on long argument lists or in tight loops (e.g., splitting thousands of arguments or repeatedly parsing assertion arguments) will see the largest absolute and relative improvements.

Short inputs also generally benefit (small microbenchmarks mostly show faster times), but the biggest wins are for large/mixed/nested inputs as shown in the annotated tests.

Summary
The optimized implementation reduces per-character Python-level operations, avoids frequent small allocations and joins, and caches locals so the inner loop runs much lighter. Those combined micro-optimizations produce the reported runtime improvement (4.99 ms → 2.55 ms) and large speedups on real workloads, at the cost of a tiny micro-regression in one trivial input case.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 47 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import pytest # used for our unit tests from codeflash.languages.java.remove_asserts import JavaAssertTransformer def test_split_basic_simple(): # Create a real transformer instance using the real constructor t = JavaAssertTransformer("assert") # Simple comma-separated arguments should split into three items args = "a,b,c" codeflash_output = t._split_top_level_args(args); result = codeflash_output # 3.03μs -> 2.97μs (2.02% faster) def test_split_respects_parentheses_and_whitespace(): # Transformer instance t = JavaAssertTransformer("assert") # First arg contains a comma inside parentheses, which should not be split. # There is a top-level comma between the function call and the 'x' argument. args = "func(1,2), x" codeflash_output = t._split_top_level_args(args); result = codeflash_output # 4.88μs -> 3.83μs (27.5% faster) def test_split_respects_generics_brackets_and_braces(): t = JavaAssertTransformer("assert") # Generics use angle brackets which the splitter treats like parentheses (depth tracking). args = "List<String>, Map<K,V>" codeflash_output = t._split_top_level_args(args); res = codeflash_output # 7.08μs -> 4.86μs (45.7% faster) def test_split_respects_strings_and_escapes(): t = JavaAssertTransformer("assert") # Double-quoted string containing a comma should be preserved as one argument. args1 = '"a,b", c' codeflash_output = t._split_top_level_args(args1); r1 = codeflash_output # 3.44μs -> 2.87μs (19.9% faster) # Single-quoted string with an escaped single-quote and a comma inside should also be preserved. # Use a Python double-quoted literal so the inner backslash is represented correctly in the input string. args2 = "'don\\'t,split', y" codeflash_output = t._split_top_level_args(args2); r2 = codeflash_output # 3.58μs -> 2.33μs (53.2% faster) def test_split_handles_brackets_and_braces(): t = JavaAssertTransformer("assert") # Arrays and braces should be treated as nested structures; commas inside them are ignored. args = "[1,2], {a,b}, (x,y)" codeflash_output = t._split_top_level_args(args); res = codeflash_output # 6.62μs -> 4.66μs (42.2% faster) def test_split_unmatched_parentheses_returns_single(): t = JavaAssertTransformer("assert") # Unmatched parentheses means top-level commas may not exist; all characters remain in a single arg. args = "((a,b), c" # note: one more '(' than ')' codeflash_output = t._split_top_level_args(args); res = codeflash_output # 3.78μs -> 3.02μs (25.2% faster) def test_split_empty_string_returns_empty_list(): t = JavaAssertTransformer("assert") # An empty input string should produce an empty list (no args). codeflash_output = t._split_top_level_args(""); res = codeflash_output # 742ns -> 581ns (27.7% faster) def test_split_double_closing_angle_brackets_in_generics(): t = JavaAssertTransformer("assert") # Handle cases with consecutive closing angle brackets (e.g., nested generics like Foo<Bar<Baz>>) args = "Map<A<B>, C>>, y" codeflash_output = t._split_top_level_args(args); res = codeflash_output # 5.39μs -> 3.92μs (37.6% faster) def test_split_large_number_of_simple_args(): t = JavaAssertTransformer("assert") # Build 1000 simple args: a0,a1,a2,...,a999 n = 1000 expected = [f"a{i}" for i in range(n)] big_args = ",".join(expected) # Split should return the original list codeflash_output = t._split_top_level_args(big_args); res = codeflash_output # 1.16ms -> 623μs (85.6% faster) def test_split_large_mixed_nested_args(): t = JavaAssertTransformer("assert") # Build 1000 args where even indices are nested function calls containing internal commas, # and odd indices are simple values. Joining with top-level commas must split into the 1000 parts. n = 1000 parts = [] for i in range(n): if i % 2 == 0: # nested call contains an internal comma which should NOT cause a top-level split parts.append(f"f({i},{i+1})") else: parts.append(f"v{i}") big = ",".join(parts) codeflash_output = t._split_top_level_args(big); res = codeflash_output # 1.86ms -> 926μs (100% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest # used for our unit tests from codeflash.languages.java.remove_asserts import JavaAssertTransformer def test_simple_top_level_commas(): # Create a real instance of the transformer using the actual constructor. t = JavaAssertTransformer("assertX") # Basic comma-separated tokens at top-level should split into separate fragments. args = "a,b,c" # Call the instance method; it's an instance method so we call on the real object. codeflash_output = t._split_top_level_args(args); parts = codeflash_output # 3.13μs -> 2.80μs (12.1% faster) def test_preserve_whitespace_and_trailing_leading_commas(): t = JavaAssertTransformer("assertX") # Whitespace around items should be preserved in the fragments. args = " a , b ,c " codeflash_output = t._split_top_level_args(args); parts = codeflash_output # 4.77μs -> 3.52μs (35.6% faster) # Trailing comma should result in omission of a final empty fragment (current is empty -> not appended). codeflash_output = t._split_top_level_args("a,b,"); parts_trailing = codeflash_output # 1.74μs -> 1.44μs (20.9% faster) # Leading comma results in the first fragment being an empty string, per implementation. codeflash_output = t._split_top_level_args(",a"); parts_leading = codeflash_output # 1.02μs -> 961ns (6.35% faster) def test_parentheses_and_nested_commas_are_ignored_for_splitting(): t = JavaAssertTransformer("assertX") # Commas inside parentheses should not split top-level arguments. args = "foo(bar,baz),qux" codeflash_output = t._split_top_level_args(args); parts = codeflash_output # 5.62μs -> 4.14μs (35.9% faster) # Multiple nested parentheses should also be respected. complex_args = "outer(inner(a,b), other(c,d)),last" codeflash_output = t._split_top_level_args(complex_args); complex_parts = codeflash_output # 8.34μs -> 4.61μs (80.9% faster) def test_angle_brackets_generics_are_respected(): t = JavaAssertTransformer("assertX") # Commas inside angle-bracket generics shouldn't split at top-level. args = "Map<String,Integer>, other" codeflash_output = t._split_top_level_args(args); parts = codeflash_output # 8.02μs -> 5.17μs (55.1% faster) # Deeper nested generics remain intact. args2 = "List<Map<String,List<Integer>>>, z" codeflash_output = t._split_top_level_args(args2); parts2 = codeflash_output # 8.43μs -> 4.62μs (82.5% faster) def test_string_literals_with_commas_and_escaped_quotes(): t = JavaAssertTransformer("assertX") # Build a quoted string containing commas and escaped internal quotes. part1 = '"He said \\"Hello, world\\""' # Compose the args string with that quoted string followed by a simple identifier. args = part1 + ",x" codeflash_output = t._split_top_level_args(args); parts = codeflash_output # 5.83μs -> 4.07μs (43.3% faster) # Single-quoted strings (char literals) with escaped characters should also be preserved. char_part = "'\\n'" args2 = char_part + ", other" codeflash_output = t._split_top_level_args(args2); parts2 = codeflash_output # 3.04μs -> 2.11μs (43.6% faster) def test_braces_and_array_initializers(): t = JavaAssertTransformer("assertX") # Commas inside braces should not produce top-level splits. args = "{1,2},3" codeflash_output = t._split_top_level_args(args); parts = codeflash_output # 3.58μs -> 3.04μs (17.8% faster) # Array initializer inside expression should be preserved as one fragment. args2 = "new int[]{1,2}, other" codeflash_output = t._split_top_level_args(args2); parts2 = codeflash_output # 5.70μs -> 3.34μs (70.9% faster) def test_empty_input_and_single_comma_behavior(): t = JavaAssertTransformer("assertX") # Empty input should return an empty list (no fragments). codeflash_output = t._split_top_level_args("") # 722ns -> 572ns (26.2% faster) # A single comma at top-level: implementation appends current (which is empty) once and # does not append another empty current at end -> produces a single empty string. codeflash_output = t._split_top_level_args(",") # 1.30μs -> 1.48μs (12.2% slower) # Two commas: the first comma will append empty, then the second comma will append empty again, # but because current will be empty at end there is no final append, resulting in two empties. codeflash_output = t._split_top_level_args(",,") # 1.08μs -> 1.06μs (1.88% faster) def test_complex_mixed_constructs(): t = JavaAssertTransformer("assertX") # Mix strings with commas, nested calls with commas and maps containing commas. part = 'assertEquals("expected, with,commas", List.of(1,2), Map.of("k","v"))' args = part + ", x" codeflash_output = t._split_top_level_args(args); parts = codeflash_output # 15.3μs -> 9.17μs (67.2% faster) def test_large_scale_many_elements_with_strings_and_commas(): t = JavaAssertTransformer("assertX") # Build 1000 elements; every 10th element is a quoted string containing a comma. parts = [] for i in range(1000): if i % 10 == 0: # Quoted string containing a comma to test that commas within strings are not split. parts.append(f'"{i},{i+1}"') else: parts.append(f'item{i}') # Join into a single args string with top-level commas between parts. args = ",".join(parts) # Split using the method under test. codeflash_output = t._split_top_level_args(args); result = codeflash_output # 1.86ms -> 916μs (103% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1663-2026-02-25T20.47.23

Click to see suggested changes

Suggested change

while i < len(args_str):

ch = args_str[i]

if in_string:

current.append(ch)

if ch == "\\" and i + 1 < len(args_str):

i += 1

current.append(args_str[i])

elif ch == string_char:

in_string = False

elif ch in ('"', "'"):

in_string = True

string_char = ch

current.append(ch)

elif ch in ("(", "<", "[", "{"):

depth += 1

current.append(ch)

elif ch in (")", ">", "]", "}"):

depth -= 1

current.append(ch)

elif ch == "," and depth == 0:

args.append("".join(current))

current = []

else:

current.append(ch)

i += 1

if current:

args.append("".join(current))

# Fast-path: empty input

if not args_str:

return args

s = args_str

n = len(s)

last = 0

args_append = args.append

# Use string membership checks for bracket characters

opens = "({[<"

closes = ")}]>"

while i < n:

ch = s[i]

if in_string:

# Respect escape sequences inside strings

if ch == "\\" and i + 1 < n:

i += 2

continue

if ch == string_char:

in_string = False

i += 1

continue

# Not in string

if ch == '"' or ch == "'":

in_string = True

string_char = ch

i += 1

continue

# Track nesting depth for parentheses/generics/arrays/blocks

if ch in opens:

depth += 1

i += 1

continue

if ch in closes:

depth -= 1

i += 1

continue

# Top-level comma splits arguments

if ch == "," and depth == 0:

args_append(s[last:i])

last = i + 1

i += 1

continue

i += 1

# Append the final segment if any

if last < n:

args_append(s[last:n])

Primary benefit: the optimized version reduces runtime from ~903μs to ~802μs — a 12% speedup — by lowering per-iteration and attribute-access overhead in the hot path of _generate_replacement. What changed (concrete, low-level): - Cached attribute lookups into locals: - self.invocation_counter → local inv, written back once at the end. - assertion.leading_whitespace and assertion.target_calls → leading_ws and calls locals. Caching avoids repeated attribute reads/writes which are relatively expensive in Python. - Removed a per-iteration branch by handling the first target call separately: - The original loop used if i == 0 every iteration (via enumerate). The optimized code emits the first line once, then loops the remaining calls without a conditional. This eliminates an O(n) conditional check across many iterations. - Reduced formatting overhead for loop-generated variable names: - var_name is built with "_cf_result" + str(inv) instead of using an f-string inside the loop (fewer formatting operations). - Minor local micro-optimizations in _infer_return_type: - Replaced the small "in (a, b)" checks with equivalent chained comparisons (method == "x" or method == "y"), reducing tuple creation/containment checks. - Exception-replacement counter handling: moved to a local increment-and-write-back pattern (same semantics, fewer attribute writes). Why this speeds things up: - Attribute access and writes (self.foo / assertion.attr) cost significantly more than local variable access. By doing those once per call and using locals inside tight loops we reduce Python bytecode operations dramatically. - Removing the per-iteration i == 0 branch eliminates one conditional per target call; for large lists this reduces branching overhead and improves instruction cache behavior. - Minimizing string formatting and concatenation inside a hot loop reduces temporary allocations and joins fewer intermediate values. - The profiler and tests show the biggest gains appear when there are many target_calls (1000-call test: ~240μs → ~202μs, ~19% faster), matching these optimizations’ effect on O(n) behavior. Behavioral impact and correctness: - The observable behavior (variable names, formatting, invocation_counter progression, and exception-handling output) is preserved. The counter is still incremented the same number of times and persists across calls. - Exception handling logic is unchanged semantically; only the internal counter updates were made more efficient. Trade-offs (noted regressions and why they’re acceptable): - A few small test cases show tiny slowdowns (single very-small assertions, some assertDoesNotThrow paths). These are microsecond-level regressions (often <0.1–0.2μs) and are an acceptable trade-off for sizable improvements in the common hot path (large lists and repeated invocations). - The optimizations prioritize reducing per-iteration overhead; therefore workloads dominated by many target calls or repeated invocations benefit most. Small or one-off assertions will see negligible change. Where this helps most (based on tests and profiler): - Hot paths that iterate many times over assertion.target_calls (large test files or transformations producing hundreds/thousands of captures). - Repeated uses of the same transformer instance where invocation_counter accumulates across many calls. - The annotated tests and line profiler confirm the speedup is concentrated in _generate_replacement’s loop and that the large-list tests (n=1000) get the biggest absolute and relative improvement. In short: the optimized code reduces attribute and branching overhead in the hot loop, cutting allocation/bytecode work per target call — which yields the observed 12% runtime improvement and up to ~19% on large inputs while preserving behavior.

codeflash-ai · 2026-02-25T20:52:28Z

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for `JavaAssertTransformer._generate_replacement` in `codeflash/languages/java/remove_asserts.py`

⏱️ Runtime : 903 microseconds → 802 microseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method JavaAssertTransformer._generate_replacement by 13% in PR #1663 (fix/java-maven-test-execution-bugs) #1669

If you approve, it will be merged into this PR (branch fix/java-maven-test-execution-bugs).

…2026-02-25T20.52.19 ⚡️ Speed up method `JavaAssertTransformer._generate_replacement` by 13% in PR #1663 (`fix/java-maven-test-execution-bugs`)

codeflash-ai · 2026-02-25T22:02:06Z

This PR is now faster! 🚀 @mashraf-222 accepted my optimizations from:

⚡️ Speed up method JavaAssertTransformer._generate_replacement by 13% in PR #1663 (fix/java-maven-test-execution-bugs) #1669

…2026-02-25T20.29.24 ⚡️ Speed up method `JavaAssertTransformer._infer_return_type` by 127% in PR #1663 (`fix/java-maven-test-execution-bugs`)

codeflash-ai · 2026-02-25T22:02:11Z

This PR is now faster! 🚀 @mashraf-222 accepted my optimizations from:

⚡️ Speed up method JavaAssertTransformer._infer_return_type by 127% in PR #1663 (fix/java-maven-test-execution-bugs) #1667

…2026-02-25T20.34.08 ⚡️ Speed up method `JavaAssertTransformer._infer_type_from_assertion_args` by 12% in PR #1663 (`fix/java-maven-test-execution-bugs`)

codeflash-ai · 2026-02-25T22:02:17Z

This PR is now faster! 🚀 @mashraf-222 accepted my optimizations from:

⚡️ Speed up method JavaAssertTransformer._infer_type_from_assertion_args by 12% in PR #1663 (fix/java-maven-test-execution-bugs) #1668

Remove _COMMON_JAVA_IMPORTS, ensure_common_java_imports(), and _add_import() from instrumentation.py. The root cause is now fixed in the AI service (codeflash-internal#2443) which adds comprehensive stdlib import postprocessing before tree-sitter validation in the testgen pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeflash-ai · 2026-02-25T22:18:14Z

codeflash/languages/java/remove_asserts.py

+        if self._FLOAT_LITERAL_RE.match(value):
+            return "float"
+        if self._DOUBLE_LITERAL_RE.match(value):
+            return "double"
+        if self._LONG_LITERAL_RE.match(value):
+            return "long"
+        if self._INT_LITERAL_RE.match(value):
+            return "int"
+        if self._CHAR_LITERAL_RE.match(value):
+            return "char"


⚡️Codeflash found 64% (0.64x) speedup for JavaAssertTransformer._type_from_literal in codeflash/languages/java/remove_asserts.py

⏱️ Runtime : 9.46 milliseconds → 5.77 milliseconds (best of 68 runs)

📝 Explanation and details

Runtime improvement (primary): The optimized _type_from_literal reduces the method runtime from 9.46 ms to 5.77 ms — a ~63% overall speedup. That throughput win is why this change was accepted.

What changed (specific optimizations)

Replaced the expensive compiled-regex.match calls for numeric/char detection with cheap, deterministic string operations:

Check suffixes by inspecting v[-1] (single-char index) instead of running a regex.

Use slicing (v[:-1]), simple sign-strip logic, str.isdigit(), and a single split(".", 1) for decimal validation.

Detect char literals with length checks and startswith/endswith rather than a regex.

Kept only the cast regex (_cast_re.match) for the cast-specific extraction; everything else uses C-level string methods.

Bound value to a local variable v early to reduce repeated attribute lookups (fewer attribute and name lookups).

Why this is faster (how it maps to Python performance)

Regex.match invokes the regex engine, produces match objects, and can do backtracking — it’s relatively expensive even when compiled. The original profiler shows large time spent on those regex checks (FLOAT/DOBULE/LONG/INT regex lines dominated the function's time).

String operations used here (indexing, slicing, isdigit, startswith, split) are implemented in C with minimal overhead and no match-object allocation. They therefore execute much faster and allocate less temporary memory.

Early-exit checks and ordering reduce average work: common simple cases (suffix checks, pure digits, decimals) are identified quickly with small, deterministic checks.

Profiler evidence

Original profile: the compiled-regex checks (FLOAT/DOUBLE/LONG/INT) were hot — together they accounted for a large fraction of wall time (e.g., FLOAT_LITERAL_RE.match showed ~24.5% alone).

Optimized profile: time shifts to cheap operations (v = value, v[-1] checks, isdigit, split) and the expensive regex is no longer invoked for normal numeric cases. Overall time per-call drops substantially (consistent with the measured runtime improvement).

Behavioral/compatibility notes and trade-offs

Functional behavior is preserved for the tested inputs (booleans, null, numeric suffixes, decimals, char and string detection, and cast extraction).

Some microbenchmarks show small regressions (certain f-suffix float cases or a few edge inputs), because the manual logic does more branching/splits in those particular paths. This is an acceptable trade-off given the substantial aggregate runtime benefit — the overall throughput and bulk tests improved dramatically.

The cast-detection is still done with the compiled regex, preserving the original extraction behavior for casts.

Workloads that benefit most

Hot paths that call _type_from_literal many times (parsing many literals, bulk transformations) gain the most — annotated large-scale tests show ~67–75% faster behavior.

Numeric-heavy workloads (ints, longs, doubles) see especially large wins, since they were the original regex-dominated cases.

Very small or extremely pathological single-case inputs may see no change or minor regressions; however the overall system throughput improves.

Summary

Primary win: 63% reduction in runtime (9.46 ms → 5.77 ms).

How: remove repeated regex.match calls in favor of C-level string ops and early exits, reducing engine overhead and allocations.

Trade-off: a few microbenchmarks are slightly slower due to extra branching, but the aggregate and hot-path performance improvements justify the change for workloads that exercise this method frequently.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 12069 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import pytest # used for our unit tests from codeflash.languages.java.remove_asserts import JavaAssertTransformer def test_boolean_and_null_literals_basic(): # Create a real instance of JavaAssertTransformer with a dummy function name. transformer = JavaAssertTransformer("dummy") # The literal "true" should be recognized as boolean. codeflash_output = transformer._type_from_literal("true") # 451ns -> 531ns (15.1% slower) # The literal "false" should be recognized as boolean. codeflash_output = transformer._type_from_literal("false") # 290ns -> 280ns (3.57% faster) # The literal "null" is mapped to the Java Object type. codeflash_output = transformer._type_from_literal("null") # 290ns -> 320ns (9.38% slower) def test_numeric_and_string_and_char_literals_basic(): # Instantiate the transformer to call the instance method under test. transformer = JavaAssertTransformer("fn") # Float literals with 'f' or 'F' suffix -> float codeflash_output = transformer._type_from_literal("1.0f") # 2.21μs -> 2.33μs (5.57% slower) codeflash_output = transformer._type_from_literal("2F") # 891ns -> 852ns (4.58% faster) # Double literals: decimal without 'f' but with '.' or with 'd' suffix -> double codeflash_output = transformer._type_from_literal("3.1415") # 1.45μs -> 982ns (48.0% faster) codeflash_output = transformer._type_from_literal("4.0d") # 1.09μs -> 862ns (26.7% faster) codeflash_output = transformer._type_from_literal("5D") # 982ns -> 511ns (92.2% faster) # Long literals end with L/l -> long codeflash_output = transformer._type_from_literal("123L") # 1.46μs -> 781ns (87.3% faster) codeflash_output = transformer._type_from_literal("-999l") # 1.48μs -> 932ns (59.0% faster) # Integer literals (no suffix, whole number) -> int codeflash_output = transformer._type_from_literal("42") # 1.39μs -> 591ns (136% faster) codeflash_output = transformer._type_from_literal("-7") # 1.29μs -> 571ns (126% faster) # Char literals: single char inside single quotes -> char codeflash_output = transformer._type_from_literal("'a'") # 1.20μs -> 1.27μs (5.58% slower) # Char escape like backslash + char should also be recognized as char. codeflash_output = transformer._type_from_literal("'\\n'") # 1.20μs -> 962ns (24.9% faster) # String values (start with double quote) -> String codeflash_output = transformer._type_from_literal('"hello"') # 1.35μs -> 721ns (87.7% faster) # Even a malformed string that merely starts with a quote is treated as String. codeflash_output = transformer._type_from_literal('"unterminated') # 1.01μs -> 982ns (3.05% faster) def test_cast_expressions_and_fallback_behavior(): transformer = JavaAssertTransformer("f") # Casts like (byte)0 should return the cast type (group 1 from the regex). codeflash_output = transformer._type_from_literal("(byte)0") # 3.98μs -> 3.81μs (4.49% faster) # Cast should preserve case (e.g., capitalized types). codeflash_output = transformer._type_from_literal("(Short)1") # 1.80μs -> 1.54μs (16.9% faster) # A user-defined type in a cast should also be extracted. codeflash_output = transformer._type_from_literal("(MyType)someValue") # 1.45μs -> 1.14μs (27.2% faster) # A value that does not match any pattern should fall back to Object. codeflash_output = transformer._type_from_literal("SomeRandomToken") # 1.21μs -> 952ns (27.3% faster) # Boolean is case-sensitive, so "False" (capitalized) is not recognized -> fallback Object. codeflash_output = transformer._type_from_literal("False") # 1.21μs -> 842ns (44.1% faster) def test_empty_string_and_none_behavior(): transformer = JavaAssertTransformer("x") # An empty string does not match any literal regex -> Object. codeflash_output = transformer._type_from_literal("") # Passing None is not a valid string; ensure it raises an AttributeError when startswith is used. # The implementation will attempt value.startswith('"') and thus should raise AttributeError for None. with pytest.raises(AttributeError): transformer._type_from_literal(None) # type: ignore[arg-type] def test_large_scale_many_literals_mixed_types(): transformer = JavaAssertTransformer("big") literals = [] expected = [] # Build 1000 deterministic literals covering many categories in a round-robin fashion. for i in range(1000): mod = i % 8 if mod == 0: # boolean true/false alternating val = "true" if (i % 2 == 0) else "false" typ = "boolean" elif mod == 1: # float with suffix f val = f"{i}.0f" typ = "float" elif mod == 2: # double with decimal point val = f"{i}.5" typ = "double" elif mod == 3: # long with L suffix val = f"{i}L" typ = "long" elif mod == 4: # int plain val = str(i) typ = "int" elif mod == 5: # char alternating between simple and escaped val = "'a'" if (i % 2 == 0) else "'\\t'" typ = "char" elif mod == 6: # string literal val = f"\"s{i}\"" typ = "String" else: # cast to a named type val = f"(Custom{i})x" typ = f"Custom{i}" literals.append(val) expected.append(typ) # Check each literal's resolved type matches expected. for lit, exp in zip(literals, expected): codeflash_output = transformer._type_from_literal(lit); got = codeflash_output # 947μs -> 542μs (74.6% faster) def test_large_scale_performance_consistency(): transformer = JavaAssertTransformer("perf") # Run the same small set of literals 1000 times to exercise regex caching and ensure no state bleed. sample_literals = ['true', 'false', 'null', '1.0f', '2.5', '123L', '42', "'z'", '"hi"', "(byte)0"] # Expected results for one pass, determined statically. expected_single = ["boolean", "boolean", "Object", "float", "double", "long", "int", "char", "String", "byte"] # Execute 1000 iterations and ensure results remain identical each time. for _ in range(1000): for lit, exp in zip(sample_literals, expected_single): codeflash_output = transformer._type_from_literal(lit) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest # used for our unit tests from codeflash.languages.java.remove_asserts import JavaAssertTransformer def test_boolean_and_null_literals(): # Create a real instance of the transformer with a dummy function name. t = JavaAssertTransformer("dummy") # 'true' and 'false' must be detected as boolean. codeflash_output = t._type_from_literal("true") # 932ns -> 842ns (10.7% faster) codeflash_output = t._type_from_literal("false") # 221ns -> 230ns (3.91% slower) # 'null' must map to Object codeflash_output = t._type_from_literal("null") # 351ns -> 370ns (5.14% slower) def test_string_and_char_literals_basic(): t = JavaAssertTransformer("fn") # A value that starts with a double quote is considered a String. codeflash_output = t._type_from_literal('"hello"') # 3.55μs -> 2.83μs (25.1% faster) # Even if it only starts with a quote (unterminated), rule is to return String. codeflash_output = t._type_from_literal('"unterminated') # 1.32μs -> 1.68μs (21.4% slower) # Simple char literal: single character enclosed in single quotes -> char codeflash_output = t._type_from_literal("'a'") # 1.44μs -> 1.24μs (16.2% faster) # Escaped char (backslash + char) should also be recognized as char codeflash_output = t._type_from_literal("'\\n'") # 1.27μs -> 1.01μs (25.7% faster) def test_numeric_literals_and_suffixes(): t = JavaAssertTransformer("n") # Integers with no suffix codeflash_output = t._type_from_literal("0") # 2.42μs -> 1.59μs (51.6% faster) codeflash_output = t._type_from_literal("123") # 2.01μs -> 701ns (187% faster) codeflash_output = t._type_from_literal("-42") # 1.66μs -> 982ns (69.3% faster) # Long literals with 'L' or 'l' suffix codeflash_output = t._type_from_literal("123L") # 1.67μs -> 1.03μs (62.1% faster) codeflash_output = t._type_from_literal("-999l") # 1.52μs -> 1.04μs (46.2% faster) # Float literals require an f/F suffix (per the regex used) codeflash_output = t._type_from_literal("1.0f") # 722ns -> 1.26μs (42.8% slower) codeflash_output = t._type_from_literal("2F") # 561ns -> 641ns (12.5% slower) codeflash_output = t._type_from_literal("-3.14f") # 521ns -> 912ns (42.9% slower) # Double detection: decimals (with optional trailing d/D), and '1.' should be allowed codeflash_output = t._type_from_literal("1.0") # 1.09μs -> 832ns (31.2% faster) codeflash_output = t._type_from_literal("1.") # 832ns -> 591ns (40.8% faster) codeflash_output = t._type_from_literal("6D") # 851ns -> 531ns (60.3% faster) codeflash_output = t._type_from_literal("-2.71828") # 1.07μs -> 811ns (32.2% faster) def test_char_and_nonchar_single_quote_inputs(): t = JavaAssertTransformer("x") # If single-quoted content is more than one char (e.g. "'ab'") it's not matched by the char regex -> Object codeflash_output = t._type_from_literal("'ab'") # 3.32μs -> 4.18μs (20.6% slower) # Empty string should be treated as unknown -> Object codeflash_output = t._type_from_literal("") # 1.24μs -> 1.25μs (0.799% slower) # A capitalized boolean-like 'True' should NOT be recognized -> Object codeflash_output = t._type_from_literal("True") # 1.41μs -> 1.30μs (8.53% faster) def test_cast_expressions_and_spacing(): t = JavaAssertTransformer("c") # Direct cast without space codeflash_output = t._type_from_literal("(byte)0") # 4.02μs -> 4.19μs (4.08% slower) # Cast followed by a space before number still matches because regex only matches the prefix "(type)" codeflash_output = t._type_from_literal("(short) -1") # 1.83μs -> 1.42μs (29.0% faster) # Cast with a multi-character type name codeflash_output = t._type_from_literal("(MyType)someExpr") # 1.42μs -> 1.17μs (21.4% faster) # If the parentheses are not a cast (e.g. "(123)"), group will match '123' only if it is a word; '123' isn't a word -> Object # But because the cast regex only captures word characters for the type, "(123)5" should not match and fall back to Object codeflash_output = t._type_from_literal("(123)5") # 1.38μs -> 951ns (45.3% faster) def test_numeric_precedence_and_ambiguities(): t = JavaAssertTransformer("p") # The order in the implementation checks float (f/F), then double, then long, then int. # So "1f" should be float; "1d" or "1D" double; "1L" long codeflash_output = t._type_from_literal("1f") # 2.02μs -> 1.53μs (32.0% faster) codeflash_output = t._type_from_literal("1d") # 1.32μs -> 762ns (73.5% faster) codeflash_output = t._type_from_literal("1L") # 1.16μs -> 761ns (52.8% faster) # A decimal without suffix picks up as double even if it could be seen as int by digits portion. codeflash_output = t._type_from_literal("10.0") # 1.27μs -> 1.23μs (3.25% faster) # Negative long with capital L codeflash_output = t._type_from_literal("-100L") # 1.50μs -> 991ns (51.7% faster) def test_large_scale_many_literals(): t = JavaAssertTransformer("bulk") # Prepare a base set of representative literal strings and their expected types. base_pairs = [ ("true", "boolean"), ("false", "boolean"), ("null", "Object"), ('"s"', "String"), ("100", "int"), ("-7", "int"), ("42L", "long"), ("99l", "long"), ("3.1415", "double"), ("2.", "double"), ("6D", "double"), ("7d", "double"), ("8.0f", "float"), ("9F", "float"), ("'z'", "char"), ("'\\t'", "char"), ("(byte)5", "byte"), ("(short) 6", "short"), ("'ab'", "Object"), # not a char ("", "Object"), # empty -> unknown ] # Build a large list (1000 elements) by repeating the base set until we reach 1000 total = 1000 values = [] expected = [] i = 0 while len(values) < total: val, typ = base_pairs[i % len(base_pairs)] # Slightly vary numeric values to catch regex differences (append iteration index where appropriate) if val.isdigit() or (val.startswith("-") and val[1:].isdigit()): # change numeric magnitude to keep variety but type stays the same variant = str(int(val) + i) values.append(variant) # Determine expected type from the base mapping expected.append("int") elif val.endswith("L") or val.endswith("l"): variant = val[:-1] + str((i % 100) + 1) + val[-1] values.append(variant) expected.append("long") elif val.endswith("f") or val.endswith("F"): variant = val[:-1] + str((i % 10) + 1) + val[-1] values.append(variant) expected.append("float") elif val.endswith("d") or val.endswith("D") or "." in val: # Keep as double-like variant = str(float((i % 100) + 1)) # e.g. "1.0", "2.0" values.append(variant) expected.append("double") else: # non-numeric base values: keep them repeated values.append(val) expected.append(typ) i += 1 # Now run the transformer over all values and assert types match expectations. # This checks consistent behavior over a large volume of inputs. for v, exp in zip(values, expected): codeflash_output = t._type_from_literal(v); got = codeflash_output # 970μs -> 578μs (67.8% faster) def test_none_raises_type_error(): t = JavaAssertTransformer("err") # Calling _type_from_literal with None should raise a TypeError when regex.match is attempted. with pytest.raises(TypeError): t._type_from_literal(None) # 3.85μs -> 4.54μs (15.2% slower) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1663-2026-02-25T22.18.13

Click to see suggested changes

Suggested change

if self._FLOAT_LITERAL_RE.match(value):

return "float"

if self._DOUBLE_LITERAL_RE.match(value):

return "double"

if self._LONG_LITERAL_RE.match(value):

return "long"

if self._INT_LITERAL_RE.match(value):

return "int"

if self._CHAR_LITERAL_RE.match(value):

return "char"

# Fast-path numeric suffix checks and simple digit/decimal validations

v = value

# Check float suffix (e.g., 1.23f or 10f)

if v and (v[-1] in "fF"):

core = v[:-1]

if core:

if core[0] == "-":

core2 = core[1:]

else:

core2 = core

if core2:

if "." in core2:

a, b = core2.split(".", 1)

if a.isdigit() and (b == "" or b.isdigit()):

return "float"

else:

if core2.isdigit():

return "float"

# Check double suffix (e.g., 1.23d or 10d)

if v and (v[-1] in "dD"):

core = v[:-1]

if core:

if core[0] == "-":

core2 = core[1:]

else:

core2 = core

if core2:

if "." in core2:

a, b = core2.split(".", 1)

if a.isdigit() and (b == "" or b.isdigit()):

return "double"

else:

if core2.isdigit():

return "double"

# Decimal without suffix implies double if it fits the pattern like 1. or 1.23

if "." in v:

core = v

if core and core[0] == "-":

core = core[1:]

a, b = core.split(".", 1)

if a.isdigit() and (b == "" or b.isdigit()):

return "double"

# Long suffix (e.g., 123L)

if v and (v[-1] in "lL"):

core = v[:-1]

if core:

if core[0] == "-":

core2 = core[1:]

else:

core2 = core

if core2.isdigit():

return "long"

# Integer literal (e.g., 123)

tmp = v

if tmp and tmp[0] == "-":

tmp2 = tmp[1:]

else:

tmp2 = tmp

if tmp2.isdigit():

return "int"

# Character literal ('a' or '\n')

# Matches patterns of length 3 (e.g. 'a') or length 4 (e.g. '\n')

if len(v) in (3, 4) and v.startswith("'") and v.endswith("'"):

if len(v) == 3:

return "char"

# len == 4: must be an escape like '\n'

if v[1] == "\\":

return "char"

mashraf-222 · 2026-02-25T22:31:13Z

Latest E2E Validation Report (Feb 25, 2026)

Changes in this update

Merged 3 optimization PRs targeting remove_asserts.py:
- ⚡️ Speed up method JavaAssertTransformer._generate_replacement by 13% in PR #1663 (fix/java-maven-test-execution-bugs) #1669: 13% speedup for _generate_replacement
- ⚡️ Speed up method JavaAssertTransformer._infer_return_type by 127% in PR #1663 (fix/java-maven-test-execution-bugs) #1667: 127% speedup for _infer_return_type
- ⚡️ Speed up method JavaAssertTransformer._infer_type_from_assertion_args by 12% in PR #1663 (fix/java-maven-test-execution-bugs) #1668: 12% speedup for _infer_type_from_assertion_args
Root-cause fix for Java stdlib imports — moved from CLI band-aid to AI service postprocessing (codeflash-internal#2443):
- New ensure_java_stdlib_imports() in AI service testgen pipeline (90+ stdlib classes)
- Runs before tree-sitter validation so valid tests aren't silently removed for missing imports
- Removed the CLI-side ensure_common_java_imports() band-aid (59 lines deleted)
E2E validation across 4 repos:

Repo	Function	Result	Notes
Default Java project	`Fibonacci.fibonacci`	✅ Pipeline complete	Testgen → compile → instrument → correctness → benchmark all working
Aerospike	`Utf8.encodedLength`	✅ Pipeline complete	Multi-module project, 2/5 candidates passed correctness, no speedup found
Commons Lang	`StringUtils.containsAny`	✅ Pipeline complete	Maven skip flags working correctly, no optimization found
RoaringBitmap	`Util.unsignedBinarySearch`	✅ 34% speedup found	Full pipeline success including type inference, marked as success

All 4 repos completed without errors. The full Java E2E pipeline (test generation, compilation, instrumentation, correctness verification, benchmarking) is working end-to-end.

codeflash-ai · 2026-02-25T22:31:35Z

codeflash/languages/java/remove_asserts.py

+        in_string = False
+        string_char = ""
+        cur: list[str] = []
+
+        while i < n:
+            ch = args_str[i]
+
+            if in_string:
+                cur.append(ch)
+                if ch == "\\" and i + 1 < n:
+                    i += 1
+                    cur.append(args_str[i])
+                elif ch == string_char:
+                    in_string = False
+            elif ch in ('"', "'"):
+                in_string = True
+                string_char = ch
+                cur.append(ch)
+            elif ch in ("(", "<", "[", "{"):
+                depth += 1
+                cur.append(ch)
+            elif ch in (")", ">", "]", "}"):
+                depth -= 1
+                cur.append(ch)
+            elif ch == "," and depth == 0:
+                break
+            else:
+                cur.append(ch)
+            i += 1
+
+        # Trim trailing whitespace from the extracted argument
+        if not cur:
+            return None
+        return "".join(cur).rstrip()


⚡️Codeflash found 37% (0.37x) speedup for JavaAssertTransformer._extract_first_arg in codeflash/languages/java/remove_asserts.py

⏱️ Runtime : 4.31 milliseconds → 3.15 milliseconds (best of 240 runs)

📝 Explanation and details

Runtime improvement (primary): The optimized version reduces the function's median runtime from about 4.31 ms to 3.15 ms — a ~36% speedup overall — by removing per-character Python-level allocations and doing a single substring slice at the end.

What changed (concrete optimizations)

Removed per-character accumulation into a Python list (cur.append + "".join(cur)). Instead the optimized code records the start index and scans with integer indices, producing exactly one slice (args_str[start:i]) and a single rstrip() at the end.

Replaced the "in_string" state machine that appended every character with an inner loop that advances the index over quoted strings (skipping escapes) in larger leaps (i += 2 for escaped chars, and i += 1 for normal chars). This reduces Python bytecode executed per character inside strings.

Avoided many small Python-level operations inside the hot loop (fewer method calls and list appends). Condition checks for delimiters use explicit equality chains which avoid repeated list/tuple membership checks and repeated cur.append calls.

Kept nesting depth tracking but avoided storing all characters that are not needed until the final slice.

Why this yields a speedup (Python-level reasoning)

List append per character and the final join impose heavy Python overhead (many function calls, memory operations, and a large join cost). The original profiler shows substantial time attributed to cur.append and the final "".join(cur). The optimized approach replaces those many operations with a handful of integer increments and a single slice allocation — far less Python interpreter overhead.

Fewer Python-level objects and calls in the inner loop reduces garbage/allocator pressure and bytecode dispatch cost. Index arithmetic and slicing are lower-overhead than repeated append/join work at the same string size.

The optimized inner loop also avoids re-handling escaped characters via repeated append; jumping the index over escapes reduces loop iterations for quoted string segments.

Profiler evidence

Total profiled time for the function dropped (see provided line profiler results). The heavy per-character append/join lines in the original are gone or greatly reduced in the optimized trace; time moved into index scans and a single slice/rstrip operation. This aligns with the measured runtime improvement.

Behavioral changes & trade-offs

Behavior is preserved for the intended inputs (all regression tests pass and outputs match). There are a few tiny regressions on very small edge cases in the tests (e.g., whitespace-only or leading-comma inputs are marginally slower in a couple of microbenchmarks). These regressions are minor (single-digit percentage points) and are reasonable trade-offs given the consistent and significant overall runtime reduction, especially on realistic/hot workloads.

The change does not add new dependencies or alter public signatures.

Impact on workloads (where it matters)

Big wins for:

Long or deeply nested arguments (many characters scanned) because the per-character overhead is now much smaller.

High-throughput or repeated-invocation scenarios (see tests calling the function thousands of times) — the savings per call accumulate into large throughput gains.

Small inputs see smaller absolute gains; a couple of tiny cases in tests even slowed a bit, but those are microbenchmarks and don't offset overall throughput improvements for typical usage.

Tests & suitability

Annotated tests show consistent speedups across nested-parentheses, strings with escapes, generics, and long repeated loops. The optimization is therefore especially beneficial for realistic Java argument fragments (strings with nested delimiters, generics, escaped quotes) and when the transformer is used frequently.

Summary

Primary benefit: 36% faster runtime (4.31ms → 3.15ms).

How: replace many Python-level character appends and join with index-based scanning + one slice, plus an inner loop that skips quoted segments efficiently.

Trade-offs: negligible regressions on a few tiny edge microbenchmarks, acceptable given the large gains for real workloads and repeated calls.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 2055 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import pytest # used for our unit tests from codeflash.languages.java.remove_asserts import JavaAssertTransformer def test_basic_simple_and_whitespace_handling(): # Create a real transformer instance with a sample function name. transformer = JavaAssertTransformer("assertEquals") # Simple first-arg extraction: basic comma-separated args. args = "a, b" # Expect the first top-level argument to be "a" codeflash_output = transformer._extract_first_arg(args) # 2.12μs -> 1.85μs (14.6% faster) # Leading whitespace should be skipped before extracting the first arg. args = " foo.bar(), baz" # Expect "foo.bar()" with no leading spaces preserved. codeflash_output = transformer._extract_first_arg(args) # 3.53μs -> 2.56μs (37.5% faster) # If there's no comma (single argument), the whole trimmed string is returned. args = " singleArg " # Trailing and leading whitespace trimmed; internal preserved. codeflash_output = transformer._extract_first_arg(args) # 2.90μs -> 2.00μs (44.6% faster) def test_basic_nested_delimiters_and_strings(): transformer = JavaAssertTransformer("assertThat") # Nested parentheses: the inner comma should not end the top-level argument. args = "func(1, 2), other" # The extractor should include the entire func(1, 2) as the first arg. codeflash_output = transformer._extract_first_arg(args) # 4.33μs -> 3.32μs (30.5% faster) # Strings containing commas should not be split on those commas. args = '"hello, world", next' # The quoted string with the internal comma is preserved as a unit. codeflash_output = transformer._extract_first_arg(args) # 2.44μs -> 1.79μs (35.8% faster) # Char literal containing an escaped backslash (Java char '\\'). # In Python literal we write it as "'\\\\'" to represent the two backslashes inside single quotes. args = "'\\\\', other" # Expect the char literal preserved exactly as in the input string. codeflash_output = transformer._extract_first_arg(args) # 1.23μs -> 962ns (28.2% faster) # Generics with commas inside angle brackets should be treated as nested depth. args = "Map<String, List<Integer>>, other" # The extractor should return the full generic type as first arg. codeflash_output = transformer._extract_first_arg(args) # 6.14μs -> 4.17μs (47.4% faster) def test_edge_cases_empty_and_misplaced_commas_and_casts(): transformer = JavaAssertTransformer("assertTrue") # Empty string should return None (nothing to extract). codeflash_output = transformer._extract_first_arg("") # 601ns -> 531ns (13.2% faster) # String with only whitespace should return None too. codeflash_output = transformer._extract_first_arg(" \t\n ") # 932ns -> 991ns (5.95% slower) # Leading comma indicates an empty first argument -> return None. codeflash_output = transformer._extract_first_arg(", second") # 942ns -> 1.09μs (13.7% slower) # Unbalanced parentheses (no closing paren) - extractor should return whatever it saw. args = "(a, b" # It will not find a top-level comma (the comma is inside depth>0) so return the full string. codeflash_output = transformer._extract_first_arg(args) # 2.20μs -> 1.41μs (56.1% faster) # Leading cast should be preserved (cast uses parentheses but is part of the argument). args = "(MyType) obj.method(), rest" # The extractor should include the cast and the method call. codeflash_output = transformer._extract_first_arg(args) # 5.00μs -> 3.40μs (47.2% faster) # Braces and brackets count toward nesting and should not cause early split. args = "{x, y}, second" # The braces form a single argument even though there is a comma inside. codeflash_output = transformer._extract_first_arg(args) # 1.96μs -> 1.49μs (31.5% faster) def test_string_with_escaped_quotes_inside(): transformer = JavaAssertTransformer("assertThat") # Double-quoted string containing escaped quotes: "He said \"hi\"" # Python literal must escape backslashes: "\"He said \\\"hi\\\"\", next" args = "\"He said \\\"hi\\\"\", other" # The extractor should treat the inner escaped quotes as part of the string and return the full quoted literal. codeflash_output = transformer._extract_first_arg(args) # 3.82μs -> 2.75μs (39.1% faster) # Single-quoted string containing an escaped single quote inside (char literal with escape). # Represent Java char literal '\'' in Python as "'\\''" - but to avoid quoting ambiguity we place overall string in double quotes: args = "'\\'' , following" # The extractor should include the escaped single-quote char literal. codeflash_output = transformer._extract_first_arg(args) # 1.72μs -> 1.39μs (23.7% faster) def test_large_scale_nested_parentheses_and_repeated_invocations(): transformer = JavaAssertTransformer("assertLarge") # Construct a very deep nested parentheses expression: (((...1...))) with depth 1000 depth = 1000 nested = "(" * depth + "1" + ")" * depth # Append other arguments after a top-level comma. args = nested + ", something_else" # Ensure the extractor returns the full nested expression as the first argument. codeflash_output = transformer._extract_first_arg(args) # 274μs -> 205μs (34.0% faster) # Now call the extractor repeatedly (1000 iterations) with slightly varying inputs # to ensure stability and performance under repeated use. for i in range(1000): # Alternate between a simple arg and a nested arg to exercise the code paths. if (i % 3) == 0: s = f"val{i}, rest" expected = f"val{i}" elif (i % 3) == 1: s = f"fn({i}, {i+1}), more" expected = f"fn({i}, {i+1})" else: # Use a quoted argument containing a comma to ensure string handling repeats correctly. s = f"\"x,{i}\", tail" expected = f"\"x,{i}\"" # Each invocation must return the expected first argument deterministically. codeflash_output = transformer._extract_first_arg(s) # 1.86ms -> 1.36ms (36.4% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest # used for our unit tests from codeflash.languages.java.remove_asserts import JavaAssertTransformer def test_basic_simple_first_arg_extraction(): # Create a real transformer instance with a dummy function name. t = JavaAssertTransformer("assertX") # A simple argument list: no nesting, should return the first token trimmed. codeflash_output = t._extract_first_arg("foo, bar, baz") # 2.79μs -> 2.34μs (19.2% faster) # Leading whitespace should be ignored and trailing whitespace trimmed. codeflash_output = t._extract_first_arg(" singleArg ") # 4.01μs -> 2.87μs (39.8% faster) # No comma means the whole (trimmed) string is returned. codeflash_output = t._extract_first_arg("onlyOneArg") # 2.77μs -> 1.96μs (41.3% faster) def test_edge_empty_and_whitespace_inputs_return_none(): t = JavaAssertTransformer("assertX") # Empty string -> None codeflash_output = t._extract_first_arg("") # 621ns -> 601ns (3.33% faster) # String with only whitespace -> None codeflash_output = t._extract_first_arg(" \t\n ") # 1.03μs -> 1.03μs (0.097% faster) # Leading comma means there's no first top-level argument -> None codeflash_output = t._extract_first_arg(",b") # 1.00μs -> 1.05μs (4.75% slower) # Comma immediately after whitespace still yields no first argument codeflash_output = t._extract_first_arg(" , after") # 871ns -> 922ns (5.53% slower) def test_nested_parentheses_brackets_braces_and_generics_handled(): t = JavaAssertTransformer("assertX") # Parentheses: the comma inside nested parens should not split top-level args. s = "a(b, c), d" codeflash_output = t._extract_first_arg(s) # 3.76μs -> 3.00μs (25.4% faster) # Brackets and braces (array initializers / anonymous blocks) should be respected. s2 = "new int[]{1, 2, 3}, other" codeflash_output = t._extract_first_arg(s2) # 4.80μs -> 3.43μs (40.0% faster) # Angle brackets for generics: commas inside generics shouldn't split. s3 = "Map<String, List<Integer>>, other" codeflash_output = t._extract_first_arg(s3) # 6.07μs -> 4.03μs (50.7% faster) def test_strings_with_commas_and_escaped_quotes_are_respected(): t = JavaAssertTransformer("assertX") # Double-quoted string containing a comma: should be treated as a single arg. s = '"He said, \\"hello\\"", trailing' codeflash_output = t._extract_first_arg(s) # 4.32μs -> 3.03μs (42.7% faster) # Single-quoted string containing a comma: should also be preserved. s2 = "'a,b', rest" codeflash_output = t._extract_first_arg(s2) # 1.50μs -> 1.23μs (22.0% faster) # Escaped backslash sequences inside string should not break parsing. s3 = '"path\\\\to\\\\file,withcomma", next' codeflash_output = t._extract_first_arg(s3) # 3.14μs -> 1.99μs (57.3% faster) def test_char_literals_and_edge_literals(): t = JavaAssertTransformer("assertX") # Simple char literal followed by other args. codeflash_output = t._extract_first_arg("'a', b") # 2.44μs -> 1.93μs (26.4% faster) # Escaped char literal like newline should be kept intact. codeflash_output = t._extract_first_arg("'\\n', other") # 1.39μs -> 1.06μs (31.1% faster) # A lone char (no comma) should be returned trimmed. codeflash_output = t._extract_first_arg("'z'") # 901ns -> 721ns (25.0% faster) def test_casts_and_parenthetical_prefixes(): t = JavaAssertTransformer("assertX") # A cast at the front uses parentheses which must be balanced before splitting. s = "(Type) obj.method(1, 2), somethingElse" # The first top-level argument should include the cast and the method invocation. codeflash_output = t._extract_first_arg(s) # 6.85μs -> 5.02μs (36.5% faster) # Multiple nested casts and parentheses should still work. s2 = "((A) b).c(d, e), tail" codeflash_output = t._extract_first_arg(s2) # 3.73μs -> 2.67μs (39.3% faster) def test_no_top_level_comma_returns_full_trimmed_arg(): t = JavaAssertTransformer("assertX") # Complex expression without any top-level comma should be returned entirely. complex_expr = "someFunc(1, new int[]{1,2,3}, Map.<K,V>of(k,v)) " codeflash_output = t._extract_first_arg(complex_expr) # 12.5μs -> 8.82μs (41.9% faster) def test_leading_and_trailing_whitespace_trimming(): t = JavaAssertTransformer("assertX") # Leading whitespace is skipped at start; trailing whitespace trimmed from extracted arg. codeflash_output = t._extract_first_arg(" alpha , beta") # 3.97μs -> 3.12μs (27.3% faster) codeflash_output = t._extract_first_arg(" alpha ") # 2.75μs -> 1.96μs (39.8% faster) def test_large_scale_many_items_first_arg_from_long_list(): t = JavaAssertTransformer("assertX") # Build a large comma-separated list of function calls (1000 items). # The first item itself contains nested parentheses and commas that shouldn't be split. first_item = "complexFn(" + ",".join(str(i) for i in range(10)) + ")" # nested commas inside rest = ",".join(f"fn{i}()" for i in range(1, 1000)) long_args = first_item + "," + rest # The extractor should return exactly the first complex item (with nested commas intact). codeflash_output = t._extract_first_arg(long_args) # 8.50μs -> 6.31μs (34.6% faster) def test_large_scale_loop_many_iterations_stability(): t = JavaAssertTransformer("assertX") # Call the extractor 1000 times with predictable inputs to ensure stability. for i in range(1000): s = f"value{i},other{ i }" # Each call must deterministically return the correct first argument. codeflash_output = t._extract_first_arg(s) # 2.03ms -> 1.48ms (37.2% faster) def test_complex_mixture_of_all_cases(): t = JavaAssertTransformer("assertX") # A deliberately complex mixture of generics, arrays, strings, casts, and nested calls. complex_arg = ( ' (A<B>) map.get("key,stillInString", arr[0], new HashMap<String, List<Integer>>() {{ put(1, List.of(2)); }} ) ' + ", tail" ) # Ensure extraction yields the whole complex first argument trimmed. codeflash_output = t._extract_first_arg(complex_arg); extracted = codeflash_output # 23.1μs -> 15.8μs (45.9% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1663-2026-02-25T22.31.35

Click to see suggested changes

Suggested change

in_string = False

string_char = ""

cur: list[str] = []

while i < n:

ch = args_str[i]

if in_string:

cur.append(ch)

if ch == "\\" and i + 1 < n:

i += 1

cur.append(args_str[i])

elif ch == string_char:

in_string = False

elif ch in ('"', "'"):

in_string = True

string_char = ch

cur.append(ch)

elif ch in ("(", "<", "[", "{"):

depth += 1

cur.append(ch)

elif ch in (")", ">", "]", "}"):

depth -= 1

cur.append(ch)

elif ch == "," and depth == 0:

break

else:

cur.append(ch)

i += 1

# Trim trailing whitespace from the extracted argument

if not cur:

return None

return "".join(cur).rstrip()

# Record start index of the extracted argument instead of building a list

start = i

while i < n:

ch = args_str[i]

if ch == '"' or ch == "'":

string_char = ch

# include opening quote; advance inside string while handling escapes

i += 1

while i < n:

ch2 = args_str[i]

if ch2 == "\\" and i + 1 < n:

i += 2

elif ch2 == string_char:

i += 1

break

else:

i += 1

elif ch == "(" or ch == "<" or ch == "[" or ch == "{":

depth += 1

i += 1

elif ch == ")" or ch == ">" or ch == "]" or ch == "}":

depth -= 1

i += 1

elif ch == "," and depth == 0:

# slice up to but not including the comma

res = args_str[start:i].rstrip()

if not res:

return None

return res

else:

i += 1

# If we reached the end without encountering a top-level comma

res = args_str[start:i].rstrip()

if not res:

return None

return res

mashraf-222 · 2026-02-25T23:48:56Z

Optimization Sweep Results (Feb 25, 2026)

Ran ~35 individual Java function optimizations across all 5 test repos to validate the full E2E pipeline on this branch.

Successful Optimizations (PRs Created)

Repo	Function	Speedup	PR
RoaringBitmap	`Util.unsignedBinarySearch`	34%	E2E validation (`--no-pr`)
Aerospike	`Buffer.utf8DigitsToInt`	13.56%	mashraf-222/aerospike-client-java#23
Aerospike	`Buffer.estimateSizeUtf8`	54.79%	mashraf-222/aerospike-client-java#24
Commons Lang	`NumberUtils.isCreatable`	124% (2.24x)	Optimization found, PR staging
Aerospike	`Buffer.estimateSizeUtf8Quick`	26.84%	mashraf-222/aerospike-client-java#25

Coverage by Repo

Repo	Functions Tried	Successes	Notes
Aerospike	9	3	Buffer utility functions (UTF-8 encoding/estimation) most optimizable
RoaringBitmap	8	1	Bitmap search/set operations, multi-module Maven project
Commons Lang	11	1	String/number utils, `isCreatable` had 2.24x speedup
Guava	11	0	Already heavily optimized; math, primitives, strings all tried
QuestDB	9	0	Zero-GC codebase, functions already at low-level performance ceiling

Pipeline Validation

All ~35 runs completed without pipeline errors — testgen, compilation, instrumentation, correctness verification, and benchmarking all working correctly across:

Single-module projects (default Java, Commons Lang, Guava)
Multi-module Maven projects (Aerospike, RoaringBitmap, QuestDB)
JUnit 4 (Aerospike) and JUnit 5 (all others) test frameworks
Various function signatures: static methods, instance methods, overloaded methods

…op_index parsing The previous code assumed test names with brackets always follow the pattern "testName[ N ]" (space after bracket). JUnit 5 parameterized tests produce names like "testName(int)[1]" or "testName(String)[label]" which caused a ValueError crash when parsing the loop index. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-02-26T00:48:24Z

Bug Fix: JUnit parameterized test name parsing crash (`parse_test_output.py:808`)

Problem

Line 808 in parse_test_output.py assumed test names with [ always follow the JUnit 4 pattern testName[ N ] (space after bracket, numeric index, space before closing bracket):

loop_index = int(testcase.name.split("[ ")[-1][:-2]) if testcase.name and "[" in testcase.name else 1

JUnit 5 parameterized tests produce names with different bracket patterns:

testName(int)[1] — no space after [
testName(String)[label] — non-numeric content in brackets
binaryBeMsb0ToHexDigitPosOutsideArray(int)[ — malformed/truncated

The split("[ ") splits on bracket-space, which doesn't match [ without a space. The result is the entire string, [:-2] trims the last 2 chars producing garbage, and int() raises ValueError.

Crash logs from E2E runs

Commons Lang — Conversion.intToHexDigit optimization:

  File "/home/ubuntu/code/codeflash/codeflash/verification/parse_test_output.py", line 808, in parse_test_xml
    loop_index = int(testcase.name.split("[ ")[-1][:-2]) if testcase.name and "[" in testcase.name else 1
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'binaryBeMsb0ToHexDigitPosOutsideArray(int)['

Commons Lang — StringUtils.joinWith optimization:

  File "/home/ubuntu/code/codeflash/codeflash/verification/parse_test_output.py", line 808, in parse_test_xml
    loop_index = int(testcase.name.split("[ ")[-1][:-2]) if testcase.name and "[" in testcase.name else 1
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'testJoinWith(String)['

Root Cause

Pre-existing bug (authored by Kevin Turcios on 2025-12-06, commit 33437d39). Not caused by our branch changes — git diff omni-java...fix/java-maven-test-execution-bugs -- codeflash/verification/parse_test_output.py shows zero changes from our branch.

Fix (commit `dd3bdaf8`)

Replaced the fragile one-liner with robust parsing:

loop_index = 1
if testcase.name and "[" in testcase.name:
    bracket_content = testcase.name.rsplit("[", 1)[-1].rstrip("]").strip()
    try:
        loop_index = int(bracket_content)
    except ValueError:
        loop_index = 1

This:

Uses rsplit("[", 1) to get content after the last [ (handles nested brackets)
Strips ] and whitespace from the extracted content
Tries int() conversion with a try/except — defaults to 1 for non-numeric content
Handles all JUnit 4, JUnit 5, and malformed patterns gracefully

Testing

All 27 parse_test_output tests pass
All 639 Java tests pass
Verified against 8 edge cases: testName[ 5 ], testName(int)[1], testName(String)[label], binaryBeMsb0ToHexDigitPosOutsideArray(int)[, testName[3], testName, None, testName[ 0 ]

mashraf-222 · 2026-02-26T01:57:25Z

Java E2E Optimization Sweep Report — Full Results

Comprehensive optimization sweep across 5 Java repositories using the fix/java-maven-test-execution-bugs branch. All optimizations ran with local BE services, --verbose, and PR creation enabled (no --no-pr).

Successful Optimization PRs Created

#	Repo	Function	Speedup	PR
1	aerospike-client-java	`Buffer.utf8DigitsToInt`	13.6% (1.14x)	PR #23
2	aerospike-client-java	`Buffer.estimateSizeUtf8`	54.8% (1.55x)	PR #24
3	aerospike-client-java	`Buffer.estimateSizeUtf8Quick`	26.8% (1.27x)	PR #25
4	aerospike-client-java	`Buffer.bytesToNumber`	7.2% (1.07x)	PR #26
5	aerospike-client-java	`Buffer.littleBytesToLong`	42.5% (1.43x)	PR #27
6	aerospike-client-java	`Buffer.bytesToBigInteger`	11.3% (1.11x)	PR #28
7	commons-lang	`ArrayUtils.indexOf`	10.7% (1.11x)	PR #1

Total: 7 PRs with successful optimizations across 2 repos.

Optimizations Found and Staged (PR body exceeded GitHub 65K char limit)

Repo	Function	Speedup	Notes
RoaringBitmap	`Util.unsignedBinarySearch`	1280% (13.8x)	Optimization staged — PR body too long
RoaringBitmap	`Util.cardinalityInBitmapRange`	52.2% (1.52x)	Optimization staged — PR body too long
commons-lang	`NumberUtils.isCreatable`	124% (2.24x)	Optimization staged — PR body too long

These 3 optimizations were successfully found and the cfapi correctly fell back to staging. Investigation of the cfapi logs revealed the root cause:

Root cause: The GitHub API rejects PR creation when the body exceeds 65,536 characters (Validation Failed: body is too long (maximum is 65536 characters)). The GitHub App does have proper access — branch creation succeeded for all 3, and collaborator verification passed. It's only the createNewPullRequest call that fails because the assembled PR description (optimization explanation + benchmark info + existing test source + generated test source) exceeds the 65K limit for these functions which had large test suites.

cfapi log evidence:

[2026-02-26T01:14:37] INFO  | Creating branch: codeflash/optimize-Util.cardinalityInBitmapRange-mm2rtg26  ← branch created OK
[2026-02-26T01:14:41] INFO  | Creating PR with title: ⚡️ Speed up method `Util.cardinalityInBitmapRange` by 55%
[2026-02-26T01:14:42] ERROR | Validation Failed: {"resource":"Issue","code":"custom","field":"body",
                               "message":"body is too long (maximum is 65536 characters)"}
[2026-02-26T01:14:42] INFO  | PR creation failed, falling back to staging for traceId: 8f66e934-...

Note: commons-lang/ArrayUtils.indexOf (PR #1) and all Aerospike PRs succeeded because their PR bodies were within the 65K limit — likely due to smaller test suites.

No Optimization Found (function already optimal or candidates slower)

Repo	Function	Reason
aerospike-client-java	`Utf8.encodedLength`	Candidate 3.6% slower
aerospike-client-java	`Value.get`	Existing test compilation error (anonymous inner class instrumentation limitation)
aerospike-client-java	`Packer.packString`	Void function — skipped
commons-lang	`BooleanUtils.and`	No optimization found
commons-lang	`CharUtils.isAscii`	No optimization found
commons-lang	`StringUtils.containsAny`	No optimization found
commons-lang	`StringUtils.joinWith`	Crashed on parameterized test parsing bug (now fixed)
commons-lang	`Conversion.intToHexDigit`	Crashed on parameterized test parsing bug (now fixed)
guava	`UnsignedInts.toString`	No optimization found
guava	`Longs.contains`	No optimization found
guava	`Ints.saturatedCast`	No optimization found
guava	`Shorts.toByteArray`	No optimization found
questdb	`Chars.hashCode`	Token limit exceeded (Chars.java 4000+ lines)
questdb	`Chars.endsWith`	Token limit exceeded
questdb	`Chars.isBlank`	Token limit exceeded
questdb	`Numbers.encodeLowHighInts`	Token limit exceeded (Numbers.java 4000+ lines)
questdb	`Numbers.decodeLowInt`	Token limit exceeded
questdb	`Numbers.ceilPow2`	Token limit exceeded
questdb	`Hash.hashLong32`	No optimization found
questdb	`Hash.spread`	No optimization found
questdb	`Hash.boundedHash`	No optimization found
RoaringBitmap	`Util.select`	All candidates slower
RoaringBitmap	`ArrayContainer.numberOfRuns`	Slow — 195 overloads caused 27+ min processing

Bugs Found During Sweep

1. JUnit Parameterized Test Name Parsing Crash (Fixed — commit `dd3bdaf8`)

File: parse_test_output.py:808
Severity: High — crashes the entire optimization
Status: Fixed and pushed

Line 808 assumed test names with [ follow the pattern testName[ N ]. JUnit 5 parameterized tests produce testName(int)[1] or testName(String)[label] which caused ValueError. Fixed with robust rsplit/try-except parsing.

ValueError: invalid literal for int() with base 10: 'binaryBeMsb0ToHexDigitPosOutsideArray(int)['
ValueError: invalid literal for int() with base 10: 'testJoinWith(String)['

2. Anonymous Inner Class Instrumentation Limitation (Not fixed — instrumentation)

File: instrumentation.py — _add_timing_instrumentation() / _is_inside_lambda()
Severity: Medium — existing tests fail compilation, falls back to generated tests only
Status: Open — documented with root cause and suggested fix

The instrumentation code walks into anonymous inner class method bodies when looking for target function calls. It inserts timing markers at invalid positions inside anonymous class definitions, causing "illegal start of expression" compilation errors. Affects Aerospike tests (TestAsyncBatch, TestAsyncUDF).

3. QuestDB/Large File Token Limit (Known limitation)

Files like Numbers.java (4000+ lines) and Chars.java (4000+ lines) exceed the AI service's read-writable code token budget. All QuestDB optimizations on these files fail with "Read-writable code has exceeded token limit, cannot proceed". Smaller files like Hash.java (225 lines) work fine but didn't yield optimizations.

Summary

Total optimization attempts: ~35 across 5 repos
Successful PRs: 7 (6 Aerospike + 1 Commons Lang)
Optimizations found and staged: 3 (2 RoaringBitmap + 1 Commons Lang) — PR body exceeded GitHub's 65K char limit
Bugs found and fixed: 1 (parameterized test parsing)
Bugs found and documented: 1 (anonymous inner class instrumentation)
Best speedup: Util.unsignedBinarySearch at 13.8x (RoaringBitmap, staged)
Best PR'd speedup: Buffer.estimateSizeUtf8 at 54.8% (Aerospike) — PR #24

mashraf-222 and others added 3 commits February 25, 2026 15:44

codeflash-ai bot mentioned this pull request Feb 25, 2026

⚡️ Speed up function ensure_common_java_imports by 501% in PR #1663 (fix/java-maven-test-execution-bugs) #1664

Closed

HeshamHM28 requested changes Feb 25, 2026

View reviewed changes

mashraf-222 and others added 2 commits February 25, 2026 20:19

codeflash-ai bot mentioned this pull request Feb 25, 2026

⚡️ Speed up method JavaAssertTransformer._infer_return_type by 127% in PR #1663 (fix/java-maven-test-execution-bugs) #1667

Merged

codeflash-ai bot mentioned this pull request Feb 25, 2026

⚡️ Speed up method JavaAssertTransformer._infer_type_from_assertion_args by 12% in PR #1663 (fix/java-maven-test-execution-bugs) #1668

Merged

codeflash-ai bot reviewed Feb 25, 2026

View reviewed changes

codeflash-ai bot mentioned this pull request Feb 25, 2026

⚡️ Speed up method JavaAssertTransformer._generate_replacement by 13% in PR #1663 (fix/java-maven-test-execution-bugs) #1669

Merged

Merge pull request #1669 from codeflash-ai/codeflash/optimize-pr1663-…

d57b5c9

…2026-02-25T20.52.19 ⚡️ Speed up method `JavaAssertTransformer._generate_replacement` by 13% in PR #1663 (`fix/java-maven-test-execution-bugs`)

Merge pull request #1667 from codeflash-ai/codeflash/optimize-pr1663-…

ed1d2d2

…2026-02-25T20.29.24 ⚡️ Speed up method `JavaAssertTransformer._infer_return_type` by 127% in PR #1663 (`fix/java-maven-test-execution-bugs`)

Merge pull request #1668 from codeflash-ai/codeflash/optimize-pr1663-…

5338bb4

…2026-02-25T20.34.08 ⚡️ Speed up method `JavaAssertTransformer._infer_type_from_assertion_args` by 12% in PR #1663 (`fix/java-maven-test-execution-bugs`)

codeflash-ai bot reviewed Feb 25, 2026

View reviewed changes

mashraf-222 requested a review from HeshamHM28 February 25, 2026 22:38

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 47 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

-        if self._FLOAT_LITERAL_RE.match(value):
-            return "float"
-        if self._DOUBLE_LITERAL_RE.match(value):
-            return "double"
-        if self._LONG_LITERAL_RE.match(value):
-            return "long"
-        if self._INT_LITERAL_RE.match(value):
-            return "int"
-        if self._CHAR_LITERAL_RE.match(value):
-            return "char"
+        # Fast-path numeric suffix checks and simple digit/decimal validations
+        v = value
+        # Check float suffix (e.g., 1.23f or 10f)
+        if v and (v[-1] in "fF"):
+            core = v[:-1]
+            if core:
+                if core[0] == "-":
+                    core2 = core[1:]
+                else:
+                    core2 = core
+                if core2:
+                    if "." in core2:
+                        a, b = core2.split(".", 1)
+                        if a.isdigit() and (b == "" or b.isdigit()):
+                            return "float"
+                    else:
+                        if core2.isdigit():
+                            return "float"
+        # Check double suffix (e.g., 1.23d or 10d)
+        if v and (v[-1] in "dD"):
+            core = v[:-1]
+            if core:
+                if core[0] == "-":
+                    core2 = core[1:]
+                else:
+                    core2 = core
+                if core2:
+                    if "." in core2:
+                        a, b = core2.split(".", 1)
+                        if a.isdigit() and (b == "" or b.isdigit()):
+                            return "double"
+                    else:
+                        if core2.isdigit():
+                            return "double"
+        # Decimal without suffix implies double if it fits the pattern like 1. or 1.23
+        if "." in v:
+            core = v
+            if core and core[0] == "-":
+                core = core[1:]
+            a, b = core.split(".", 1)
+            if a.isdigit() and (b == "" or b.isdigit()):
+                return "double"
+        # Long suffix (e.g., 123L)
+        if v and (v[-1] in "lL"):
+            core = v[:-1]
+            if core:
+                if core[0] == "-":
+                    core2 = core[1:]
+                else:
+                    core2 = core
+                if core2.isdigit():
+                    return "long"
+        # Integer literal (e.g., 123)
+        tmp = v
+        if tmp and tmp[0] == "-":
+            tmp2 = tmp[1:]
+        else:
+            tmp2 = tmp
+        if tmp2.isdigit():
+            return "int"
+        # Character literal ('a' or '\n')
+        # Matches patterns of length 3 (e.g. 'a') or length 4 (e.g. '\n')
+        if len(v) in (3, 4) and v.startswith("'") and v.endswith("'"):
+            if len(v) == 3:
+                return "char"
+            # len == 4: must be an escape like '\n'
+            if v[1] == "\\":
+                return "char"

Conversation

mashraf-222 commented Feb 25, 2026

Problems fixed

Root causes

Solutions implemented

Code changes

Testing

Impact

Uh oh!

mashraf-222 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Validation Report

Baseline validations

1. Default Java project — Fibonacci.fibonacci (--no-pr)

2. Aerospike — Utf8.encodedLength (with PR creation)

Bug-specific validations on previously-blocked repos

3. Commons Lang — StringUtils.containsAny (Phase 1: Rat skip fix)

4. Guava — Strings.repeat (Phase 2: surefire fix)

5. RoaringBitmap — Util.unsignedBinarySearch (Phase 3: auto-import fix)

6. QuestDB — Numbers.ceilPow2 (known limitation, out of scope)

Summary table

Uh oh!

HeshamHM28 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

mashraf-222 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

mashraf-222 commented Feb 25, 2026

Round 2 E2E Validation Results

New Fixes

E2E Results (8 optimizations across 4 repos)

Key Progress

Remaining Bugs (not in this PR's scope)

Uh oh!

codeflash-ai bot commented Feb 25, 2026

⚡️ Codeflash found optimizations for this PR

📄 127% (1.27x) speedup for JavaAssertTransformer._infer_return_type in codeflash/languages/java/remove_asserts.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method JavaAssertTransformer._infer_return_type by 127% in PR #1663 (fix/java-maven-test-execution-bugs) #1667

Uh oh!

codeflash-ai bot commented Feb 25, 2026

⚡️ Codeflash found optimizations for this PR

📄 12% (0.12x) speedup for JavaAssertTransformer._infer_type_from_assertion_args in codeflash/languages/java/remove_asserts.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method JavaAssertTransformer._infer_type_from_assertion_args by 12% in PR #1663 (fix/java-maven-test-execution-bugs) #1668

Uh oh!

codeflash-ai bot Feb 25, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 96% (0.96x) speedup for JavaAssertTransformer._split_top_level_args in codeflash/languages/java/remove_asserts.py

Uh oh!

codeflash-ai bot commented Feb 25, 2026

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for JavaAssertTransformer._generate_replacement in codeflash/languages/java/remove_asserts.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method JavaAssertTransformer._generate_replacement by 13% in PR #1663 (fix/java-maven-test-execution-bugs) #1669

Uh oh!

codeflash-ai bot commented Feb 25, 2026

Uh oh!

codeflash-ai bot commented Feb 25, 2026

Uh oh!

codeflash-ai bot commented Feb 25, 2026

Uh oh!

codeflash-ai bot Feb 25, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 64% (0.64x) speedup for JavaAssertTransformer._type_from_literal in codeflash/languages/java/remove_asserts.py

Uh oh!

mashraf-222 commented Feb 25, 2026

Latest E2E Validation Report (Feb 25, 2026)

Changes in this update

Uh oh!

codeflash-ai bot Feb 25, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 37% (0.37x) speedup for JavaAssertTransformer._extract_first_arg in codeflash/languages/java/remove_asserts.py

Uh oh!

mashraf-222 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimization Sweep Results (Feb 25, 2026)

Successful Optimizations (PRs Created)

Coverage by Repo

mashraf-222 commented Feb 25, 2026 •

edited

Loading

1. Default Java project — `Fibonacci.fibonacci` (`--no-pr`)

2. Aerospike — `Utf8.encodedLength` (with PR creation)

3. Commons Lang — `StringUtils.containsAny` (Phase 1: Rat skip fix)

4. Guava — `Strings.repeat` (Phase 2: surefire fix)

5. RoaringBitmap — `Util.unsignedBinarySearch` (Phase 3: auto-import fix)

6. QuestDB — `Numbers.ceilPow2` (known limitation, out of scope)

📄 127% (1.27x) speedup for `JavaAssertTransformer._infer_return_type` in `codeflash/languages/java/remove_asserts.py`

⚡️ Speed up method `JavaAssertTransformer._infer_return_type` by 127% in PR #1663 (`fix/java-maven-test-execution-bugs`) #1667

📄 12% (0.12x) speedup for `JavaAssertTransformer._infer_type_from_assertion_args` in `codeflash/languages/java/remove_asserts.py`

⚡️ Speed up method `JavaAssertTransformer._infer_type_from_assertion_args` by 12% in PR #1663 (`fix/java-maven-test-execution-bugs`) #1668

⚡️Codeflash found 96% (0.96x) speedup for `JavaAssertTransformer._split_top_level_args` in `codeflash/languages/java/remove_asserts.py`

📄 13% (0.13x) speedup for `JavaAssertTransformer._generate_replacement` in `codeflash/languages/java/remove_asserts.py`

⚡️ Speed up method `JavaAssertTransformer._generate_replacement` by 13% in PR #1663 (`fix/java-maven-test-execution-bugs`) #1669

⚡️Codeflash found 64% (0.64x) speedup for `JavaAssertTransformer._type_from_literal` in `codeflash/languages/java/remove_asserts.py`

⚡️Codeflash found 37% (0.37x) speedup for `JavaAssertTransformer._extract_first_arg` in `codeflash/languages/java/remove_asserts.py`

mashraf-222 commented Feb 25, 2026 •

edited

Loading

mashraf-222 commented Feb 26, 2026 •

edited

Loading

Bug Fix: JUnit parameterized test name parsing crash (`parse_test_output.py:808`)

Fix (commit `dd3bdaf8`)

mashraf-222 commented Feb 26, 2026 •

edited

Loading

1. JUnit Parameterized Test Name Parsing Crash (Fixed — commit `dd3bdaf8`)