fix: reject vacuous equivalence and deserialization error false matches in Java comparator by mashraf-222 · Pull Request #1651 · codeflash-ai/codeflash

mashraf-222 · 2026-02-24T19:57:55Z

Problems fixed

The Java Comparator (Comparator.java) could return equivalent=true despite having zero evidence of actual equivalence. Two distinct correctness gaps existed:

Issue 1 — Vacuous equivalence: If both SQLite databases were empty (e.g., test infrastructure failed before any function ran), or if every invocation was skipped due to placeholder or deserialization errors, the diffs list stayed empty and equivalent=true was returned. No guard existed anywhere in the chain — not in the Java comparator, not in the Python wrapper, and not in the function optimizer. This could silently approve a broken optimization that produced no return values at all.

Issue 2 — DeserializationError false match: When Kryo deserialization fails (e.g., class version mismatch after optimization changed imports), safeDeserialize() returns a synthetic Map{"__type":"DeserializationError","error":"..."}. If deserialization failed identically on both sides, the two error maps would pass compareMaps() as equivalent — even though the actual return values were never compared. This is analogous to comparing two broken thermometers that both show "ERR" and concluding the temperature is the same.

Both gaps are modeled after guards already present in the Python equivalence pipeline: equivalence.py:32-33 rejects empty results, equivalence.py:145-146 rejects all-timeout scenarios, and comparator.py:103-107 rejects PicklePlaceholderAccessError.

Root causes

The equivalence check was diffs.isEmpty() with no minimum-comparison requirement. An empty diff list is the default state before any comparisons run, so zero comparisons trivially satisfied the condition.
safeDeserialize() converts deserialization failures into a Map with a __type key, but no code checked for this sentinel value before feeding it into compareMaps(). Two identical error maps passed deep comparison as a legitimate match.
There was no observability into what the comparator actually did — no counters for how many invocations were compared vs skipped, making these silent failures invisible in logs.

Solutions implemented

Defense-in-depth across both Java and Python layers:

Java Comparator (Comparator.java):

Added three tracking counters: actualComparisons, skippedPlaceholders, skippedDeserializationErrors
Changed the equivalence condition from diffs.isEmpty() to diffs.isEmpty() && actualComparisons > 0 — equivalence now requires at least one real comparison
Added isDeserializationError() helper that detects synthetic error maps before they reach compareMaps()
Invocations with deserialization errors on either side are skipped (not counted as comparisons, not counted as diffs)
All three counters are included in the JSON output and in a stderr diagnostic line

Python wrapper (comparator.py):

Independent zero-comparisons guard: if actualComparisons == 0, return NOT equivalent regardless of what Java reported
Backward compatible: old JARs that don't output actualComparisons default to -1, which doesn't trigger the guard
Enhanced logging with all new counter fields for debugging
Fixed stale docstring that said "Gson" when the implementation uses Kryo

Testability refactor:

Extracted compareDatabases() from main() as a package-private static method, since main() calls System.exit() which kills the test runner
main() is now a thin wrapper that calls compareDatabases(), prints the result, and exits

Code changes

`codeflash-java-runtime/src/main/java/com/codeflash/Comparator.java`

Extracted core logic from main() into static String compareDatabases(String, String) for testability
Added isDeserializationError(Object) helper — checks if an object is a synthetic {"__type":"DeserializationError",...} map
Added actualComparisons, skippedPlaceholders, skippedDeserializationErrors counters
Changed equivalence condition: diffs.isEmpty() && actualComparisons > 0
Added all counters to JSON output fields
Added [codeflash-comparator] stderr diagnostic line with all counter values

`codeflash/languages/java/comparator.py`

Extract actualComparisons, skippedPlaceholders, skippedDeserializationErrors from Java JSON output
Added zero-comparisons guard: if actual_comparisons == 0: return False, [] with WARNING log
Added stderr debug logging for Java comparator output
Updated info log to include all new counter fields
Fixed docstring: "Gson" → "Kryo"

`codeflash-java-runtime/src/test/java/com/codeflash/ComparatorCorrectnessTest.java` (new)

9 test cases exercising all correctness scenarios via temporary SQLite databases:

testEmptyDatabases — vacuous equivalence guard (zero rows)
testAllPlaceholderSkips — all invocations skipped due to KryoPlaceholder
testDeserializationErrorSkipped — corrupted Kryo data triggers deserialization error skip
testMixedRealAndPlaceholder — mix of real comparisons and placeholder skips
testNormalHappyPath — standard matching return values
testNormalMismatch — differing return values produce diffs
testVoidMethodsBothNull — void methods count as real comparisons
testOneSideEmpty — asymmetric failure (one DB empty)
testIsDeserializationError — unit test for the helper method

Testing

Unit tests:

mvn test (in codeflash-java-runtime/)
Tests run: 236, Failures: 0, Errors: 0, Skipped: 0 — BUILD SUCCESS

All 9 new tests pass alongside 227 existing tests.

E2E validation — Fibonacci (--no-pr):

[codeflash-comparator] total=10 compared=10 skipped_placeholders=0 skipped_deser_errors=0 diffs=0 equivalent=true
Java comparison: equivalent (10 invocations, 10 compared, 0 placeholder skips, 0 deser skips, 0 diffs)

Optimization found and marked as success.

E2E validation — Aerospike Utf8.encodedLength (with PR):

[codeflash-comparator] total=3 compared=3 skipped_placeholders=0 skipped_deser_errors=0 diffs=0 equivalent=true
Java comparison: equivalent (3 invocations, 3 compared, 0 placeholder skips, 0 deser skips, 0 diffs)

Completed without errors (no optimization found — function already optimal).

E2E validation — Calculator.calculateStats (Map return type, --no-pr):

[codeflash-comparator] total=1 compared=1 skipped_placeholders=0 skipped_deser_errors=0 diffs=0 equivalent=true
Java comparison: equivalent (1 invocations, 1 compared, 0 placeholder skips, 0 deser skips, 0 diffs)

Complex object (Map<String, Double>) through Kryo round-trip. Optimization found and succeeded.

E2E validation — NetworkStats.getConnectionInfo (Socket in return value, --no-pr):

[codeflash-comparator] total=2 compared=0 skipped_placeholders=2 skipped_deser_errors=0 diffs=0 equivalent=false
Java comparison: no actual comparisons performed (total=2, skipped_placeholders=2, skipped_deser_errors=0). Treating as NOT equivalent.

Confirmed the placeholder skip path and the Python zero-comparisons guard both work correctly in a real E2E flow.

Impact

Prevents silently accepting broken optimizations that produce no return values or only deserialization errors
Adds observability into the comparison process — every run now logs exactly how many invocations were compared, skipped, or diffed
Backward compatible — old JARs without the new fields continue to work with the updated Python wrapper
Modeled after existing Python-side guards for consistency across language pipelines

🤖 Generated with Claude Code

…es in Java comparator Guard against two correctness gaps where the comparator could return equivalent=true with zero evidence: empty databases and identical DeserializationError maps. Add tracking counters and stderr logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…guard The test previously expected empty databases to return equivalent=True, which was the exact bug being fixed. Updated to assert equivalent=False. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-03T19:42:15Z

The resources JAR used by CI unit tests was stale and didn't include the Comparator changes from this PR. This caused test_comparator_empty_table to fail on all CI runners. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The unit-tests workflow relied on a pre-committed JAR binary in resources/ which could become stale when Comparator.java changes. Now the workflow builds the JAR from source and installs it to the local Maven repo, matching what java-e2e and fibonacci-nogit already do. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 requested review from HeshamHM28 and misrasaurabh1 and removed request for misrasaurabh1 February 24, 2026 20:01

mashraf-222 and others added 2 commits March 3, 2026 19:40

test: update empty-table test to expect not-equivalent after vacuous …

e6ca15c

…guard The test previously expected empty databases to return equivalent=True, which was the exact bug being fixed. Updated to assert equivalent=False. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 force-pushed the fix/java-comparator-vacuous-equivalence branch from fde20ed to e6ca15c Compare March 3, 2026 19:41

mashraf-222 and others added 2 commits March 3, 2026 19:59

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Mar 3, 2026

mashraf-222 merged commit dc7caa3 into omni-java Mar 3, 2026
25 of 31 checks passed

mashraf-222 deleted the fix/java-comparator-vacuous-equivalence branch March 3, 2026 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reject vacuous equivalence and deserialization error false matches in Java comparator#1651

fix: reject vacuous equivalence and deserialization error false matches in Java comparator#1651
mashraf-222 merged 4 commits intoomni-javafrom
fix/java-comparator-vacuous-equivalence

mashraf-222 commented Feb 24, 2026

Uh oh!

claude bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mashraf-222 commented Feb 24, 2026

Problems fixed

Root causes

Solutions implemented

Code changes

codeflash-java-runtime/src/main/java/com/codeflash/Comparator.java

codeflash/languages/java/comparator.py

codeflash-java-runtime/src/test/java/com/codeflash/ComparatorCorrectnessTest.java (new)

Testing

Impact

Uh oh!

claude bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`codeflash-java-runtime/src/main/java/com/codeflash/Comparator.java`

`codeflash/languages/java/comparator.py`

`codeflash-java-runtime/src/test/java/com/codeflash/ComparatorCorrectnessTest.java` (new)

claude bot commented Mar 3, 2026 •

edited

Loading