[WIP] Array theory improvements by tautschnig · Pull Request #8905 · diffblue/cbmc

tautschnig · 2026-03-28T19:41:58Z

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Skip generating Ackermann constraints for arrays that are derived from other arrays via with (store), if, array_of, array constants, array_comprehension, typecast, or let expressions. For a derived array such as x = y with [k := v], the Ackermann constraint i1 = i2 => x[i1] = x[i2] is already implied by: (1) the with constraint: k != j => x[j] = y[j], and (2) the Ackermann constraint on the base array y. This is the read-over-weakeq optimisation from the theory of weakly equivalent arrays (Christ & Hoenicke, 2014). The same reasoning applies to if, array_of, and other derived array expressions, all of which already have constraints connecting them element-wise to their constituent arrays. With 5 stores to the same unbounded array the Ackermann constraint count drops from 110 to 60; with 40 stores it drops from 63180 to 31980 (approximately 50% reduction in all cases). Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

collect_arrays() and add_array_constraints() restricted member expressions to only those whose struct operand was a symbol or nondet_symbol. With --arrays-uf-always, member expressions can have more complex struct operands (index, nested member, struct literal) when arrays are embedded in structs. These are base array expressions that need no special constraint generation, just like symbols. The overly strict invariant caused crashes on tests like Array_operations2, array-bug-6230, byte_update18, and bounds_check1 when run with --arrays-uf-always. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

When lower_byte_operators() is applied to array equalities and index expressions, the result can contain member-of-struct-literal expressions (e.g., member(struct{...}, A)) that the array theory treats as opaque base arrays. These expressions are trivially simplifiable to the actual array value by simplify_expr. Without simplification, the array theory registers these as unconstrained arrays, leading to spurious counterexamples. With simplification, member(struct_literal, field) is reduced to the field value, allowing the array theory to properly constrain it. Also handle the case where simplification fully resolves an index expression (e.g., index into a constant array with constant index), so the result is no longer an index_exprt. This fixes byte_update18 and array-bug-6230 when run with --arrays-uf-always. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

When arrays are cast between types with different element sizes (e.g., SIMD reinterpretation int32[4] <-> int64[2]), the array theory's element-wise constraint a[i]=b[i] is incorrect because elements don't correspond one-to-one. The bitvector-level conversion handles these casts correctly as bitwise copies. Three changes: 1. collect_arrays: don't unify typecast arrays when element types differ, since they have different index domains. 2. add_array_constraints: skip generating typecast constraints when element types differ. 3. convert_index: when indexing into a typecast with different element types, lower to byte_extract on the source array instead of using the array decision procedure. This fixes SIMD1 when run with --arrays-uf-always. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

…-always Add a KNOWNBUG test descriptor that runs Array_operations2 with --arrays-uf-always, documenting the root cause: the array theory's union-find transitively unifies structurally different arrays that share the same initial value literal. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

After lower_byte_operators and simplify_expr, an array equality may simplify to a non-equal expression (e.g., true/false). Check the result before calling to_equal_expr to avoid an invariant violation. This fixes crashes in address_space_size_limit3, byte_update14, and array-cell-sensitivity7 when run with --arrays-uf-always. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add a cbmc-arrays-uf-always CMake test profile that runs all CORE regression tests with --arrays-uf-always, following the same pattern as the existing cbmc-cprover-smt2 and cbmc-new-smt-backend profiles. Introduce two new test tags: - broken-arrays-uf-always: tests with known incorrect results (Array_operations2 union-find conflation, trace-values format) - thorough-arrays-uf-always: tests too slow for CI (bounds_check1, union/union_large_array) Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Change the with-expression and if-expression array constraints from eager to lazy (deferred to refinement). Previously only Ackermann constraints were lazy, making --refine-arrays largely ineffective since the bulk of array constraints were still added eagerly. With this change, --refine-arrays defers with, if, and Ackermann constraints, adding them only when the refinement loop detects they are violated by the current model. This can dramatically reduce the number of constraints passed to the SAT solver. Benchmark on bounds_check1 with --arrays-uf-always: Without --refine-arrays: 83s, 287MB With --refine-arrays: 2s, 233MB Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add a test descriptor that runs bounds_check1 with both flags, documenting the performance improvement from lazy with/if/Ackermann constraints (~2s vs ~85s without --refine-arrays). Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add a cbmc-arrays-uf-always-refine-arrays CMake test profile that runs all CORE tests with both flags. This documents the performance improvement from lazy with/if/Ackermann constraints. Introduce broken-refine-arrays tag for tests with pre-existing --refine-arrays issues (MiniSat eliminated-variable crashes): Multi_Dimensional_Array6, address_space_size_limit3, array_of_bool_as_bitvec (SMT output), Array_UF23 (constraint counts) Total test execution time comparison: --arrays-uf-always: 84s (1093 tests, 71 skipped) --arrays-uf-always --refine-arrays: 75s (1090 tests, 75 skipped) Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

MiniSat's SimpSolver eliminates propositional variables during preprocessing. When --refine-arrays adds clauses incrementally in the refinement loop, it can reference variables that were eliminated, causing an assertion failure in addClause_. This is the same issue that --refine-strings already works around. Disable the simplifier when --refine-arrays is active, matching the existing pattern for --refine-strings. Fixes the crash in Multi_Dimensional_Array6 and address_space_size_limit3 when run with --arrays-uf-always --refine-arrays. Remove the broken-refine-arrays tag from address_space_size_limit3 since it now passes. The remaining broken-refine-arrays tests (Multi_Dimensional_Array6, Array_UF23, array_of_bool_as_bitvec) have pre-existing --refine-arrays issues unrelated to the simplifier: - Multi_Dimensional_Array6/Array_UF23: refinement loop does not converge (insufficient constraint activation in arrays_overapproximated) - array_of_bool_as_bitvec: --refine-arrays overrides --smt2 --outfile Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Making with/if constraints lazy for --refine-arrays caused the refinement loop to not converge: the arrays_overapproximated() check evaluates each lazy constraint against the current SAT model, but with/if constraints can appear satisfied by coincidence when the solver freely assigns array element values. This led to spurious counterexamples (Multi_Dimensional_Array6) and infinite loops (Array_UF23). Only Ackermann constraints are suitable for lazy refinement because their activation depends on index equality, which is correctly determined from the bitvector encoding in the SAT model. The main performance benefit of --refine-arrays comes from lazy Ackermann constraints (e.g., bounds_check1: 83s -> 2s), so keeping with/if eager does not significantly impact the speedup. Remove broken-refine-arrays tag from Multi_Dimensional_Array6 since it now passes correctly. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

lower_byte_extract_array_vector expands byte_extract of an array type into an element-by-element array_exprt. For large arrays (e.g., char[100000] in a union), this creates N sub-expressions that are each recursively lowered and simplified, resulting in O(N^2) behavior. When the array size exceeds MAX_FLATTENED_ARRAY_SIZE (1000), use array_comprehension_exprt instead. This path already exists for arrays with non-constant size; now it is also used for large constant-size arrays. This reduces the lowering from O(N) expressions to O(1). Performance on union_large_array (char[100000] in a union): Before: >120s with --arrays-uf-always After: 2.3s with --arrays-uf-always Remove thorough-arrays-uf-always tag from union_large_array.desc since the test now completes in 2 seconds. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

The invariant violation for member expressions with index struct operands was already fixed in commit db83e74d86 ("Accept member expressions with arbitrary struct operands in array solver"). Promote the test to CORE and update the description to reflect that this is a passing test. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

When indexing an array of structs containing array members through a nondeterministic index (e.g., a[i].d[0]), the expression member(index(outer_array, i), field) is an array that the array theory treats as an opaque base array, generating no constraints for it. This causes the bitvector solver to leave the element values unconstrained, producing spurious counterexamples. Fix: in convert_index, when the array operand is a member expression whose struct operand is not a symbol/nondet_symbol and the array type has a known finite size, bypass the array decision procedure and fall through to the bounded-array encoding. This lets the bitvector solver directly extract the member field bits from the struct bitvector (which is properly constrained by the outer array's theory), then select the element based on the index. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

… arrays Add a KNOWNBUG test for a spurious counterexample when --arrays-uf-always is used with an array of structs containing array members with 65+ elements, accessed through a nondeterministic index. Structs with array members of 64 or fewer elements work correctly (covered by the arrays-uf-always-member-soundness test). Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

When --arrays-uf-always is used, symbol arrays have two disconnected representations in the bitvector solver: 1. Map literals (from convert_symbol) - used when the full bitvector is needed, e.g., in struct literals 2. Element-wise free variables (from convert_index via array theory) - used for individual element access The array theory constrains the element-wise free variables (via with, if, Ackermann constraints) but not the map literals. When a symbol array appears inside a struct that is an element of another array, the struct's bitvector uses the unconstrained map literals, causing spurious counterexamples. Fix: in boolbv_set_equality_to_true, when processing an equality for a symbol of unbounded array type with known finite size (up to MAX_FLATTENED_ARRAY_SIZE elements), connect the symbol's map literals to the element-wise bitvectors by calling convert_bv on index expressions for each element and using map.set_literals to equate them. This ensures that the array theory's element-wise constraints propagate to the map literals used in struct contexts. The equality is still passed to the array theory for element-wise constraint generation. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

Add TIMEOUT 1200 to the set_tests_properties call in both copies of the add_test_pl_profile macro (regression/CMakeLists.txt and regression/libcprover-cpp/CMakeLists.txt). This ensures CTest will kill and report as failed any regression test that exceeds 20 minutes, preventing CI jobs from hanging indefinitely on tests that time out. Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>

tautschnig and others added 18 commits March 4, 2026 15:13

tautschnig self-assigned this Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Array theory improvements#8905

[WIP] Array theory improvements#8905
tautschnig wants to merge 18 commits intodiffblue:developfrom
tautschnig:array-fixes

tautschnig commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tautschnig commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant