Skip to content

[WIP] Array theory improvements#8905

Draft
tautschnig wants to merge 18 commits intodiffblue:developfrom
tautschnig:array-fixes
Draft

[WIP] Array theory improvements#8905
tautschnig wants to merge 18 commits intodiffblue:developfrom
tautschnig:array-fixes

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

tautschnig and others added 18 commits March 4, 2026 15:13
Skip generating Ackermann constraints for arrays that are derived from
other arrays via with (store), if, array_of, array constants,
array_comprehension, typecast, or let expressions.

For a derived array such as x = y with [k := v], the Ackermann
constraint i1 = i2 => x[i1] = x[i2] is already implied by:
  (1) the with constraint: k != j => x[j] = y[j], and
  (2) the Ackermann constraint on the base array y.
This is the read-over-weakeq optimisation from the theory of weakly
equivalent arrays (Christ & Hoenicke, 2014).

The same reasoning applies to if, array_of, and other derived array
expressions, all of which already have constraints connecting them
element-wise to their constituent arrays.

With 5 stores to the same unbounded array the Ackermann constraint
count drops from 110 to 60; with 40 stores it drops from 63180 to
31980 (approximately 50% reduction in all cases).

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
collect_arrays() and add_array_constraints() restricted member
expressions to only those whose struct operand was a symbol or
nondet_symbol. With --arrays-uf-always, member expressions can have
more complex struct operands (index, nested member, struct literal)
when arrays are embedded in structs. These are base array expressions
that need no special constraint generation, just like symbols.

The overly strict invariant caused crashes on tests like
Array_operations2, array-bug-6230, byte_update18, and bounds_check1
when run with --arrays-uf-always.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
When lower_byte_operators() is applied to array equalities and index
expressions, the result can contain member-of-struct-literal
expressions (e.g., member(struct{...}, A)) that the array theory
treats as opaque base arrays. These expressions are trivially
simplifiable to the actual array value by simplify_expr.

Without simplification, the array theory registers these as
unconstrained arrays, leading to spurious counterexamples. With
simplification, member(struct_literal, field) is reduced to the
field value, allowing the array theory to properly constrain it.

Also handle the case where simplification fully resolves an index
expression (e.g., index into a constant array with constant index),
so the result is no longer an index_exprt.

This fixes byte_update18 and array-bug-6230 when run with
--arrays-uf-always.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
When arrays are cast between types with different element sizes (e.g.,
SIMD reinterpretation int32[4] <-> int64[2]), the array theory's
element-wise constraint a[i]=b[i] is incorrect because elements don't
correspond one-to-one. The bitvector-level conversion handles these
casts correctly as bitwise copies.

Three changes:
1. collect_arrays: don't unify typecast arrays when element types
   differ, since they have different index domains.
2. add_array_constraints: skip generating typecast constraints when
   element types differ.
3. convert_index: when indexing into a typecast with different element
   types, lower to byte_extract on the source array instead of using
   the array decision procedure.

This fixes SIMD1 when run with --arrays-uf-always.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
…-always

Add a KNOWNBUG test descriptor that runs Array_operations2 with
--arrays-uf-always, documenting the root cause: the array theory's
union-find transitively unifies structurally different arrays that
share the same initial value literal.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
After lower_byte_operators and simplify_expr, an array equality may
simplify to a non-equal expression (e.g., true/false). Check the
result before calling to_equal_expr to avoid an invariant violation.

This fixes crashes in address_space_size_limit3, byte_update14, and
array-cell-sensitivity7 when run with --arrays-uf-always.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add a cbmc-arrays-uf-always CMake test profile that runs all CORE
regression tests with --arrays-uf-always, following the same pattern
as the existing cbmc-cprover-smt2 and cbmc-new-smt-backend profiles.

Introduce two new test tags:
- broken-arrays-uf-always: tests with known incorrect results
  (Array_operations2 union-find conflation, trace-values format)
- thorough-arrays-uf-always: tests too slow for CI
  (bounds_check1, union/union_large_array)

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Change the with-expression and if-expression array constraints from
eager to lazy (deferred to refinement). Previously only Ackermann
constraints were lazy, making --refine-arrays largely ineffective
since the bulk of array constraints were still added eagerly.

With this change, --refine-arrays defers with, if, and Ackermann
constraints, adding them only when the refinement loop detects they
are violated by the current model. This can dramatically reduce the
number of constraints passed to the SAT solver.

Benchmark on bounds_check1 with --arrays-uf-always:
  Without --refine-arrays: 83s, 287MB
  With --refine-arrays:     2s, 233MB

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add a test descriptor that runs bounds_check1 with both flags,
documenting the performance improvement from lazy with/if/Ackermann
constraints (~2s vs ~85s without --refine-arrays).

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add a cbmc-arrays-uf-always-refine-arrays CMake test profile that
runs all CORE tests with both flags. This documents the performance
improvement from lazy with/if/Ackermann constraints.

Introduce broken-refine-arrays tag for tests with pre-existing
--refine-arrays issues (MiniSat eliminated-variable crashes):
  Multi_Dimensional_Array6, address_space_size_limit3,
  array_of_bool_as_bitvec (SMT output), Array_UF23 (constraint counts)

Total test execution time comparison:
  --arrays-uf-always:                  84s (1093 tests, 71 skipped)
  --arrays-uf-always --refine-arrays:  75s (1090 tests, 75 skipped)

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
MiniSat's SimpSolver eliminates propositional variables during
preprocessing. When --refine-arrays adds clauses incrementally in
the refinement loop, it can reference variables that were eliminated,
causing an assertion failure in addClause_.

This is the same issue that --refine-strings already works around.
Disable the simplifier when --refine-arrays is active, matching the
existing pattern for --refine-strings.

Fixes the crash in Multi_Dimensional_Array6 and address_space_size_limit3
when run with --arrays-uf-always --refine-arrays. Remove the
broken-refine-arrays tag from address_space_size_limit3 since it now
passes.

The remaining broken-refine-arrays tests (Multi_Dimensional_Array6,
Array_UF23, array_of_bool_as_bitvec) have pre-existing --refine-arrays
issues unrelated to the simplifier:
- Multi_Dimensional_Array6/Array_UF23: refinement loop does not converge
  (insufficient constraint activation in arrays_overapproximated)
- array_of_bool_as_bitvec: --refine-arrays overrides --smt2 --outfile

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Making with/if constraints lazy for --refine-arrays caused the
refinement loop to not converge: the arrays_overapproximated() check
evaluates each lazy constraint against the current SAT model, but
with/if constraints can appear satisfied by coincidence when the
solver freely assigns array element values. This led to spurious
counterexamples (Multi_Dimensional_Array6) and infinite loops
(Array_UF23).

Only Ackermann constraints are suitable for lazy refinement because
their activation depends on index equality, which is correctly
determined from the bitvector encoding in the SAT model.

The main performance benefit of --refine-arrays comes from lazy
Ackermann constraints (e.g., bounds_check1: 83s -> 2s), so keeping
with/if eager does not significantly impact the speedup.

Remove broken-refine-arrays tag from Multi_Dimensional_Array6 since
it now passes correctly.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
lower_byte_extract_array_vector expands byte_extract of an array
type into an element-by-element array_exprt. For large arrays
(e.g., char[100000] in a union), this creates N sub-expressions
that are each recursively lowered and simplified, resulting in
O(N^2) behavior.

When the array size exceeds MAX_FLATTENED_ARRAY_SIZE (1000), use
array_comprehension_exprt instead. This path already exists for
arrays with non-constant size; now it is also used for large
constant-size arrays. This reduces the lowering from O(N)
expressions to O(1).

Performance on union_large_array (char[100000] in a union):
  Before: >120s with --arrays-uf-always
  After:  2.3s with --arrays-uf-always

Remove thorough-arrays-uf-always tag from union_large_array.desc
since the test now completes in 2 seconds.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
The invariant violation for member expressions with index struct operands
was already fixed in commit db83e74d86 ("Accept member expressions with
arbitrary struct operands in array solver"). Promote the test to CORE
and update the description to reflect that this is a passing test.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
When indexing an array of structs containing array members through a
nondeterministic index (e.g., a[i].d[0]), the expression
member(index(outer_array, i), field) is an array that the array theory
treats as an opaque base array, generating no constraints for it. This
causes the bitvector solver to leave the element values unconstrained,
producing spurious counterexamples.

Fix: in convert_index, when the array operand is a member expression
whose struct operand is not a symbol/nondet_symbol and the array type
has a known finite size, bypass the array decision procedure and fall
through to the bounded-array encoding. This lets the bitvector solver
directly extract the member field bits from the struct bitvector
(which is properly constrained by the outer array's theory), then
select the element based on the index.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
… arrays

Add a KNOWNBUG test for a spurious counterexample when --arrays-uf-always
is used with an array of structs containing array members with 65+
elements, accessed through a nondeterministic index. Structs with array
members of 64 or fewer elements work correctly (covered by the
arrays-uf-always-member-soundness test).

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
When --arrays-uf-always is used, symbol arrays have two disconnected
representations in the bitvector solver:
1. Map literals (from convert_symbol) - used when the full bitvector
   is needed, e.g., in struct literals
2. Element-wise free variables (from convert_index via array theory) -
   used for individual element access

The array theory constrains the element-wise free variables (via with,
if, Ackermann constraints) but not the map literals. When a symbol
array appears inside a struct that is an element of another array, the
struct's bitvector uses the unconstrained map literals, causing
spurious counterexamples.

Fix: in boolbv_set_equality_to_true, when processing an equality for
a symbol of unbounded array type with known finite size (up to
MAX_FLATTENED_ARRAY_SIZE elements), connect the symbol's map literals
to the element-wise bitvectors by calling convert_bv on index
expressions for each element and using map.set_literals to equate
them. This ensures that the array theory's element-wise constraints
propagate to the map literals used in struct contexts.

The equality is still passed to the array theory for element-wise
constraint generation.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
Add TIMEOUT 1200 to the set_tests_properties call in both copies of the
add_test_pl_profile macro (regression/CMakeLists.txt and
regression/libcprover-cpp/CMakeLists.txt). This ensures CTest will kill
and report as failed any regression test that exceeds 20 minutes,
preventing CI jobs from hanging indefinitely on tests that time out.

Co-authored-by: Kiro (autonomous agent) <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig self-assigned this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant