Skip to content

Use extractbits to extract non-byte-fields from unions#7861

Open
tautschnig wants to merge 1 commit intodiffblue:developfrom
tautschnig:bugfixes/union-bit-fields
Open

Use extractbits to extract non-byte-fields from unions#7861
tautschnig wants to merge 1 commit intodiffblue:developfrom
tautschnig:bugfixes/union-bit-fields

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

Do not use byte_extract to extract members that aren't multiples of bytes. Instead, use extractbits in that case.

Issue was found by CSmith (test generated with random seed 1692379469), regression test generated via C-Reduce.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig marked this pull request as draft August 22, 2023 08:01
@tautschnig tautschnig force-pushed the bugfixes/union-bit-fields branch from adff789 to 513f15a Compare March 26, 2026 18:52
@tautschnig tautschnig marked this pull request as ready for review March 27, 2026 14:01
Copilot AI review requested due to automatic review settings March 27, 2026 14:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes union-member extraction for C bit-fields whose widths aren’t a multiple of a byte by switching from byte_extract to extractbits in those cases, and adds a regression test reproducing the reported failure.

Changes:

  • Use extractbits_exprt (instead of make_byte_extract) when extracting non-byte-multiple ID_c_bit_field union members.
  • Improve debug type formatting for ID_c_bit_field.
  • Add a reduced CSmith-derived regression under regression/cbmc/Bitfields6.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/util/format_type.cpp Adds formatting for c_bit_field types.
src/goto-symex/field_sensitivity.cpp Uses extractbits for non-byte-sized bit-field union component extraction during field-sensitive assignments.
src/goto-programs/rewrite_union.cpp Rewrites union member access to use extractbits for non-byte-sized bit-fields (and keeps byte_extract otherwise).
regression/cbmc/Bitfields6/test.desc New regression test descriptor expecting verification failure output/exit code.
regression/cbmc/Bitfields6/main.c Reduced reproducer featuring a union with non-byte-sized bit-fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +489 to +493
if(
comp.type().id() != ID_c_bit_field ||
to_c_bit_field_type(comp.type()).get_width() %
config.ansi_c.char_width ==
0)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches union component extraction from make_byte_extract (endianness-aware) to extractbits_exprt (bit indices count from the least-significant bit and are not affected by endianness). For big-endian targets this can change the meaning/bit ordering of extracted c_bit_field components compared to a direct union member access. Consider an approach that preserves endianness (e.g., byte-based extraction followed by bit slicing, or an explicit endianness mapping).

Copilot uses AI. Check for mistakes.
return os << format(mathematical_function.codomain());
}
else if(id == ID_c_bit_field)
return os << "c_bit_field[" << to_c_bit_field_type(type).get_width() << ']';
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format_type prints c_bit_field as just c_bit_field[<width>], which drops the underlying type (signedness) even though c_bit_field_typet carries it. Including the underlying_type in the formatted output would make diagnostics/debug logs less ambiguous.

Suggested change
return os << "c_bit_field[" << to_c_bit_field_type(type).get_width() << ']';
{
const auto &c_bit_field_type = to_c_bit_field_type(type);
return os << "c_bit_field " << format(c_bit_field_type.underlying_type())
<< '[' << c_bit_field_type.get_width() << ']';
}

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +94
if(
expr.type().id() != ID_c_bit_field ||
to_c_bit_field_type(expr.type()).get_width() %
config.ansi_c.char_width ==
0)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching union member rewrites from make_byte_extract to extractbits_exprt changes endianness handling: make_byte_extract is endianness-aware, while extractbits indices are defined from the least-significant bit and are not affected by endianness. On big-endian configurations this is likely to produce different bit ordering than a union member access / the previous rewrite, so the rewritten expression may be semantically wrong. Consider preserving endianness by extracting via a byte-based operation and then slicing/adjusting bits, or otherwise applying the appropriate endianness mapping for the extracted c_bit_field.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +103
expr =
extractbits_exprt(op, from_integer(0, c_index_type()), expr.type());
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite_union now rewrites some union member accesses to extractbits_exprt, but restore_union_rec only undoes byte_extract_{le,be}. This means counterexample/user-facing expression output may no longer be restored back to u.member for non-byte-sized bitfields. Consider extending restore_union_rec (and updating the function comment) to also recognize the extractbits-based rewrite pattern.

Copilot uses AI. Check for mistakes.
Do not use byte_extract to extract members that aren't multiples of
bytes. Instead, use extractbits in that case.

Issue was found by CSmith (test generated with random seed 1692379469),
regression test generated via C-Reduce.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig force-pushed the bugfixes/union-bit-fields branch from 513f15a to 0d2d0da Compare March 30, 2026 16:30
@tautschnig tautschnig requested a review from martin-cs as a code owner March 30, 2026 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants