Use extractbits to extract non-byte-fields from unions#7861
Use extractbits to extract non-byte-fields from unions#7861tautschnig wants to merge 1 commit intodiffblue:developfrom
Conversation
adff789 to
513f15a
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes union-member extraction for C bit-fields whose widths aren’t a multiple of a byte by switching from byte_extract to extractbits in those cases, and adds a regression test reproducing the reported failure.
Changes:
- Use
extractbits_exprt(instead ofmake_byte_extract) when extracting non-byte-multipleID_c_bit_fieldunion members. - Improve debug type formatting for
ID_c_bit_field. - Add a reduced CSmith-derived regression under
regression/cbmc/Bitfields6.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/util/format_type.cpp | Adds formatting for c_bit_field types. |
| src/goto-symex/field_sensitivity.cpp | Uses extractbits for non-byte-sized bit-field union component extraction during field-sensitive assignments. |
| src/goto-programs/rewrite_union.cpp | Rewrites union member access to use extractbits for non-byte-sized bit-fields (and keeps byte_extract otherwise). |
| regression/cbmc/Bitfields6/test.desc | New regression test descriptor expecting verification failure output/exit code. |
| regression/cbmc/Bitfields6/main.c | Reduced reproducer featuring a union with non-byte-sized bit-fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if( | ||
| comp.type().id() != ID_c_bit_field || | ||
| to_c_bit_field_type(comp.type()).get_width() % | ||
| config.ansi_c.char_width == | ||
| 0) |
There was a problem hiding this comment.
This switches union component extraction from make_byte_extract (endianness-aware) to extractbits_exprt (bit indices count from the least-significant bit and are not affected by endianness). For big-endian targets this can change the meaning/bit ordering of extracted c_bit_field components compared to a direct union member access. Consider an approach that preserves endianness (e.g., byte-based extraction followed by bit slicing, or an explicit endianness mapping).
| return os << format(mathematical_function.codomain()); | ||
| } | ||
| else if(id == ID_c_bit_field) | ||
| return os << "c_bit_field[" << to_c_bit_field_type(type).get_width() << ']'; |
There was a problem hiding this comment.
format_type prints c_bit_field as just c_bit_field[<width>], which drops the underlying type (signedness) even though c_bit_field_typet carries it. Including the underlying_type in the formatted output would make diagnostics/debug logs less ambiguous.
| return os << "c_bit_field[" << to_c_bit_field_type(type).get_width() << ']'; | |
| { | |
| const auto &c_bit_field_type = to_c_bit_field_type(type); | |
| return os << "c_bit_field " << format(c_bit_field_type.underlying_type()) | |
| << '[' << c_bit_field_type.get_width() << ']'; | |
| } |
| if( | ||
| expr.type().id() != ID_c_bit_field || | ||
| to_c_bit_field_type(expr.type()).get_width() % | ||
| config.ansi_c.char_width == | ||
| 0) |
There was a problem hiding this comment.
Switching union member rewrites from make_byte_extract to extractbits_exprt changes endianness handling: make_byte_extract is endianness-aware, while extractbits indices are defined from the least-significant bit and are not affected by endianness. On big-endian configurations this is likely to produce different bit ordering than a union member access / the previous rewrite, so the rewritten expression may be semantically wrong. Consider preserving endianness by extracting via a byte-based operation and then slicing/adjusting bits, or otherwise applying the appropriate endianness mapping for the extracted c_bit_field.
src/goto-programs/rewrite_union.cpp
Outdated
| expr = | ||
| extractbits_exprt(op, from_integer(0, c_index_type()), expr.type()); | ||
| } |
There was a problem hiding this comment.
rewrite_union now rewrites some union member accesses to extractbits_exprt, but restore_union_rec only undoes byte_extract_{le,be}. This means counterexample/user-facing expression output may no longer be restored back to u.member for non-byte-sized bitfields. Consider extending restore_union_rec (and updating the function comment) to also recognize the extractbits-based rewrite pattern.
Do not use byte_extract to extract members that aren't multiples of bytes. Instead, use extractbits in that case. Issue was found by CSmith (test generated with random seed 1692379469), regression test generated via C-Reduce. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
513f15a to
0d2d0da
Compare
Do not use byte_extract to extract members that aren't multiples of bytes. Instead, use extractbits in that case.
Issue was found by CSmith (test generated with random seed 1692379469), regression test generated via C-Reduce.