Skip to content

Test for-next (regular, SELF kvm)#1624

Open
kdave wants to merge 10000 commits into
ci-kvmfrom
for-next
Open

Test for-next (regular, SELF kvm)#1624
kdave wants to merge 10000 commits into
ci-kvmfrom
for-next

Conversation

@kdave
Copy link
Copy Markdown
Member

@kdave kdave commented Mar 5, 2026

Keep this open, the build tests are on self-hosted workers.

@kdave
Copy link
Copy Markdown
Member Author

kdave commented Mar 5, 2026

Re #1623 .

@kdave kdave changed the title Test for-next (regular, GH kvm) 2 Test for-next (regular, GH kvm) Mar 5, 2026
@kdave kdave closed this Mar 5, 2026
@kdave kdave reopened this Mar 5, 2026
@kdave kdave force-pushed the ci-kvm branch 2 times, most recently from 69fc6c9 to 98bf7e7 Compare March 5, 2026 23:30
@kdave kdave force-pushed the for-next branch 3 times, most recently from 934d926 to 0cb5a8a Compare March 13, 2026 11:46
@kdave kdave force-pushed the ci-kvm branch 3 times, most recently from 2cd3911 to c09d7cb Compare March 13, 2026 19:11
@kdave kdave force-pushed the for-next branch 8 times, most recently from c61e262 to daed989 Compare March 17, 2026 16:00
@kdave kdave force-pushed the for-next branch 4 times, most recently from 9dc51d6 to d76ed94 Compare March 19, 2026 13:21
fdmanana and others added 30 commits May 24, 2026 03:01
…ted_trans()

We are not committing a transaction there, plus in subsequent patches we
want to change the argument for the trace event to be a transaction handle
instead of fs_info and in this context we don't have a transaction handle
(struct btrfs_trans_handle, only a struct btrfs_transaction). So remove the
call to the trace point.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…action()

We are not committing a transaction there, plus in subsequent patches we
want to change the argument for the trace event to be a transaction handle
instead of fs_info and in this context we don't have a transaction handle
(struct btrfs_trans_handle, only a struct btrfs_transaction). So remove the
call to the trace point.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…vent

The transaction commit tracepoint prints fs_info->generation as if it
were the ID of the committed transaction but this does not always match
that ID. This is because the trace point is called in the transaction
commit path after the transaction is in the TRANS_STATE_COMPLETED state,
which means another transaction may have already started (which can happen
as soon as the transaction state was set to TRANS_STATE_UNBLOCKED), in
which case fs_info->generation was incremented and does not correspond
to the committed transaction anymore.

So fix this by passing a transaction handle to the trace event instead of
fs_info. This will also allow later for the trace event to dump other
useful information about the transaction.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Include the in_fsync value from the transaction handle so that we can know
if a transaction commit was triggered by a fsync call.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While tracing it's useful to know not just when a transaction is committed
but also when one is aborted. So add a trace event for transaction aborts.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While tracing it's useful to know not just when a transaction is committed
or aborted, but also when a new one is started. So add a trace event for
transaction starts.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently the trace event is fired only when a transaction is fully
complete (its state is TRANS_STATE_COMPLETED). However during a
transaction commit we go through several states and as soon as the
state reaches TRANS_STATE_UNBLOCKED, another transaction can start.
Therefore it's useful to track every transaction state changed during
the commit of a transaction, so that we can see if a new transaction
is started before the current one is completed. Add the transaction
state to the transaction commit event and call the event everytime
we change the transaction state during commit.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…c_file()

The value of 'ret' can never be greater than zero when we reach the end of
btrfs_sync_file() but we have this ternary operator converting any such
value into -EIO. This logic exists since the first fsync implementation,
added in 2007 by commit 8fd1779 ("Btrfs: early fsync support"), when
all that fsync did was simply to commit a transaction, but even a call to
btrfs_commit_transaction() could never return a value greater than zero.

So stop checking for a greater than zero value and assert that 'ret' is
never greater than zero, to catch any eventual regression during future
development.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we can skip logging the inode during fsync, we check for writeback
errors in the inode's mapping by calling filemap_check_wb_err() and then
jump to the 'out_release_extents' label, which in turn jumps to the 'out'
label under which we check again for a writeback error by calling
file_check_and_advance_wb_err(). So the filemap_check_wb_err() ends up
being redundant. This happens since commit 333427a ("btrfs: minimal
conversion to errseq_t writeback error reporting on fsync").

Remove the filemap_check_wb_err() call.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we only have a trace event for when a fsync operation starts,
but this alone is not very helpful. Add a trace event for when fsync
finishes, which reports its return value, so that using tracing we can
see which other trace events happened in between (several will be added
soon for inode logging steps) and even measure execution time.

So rename the existing trace event btrfs_sync_file to
btrfs_sync_file_enter and add the trace event btrfs_sync_file_exit.
The naming is similar to what ext4 does (ext4_sync_file_enter and
ext4_sync_file_exit) and with similar information reported.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_inode_parent() is one of the most important steps called during
a fsync operation as well as during rename and link operations on inodes
that were previously logged. Add trace events for when entering and
exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We use this unnamed enum for the log mode and then pass it around log
functions as an int type with the odd name "inode_only" which suggests a
boolean. So add a name to the enum and change the type everywhere to that
enum and rename the parameters to something more clear - "log_mode".
Also move the enum into tree-log.h - it will be used later by new trace
events.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_inode() is one of the most important steps called during a fsync,
as well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_all_parents() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
log_all_new_ancestors() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
log_new_dir_dentries() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
add_conflicting_inode() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
log_conflicting_inodes() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…play

In overwrite_item():

  There's no point in printing the root's ID if the assertion fails, since
  it can only be BTRFS_TREE_LOG_OBJECTID if it fails.

In log_new_delayed_dentries():

  There's no point in using a verbose assertion to print the value of
  ctx->logging_new_delayed_dentries because it's a boolean, so if the
  assertion fails we know its value is true (1).

So convert them to simpler assertion to make the code less verbose.
It also slightly reduces the object size, at least on x86_64 using
Debian's gcc 14.2.0-19 (if CONFIG_BTRFS_ASSERT is enabled in the kernel
config, which is the case for SUSE distributions for example).

Before:

  $ size fs/btrfs/btrfs.ko
     text	   data	    bss	    dec	    hex	filename
  2028244	 197176	  15624	2241044	 223214	fs/btrfs/btrfs.ko

After:

  $ size fs/btrfs/btrfs.ko
     text	   data	    bss	    dec	    hex	filename
  2028228	 197176	  15624	2241028	 223204	fs/btrfs/btrfs.ko

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
log_new_delayed_dentries() is an important step called during a fsync, as
well as during rename and link operations on inodes that were previously
logged. Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_unlink_dir() is an important operation that affects inode
logging and is called during unlink and rename operations. Add a trace
event for it to help debug issues.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_snapshot_destroy() is an important operation that affects
inode logging and is called during subvolume/snapshot deletion as well as
during rmdir. Add a trace event for it to help debug issues.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_new_subvolume() is an important operation that affects
inode logging and is called during subvolume creation. Add a trace event
for it to help debug issues.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_new_name() is an important function that affects inode logging
and is called during link and rename operations. Add trace events for when
entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_sync_log() is one of the main functions called during a fsync.
Add trace events for when entering and exiting that function.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Print the type of the inode (directory, regular file, symlink, etc) to
facilitate debugging.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
sb_write_pointer() reads the super block from the block device page cache
using read_cache_page_gfp(). This has the same race with BLKBSZSET as the
one fixed by commit 3f29d66 ("btrfs: sync read disk super and set
block size").

Take the mapping invalidate lock around read_cache_page_gfp() to
serialize the read against block size changes.

Signed-off-by: KangNing Liao <lkangn.kernel@gmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the beginning of the loop, we try to obtain a locked delayed ref head,
if 'locked_ref' is currently NULL, by calling btrfs_select_ref_head(),
which can return an error pointer. If the error pointer is -EAGAIN we do
a continue and go back to the beginning of the loop, which will not try
again to call btrfs_select_ref_head() since 'locked_ref' is no longer
NULL but it's ERR_PTR(-EAGAIN), and then we do:

   spin_lock(&locked_ref->lock);

against a ERR_PTR(-EAGAIN) value, generating an invalid pointer
dereference.

Fix this by ensuring that 'locked_ref' is set to NULL when
btrfs_select_ref_head() returns ERR_PTR(-EAGAIN) and incrementing 'count'
as well, to prevent infinite looping. We do this by doing a goto to the
bottom of the loop that already sets 'locked_ref' to NULL and does a
cond_resched(), with an increment to 'count' right before the goto.
These measures were in place before the refactoring in commit 0110a4c
("btrfs: refactor __btrfs_run_delayed_refs loop") but were unintentionally
lost afterwards.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/ag8ARRwykv8bpJ87@stanley.mountain/
Fixes: 0110a4c ("btrfs: refactor __btrfs_run_delayed_refs loop")
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…on()

In preparation to encode more information to the error value add a step
that verifies if the value is valid (i.e. < 0). This works for
compile-time and runtime (in debugging mode).

The compile-time check recognizes direct constants and defines an array
type. An invalid condition leads to negative array size which is caught
by compiler.

The runtime check constructs the array type from the condition and only
verifies the correct size, as we don't need to tweak the size to be
negative.

The sizeof() expressions do not generate any code. In the debugging
config the warning adds about 9KiB of btrfs.ko code size.

The array size trick is needed as we can't use static_array(), not even
with __builtin_constant_p().

Sample error message:

In file included from inode.c:40:
inode.c: In function ‘__cow_file_range_inline’:
transaction.h:261:26: error: size of unnamed array is negative
  261 |         (void)sizeof(char[-!(__builtin_constant_p(error) ? (error) < 0 : 1)]);  \
      |                          ^
transaction.h:275:9: note: in expansion of macro ‘VERIFY_NEGATIVE_ERROR’
  275 |         VERIFY_NEGATIVE_ERROR(error);                           \
      |         ^~~~~~~~~~~~~~~~~~~~~
inode.c:665:17: note: in expansion of macro ‘btrfs_abort_transaction’
  665 |                 btrfs_abort_transaction(trans, 17);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: David Sterba <dsterba@suse.com>
Optimize the btrfs_abort_transaction() for size as it (by our
convention) must be put right after the error condition is detected.
The exact file:line is reported so there's a portion that must be
inlined. As this is cold code it bloats functions. In previous patch
"btrfs: move transaction abort message to __btrfs_abort_transaction()"
the error message was moved to the common helper, saving like 20KiB of
btrfs.ko and several instructions per call site and some stack space.

There's little left to be optimized, we need to keep the atomic
test_and_set_bit() and to convey that as 'first hit' to
__btrfs_abort_transaction().

Right now it's a bool, which takes 8 bytes on stack for each call but
it's 1 bit of information. We can encode that to some of the other
parameters.

For that let's use the 'error' parameter, by convention it's negative
errno so we can reliably detect if it's the first hit or a later error.
Also the negation is usually implemented by a single instruction (NEG on
x86_64) so the resulting object code is kept short.

This reduces btrfs.ko by 8K and stack in several functions by 8 bytes.

Cumulative effect with the other commit is -30K of btrfs.ko. While the
encoding is an implementation detail, it's contained within the API.
Making the transaction abort calls very light is desired.

Signed-off-by: David Sterba <dsterba@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.