Conversation
Member
Author
|
Re #1623 . |
69fc6c9 to
98bf7e7
Compare
934d926 to
0cb5a8a
Compare
2cd3911 to
c09d7cb
Compare
c61e262 to
daed989
Compare
9dc51d6 to
d76ed94
Compare
…ted_trans() We are not committing a transaction there, plus in subsequent patches we want to change the argument for the trace event to be a transaction handle instead of fs_info and in this context we don't have a transaction handle (struct btrfs_trans_handle, only a struct btrfs_transaction). So remove the call to the trace point. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…action() We are not committing a transaction there, plus in subsequent patches we want to change the argument for the trace event to be a transaction handle instead of fs_info and in this context we don't have a transaction handle (struct btrfs_trans_handle, only a struct btrfs_transaction). So remove the call to the trace point. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…vent The transaction commit tracepoint prints fs_info->generation as if it were the ID of the committed transaction but this does not always match that ID. This is because the trace point is called in the transaction commit path after the transaction is in the TRANS_STATE_COMPLETED state, which means another transaction may have already started (which can happen as soon as the transaction state was set to TRANS_STATE_UNBLOCKED), in which case fs_info->generation was incremented and does not correspond to the committed transaction anymore. So fix this by passing a transaction handle to the trace event instead of fs_info. This will also allow later for the trace event to dump other useful information about the transaction. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Include the in_fsync value from the transaction handle so that we can know if a transaction commit was triggered by a fsync call. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
While tracing it's useful to know not just when a transaction is committed but also when one is aborted. So add a trace event for transaction aborts. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
While tracing it's useful to know not just when a transaction is committed or aborted, but also when a new one is started. So add a trace event for transaction starts. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Currently the trace event is fired only when a transaction is fully complete (its state is TRANS_STATE_COMPLETED). However during a transaction commit we go through several states and as soon as the state reaches TRANS_STATE_UNBLOCKED, another transaction can start. Therefore it's useful to track every transaction state changed during the commit of a transaction, so that we can see if a new transaction is started before the current one is completed. Add the transaction state to the transaction commit event and call the event everytime we change the transaction state during commit. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…c_file() The value of 'ret' can never be greater than zero when we reach the end of btrfs_sync_file() but we have this ternary operator converting any such value into -EIO. This logic exists since the first fsync implementation, added in 2007 by commit 8fd1779 ("Btrfs: early fsync support"), when all that fsync did was simply to commit a transaction, but even a call to btrfs_commit_transaction() could never return a value greater than zero. So stop checking for a greater than zero value and assert that 'ret' is never greater than zero, to catch any eventual regression during future development. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
If we can skip logging the inode during fsync, we check for writeback errors in the inode's mapping by calling filemap_check_wb_err() and then jump to the 'out_release_extents' label, which in turn jumps to the 'out' label under which we check again for a writeback error by calling file_check_and_advance_wb_err(). So the filemap_check_wb_err() ends up being redundant. This happens since commit 333427a ("btrfs: minimal conversion to errseq_t writeback error reporting on fsync"). Remove the filemap_check_wb_err() call. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Currently we only have a trace event for when a fsync operation starts, but this alone is not very helpful. Add a trace event for when fsync finishes, which reports its return value, so that using tracing we can see which other trace events happened in between (several will be added soon for inode logging steps) and even measure execution time. So rename the existing trace event btrfs_sync_file to btrfs_sync_file_enter and add the trace event btrfs_sync_file_exit. The naming is similar to what ext4 does (ext4_sync_file_enter and ext4_sync_file_exit) and with similar information reported. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_inode_parent() is one of the most important steps called during a fsync operation as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
We use this unnamed enum for the log mode and then pass it around log functions as an int type with the odd name "inode_only" which suggests a boolean. So add a name to the enum and change the type everywhere to that enum and rename the parameters to something more clear - "log_mode". Also move the enum into tree-log.h - it will be used later by new trace events. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_inode() is one of the most important steps called during a fsync, as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_all_parents() is an important step called during a fsync, as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
log_all_new_ancestors() is an important step called during a fsync, as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
log_new_dir_dentries() is an important step called during a fsync, as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
add_conflicting_inode() is an important step called during a fsync, as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
log_conflicting_inodes() is an important step called during a fsync, as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…play
In overwrite_item():
There's no point in printing the root's ID if the assertion fails, since
it can only be BTRFS_TREE_LOG_OBJECTID if it fails.
In log_new_delayed_dentries():
There's no point in using a verbose assertion to print the value of
ctx->logging_new_delayed_dentries because it's a boolean, so if the
assertion fails we know its value is true (1).
So convert them to simpler assertion to make the code less verbose.
It also slightly reduces the object size, at least on x86_64 using
Debian's gcc 14.2.0-19 (if CONFIG_BTRFS_ASSERT is enabled in the kernel
config, which is the case for SUSE distributions for example).
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
2028244 197176 15624 2241044 223214 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
2028228 197176 15624 2241028 223204 fs/btrfs/btrfs.ko
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
log_new_delayed_dentries() is an important step called during a fsync, as well as during rename and link operations on inodes that were previously logged. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_unlink_dir() is an important operation that affects inode logging and is called during unlink and rename operations. Add a trace event for it to help debug issues. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_snapshot_destroy() is an important operation that affects inode logging and is called during subvolume/snapshot deletion as well as during rmdir. Add a trace event for it to help debug issues. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_new_subvolume() is an important operation that affects inode logging and is called during subvolume creation. Add a trace event for it to help debug issues. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_log_new_name() is an important function that affects inode logging and is called during link and rename operations. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_sync_log() is one of the main functions called during a fsync. Add trace events for when entering and exiting that function. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Print the type of the inode (directory, regular file, symlink, etc) to facilitate debugging. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
sb_write_pointer() reads the super block from the block device page cache using read_cache_page_gfp(). This has the same race with BLKBSZSET as the one fixed by commit 3f29d66 ("btrfs: sync read disk super and set block size"). Take the mapping invalidate lock around read_cache_page_gfp() to serialize the read against block size changes. Signed-off-by: KangNing Liao <lkangn.kernel@gmail.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
In the beginning of the loop, we try to obtain a locked delayed ref head, if 'locked_ref' is currently NULL, by calling btrfs_select_ref_head(), which can return an error pointer. If the error pointer is -EAGAIN we do a continue and go back to the beginning of the loop, which will not try again to call btrfs_select_ref_head() since 'locked_ref' is no longer NULL but it's ERR_PTR(-EAGAIN), and then we do: spin_lock(&locked_ref->lock); against a ERR_PTR(-EAGAIN) value, generating an invalid pointer dereference. Fix this by ensuring that 'locked_ref' is set to NULL when btrfs_select_ref_head() returns ERR_PTR(-EAGAIN) and incrementing 'count' as well, to prevent infinite looping. We do this by doing a goto to the bottom of the loop that already sets 'locked_ref' to NULL and does a cond_resched(), with an increment to 'count' right before the goto. These measures were in place before the refactoring in commit 0110a4c ("btrfs: refactor __btrfs_run_delayed_refs loop") but were unintentionally lost afterwards. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/linux-btrfs/ag8ARRwykv8bpJ87@stanley.mountain/ Fixes: 0110a4c ("btrfs: refactor __btrfs_run_delayed_refs loop") Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…on()
In preparation to encode more information to the error value add a step
that verifies if the value is valid (i.e. < 0). This works for
compile-time and runtime (in debugging mode).
The compile-time check recognizes direct constants and defines an array
type. An invalid condition leads to negative array size which is caught
by compiler.
The runtime check constructs the array type from the condition and only
verifies the correct size, as we don't need to tweak the size to be
negative.
The sizeof() expressions do not generate any code. In the debugging
config the warning adds about 9KiB of btrfs.ko code size.
The array size trick is needed as we can't use static_array(), not even
with __builtin_constant_p().
Sample error message:
In file included from inode.c:40:
inode.c: In function ‘__cow_file_range_inline’:
transaction.h:261:26: error: size of unnamed array is negative
261 | (void)sizeof(char[-!(__builtin_constant_p(error) ? (error) < 0 : 1)]); \
| ^
transaction.h:275:9: note: in expansion of macro ‘VERIFY_NEGATIVE_ERROR’
275 | VERIFY_NEGATIVE_ERROR(error); \
| ^~~~~~~~~~~~~~~~~~~~~
inode.c:665:17: note: in expansion of macro ‘btrfs_abort_transaction’
665 | btrfs_abort_transaction(trans, 17);
| ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: David Sterba <dsterba@suse.com>
Optimize the btrfs_abort_transaction() for size as it (by our convention) must be put right after the error condition is detected. The exact file:line is reported so there's a portion that must be inlined. As this is cold code it bloats functions. In previous patch "btrfs: move transaction abort message to __btrfs_abort_transaction()" the error message was moved to the common helper, saving like 20KiB of btrfs.ko and several instructions per call site and some stack space. There's little left to be optimized, we need to keep the atomic test_and_set_bit() and to convey that as 'first hit' to __btrfs_abort_transaction(). Right now it's a bool, which takes 8 bytes on stack for each call but it's 1 bit of information. We can encode that to some of the other parameters. For that let's use the 'error' parameter, by convention it's negative errno so we can reliably detect if it's the first hit or a later error. Also the negation is usually implemented by a single instruction (NEG on x86_64) so the resulting object code is kept short. This reduces btrfs.ko by 8K and stack in several functions by 8 bytes. Cumulative effect with the other commit is -30K of btrfs.ko. While the encoding is an implementation detail, it's contained within the API. Making the transaction abort calls very light is desired. Signed-off-by: David Sterba <dsterba@suse.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Keep this open, the build tests are on self-hosted workers.