Commit dbc2a8c9 authored by Dennis Zhou's avatar Dennis Zhou Committed by David Sterba
Browse files

btrfs: add async discard implementation overview



Give a brief overview for how async discard is implemented.

Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
parent 9ddf648f
Loading
Loading
Loading
Loading
+39 −0
Original line number Diff line number Diff line
@@ -12,6 +12,45 @@
#include "discard.h"
#include "free-space-cache.h"

/*
 * This contains the logic to handle async discard.
 *
 * Async discard manages trimming of free space outside of transaction commit.
 * Discarding is done by managing the block_groups on a LRU list based on free
 * space recency.  Two passes are used to first prioritize discarding extents
 * and then allow for trimming in the bitmap the best opportunity to coalesce.
 * The block_groups are maintained on multiple lists to allow for multiple
 * passes with different discard filter requirements.  A delayed work item is
 * used to manage discarding with timeout determined by a max of the delay
 * incurred by the iops rate limit, the byte rate limit, and the max delay of
 * BTRFS_DISCARD_MAX_DELAY.
 *
 * Note, this only keeps track of block_groups that are explicitly for data.
 * Mixed block_groups are not supported.
 *
 * The first list is special to manage discarding of fully free block groups.
 * This is necessary because we issue a final trim for a full free block group
 * after forgetting it.  When a block group becomes unused, instead of directly
 * being added to the unused_bgs list, we add it to this first list.  Then
 * from there, if it becomes fully discarded, we place it onto the unused_bgs
 * list.
 *
 * The in-memory free space cache serves as the backing state for discard.
 * Consequently this means there is no persistence.  We opt to load all the
 * block groups in as not discarded, so the mount case degenerates to the
 * crashing case.
 *
 * As the free space cache uses bitmaps, there exists a tradeoff between
 * ease/efficiency for find_free_extent() and the accuracy of discard state.
 * Here we opt to let untrimmed regions merge with everything while only letting
 * trimmed regions merge with other trimmed regions.  This can cause
 * overtrimming, but the coalescing benefit seems to be worth it.  Additionally,
 * bitmap state is tracked as a whole.  If we're able to fully trim a bitmap,
 * the trimmed flag is set on the bitmap.  Otherwise, if an allocation comes in,
 * this resets the state and we will retry trimming the whole bitmap.  This is a
 * tradeoff between discard state accuracy and the cost of accounting.
 */

/* This is an initial delay to give some chance for block reuse */
#define BTRFS_DISCARD_DELAY		(120ULL * NSEC_PER_SEC)
#define BTRFS_DISCARD_UNUSED_DELAY	(10ULL * NSEC_PER_SEC)