Merge branch 'master' (2ff47823) · Commits · 戴 / test

Documentation/block/barrier.txt

0 → 100644

+271 −0

Original line number	Diff line number	Diff line
		I/O Barriers
		============
		Tejun Heo <htejun@gmail.com>, July 22 2005

		I/O barrier requests are used to guarantee ordering around the barrier
		requests. Unless you're crazy enough to use disk drives for
		implementing synchronization constructs (wow, sounds interesting...),
		the ordering is meaningful only for write requests for things like
		journal checkpoints. All requests queued before a barrier request
		must be finished (made it to the physical medium) before the barrier
		request is started, and all requests queued after the barrier request
		must be started only after the barrier request is finished (again,
		made it to the physical medium).

		In other words, I/O barrier requests have the following two properties.

		1. Request ordering

		Requests cannot pass the barrier request. Preceding requests are
		processed before the barrier and following requests after.

		Depending on what features a drive supports, this can be done in one
		of the following three ways.

		i. For devices which have queue depth greater than 1 (TCQ devices) and
		support ordered tags, block layer can just issue the barrier as an
		ordered request and the lower level driver, controller and drive
		itself are responsible for making sure that the ordering contraint is
		met. Most modern SCSI controllers/drives should support this.

		NOTE: SCSI ordered tag isn't currently used due to limitation in the
		SCSI midlayer, see the following random notes section.

		ii. For devices which have queue depth greater than 1 but don't
		support ordered tags, block layer ensures that the requests preceding
		a barrier request finishes before issuing the barrier request. Also,
		it defers requests following the barrier until the barrier request is
		finished. Older SCSI controllers/drives and SATA drives fall in this
		category.

		iii. Devices which have queue depth of 1. This is a degenerate case
		of ii. Just keeping issue order suffices. Ancient SCSI
		controllers/drives and IDE drives are in this category.

		2. Forced flushing to physcial medium

		Again, if you're not gonna do synchronization with disk drives (dang,
		it sounds even more appealing now!), the reason you use I/O barriers
		is mainly to protect filesystem integrity when power failure or some
		other events abruptly stop the drive from operating and possibly make
		the drive lose data in its cache. So, I/O barriers need to guarantee
		that requests actually get written to non-volatile medium in order.

		There are four cases,

		i. No write-back cache. Keeping requests ordered is enough.

		ii. Write-back cache but no flush operation. There's no way to
		gurantee physical-medium commit order. This kind of devices can't to
		I/O barriers.

		iii. Write-back cache and flush operation but no FUA (forced unit
		access). We need two cache flushes - before and after the barrier
		request.

		iv. Write-back cache, flush operation and FUA. We still need one
		flush to make sure requests preceding a barrier are written to medium,
		but post-barrier flush can be avoided by using FUA write on the
		barrier itself.


		How to support barrier requests in drivers
		------------------------------------------

		All barrier handling is done inside block layer proper. All low level
		drivers have to are implementing its prepare_flush_fn and using one
		the following two functions to indicate what barrier type it supports
		and how to prepare flush requests. Note that the term 'ordered' is
		used to indicate the whole sequence of performing barrier requests
		including draining and flushing.

		typedef void (prepare_flush_fn)(request_queue_t q, struct request rq);

		int blk_queue_ordered(request_queue_t *q, unsigned ordered,
		prepare_flush_fn *prepare_flush_fn,
		unsigned gfp_mask);

		int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
		prepare_flush_fn *prepare_flush_fn,
		unsigned gfp_mask);

		The only difference between the two functions is whether or not the
		caller is holding q->queue_lock on entry. The latter expects the
		caller is holding the lock.

		@q : the queue in question
		@ordered : the ordered mode the driver/device supports
		@prepare_flush_fn : this function should prepare @rq such that it
		flushes cache to physical medium when executed
		@gfp_mask : gfp_mask used when allocating data structures
		for ordered processing

		For example, SCSI disk driver's prepare_flush_fn looks like the
		following.

		static void sd_prepare_flush(request_queue_t q, struct request rq)
		{
		memset(rq->cmd, 0, sizeof(rq->cmd));
		rq->flags \|= REQ_BLOCK_PC;
		rq->timeout = SD_TIMEOUT;
		rq->cmd[0] = SYNCHRONIZE_CACHE;
		}

		The following seven ordered modes are supported. The following table
		shows which mode should be used depending on what features a
		device/driver supports. In the leftmost column of table,
		QUEUE_ORDERED_ prefix is omitted from the mode names to save space.

		The table is followed by description of each mode. Note that in the
		descriptions of QUEUE_ORDERED_DRAIN*, '=>' is used whereas '->' is
		used for QUEUE_ORDERED_TAG* descriptions. '=>' indicates that the
		preceding step must be complete before proceeding to the next step.
		'->' indicates that the next step can start as soon as the previous
		step is issued.

		write-back cache ordered tag flush FUA
		-----------------------------------------------------------------------
		NONE yes/no N/A no N/A
		DRAIN no no N/A N/A
		DRAIN_FLUSH yes no yes no
		DRAIN_FUA yes no yes yes
		TAG no yes N/A N/A
		TAG_FLUSH yes yes yes no
		TAG_FUA yes yes yes yes


		QUEUE_ORDERED_NONE
		I/O barriers are not needed and/or supported.

		Sequence: N/A

		QUEUE_ORDERED_DRAIN
		Requests are ordered by draining the request queue and cache
		flushing isn't needed.

		Sequence: drain => barrier

		QUEUE_ORDERED_DRAIN_FLUSH
		Requests are ordered by draining the request queue and both
		pre-barrier and post-barrier cache flushings are needed.

		Sequence: drain => preflush => barrier => postflush

		QUEUE_ORDERED_DRAIN_FUA
		Requests are ordered by draining the request queue and
		pre-barrier cache flushing is needed. By using FUA on barrier
		request, post-barrier flushing can be skipped.

		Sequence: drain => preflush => barrier

		QUEUE_ORDERED_TAG
		Requests are ordered by ordered tag and cache flushing isn't
		needed.

		Sequence: barrier

		QUEUE_ORDERED_TAG_FLUSH
		Requests are ordered by ordered tag and both pre-barrier and
		post-barrier cache flushings are needed.

		Sequence: preflush -> barrier -> postflush

		QUEUE_ORDERED_TAG_FUA
		Requests are ordered by ordered tag and pre-barrier cache
		flushing is needed. By using FUA on barrier request,
		post-barrier flushing can be skipped.

		Sequence: preflush -> barrier


		Random notes/caveats
		--------------------

		* SCSI layer currently can't use TAG ordering even if the drive,
		controller and driver support it. The problem is that SCSI midlayer
		request dispatch function is not atomic. It releases queue lock and
		switch to SCSI host lock during issue and it's possible and likely to
		happen in time that requests change their relative positions. Once
		this problem is solved, TAG ordering can be enabled.

		* Currently, no matter which ordered mode is used, there can be only
		one barrier request in progress. All I/O barriers are held off by
		block layer until the previous I/O barrier is complete. This doesn't
		make any difference for DRAIN ordered devices, but, for TAG ordered
		devices with very high command latency, passing multiple I/O barriers
		to low level might be helpful if they are very frequent. Well, this
		certainly is a non-issue. I'm writing this just to make clear that no
		two I/O barrier is ever passed to low-level driver.

		* Completion order. Requests in ordered sequence are issued in order
		but not required to finish in order. Barrier implementation can
		handle out-of-order completion of ordered sequence. IOW, the requests
		MUST be processed in order but the hardware/software completion paths
		are allowed to reorder completion notifications - eg. current SCSI
		midlayer doesn't preserve completion order during error handling.

		* Requeueing order. Low-level drivers are free to requeue any request
		after they removed it from the request queue with
		blkdev_dequeue_request(). As barrier sequence should be kept in order
		when requeued, generic elevator code takes care of putting requests in
		order around barrier. See blk_ordered_req_seq() and
		ELEVATOR_INSERT_REQUEUE handling in __elv_add_request() for details.

		Note that block drivers must not requeue preceding requests while
		completing latter requests in an ordered sequence. Currently, no
		error checking is done against this.

		* Error handling. Currently, block layer will report error to upper
		layer if any of requests in an ordered sequence fails. Unfortunately,
		this doesn't seem to be enough. Look at the following request flow.
		QUEUE_ORDERED_TAG_FLUSH is in use.

		[0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
		still in elevator

		Let's say request [2], [3] are write requests to update file system
		metadata (journal or whatever) and [barrier] is used to mark that
		those updates are valid. Consider the following sequence.

		i. Requests [0] ~ [post] leaves the request queue and enters
		low-level driver.
		ii. After a while, unfortunately, something goes wrong and the
		drive fails [2]. Note that any of [0], [1] and [3] could have
		completed by this time, but [pre] couldn't have been finished
		as the drive must process it in order and it failed before
		processing that command.
		iii. Error handling kicks in and determines that the error is
		unrecoverable and fails [2], and resumes operation.
		iv. [pre] [barrier] [post] gets processed.
		v. BOOM power fails

		The problem here is that the barrier request is supposed to indicate
		that filesystem update requests [2] and [3] made it safely to the
		physical medium and, if the machine crashes after the barrier is
		written, filesystem recovery code can depend on that. Sadly, that
		isn't true in this case anymore. IOW, the success of a I/O barrier
		should also be dependent on success of some of the preceding requests,
		where only upper layer (filesystem) knows what 'some' is.

		This can be solved by implementing a way to tell the block layer which
		requests affect the success of the following barrier request and
		making lower lever drivers to resume operation on error only after
		block layer tells it to do so.

		As the probability of this happening is very low and the drive should
		be faulty, implementing the fix is probably an overkill. But, still,
		it's there.

		* In previous drafts of barrier implementation, there was fallback
		mechanism such that, if FUA or ordered TAG fails, less fancy ordered
		mode can be selected and the failed barrier request is retried
		automatically. The rationale for this feature was that as FUA is
		pretty new in ATA world and ordered tag was never used widely, there
		could be devices which report to support those features but choke when
		actually given such requests.

		This was removed for two reasons 1. it's an overkill 2. it's
		impossible to implement properly when TAG ordering is used as low
		level drivers resume after an error automatically. If it's ever
		needed adding it back and modifying low level drivers accordingly
		shouldn't be difficult.

Documentation/filesystems/fuse.txt

+63 −0

Original line number	Diff line number	Diff line
		@@ -86,6 +86,62 @@ Mount options
		The default is infinite. Note that the size of read requests is
		limited anyway to 32 pages (which is 128kbyte on i386).

		Sysfs
		~~~~~

		FUSE sets up the following hierarchy in sysfs:

		/sys/fs/fuse/connections/N/

		where N is an increasing number allocated to each new connection.

		For each connection the following attributes are defined:

		'waiting'

		The number of requests which are waiting to be transfered to
		userspace or being processed by the filesystem daemon. If there is
		no filesystem activity and 'waiting' is non-zero, then the
		filesystem is hung or deadlocked.

		'abort'

		Writing anything into this file will abort the filesystem
		connection. This means that all waiting requests will be aborted an
		error returned for all aborted and new requests.

		Only a privileged user may read or write these attributes.

		Aborting a filesystem connection
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

		It is possible to get into certain situations where the filesystem is
		not responding. Reasons for this may be:

		a) Broken userspace filesystem implementation

		b) Network connection down

		c) Accidental deadlock

		d) Malicious deadlock

		(For more on c) and d) see later sections)

		In either of these cases it may be useful to abort the connection to
		the filesystem. There are several ways to do this:

		- Kill the filesystem daemon. Works in case of a) and b)

		- Kill the filesystem daemon and all users of the filesystem. Works
		in all cases except some malicious deadlocks

		- Use forced umount (umount -f). Works in all cases but only if
		filesystem is still attached (it hasn't been lazy unmounted)

		- Abort filesystem through the sysfs interface. Most powerful
		method, always works.

		How do non-privileged mounts work?
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

		@@ -313,3 +369,10 @@ faulted with get_user_pages(). The 'req->locked' flag indicates
		when the copy is taking place, and interruption is delayed until
		this flag is unset.

		Scenario 3 - Tricky deadlock with asynchronous read
		---------------------------------------------------

		The same situation as above, except thread-1 will wait on page lock
		and hence it will be uninterruptible as well. The solution is to
		abort the connection with forced umount (if mount is attached) or
		through the abort attribute in sysfs.

Documentation/video4linux/CARDLIST.tuner

+1 −0

Original line number	Diff line number	Diff line
		@@ -68,3 +68,4 @@ tuner=66 - LG NTSC (TALN mini series)
		tuner=67 - Philips TD1316 Hybrid Tuner
		tuner=68 - Philips TUV1236D ATSC/NTSC dual in
		tuner=69 - Tena TNF 5335 MF
		tuner=70 - Samsung TCPN 2121P30A

MAINTAINERS

+5 −3

Original line number	Diff line number	Diff line
		@@ -1696,11 +1696,13 @@ M: mtk-manpages@gmx.net
		W: ftp://ftp.kernel.org/pub/linux/docs/manpages
		S: Maintained

		MARVELL MV64340 ETHERNET DRIVER
		MARVELL MV643XX ETHERNET DRIVER
		P: Dale Farnsworth
		M: dale@farnsworth.org
		P: Manish Lachwani
		L: linux-mips@linux-mips.org
		M: mlachwani@mvista.com
		L: netdev@vger.kernel.org
		S: Supported
		S: Odd Fixes for 2.4; Maintained for 2.6.

		MATROX FRAMEBUFFER DRIVER
		P: Petr Vandrovec

Makefile

+24 −15

Original line number	Diff line number	Diff line
		VERSION = 2
		PATCHLEVEL = 6
		SUBLEVEL = 15
		EXTRAVERSION =
		SUBLEVEL = 16
		EXTRAVERSION =-rc1
		NAME=Sliding Snow Leopard

		# DOCUMENTATION
		@@ -106,12 +106,13 @@ KBUILD_OUTPUT := $(shell cd $(KBUILD_OUTPUT) && /bin/pwd)
		$(if $(KBUILD_OUTPUT),, \
		$(error output directory "$(saved-output)" does not exist))

		.PHONY: $(MAKECMDGOALS)
		.PHONY: $(MAKECMDGOALS) cdbuilddir
		$(MAKECMDGOALS) _all: cdbuilddir

		$(filter-out _all,$(MAKECMDGOALS)) _all:
		cdbuilddir:
		$(if $(KBUILD_VERBOSE:1=),@)$(MAKE) -C $(KBUILD_OUTPUT) \
		KBUILD_SRC=$(CURDIR) \
		KBUILD_EXTMOD="$(KBUILD_EXTMOD)" -f $(CURDIR)/Makefile $@
		KBUILD_EXTMOD="$(KBUILD_EXTMOD)" -f $(CURDIR)/Makefile $(MAKECMDGOALS)

		# Leave processing to above invocation of make
		skip-makefile := 1
		@@ -262,6 +263,13 @@ export quiet Q KBUILD_VERBOSE
		# cc support functions to be used (only) in arch/$(ARCH)/Makefile
		# See documentation in Documentation/kbuild/makefiles.txt

		# as-option
		# Usage: cflags-y += $(call as-option, -Wa$(comma)-isa=foo,)

		as-option = $(shell if $(CC) $(CFLAGS) $(1) -Wa,-Z -c -o /dev/null \
		-xassembler /dev/null > /dev/null 2>&1; then echo "$(1)"; \
		else echo "$(2)"; fi ;)

		# cc-option
		# Usage: cflags-y += $(call cc-option, -march=winchip-c6, -march=i586)

		@@ -337,8 +345,9 @@ AFLAGS := -D__ASSEMBLY__

		# Read KERNELRELEASE from .kernelrelease (if it exists)
		KERNELRELEASE = $(shell cat .kernelrelease 2> /dev/null)
		KERNELVERSION = $(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)

		export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE \
		export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION \
		ARCH CONFIG_SHELL HOSTCC HOSTCFLAGS CROSS_COMPILE AS LD CC \
		CPP AR NM STRIP OBJCOPY OBJDUMP MAKE AWK GENKSYMS PERL UTS_MACHINE \
		HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
		@@ -433,6 +442,7 @@ export KBUILD_DEFCONFIG
		config %config: scripts_basic outputmakefile FORCE
		$(Q)mkdir -p include/linux
		$(Q)$(MAKE) $(build)=scripts/kconfig $@
		$(Q)$(MAKE) .kernelrelease

		else
		# ===========================================================================
		@@ -542,7 +552,7 @@ export INSTALL_PATH ?= /boot
		# makefile but the arguement can be passed to make if needed.
		#

		MODLIB := $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
		MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
		export MODLIB


		@@ -783,12 +793,10 @@ endif
		localver-full = $(localver)$(localver-auto)

		# Store (new) KERNELRELASE string in .kernelrelease
		kernelrelease = \
		$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)$(localver-full)
		kernelrelease = $(KERNELVERSION)$(localver-full)
		.kernelrelease: FORCE
		$(Q)rm -f .kernelrelease
		$(Q)echo $(kernelrelease) > .kernelrelease
		$(Q)echo " Building kernel $(kernelrelease)"
		$(Q)rm -f $@
		$(Q)echo $(kernelrelease) > $@


		# Things we need to do before we recursively start building the kernel
		@@ -898,7 +906,7 @@ define filechk_version.h
		)
		endef

		include/linux/version.h: $(srctree)/Makefile FORCE
		include/linux/version.h: $(srctree)/Makefile .config FORCE
		$(call filechk,version.h)

		# ---------------------------------------------------------------------------
		@@ -1301,9 +1309,10 @@ checkstack:
		$(PERL) $(src)/scripts/checkstack.pl $(ARCH)

		kernelrelease:
		@echo $(KERNELRELEASE)
		$(if $(wildcard .kernelrelease), $(Q)echo $(KERNELRELEASE), \
		$(error kernelrelease not valid - run 'make *config' to update it))
		kernelversion:
		@echo $(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)
		@echo $(KERNELVERSION)

		# FIXME Should go into a make.lib or something
		# ===========================================================================

Admin message