"Definition" is misspelled "defintion" in several comments; this
patch fixes them. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
If the session is reset during state recovery, the state manager thread can
sleep on the slot_tbl_waitq causing a deadlock.
Add a completion framework to the session. Have the state manager thread set
a new session state (NFS4CLNT_SESSION_DRAINING) and wait for the session slot
table to drain.
Signal the state manager thread in nfs41_sequence_free_slot when the
NFS4CLNT_SESSION_DRAINING bit is set and the session is drained.
Reported-by: Trond Myklebust <trond@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch modifies scsi_host_template->change_queue_depth so that
it takes an argument indicating why it is being called. This will be
used so that if a LLD needs to do some extra processing when
handling queue fulls or later ramp ups, it can do so.
This is a simple port of the drivers setting a change_queue_depth
callback. In the patch I just have these LLDs adjust the queue depth
if the user was requesting it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
[Vasu.Dev: v2
Also converted pmcraid_change_queue_depth and then verified
all modules compile using "make allmodconfig" for any new build
warnings on X86_64.
Updated original description after combing two original
patches from Mike to make this patch git bisectable.]
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
[jejb: fixed up 53c700]
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
With CLONE_IO, parent's io_context->nr_tasks is incremented, but never
decremented whenever copy_process() fails afterwards, which prevents
exit_io_context() from calling IO schedulers exit functions.
Give a task_struct to exit_io_context(), and call exit_io_context() instead of
put_io_context() in copy_process() cleanup path.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Most users (except sh_mobile_lcdcfb.c) get their framebuffer from
vmalloc. Setting VM_IO is not necessary as the memory obtained
from vmalloc is System RAM type and is not susceptible to PCI memory
constraints.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Provide common routine for the transition of operational state for a leaf
device during a root device transition.
Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using remote wakeup and delayed transmission to allow
online device to go into usb autosuspend.
Minimal alternate support for devices that don't support
remote wakeup.
Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ata_set_lba_range_entries used the variable max for two different things
which was confusing. Make the function take a buffer size in bytes as
argument and return the used buffer size upon completion.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Our current TRIM payload is a single sector that can accommodate 64 *
65535 blocks being unmapped. Report this value in the Block Limits
Maximum Unmap LBA count field.
If a storage device supports TRIM and the DRAT and RZAT bits are set,
report TPRZ=1 in Read Capacity(16).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch cleans up the VPD code by creating preprocessor definitions
and using them in the place of hardcoded constants.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel sometimes makes RPC calls to services that aren't running.
Because the kernel's RPC client always assumes the hard retry semantic
when reconnecting a connection-oriented RPC transport, the underlying
reconnect logic takes a long while to time out, even though the remote
may have responded immediately with ECONNREFUSED.
In certain cases, like upcalls to our local rpcbind daemon, or for NFS
mount requests, we'd like the kernel to fail immediately if the remote
service isn't reachable. This allows another transport to be tried
immediately, or the pending request can be abandoned quickly.
Introduce a per-request flag which controls how call_transmit_status()
behaves when request transmission fails because the server cannot be
reached.
We don't want soft connection semantics to apply to other errors. The
default case of the switch statement in call_transmit_status() no
longer falls through; the fall through code is copied to the default
case, and a "break;" is added.
The transport's connection re-establishment timeout is also ignored for
such requests. We want the request to fail immediately, so the
reconnect delay is skipped. Additionally, we don't want a connect
failure here to further increase the reconnect timeout value, since
this request will not be retried.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
reorder nfs4_sequence_args to remove 8 bytes of padding on 64 bit
builds.
The size of this structure drops to 24 bytes from 32 and reduces the
text size of nfs.ko.
On my x86_64 size reports
text data bss
2.6.32-rc5 200996 8512 432 209940 33414 nfs.ko
+patch 200884 8512 432 209828 333a4 nfs.ko
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:16:35 2009 +0100
ipv4: add sysctl to accept packets with local source addresses
Change fib_validate_source() to accept packets with a local source address when
the "accept_local" sysctl is set for the incoming inet device. Combined with the
previous patches, this allows to communicate between multiple local interfaces
over the wire.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68144d350f4f6c348659c825cde6a82b34c27a91
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:25 2009 +0100
net: fib_rules: add oif classification
Support routing table lookup based on the flow's oif. This is useful to
classify packets originating from sockets bound to interfaces differently.
The route cache already includes the oif and needs no changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 229e77eec406ad68662f18e49fda8b5d366768c5
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:23 2009 +0100
net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME
The next patch will add oif classification, rename interface related members
and attributes to reflect that they're used for iif classification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were never able to get docs for this out of Toshiba for years. Dave
Barnes produced a NetBSD driver however and from that we can fill in the
needed tables.
As we correct the PCI identifiers a bit also update the old ide generic driver
at the same time so it stays compiling.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
o This is basic implementation of blkio controller cgroup interface. This is
the common interface visible to user space and should be used by different
IO control policies as we implement those.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There are two spare field in the header common to all GFS2
metadata. One is just the right size to fit a journal id
in it, and this patch updates the journal code so that each
time a metadata block is modified, we tag it with the journal
id of the node which is performing the modification.
The reason for this is that it should make it much easier to
debug issues which arise if we can tell which node was the
last to modify a particular metadata block.
Since the field is updated before the block is written into
the journal, each journal should only contain metadata which
is tagged with its own journal id. The one exception to this
is the journal header block, which might have a different node's
id in it, if that journal was recovered by another node in the
cluster.
Thus each journal will contain a record of which nodes recovered
it, via the journal header.
The other field in the metadata header could potentially be
used to hold information about what kind of operation was
performed, but for the time being we just zero it on each
transaction so that if we use it for that in future, we'll
know that the information (where it exists) is reliable.
I did consider using the other field to hold the journal
sequence number, however since in GFS2's journaling we write
the modified data into the journal and not the original
data, this gives no information as to what action caused the
modification, so I think we can probably come up with a better
use for those 64 bits in the future.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Sending a message to userspace in a generic format to warn
of events (e.g. quota exceeded) in the quota subsystem is
a generically useful feature. This patch makes some minor
changes to the send_message function from dquot.c renaming
it quota_send_message, moving it to quota.c and exporting it
for use by filesystems which do not use the dquot code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is required for cluster filesystems which want to use
cached ACLs so that they can invalidate the cache when
required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
The discard ioctl is used by mkfs utilities to clear a block device
prior to putting metadata down. However, not all devices return zeroed
blocks after a discard. Some drives return stale data, potentially
containing old superblocks. It is therefore important to know whether
discarded blocks are properly zeroed.
Both ATA and SCSI drives have configuration bits that indicate whether
zeroes are returned after a discard operation. Implement a block level
interface that allows this information to be bubbled up the stack and
queried via a new block device ioctl.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Add support for the ATA TRIM command in libata. We translate a WRITE SAME 16
command with the unmap bit set into an ATA TRIM command and export enough
information in READ CAPACITY 16 and the block limits EVPD page so that the new
SCSI layer discard support will driver this for us.
Note that I hardcode the WRITE_SAME_16 opcode for now as the patch to introduce
the symbolic is not in 2.6.32 yet but only in the SCSI tree - as soon as it is
merged we can fix it up to properly use the symbolic name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.
However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.
This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination. Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode. This was reported in the following bugzilla bug.
http://bugzilla.kernel.org/show_bug.cgi?id=14543
This patch makes libata EH retry FLUSH if it wasn't failed by the
device.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.
The userspace ABI is broken by this, however there are no applications
in the wild using this. A capability check is provided so users can
verify the updated API exists.
Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.
[avi: future-proof abi by adding a flags field]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults. If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.
Signed-off-by: Avi Kivity <avi@redhat.com>
Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available. Add extra data to internal error
reporting so we can expose it to the debugger. Extra data is specific
to the suberror.
Signed-off-by: Avi Kivity <avi@redhat.com>
Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.
A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future. Thus this patch
adds special support for the Xen HVM MSR.
I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files. When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.
I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.
Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).
Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.
So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.
[avi: build fix on non-x86/ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use gsi indexed array instead of scanning all entries on each interrupt
injection.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.
[avi: no PIC on ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>