Commit Graph

2047 Commits

Author SHA1 Message Date
Boaz Harrosh
5e383255da pnfs-obj: Must return layout on IO error
commit fe0fe83585f88346557868a803a479dfaaa0688a upstream.

As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.

This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
 because this code is never called outside of pnfs.c and pnfs IO
 paths)

Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:25:03 -08:00
Boaz Harrosh
8edf7c1351 pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
commit 5c0b4129c07b902b27d3f3ebc087757f534a3abd upstream.

Some time along the way pNFS IO errors were switched to
communicate with a special iodata->pnfs_error member instead
of the regular RPC members. But objlayout was not switched
over.

Fix that!
Without this fix any IO error is hanged, because IO is not
switched to MDS and pages are never cleared or read.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:25:02 -08:00
Jeff Layton
dbbef3cbde nfs: fix regression in handling of context= option in NFSv4
commit 8a0d551a59ac92d8ff048d6cb29d3a02073e81e8 upstream.

Setting the security context of a NFSv4 mount via the context= mount
option is currently broken. The NFSv4 codepath allocates a parsed
options struct, and then parses the mount options to fill it. It
eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
That clobbers the lsm_opts struct that was populated earlier. This bug
also looks like it causes a small memory leak on each v4 mount where
context= is used.

Fix this by moving the initialization of the lsm_opts into
nfs_alloc_parsed_mount_data. Also, add a destructor for
nfs_parsed_mount_data to make it easier to free all of the allocations
hanging off of it, and to ensure that the security_free_mnt_opts is
called whenever security_init_mnt_opts is.

I believe this regression was introduced quite some time ago, probably
by commit c02d7adf.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:36 -08:00
Andy Adamson
58a902db88 NFSv4.1: fix backchannel slotid off-by-one bug
commit 61f2e5106582d02f30b6807e3f9c07463c572ccb upstream.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25 17:24:36 -08:00
Trond Myklebust
746b9ba617 NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.
commit 111d489f0fb431f4ae85d96851fbf8d3248c09d8 upstream.

Currently, the code assumes that the SEQUENCE status bits are mutually
exclusive. They are not...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 14:13:52 -08:00
Trond Myklebust
ee6dfa64be NFS: Prevent 3.0 from crashing if it receives a partial layout
This is a backport of critical parts of
commit 7c24d9489f "NFSv4.1: File layout only supports whole file layouts"

It prevents the file layout driver from (incorrectly) using
partial layouts, but ignores the part of the referenced commmit that
relies on additional machinery to change the LAYOUTGET request
based on layout driver.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:46 -08:00
Jeff Layton
124e35242a nfs: when attempting to open a directory, fall back on normal lookup (try #5)
commit 1788ea6e3b2a58cf4fb00206e362d9caff8d86a7 upstream.

commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return
from an OPEN call. Prior to that patch, that caused the client to fall
back to doing a normal lookup. When that patch went in, the code began
returning that error to userspace. The d_revalidate codepath however
never had the corresponding change, so it was still possible to end up
with a NULL ctx->state pointer after that.

That patch caused a regression. When we attempt to open a directory that
does not have a cached dentry, that open now errors out with EISDIR. If
you attempt the same open with a cached dentry, it will succeed.

Fix this by reverting the change in nfs_atomic_lookup and allowing
attempts to open directories to fall back to a normal lookup

Also, add a NFSv4-specific f_ops->open routine that just returns
-ENOTDIR. This should never be called if things are working properly,
but if it ever is, then the dprintk may help in debugging.

To facilitate this, a new file_operations field is also added to the
nfs_rpc_ops struct.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 09:09:59 -08:00
Trond Myklebust
6c36547293 VFS: Fix the remaining automounter semantics regressions
commit 815d405ceff0d6964683f033e18b9b23a88fba87 upstream.

The concensus seems to be that system calls such as stat() etc should
not trigger an automount.  Neither should the l* versions.

This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups
that _should_ trigger an automount on the last path element.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Edited to leave out the cases that are already covered by LOOKUP_OPEN,
  LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally
  force automounting for their own reasons   - Linus ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:22 -08:00
Jeff Layton
60eb9e7d1a nfs: don't try to migrate pages with active requests
commit 2da956523526e440ef4f4dd174e26f5ac06fe011 upstream.

nfs_find_and_lock_request will take a reference to the nfs_page and
will then put it if the req is already locked. It's possible though
that the reference will be the last one. That put then can kick off
a whole series of reference puts:

nfs_page
   nfs_open_context
      dentry
          inode

If the inode ends up being deleted, then the VFS will call
truncate_inode_pages. That function will try to take the page lock, but
it was already locked when migrate_page was called. The code
deadlocks.

Fix this by simply refusing the migration request if PagePrivate is
already set, indicating that the page is already associated with an
active read or write request.

We've had a customer test a backported version of this patch and
the preliminary results seem good.

Cc: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:54 -08:00
Jeff Layton
363d0372b3 nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pages
commit 3236c3e1adc0c7ec83eaff1de2d06746b7c5bb28 upstream.

commit 420e3646 allowed the kernel to reduce the number of unnecessary
commit calls by skipping the commit when there are a large number of
outstanding pages.

However, the current test in nfs_commit_unstable_pages does not handle
the edge condition properly. When ncommit == 0, then that means that the
kernel doesn't need to do anything more for the inode. The current test
though in the WB_SYNC_NONE case will return true, and the inode will end
up being marked dirty. Once that happens the inode will never be clean
until there's a WB_SYNC_ALL flush.

Fix this by immediately returning from nfs_commit_unstable_pages when
ncommit == 0.

Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which
has a backported version of 420e3646. The inode cache there was growing
very large. The inode cache was unable to be shrunk since the inodes
were all marked dirty. Calling sync() would essentially "fix" the
problem -- the WB_SYNC_ALL flush would result in the inodes all being
marked clean.

What I'm not clear on is how big a problem this is in mainline kernels
as the writeback code there is very different. Either way, it seems
incorrect to re-mark the inode dirty in this case.

Reported-by: Mike McLean <mikem@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:50 -08:00
Trond Myklebust
b04a57568d Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when syncing"
commit 59b7c05fffba030e5d9e72324691e2f99aa69b79 upstream.

This reverts commit b80c3cb628.

The reverted commit was rendered obsolete by a VFS fix: commit
5547e8aac6 (writeback: Update dirty flags in
two steps). We now no longer need to worry about writeback_single_inode()
missing our marking the inode for COMMIT in 'do_writepages()' call.

Reverting this patch, fixes a performance regression in which the inode
would continuously get queued to the dirty list, causing the writeback
code to unnecessarily try to send a COMMIT.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:49 -08:00
Trond Myklebust
e25d2c749d NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets
commit 910ac68a2b80c7de95bc8488734067b1bb15d583 upstream.

If the client is in the process of resetting the session when it receives
a callback, then returning NFS4ERR_DELAY may cause a deadlock with the
DESTROY_SESSION call.

Basically, if the client returns NFS4ERR_DELAY in response to the
CB_SEQUENCE call, then the server is entitled to believe that the
client is busy because it is already processing that call. In that
case, the server is perfectly entitled to respond with a
NFS4ERR_BACK_CHAN_BUSY to any DESTROY_SESSION call.

Fix this by having the client reply with a NFS4ERR_BADSESSION in
response to the callback if it is resetting the session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29 13:29:08 -07:00
Trond Myklebust
f4bc412bc2 NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
commit 55a673990ec04cf63005318bcf08c2b0046e5778 upstream.

Currently, there is no guarantee that we will call nfs4_cb_take_slot() even
though nfs4_callback_compound() will consistently call
nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field.
The result is that we can trigger the BUG_ON() upon the next call to
nfs4_cb_take_slot().

This patch fixes the above problem by using the slot id that was taken in
the CB_SEQUENCE operation as a flag for whether or not we need to call
nfs4_cb_free_slot().
It also fixes an atomicity problem: we need to set tbl->highest_used_slotid
atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up
racing with the various tests in nfs4_begin_drain_session().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29 13:29:08 -07:00
Boaz Harrosh
b861a2580d pnfs-obj: Bug when we are running out of bio
commit 20618b21da0796115e81906d24ff1601552701b7 upstream.

When we have a situation that the number of pages we want
to encode is bigger then the size of the bio. (Which can
currently happen only when all IO is going to a single device
.e.g group_width==1) then the IO is submitted short and we
report back only the amount of bytes we actually wrote/read
and all is fine. BUT ...

There was a bug that the current length counter was advanced
before the fail to add the extra page, and we come to a situation
that the CDB length was one-page longer then the actual bio size,
which is of course rejected by the osd-target.

While here also fix the bio size calculation, in the case
that we received more then one group of devices.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29 13:29:08 -07:00
Boaz Harrosh
3be1216c9e pnfs-obj: Fix the comp_index != 0 case
commit 9af7db3228acc286c50e3a0f054ec982efdbc6c6 upstream.

There were bugs in the case of partial layout where olo_comp_index
is not zero. This used to work and was tested but one of the later
cleanup SQUASHMEs broke it and was not tested since.

Also add a dprint that specify those received layout parameters.
Everything else was already printed.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29 13:29:07 -07:00
Trond Myklebust
c14acb19a4 NFS: Fix spurious readdir cookie loop messages
commit 0c0308066ca53fdf1423895f3a42838b67b3a5a8 upstream.

If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

The patch also fixes a second loop detection bug by ensuring
that after turning on the ctx->duped flag, we read at least one new
cookie into ctx->dir_cookie before attempting to match with
ctx->dup_cookie.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:40 -07:00
Trond Myklebust
1fcb9d4b03 NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
commit ed1e6211a0a134ff23592c6f057af982ad5dab52 upstream.

nfs_mark_return_delegation() is usually called without any locking, and
so it is not safe to dereference delegation->inode. Since the inode is
only used to discover the nfs_client anyway, it makes more sense to
have the callers pass a valid pointer to the nfs_server as a parameter.

Reported-by: Ian Kent <raven@themaw.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:40 -07:00
Peng Tao
40a3e9966c pnfs: use lwb as layoutcommit length
commit 3557c6c3be5b2ca0b11365db7f8a813253eb520b upstream.

Using NFS4_MAX_UINT64 will break current protocol.

[Needed in v3.0]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:37 -07:00
Peng Tao
892cd4a38f pnfs: let layoutcommit handle a list of lseg
commit a9bae5666d0510ad69bdb437371c9a3e6b770705 upstream.

There can be multiple lseg per file, so layoutcommit should be
able to handle it.

[Needed in v3.0]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:37 -07:00
Peng Tao
f45c1d4647 pnfs: save layoutcommit cred at layout header init
commit 9fa4075878a5faac872a63f4a97ce79c776264e9 upstream.

No need to save it for every lseg.
No need to save it at every pnfs_set_layoutcommit.

[Needed in v3.0]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:37 -07:00
Peng Tao
a14f191cda pnfs: save layoutcommit lwb at layout header
commit acff5880539fe33897d016c0f3dcf062e67c61b6 upstream.

No need to save it for every lseg.

[Needed in v3.0]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-04 21:58:37 -07:00
Linus Torvalds
201f92e2ca Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Fix use of static variable in rpcb_getport_async
  NFSv4.1: update nfs4_fattr_bitmap_maxsz
  SUNRPC: Fix a race between work-queue and rpc_killall_tasks
  pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
2011-07-13 14:34:08 -07:00
Andy Adamson
e5012d1f38 NFSv4.1: update nfs4_fattr_bitmap_maxsz
Attribute IDs assigned in RFC 5661 now require three bitmaps.
Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs.

Signed-off-by: Andy Adamson <andros@netapp.com>
Cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-11 19:14:38 -04:00
David Howells
c902ce1bfb FS-Cache: Add a helper to bulk uncache pages on an inode
Add an FS-Cache helper to bulk uncache pages on an inode.  This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages.  This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress.  Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

  ------------[ cut here ]------------
  kernel BUG at fs/cachefiles/namei.c:201!
  invalid opcode: 0000 [#1] SMP
  Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
  RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
  RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
  RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
  R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
  R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
  FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
  Stack:
   0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
   ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
   ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
  Call Trace:
   cachefiles_lookup_object+0x78/0xd4 [cachefiles]
   fscache_lookup_object+0x131/0x16d [fscache]
   fscache_object_work_func+0x1bc/0x669 [fscache]
   process_one_work+0x186/0x298
   worker_thread+0xda/0x15d
   kthread+0x84/0x8c
   kernel_thread_helper+0x4/0x10
  RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  ---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

 (1) Create a big file on my NFS server (104857600 bytes).

 (2) Read the file into the cache with md5sum on the NFS client.  Look in
     /proc/fs/fscache/stats:

	Pages  : mrk=25601 unc=0

 (3) Open the file for read/write ("bash 5<>/warthog/bigfile").  Look in proc
     again:

	Pages  : mrk=25601 unc=25601

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-07 13:21:56 -07:00
Boaz Harrosh
2bea038c52 pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
In current pnfs tree, all the layouts set mds_offset in their
.write_pagelist member.
mds_offset is only used by generic layer and should be handled by it.

This patch is for upstream. It is needed in this -rc series to fix a
bug in objects layout_commit.

I'll send patches for objects and blocks to be
squashed into current pnfs tree.

TODO: It looks like the read path needs the same patch.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-28 14:12:11 -04:00
Bryan Schumaker
1650add235 NFS: Fix decode_secinfo_maxsz
I initially did the calculation in bytes, and not words

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:07 -04:00
Trond Myklebust
19982ba856 NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test
And document what is going on there...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:06 -04:00
Trond Myklebust
8f7d5efbef NFSv4.1: Fix some issues with pnfs_generic_pg_test
1. If the intention is to coalesce requests 'prev' and 'req' then we
   have to ensure at least that we have a layout starting at
   req_offset(prev).

2. If we're only requesting a minimal layout of length desc->pg_count,
   we need to test the length actually returned by the server before
   we allow the coalescing to occur.

3. We need to deal correctly with (pgio->lseg == NULL)

4. Fixup the test guarding the pnfs_update_layout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:05 -04:00
Benny Halevy
19345cb299 NFSv4.1: file layout must consider pg_bsize for coalescing
Otherwise we end up overflowing the rpc buffer size on the receive end.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-20 16:12:26 -04:00
Boaz Harrosh
df18d127f4 pnfs-obj: No longer needed to take an extra ref at add_device
Andy's last device_cache patches, already take an extra
reference on the newly inserted device_id. So we can remove it
from obj-io.

Without this patch the device_ids are leaked.

Andy's patches are not in Linus tree yet. So I'm not sure if they are
scheduled for this Kernel or the next. This patch should be added as
part of these.

CC: Andy Adamson <andros@netapp.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-19 14:49:51 -04:00
Trond Myklebust
ee7b75fc4f NFSv4: Fix a readdir regression
Commit 7ebb9315 (NFS: use secinfo when crossing mountpoints) introduces
a regression when decoding an NFSv4 readdir entry that sets the
rdattr_error field.
By treating the resulting value as if it is a decoding error, the current
code may cause us to skip valid readdir entries.

Reported-by: Andy Adamson <andros@netapp.com>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-16 13:24:47 -04:00
Fred Isaman
9e2dfdb308 nfs4.1: mark layout as bad on error path in _pnfs_return_layout
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 14:36:33 -04:00
Fred Isaman
ea0ded748b nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout
mark_matching_lsegs_invalid could put the last ref to the layout, so
the get_layout_hdr needs to be called first.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 12:39:23 -04:00
Benny Halevy
1ed3a8539a NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path
We always get a reference on the layout header and we rely on
nfs4_layoutreturn_release to put it.  If we hit an allocation error
before starting the rpc proc we bail out early without dereferncing
the layout header properly.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:52:15 -04:00
David Howells
c7fd06228b NFS: (d)printks should use %zd for ssize_t arguments
(d)printks should use %zd for ssize_t arguments not %ld, otherwise they might
get a warning.  I see the following with MN10300.

fs/nfs/objlayout/objlayout.c: In function 'objlayout_read_done':
fs/nfs/objlayout/objlayout.c:294: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: linux-nfs@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:31 -04:00
Benny Halevy
d771e3a43e NFSv4.1: fix break condition in pnfs_find_lseg
The break condition to skip out of the loop got broken when cmp_layout
was change.  Essentially, we want to stop looking once we know no layout
on the remainder of the list can match the first byte of the looked-up
range.

Reported-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:31 -04:00
Fred Isaman
a2e1d4f2e5 nfs4.1: fix several problems with _pnfs_return_layout
_pnfs_return_layout had the following problems:

- it did not call pnfs_free_lseg_list on all paths
- it unintentionally did a forgetful return when there was no outstanding io
- it raced with concurrent LAYOUTGETS

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:30 -04:00
Andy Adamson
cec765cf58 NFSv4.1: allow zero fh array in filelayout decode layout
Signed-off-by: Andy Adamson <andros@netapp.com>
cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:30 -04:00
Andy Adamson
533eb4611c NFSv4.1: allow nfs_fhget to succeed with mounted on fileid
Commit 28331a46d8 "Ensure we request the
ordinary fileid when doing readdirplus"
changed the meaning of NFS_ATTR_FATTR_FILEID which used to be set when
FATTR4_WORD1_MOUNTED_ON_FILED was requested.

Allow nfs_fhget to succeed with only a mounted on fileid when crossing
a mountpoint or a referral.

Ask for the fileid of the absent file system if mounted_on_fileid is not
supported.

Signed-off-by: Andy Adamson <andros@netapp.com>
cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:29 -04:00
Trond Myklebust
1d92a08da2 NFSv4.1: Fix a refcounting issue in the pNFS device id cache
When we add something to the global device id cache, we need to bump the
reference count, so that the cache itself holds a reference.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:29 -04:00
Benny Halevy
c9c30dd5f7 NFSv4.1: deprecate headerpadsz in CREATE_SESSION
We don't support header padding yet so better off ditching it

Reported-by: Sid Moore <learnmost@gmail.com>
Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:28 -04:00
Peng Tao
0f66b5984d NFS41: do not update isize if inode needs layoutcommit
nfs_update_inode will update isize if there is no queued pages. For pNFS,
layoutcommit is supposed to change file size on server, the same effect as queued
pages. nfs_update_inode may be called when dirty pages are written back (nfsi->npages==0)
but layoutcommit is not sent, and it will change client file size according to server
file size. Then client ends up losing what it just writes back in pNFS path.
So we should skip updating client file size if file needs layoutcommit.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Cc: stable@kernel.org   [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:28 -04:00
Weston Andros Adamson
9e3bd4e24e NFS: fix umount of pnfs filesystems
Unmounting a pnfs filesystem hangs using filelayout and possibly others.
This fixes the use of the rcu protected node by making use of a new 'tmpnode'
for the temporary purge list. Also, the spinlock shouldn't be held when calling
synchronize_rcu().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:23:02 -04:00
Linus Torvalds
cd1acdf172 Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd
* 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
  pnfs-obj: pg_test check for max_io_size
  NFSv4.1: define nfs_generic_pg_test
  NFSv4.1: use pnfs_generic_pg_test directly by layout driver
  NFSv4.1: change pg_test return type to bool
  NFSv4.1: unify pnfs_pageio_init functions
  pnfs-obj: objlayout_encode_layoutcommit implementation
  pnfs: encode_layoutcommit
  pnfs-obj: report errors and .encode_layoutreturn Implementation.
  pnfs: encode_layoutreturn
  pnfs: layoutret_on_setattr
  pnfs: layoutreturn
  pnfs-obj: osd raid engine read/write implementation
  pnfs: support for non-rpc layout drivers
  pnfs-obj: define per-inode private structure
  pnfs: alloc and free layout_hdr layoutdriver methods
  pnfs-obj: objio_osd device information retrieval and caching
  pnfs-obj: decode layout, alloc/free lseg
  pnfs-obj: pnfs_osd XDR client implementation
  pnfs-obj: pnfs_osd XDR definitions
  pnfs-obj: objlayoutdriver module skeleton
  ...
2011-05-29 14:10:13 -07:00
Linus Torvalds
f1d1c9fa8f Merge branch 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Support for RPC over AF_LOCAL transports
  SUNRPC: Remove obsolete comment
  SUNRPC: Use AF_LOCAL for rpcbind upcalls
  SUNRPC: Clean up use of curly braces in switch cases
  NFS: Revert NFSROOT default mount options
  SUNRPC: Rename xs_encode_tcp_fragment_header()
  nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
  nfs41: Correct offset for LAYOUTCOMMIT
  NFS: nfs_update_inode: print current and new inode size in debug output
  NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
  NFSv4: Handle expired stateids when the lease is still valid
  SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
2011-05-29 11:20:02 -07:00
Boaz Harrosh
9342077011 pnfs-obj: pg_test check for max_io_size
Implement pg_test vector to test for max IO sizes. We calculate
a max_io_size member only once, and cache it in lseg so to not
do so on every page insert.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[simplify logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 21:03:08 +03:00
Boaz Harrosh
5b36c7dc41 NFSv4.1: define nfs_generic_pg_test
By default, unless pnfs is used coalesce pages until pg_bsize
(rsize or wsize) is reached.

pnfs layout drivers define their own pg_test methods that use
pnfs_generic_pg_test and need to define their own I/O size
limits (e.g. based on the file stripe size).

[Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 21:02:42 +03:00
Benny Halevy
89a58e32d9 NFSv4.1: use pnfs_generic_pg_test directly by layout driver
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:56:55 +03:00
Benny Halevy
18ad0a9f2c NFSv4.1: change pg_test return type to bool
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:56:54 +03:00
Benny Halevy
dfed206b88 NFSv4.1: unify pnfs_pageio_init functions
Use common code for pnfs_pageio_init_{read,write} and use
a common generic pg_test function.

Note that this function always assumes the the layout driver's
pg_test method is implemented.

[Fix BUG]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:56:43 +03:00