If the MMC_SEND_STATUS command is not successful, we should not return
a zero status word, but instead allow the caller to know positively
that an error occurred.
Convert the open-coded get_card_status() to use the helper function,
and provide definitions for the card state field.
Change-Id: Icfd6258af78a89c21abac386c556153fa3fac364
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Allow the board file to pass a boot info string through the
platform data that is appended to the /proc/last_kmsg file.
Change-Id: I37065fafb09676085465c93384d8e176fdd942d6
Signed-off-by: Colin Cross <ccross@android.com>
commit 04da85b86188f224cc9b391b5bdd92a3ba20ffcf upstream.
The struct ftrace_hash was declared within CONFIG_FUNCTION_TRACER
but was referenced outside of it.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
commit f75159e9936143177b442afc78150b7a7ad8aa07 upstream.
The IEEE 1588 standard defines two kinds of messages, event and general
messages. Event messages require time stamping, and general do not. When
using UDP transport, two separate ports are used for the two message
types.
The BPF designed to recognize event messages incorrectly classifies L2
general messages as event messages. This commit fixes the issue by
extending the filter to check the message type field for L2 PTP packets.
Event messages are be distinguished from general messages by testing
the "general" bit.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d670ec13178d0fd8680e6742a2bc6e04f28f87d8 upstream.
David reported:
Attached below is a watered-down version of rt/tst-cpuclock2.c from
GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
similar.
Run it several times, and you will see cases where the main thread
will measure a process clock difference before and after the nanosleep
which is smaller than the cpu-burner thread's individual thread clock
difference. This doesn't make any sense since the cpu-burner thread
is part of the top-level process's thread group.
I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
64-bit binaries).
For example:
[davem@boricha build-x86_64-linux]$ ./test
process: before(0.001221967) after(0.498624371) diff(497402404)
thread: before(0.000081692) after(0.498316431) diff(498234739)
self: before(0.001223521) after(0.001240219) diff(16698)
[davem@boricha build-x86_64-linux]$
The diff of 'process' should always be >= the diff of 'thread'.
I make sure to wrap the 'thread' clock measurements the most tightly
around the nanosleep() call, and that the 'process' clock measurements
are the outer-most ones.
---
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>
static pthread_barrier_t barrier;
static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1)
__asm__ __volatile__("" : : : "memory");
return NULL;
}
int main(void)
{
clockid_t process_clock, my_thread_clock, th_clock;
struct timespec process_before, process_after;
struct timespec me_before, me_after;
struct timespec th_before, th_after;
struct timespec sleeptime;
unsigned long diff;
pthread_t th;
int err;
err = clock_getcpuclockid(0, &process_clock);
if (err)
return 1;
err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
if (err)
return 1;
pthread_barrier_init(&barrier, NULL, 2);
err = pthread_create(&th, NULL, chew_cpu, NULL);
if (err)
return 1;
err = pthread_getcpuclockid(th, &th_clock);
if (err)
return 1;
pthread_barrier_wait(&barrier);
err = clock_gettime(process_clock, &process_before);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_before);
if (err)
return 1;
err = clock_gettime(th_clock, &th_before);
if (err)
return 1;
sleeptime.tv_sec = 0;
sleeptime.tv_nsec = 500000000;
nanosleep(&sleeptime, NULL);
err = clock_gettime(th_clock, &th_after);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_after);
if (err)
return 1;
err = clock_gettime(process_clock, &process_after);
if (err)
return 1;
diff = process_after.tv_nsec - process_before.tv_nsec;
printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
process_before.tv_sec, process_before.tv_nsec,
process_after.tv_sec, process_after.tv_nsec, diff);
diff = th_after.tv_nsec - th_before.tv_nsec;
printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
th_before.tv_sec, th_before.tv_nsec,
th_after.tv_sec, th_after.tv_nsec, diff);
diff = me_after.tv_nsec - me_before.tv_nsec;
printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
me_before.tv_sec, me_before.tv_nsec,
me_after.tv_sec, me_after.tv_nsec, diff);
return 0;
}
This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.
This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().
Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6e6938b6d3130305a5960c86b1a9b21e58cf6144 upstream.
sync(2) is performed in two stages: the WB_SYNC_NONE sync and the
WB_SYNC_ALL sync. Identify the first stage with .tagged_writepages and
do livelock prevention for it, too.
Jan's commit f446daaea9 ("mm: implement writeback livelock avoidance
using page tagging") is a partial fix in that it only fixed the
WB_SYNC_ALL phase livelock.
Although ext4 is tested to no longer livelock with commit f446daaea9,
it may due to some "redirty_tail() after pages_skipped" effect which
is by no means a guarantee for _all_ the file systems.
Note that writeback_inodes_sb() is called by not only sync(), they are
treated the same because the other callers also need livelock prevention.
Impact: It changes the order in which pages/inodes are synced to disk.
Now in the WB_SYNC_NONE stage, it won't proceed to write the next inode
until finished with the current inode.
Acked-by: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8efcc57dedfebc99c3cd39564e3fc47cd1a24b75 upstream.
This needs to be an out of band value for the register and on this device
registers are 16 bit so we must shift left one to the 17th bit.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 938f97bcf1bdd1b681d5d14d1d7117a2e22d4434 upstream.
Thomas earlier submitted a fix to limit the RTC PIE freq, but
picked 5000Hz out of the air. Willy noticed that we should
instead use the 8192Hz max from the rtc man documentation.
Cc: Willy Tarreau <w@1wt.eu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 24d406a6bf736f7aebdc8fa0f0ec86e0890c6d24 upstream.
tty_operations->remove is normally called like:
queue_release_one_tty
->tty_shutdown
->tty_driver_remove_tty
->tty_operations->remove
However tty_shutdown() is called from queue_release_one_tty() only if
tty_operations->shutdown is NULL. But for pty, it is not.
pty_unix98_shutdown() is used there as ->shutdown.
So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
called. This results in invalid pty_count. I.e. what can be seen in
/proc/sys/kernel/pty/nr.
I see this was already reported at:
https://lkml.org/lkml/2009/11/5/370
But it was not fixed since then.
This patch is kind of a hackish way. The problem lies in ->install. We
allocate there another tty (so-called tty->link). So ->install is
called once, but ->remove twice, for both tty and tty->link. The fix
here is to count both tty and tty->link and divide the count by 2 for
user.
And to have ->remove called, let's make tty_driver_remove_tty() global
and call that from pty_unix98_shutdown() (tty_operations->shutdown).
While at it, let's document that when ->shutdown is defined,
tty_shutdown() is not called.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 284fb68d00c56e971ed01e0b4bac5ddd4d1b74ab upstream.
Replace/remove use of RIO v.1.2 registers/bits that are not
forward-compatible with newer versions of RapidIO specification.
RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
Use of removed (since RIO v.1.3) register bits affects users of
currently available 1.3 and 2.x compliant devices who may use not so
recent kernel versions.
Removing checks for unsupported bits makes corresponding routines
compatible with all versions of RapidIO specification. Therefore,
backporting makes stable kernel versions compliant with RIO v.1.3 and
later as well.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Chul Kim <chul.kim@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add input_register callback which gets called after
hid_configure_usage is called for all the reports
and before the input device is registered. This allows
individual drivers to do extra work like input mapping just
before device registration.
Based on discussions with David Herrmann <dh.herrmann@googlemail.com>
Change-Id: Idab6fb4f7b1e5e569bd0410967288717e9d34c98
Signed-off-by: Jaikumar Ganesh <jaikumarg@android.com>
It is possible that while one driver has already suspended,
another driver calls otg_id_notify() because it has not yet been
suspended. It would then be possible for the suspended driver's
detect callback to be called. This is undesirable.
Introduce new otg_id_suspend/otg_id_resume functions that
keep a suspended count, and if a notification happens while
someone is suspended, that notification is deferred until
all the drivers are resumed. If the notification happens before
the last driver is suspended, that suspend will be aborted
and once the final driver resumes through otg_id_resume, the
notification will be delivered.
Change-Id: I32fd32bec65e366e5f97a25c15255d94773b85b3
Signed-off-by: Dima Zavin <dima@android.com>
There is no need to take a wakelock for delayed lazy disable
work, it will be cancelled in the suspend handler and force
disabled. Only take the wakelock when the detect work is
queued, and make sure to drop the wakelock if the work is
cancelled.
Change-Id: I1e507a5f98848954ea21d45e23b6192c3132a349
Signed-off-by: Colin Cross <ccross@android.com>
commit be27425dcc516fd08245b047ea57f83b8f6f0903 upstream.
I ran into a couple of programs which broke with the new Linux 3.0
version. Some of those were binary only. I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.
For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.
$ uname -a
Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux
$ hpacucli ctrl all show
Error: No controllers detected.
$ rpm -qf /usr/sbin/hpacucli
hpacucli-8.75-12.0
Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":
https://bugzilla.mozilla.org/show_bug.cgi?id=664564
It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.
This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead. The x is the x in 3.x.
I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.
Some programs also read /proc/sys/kernel/osrelease. This can be worked
around in user space with mount --bind (and a mount namespace)
To use:
wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
gcc -o uname26 uname26.c
./uname26 program
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6d3321e8e2b3bf6a5892e2ef673c7bf536e3f904 upstream.
MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).
MTRR rendezvous sequence is not implemented using stop_machine() as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
stop_machine() works with only online cpus.
For now, take the stop_machine mutex in the MTRR rendezvous sequence that
gets called from an online cpu (here we are in the process context
and can potentially sleep while taking the mutex). And the MTRR rendezvous
that gets triggered during cpu online doesn't need to take this stop_machine
lock (as the stop_machine() already ensures that there is no cpu hotplug
going on in parallel by doing get_online_cpus())
TBD: Pursue a cleaner solution of extending the stop_machine()
infrastructure to handle the case where the calling cpu is
still not online and use this for MTRR rendezvous sequence.
fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008
Reported-by: Vadim Kotelnikov <vadimuzzz@inbox.ru>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182056.807230326@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
At times, it is necessary for boards to provide some additional information
as part of panic logs. Provide information on the board hardware as part
of panic logs.
It is safer to print this information at the very end in case something
bad happens as part of the information retrieval itself.
To use this, set global mach_panic_string to an appropriate string in the
board file.
Change-Id: Id12cdda87b0cd2940dd01d52db97e6162f671b4d
Signed-off-by: Nishanth Menon <nm@ti.com>
[ Upstream commit d8873315065f1f527c7c380402cf59b1e1d0ae36 ]
Pktgen attempts to transmit shared skbs to net devices, which can't be used by
some drivers as they keep state information in skbs. This patch adds a flag
marking drivers as being able to handle shared skbs in their tx path. Drivers
are defaulted to being unable to do so, but calling ether_setup enables this
flag, as 90% of the drivers calling ether_setup touch real hardware and can
handle shared skbs. A subsequent patch will audit drivers to ensure that the
flag is set properly
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jiri Pirko <jpirko@redhat.com>
CC: Robert Olsson <robert.olsson@its.uu.se>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.
MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)
Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.
For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.
Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We are going to use this for TCP/IP sequence number and fragment ID
generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some otg_id handlers can detect what's connected but can't detect a change.
This allows that handler to pass off the waiting for ID change to a proxy.
Change-Id: Ib38b750c3da4bffc35e37b620ecee37c5d64d31f
Signed-off-by: Erik Gilling <konkers@android.com>
-- init rb nodes in ion_handle_create
-- in ion_handle_destroy, check that a node belongs to a tree before removing
it (safety check, does not happen right now)
-- mark as static functions used only inside ion.c
-- update comments to ion_share() with a relevant blurb from the implementation
-- other minor updates/typo fixes to comments
Signed-off-by: Iliyan Malchev <malchev@google.com>
commit 0c0308066ca53fdf1423895f3a42838b67b3a5a8 upstream.
If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.
The patch also fixes a second loop detection bug by ensuring
that after turning on the ctx->duped flag, we read at least one new
cookie into ctx->dir_cookie before attempting to match with
ctx->dup_cookie.
Reported-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2efaca927f5cd7ecd0f1554b8f9b6a9a2c329c03 upstream.
I haven't reproduced it myself but the fail scenario is that on such
machines (notably ARM and some embedded powerpc), if you manage to hit
that futex path on a writable page whose dirty bit has gone from the PTE,
you'll livelock inside the kernel from what I can tell.
It will go in a loop of trying the atomic access, failing, trying gup to
"fix it up", getting succcess from gup, go back to the atomic access,
failing again because dirty wasn't fixed etc...
So I think you essentially hang in the kernel.
The scenario is probably rare'ish because affected architecture are
embedded and tend to not swap much (if at all) so we probably rarely hit
the case where dirty is missing or young is missing, but I think Shan has
a piece of SW that can reliably reproduce it using a shared writable
mapping & fork or something like that.
On archs who use SW tracking of dirty & young, a page without dirty is
effectively mapped read-only and a page without young unaccessible in the
PTE.
Additionally, some architectures might lazily flush the TLB when relaxing
write protection (by doing only a local flush), and expect a fault to
invalidate the stale entry if it's still present on another processor.
The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
"fix it up" by causing get_user_pages() which would then be equivalent to
taking the fault.
However that isn't the case. get_user_pages() will not call
handle_mm_fault() in the case where the PTE seems to have the right
permissions, regardless of the dirty and young state. It will eventually
update those bits ... in the struct page, but not in the PTE.
Additionally, it will not handle the lazy TLB flushing that can be
required by some architectures in the fault case.
Basically, gup is the wrong interface for the job. The patch provides a
more appropriate one which boils down to just calling handle_mm_fault()
since what we are trying to do is simulate a real page fault.
The futex code currently attempts to write to user memory within a
pagefault disabled section, and if that fails, tries to fix it up using
get_user_pages().
This doesn't work on archs where the dirty and young bits are maintained
by software, since they will gate access permission in the TLB, and will
not be updated by gup().
In addition, there's an expectation on some archs that a spurious write
fault triggers a local TLB flush, and that is missing from the picture as
well.
I decided that adding those "features" to gup() would be too much for this
already too complex function, and instead added a new simpler
fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
which the futex code can call.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Shan Hai <haishan.bai@gmail.com>
Tested-by: Shan Hai <haishan.bai@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <darren.hart@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a9bae5666d0510ad69bdb437371c9a3e6b770705 upstream.
There can be multiple lseg per file, so layoutcommit should be
able to handle it.
[Needed in v3.0]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 93b37905f70083d6143f5f4dba0a45cc64379a62 upstream.
Between open(2) of a /dev/fw* and the first FW_CDEV_IOC_GET_INFO
ioctl(2) on it, the kernel already queues FW_CDEV_EVENT_BUS_RESET events
to be read(2) by the client. The get_info ioctl is practically always
issued right away after open, hence this condition only occurs if the
client opens during a bus reset, especially during a rapid series of bus
resets.
The problem with this condition is twofold:
- These bus reset events carry the (as yet undocumented) @closure
value of 0. But it is not the kernel's place to choose closures;
they are privat to the client. E.g., this 0 value forced from the
kernel makes it unsafe for clients to dereference it as a pointer to
a closure object without NULL pointer check.
- It is impossible for clients to determine the relative order of bus
reset events from get_info ioctl(2) versus those from read(2),
except in one way: By comparison of closure values. Again, such a
procedure imposes complexity on clients and reduces freedom in use
of the bus reset closure.
So, change the ABI to suppress queuing of bus reset events before the
first FW_CDEV_IOC_GET_INFO ioctl was issued by the client.
Note, this ABI change cannot be version-controlled. The kernel cannot
distinguish old from new clients before the first FW_CDEV_IOC_GET_INFO
ioctl.
We will try to back-merge this change into currently maintained stable/
longterm series, and we only document the new behaviour. The old
behavior is now considered a kernel bug, which it basically is.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org>
commit 17dd759c67f21e34f2156abcf415e1f60605a188 upstream.
Currently skb_gro_header_slow unconditionally resets frag0 and
frag0_len. However, when we can't pull on the skb this leaves
the GRO fields in an inconsistent state.
This patch fixes this by only resetting those fields after the
pskb_may_pull test.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* uid handling
- Limit UID impersonation to processes with a gid in AID_NET_BW_ACCT.
This affects socket tagging, and data removal.
- Limit stats lookup to own uid or the process gid is in AID_NET_BW_STATS.
This affects stats lookup.
* allow pacifying the module
Setting passive to Y/y will make the module return immediately on
external stimulus.
No more stats and silent success on ctrl writes.
Mainly used when one suspects this module of misbehaving.
Change-Id: I83990862d52a9b0922aca103a0f61375cddeb7c4
Signed-off-by: JP Abgrall <jpa@google.com>
Allow for sched_domain spans that overlap by giving such domains their
own sched_group list instead of sharing the sched_groups amongst
each-other.
This is needed for machines with more than 16 nodes, because
sched_domain_node_span() will generate a node mask from the
16 nearest nodes without regard if these masks have any overlap.
Currently sched_domains have a sched_group that maps to their child
sched_domain span, and since there is no overlap we share the
sched_group between the sched_domains of the various CPUs. If however
there is overlap, we would need to link the sched_group list in
different ways for each cpu, and hence sharing isn't possible.
In order to solve this, allocate private sched_groups for each CPU's
sched_domain but have the sched_groups share a sched_group_power
structure such that we can uniquely track the power.
Reported-and-tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-08bxqw9wis3qti9u5inifh3y@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The RCU_BOOST commits for TREE_PREEMPT_RCU introduced an other-task
write to a new RCU_READ_UNLOCK_BOOSTED bit in the task_struct structure's
->rcu_read_unlock_special field, but, as noted by Steven Rostedt, without
correctly synchronizing all accesses to ->rcu_read_unlock_special.
This could result in bits in ->rcu_read_unlock_special being spuriously
set and cleared due to conflicting accesses, which in turn could result
in deadlocks between the rcu_node structure's ->lock and the scheduler's
rq and pi locks. These deadlocks would result from RCU incorrectly
believing that the just-ended RCU read-side critical section had been
preempted and/or boosted. If that RCU read-side critical section was
executed with either rq or pi locks held, RCU's ensuing (incorrect)
calls to the scheduler would cause the scheduler to attempt to once
again acquire the rq and pi locks, resulting in deadlock. More complex
deadlock cycles are also possible, involving multiple rq and pi locks
as well as locks from multiple rcu_node structures.
This commit fixes synchronization by creating ->rcu_boosted field in
task_struct that is accessed and modified only when holding the ->lock
in the rcu_node structure on which the task is queued (on that rcu_node
structure's ->blkd_tasks list). This results in tasks accessing only
their own current->rcu_read_unlock_special fields, making unsynchronized
access once again legal, and keeping the rcu_read_unlock() fastpath free
of atomic instructions and memory barriers.
The reason that the rcu_read_unlock() fastpath does not need to access
the new current->rcu_boosted field is that this new field cannot
be non-zero unless the RCU_READ_UNLOCK_BLOCKED bit is set in the
current->rcu_read_unlock_special field. Therefore, rcu_read_unlock()
need only test current->rcu_read_unlock_special: if that is zero, then
current->rcu_boosted must also be zero.
This bug does not affect TINY_PREEMPT_RCU because this implementation
of RCU accesses current->rcu_read_unlock_special with irqs disabled,
thus preventing races on the !SMP systems that TINY_PREEMPT_RCU runs on.
Maybe-reported-by: Dave Jones <davej@redhat.com>
Maybe-reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
`make headers_check` complains that
linux-2.6/usr/include/linux/sdla.h:116: userspace cannot reference
function or variable defined in the kernel
this is due to that there is no such a kernel function,
void sdla(void *cfg_info, char *dev, struct frad_conf *conf, int quiet);
I don't know why we have it in a kernel header, so remove it.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the x86_64 idle notifiers originally by Andi Kleen and Venkatesh
Pallipadi to generic.
Change-Id: Idf29cda15be151f494ff245933c12462643388d5
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Todd Poynor <toddpoynor@google.com>
MTP_SEND_FILE_WITH_HEADER ioctl allows sending a file with the 12 byte header
prepended at the beginning.
This is to allow MTP to use a single packet for the data phase instead of two.
Signed-off-by: Mike Lockwood <lockwood@android.com>
There is no software fallback implemented for SCTP or FCoE checksumming,
and so it should not be passed on by software devices like bridge or bonding.
For VLAN devices, this is different. First, the driver for underlying device
should be prepared to get offloaded packets even when the feature is disabled
(especially if it advertises it in vlan_features). Second, devices under
VLANs do not get replaced without tearing down the VLAN first.
This fixes a mess I accidentally introduced while converting bonding to
ndo_fix_features.
NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they
are unused as of commit 712ae51afd.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
On reading the ext_csd for the first time (in 1 bit mode), save the
ext_csd information needed for bus width compare.
On every pass we make re-reading the ext_csd, compare the data
against the saved ext_csd data.
This fixes a regression introduced in 3.0-rc1 by 08ee80cc39
("mmc: core: eMMC bus width may not work on all platforms"), which
incorrectly assumed we would be re-reading the ext_csd at resume-
time.
Signed-off-by: Philip Rakity <prakity@marvell.com>
Tested-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Rather than using explicit euid == 0 checks when trying to move
tasks into a cgroup via CFS, move permission checks into each
specific cgroup subsystem. If a subsystem does not specify a
'allow_attach' handler, then we fall back to doing our checks
the old way.
Use the 'allow_attach' handler for the 'cpu' cgroup to allow
non-root processes to add arbitrary processes to a 'cpu' cgroup
if it has the CAP_SYS_NICE capability set.
This version of the patch adds a 'allow_attach' handler instead
of reusing the 'can_attach' handler. If the 'can_attach' handler
is reused, a new cgroup that implements 'can_attach' but not
the permission checks could end up with no permission checks
at all.
Change-Id: Icfa950aa9321d1ceba362061d32dc7dfa2c64f0c
Original-Author: San Mehat <san@google.com>
Signed-off-by: Colin Cross <ccross@android.com>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/mm: Fix memory_block_size_bytes() for non-pseries
mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c
files, and I need it in a third one to fix a powerpc bug, so let's
first move it into a header
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
The original xt_quota in the kernel is plain broken:
- counts quota at a per CPU level
(was written back when ubiquitous SMP was just a dream)
- provides no way to count across IPV4/IPV6.
This patch is the original unaltered code from:
http://sourceforge.net/projects/xtables-addons
at commit e84391ce665cef046967f796dd91026851d6bbf3
Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb
Signed-off-by: JP Abgrall <jpa@google.com>