Commit Graph

12045 Commits

Author SHA1 Message Date
Ido Yariv
0198f84095 genirq: Fix race condition when stopping the irq thread
commit 550acb19269d65f32e9ac4ddb26c2b2070e37f1c upstream.

In irq_wait_for_interrupt(), the should_stop member is verified before
setting the task's state to TASK_INTERRUPTIBLE and calling schedule().
In case kthread_stop sets should_stop and wakes up the process after
should_stop is checked by the irq thread but before the task's state
is changed, the irq thread might never exit:

kthread_stop                    irq_wait_for_interrupt
------------                    ----------------------

                                 ...
...                              while (!kthread_should_stop()) {
kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);
...
                                     set_current_state(TASK_INTERRUPTIBLE);

                                     ...

                                     schedule();
                                 }

Fix this by checking if the thread should stop after modifying the
task's state.

[ tglx: Simplified it a bit ]

Signed-off-by: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:46 -08:00
Jeff Ohlstein
24ee8bfeb1 hrtimer: Fix extra wakeups from __remove_hrtimer()
commit 27c9cd7e601632b3794e1c3344d37b86917ffb43 upstream.

__remove_hrtimer() attempts to reprogram the clockevent device when
the timer being removed is the next to expire. However,
__remove_hrtimer() reprograms the clockevent *before* removing the
timer from the timerqueue and thus when hrtimer_force_reprogram()
finds the next timer to expire it finds the timer we're trying to
remove.

This is especially noticeable when the system switches to NOHz mode
and the system tick is removed. The timer tick is removed from the
system but the clockevent is programmed to wakeup in another HZ
anyway.

Silence the extra wakeup by removing the timer from the timerqueue
before calling hrtimer_force_reprogram() so that we actually program
the clockevent for the next timer to expire.

This was broken by 998adc3 "hrtimers: Convert hrtimers to use
timerlist infrastructure".

Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:28 -08:00
Hector Palacios
e1ef77bdad timekeeping: add arch_offset hook to ktime_get functions
commit d004e024058a0eaca097513ce62cbcf978913e0a upstream.

ktime_get and ktime_get_ts were calling timekeeping_get_ns()
but later they were not calling arch_gettimeoffset() so architectures
using this mechanism returned 0 ns when calling these functions.

This happened for example when running Busybox's ping which calls
syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
calls ktime_get. As a result the returned ping travel time was zero.

Signed-off-by: Hector Palacios <hector.palacios@digi.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:28 -08:00
Michal Hocko
953d0c888e cgroup_freezer: fix freezing groups with stopped tasks
commit 884a45d964dd395eda945842afff5e16bcaedf56 upstream.

2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
transitions) removed is_task_frozen_enough and replaced it with a simple
frozen call. This, however, breaks freezing for a group with stopped tasks
because those cannot be frozen and so the group remains in CGROUP_FREEZING
state (update_if_frozen doesn't count stopped tasks) and never reaches
CGROUP_FROZEN.

Let's add is_task_frozen_enough back and use it at the original locations
(update_if_frozen and try_to_freeze_cgroup). Semantically we consider
stopped tasks as frozen enough so we should consider both cases when
testing frozen tasks.

Testcase:
mkdir /dev/freezer
mount -t cgroup -o freezer none /dev/freezer
mkdir /dev/freezer/foo
sleep 1h &
pid=$!
kill -STOP $pid
echo $pid > /dev/freezer/foo/tasks
echo FROZEN > /dev/freezer/foo/freezer.state
while true
do
	cat /dev/freezer/foo/freezer.state
	[ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
	sleep 1
done
echo OK

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tomasz Buchert <tomasz.buchert@inria.fr>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:27 -08:00
Edward Donovan
1c8ca629b7 genirq: fix regression in irqfixup, irqpoll
commit 52553ddffad76ccf192d4dd9ce88d5818f57f62a upstream.

Commit fa27271bc8d2("genirq: Fixup poll handling") introduced a
regression that broke irqfixup/irqpoll for some hardware configurations.

Amidst reorganizing 'try_one_irq', that patch removed a test that
checked for 'action->handler' returning IRQ_HANDLED, before acting on
the interrupt.  Restoring this test back returns the functionality lost
since 2.6.39.  In the current set of tests, after 'action' is set, it
must precede '!action->next' to take effect.

With this and my previous patch to irq/spurious.c, c75d720fca8a, all
IRQ regressions that I have encountered are fixed.

Signed-off-by: Edward Donovan <edward.donovan@numble.net>
Reported-and-tested-by: Rogério Brito <rbrito@ime.usp.br>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 08:52:27 -08:00
Heiko Carstens
45db69a9fd nohz: Remove "Switched to NOHz mode" debugging messages
When performing cpu hotplug tests the kernel printk log buffer gets flooded
with pointless "Switched to NOHz mode..." messages. Especially when afterwards
analyzing a dump this might have removed more interesting stuff out of the
buffer.
Assuming that switching to NOHz mode simply works just remove the printk.

Change-Id: I1746f8c0119a512055716c3fd77a966b735ca49b
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: http://lkml.kernel.org/r/20110823112046.GB2540@osiris.boeblingen.de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
2011-12-07 13:12:14 -08:00
Linux Build Service Account
219a067fc9 Merge "Revert "power: wakelock: don't set check_done flag if aborting"" into msm-3.0 2011-12-02 21:29:05 -08:00
Colin Cross
06e857251e ARM: cpu_pm: Add cpu power management notifiers
During some CPU power modes entered during idle, hotplug and
suspend, peripherals located in the CPU power domain, such as
the GIC, localtimers, and VFP, may be powered down.  Add a
notifier chain that allows drivers for those peripherals to
be notified before and after they may be reset.

Signed-off-by: Colin Cross <ccross <at> android.com>
[santosh.shilimkar <at> ti.com: Rebased against 3.1-rc3]
Signed-off-by: Santosh Shilimkar <santosh.shilimkar <at> ti.com>
Tested-by: Kevin Hilman <khilman <at> ti.com>
Change-Id: I6e076344b268869d12033f57321f3e7cf23b05e8
Signed-off-by: Ashwin Chaugule <ashwinc@codeaurora.org>
2011-11-29 16:42:33 -05:00
Patrick Cain
b599ff7fcb Revert "power: wakelock: don't set check_done flag if aborting"
This reverts commit 7ef4dbaa2a.

Revert "power: wakelock: BUG if wakelock is taken very late"

This reverts commit 57b34bbdca.

Change-Id: Ie1288b5e6c899ac4419c55b91cb024b8093b5ffe
Signed-off-by: Patrick Cain <pcain@codeaurora.org>
2011-11-28 14:41:25 -08:00
Edward Donovan
2cecc3d5df genirq: Fix irqfixup, irqpoll regression
commit c75d720fca8a91ce99196d33adea383621027bf2 upstream.

commit d05c65fff0 ("genirq: spurious: Run only one poller at a time")
introduced a regression, leaving the boot options 'irqfixup' and
'irqpoll' non-functional. The patch placed tests in each function, to
exit if the function is already running. The test in 'misrouted_irq'
exited when it should have proceeded, effectively disabling
'misrouted_irq' and 'poll_spurious_irqs'.

The check for an already running poller needs to be "!= 1" not "== 1"
as "1" is the value when the first poller starts running.

Signed-off-by: Edward Donovan <edward.donovan@numble.net>
Cc: maciej.rutecki@gmail.com
Link: http://lkml.kernel.org/r/1320175784-6745-1-git-send-email-edward.donovan@numble.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 09:09:51 -08:00
Linux Build Service Account
949f865008 Merge "hrtimer: remove timerqueue node before reprogramming clockevent device" into msm-3.0 2011-11-23 14:51:07 -08:00
Arve Hjønnevåg
452d440ab2 Fix "time: Catch invalid timespec sleep values in __timekeeping_inject_sleeptime" to compile on 3.0
Change-Id: I1225f279cda04dedbfb7f853f6b58f1032bd6d2b
2011-11-22 16:49:43 -08:00
John Stultz
cf70c6a400 time: Catch invalid timespec sleep values in __timekeeping_inject_sleeptime
Arve suggested making sure we catch possible negative sleep time
intervals that could be passed into timekeeping_inject_sleeptime.

CC: Arve Hjønnevåg <arve@android.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-22 16:49:29 -08:00
Jeff Ohlstein
ead6b51050 hrtimer: remove timerqueue node before reprogramming clockevent device
Currently the __remove_hrtimer function attempts to reprogram the
clockevent device so that we don't wake up unnecessarily from a timer we
have deleted. However, it does the reprogramming before actually
removing the timer in question from the timerqueue, so it turns into a
noop. This causes us to have an extra wakeup every time we remove the
timer that fires this. This is especially noticeable when the system
goes idle and we switch to NOHZ mode, as the system will always wakeup
one extra time when the system tick is removed.

Change-Id: If8656bbf85694228f279923fa86bd798d23e0f49
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
2011-11-17 16:25:08 -08:00
Bryan Huntsman
d074fa2796 Merge remote-tracking branch 'common/android-3.0' into msm-3.0
* common/android-3.0: (570 commits)
  misc: remove kernel debugger core
  ARM: common: fiq_debugger: dump sysrq directly to console if enabled
  ARM: common: fiq_debugger: add irq context debug functions
  net: wireless: bcmdhd: Call init_ioctl() only if was started properly for WEXT
  net: wireless: bcmdhd: Call init_ioctl() only if was started properly
  net: wireless: bcmdhd: Fix possible memory leak in escan/iscan
  cpufreq: interactive governor: default 20ms timer
  cpufreq: interactive governor: go to intermediate hi speed before max
  cpufreq: interactive governor: scale to max only if at min speed
  cpufreq: interactive governor: apply intermediate load on current speed
  ARM: idle: update idle ticks before call idle end notifier
  input: gpio_input: don't print debounce message unless flag is set
  net: wireless: bcm4329: Skip dhd_bus_stop() if bus is already down
  net: wireless: bcmdhd: Skip dhd_bus_stop() if bus is already down
  net: wireless: bcmdhd: Improve suspend/resume processing
  net: wireless: bcmdhd: Check if FW is Ok for internal FW call
  tcp: Don't nuke connections for the wrong protocol
  ARM: common: fiq_debugger: make uart irq be no_suspend
  net: wireless: Skip connect warning for CONFIG_CFG80211_ALLOW_RECONNECT
  mm: avoid livelock on !__GFP_FS allocations
  ...

Conflicts:
	arch/arm/mm/cache-l2x0.c
	arch/arm/vfp/vfpmodule.c
	drivers/mmc/core/host.c
	kernel/power/wakelock.c
	net/bluetooth/hci_event.c

Signed-off-by: Bryan Huntsman <bryanh@codeaurora.org>
2011-11-16 13:52:50 -08:00
Dan Carpenter
f450df8004 PM / Suspend: Off by one in pm_suspend()
commit 528f7ce6e439edeac38f6b3f8561f1be129b5e91 upstream.

In enter_state() we use "state" as an offset for the pm_states[]
array.  The pm_states[] array only has PM_SUSPEND_MAX elements so
this test is off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:37:17 -08:00
Oleg Nesterov
43742b14ff ptrace: don't clear GROUP_STOP_SIGMASK on double-stop
[This does not correspond to any specific patch in the upstream tree as it was
fixed accidentally by rewriting the code in the 3.1 release]

https://bugzilla.redhat.com/show_bug.cgi?id=740121

1. Luke Macken triggered WARN_ON(!(group_stop & GROUP_STOP_SIGMASK))
   in do_signal_stop().

   This is because do_signal_stop() clears GROUP_STOP_SIGMASK part
   unconditionally but doesn't update it if task_is_stopped().

2. Looking at this problem I noticed that WARN_ON_ONCE(!ptrace) is
   not right, a stopped-but-resumed tracee can clone the untraced
   thread in the SIGNAL_STOP_STOPPED group, the new thread can start
   another group-stop.

   Remove this warning, we need more fixes to make it true.

Reported-by: Luke Macken <lmacken@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:36:23 -08:00
Ian Campbell
09bb52774e genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier
commit 9bab0b7fbaceec47d32db51cd9e59c82fb071f5a upstream.

This adds a mechanism to resume selected IRQs during syscore_resume
instead of dpm_resume_noirq.

Under Xen we need to resume IRQs associated with IPIs early enough
that the resched IPI is unmasked and we can therefore schedule
ourselves out of the stop_machine where the suspend/resume takes
place.

This issue was introduced by 676dc3cf5b "xen: Use IRQF_FORCE_RESUME".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:54 -08:00
Steven Rostedt
12cb3e734a tracing: Fix returning of duplicate data after EOF in trace_pipe_raw
commit 436fc280261dcfce5af38f08b89287750dc91cd2 upstream.

The trace_pipe_raw handler holds a cached page from the time the file
is opened to the time it is closed. The cached page is used to handle
the case of the user space buffer being smaller than what was read from
the ring buffer. The left over buffer is held in the cache so that the
next read will continue where the data left off.

After EOF is returned (no more data in the buffer), the index of
the cached page is set to zero. If a user app reads the page again
after EOF, the check in the buffer will see that the cached page
is less than page size and will return the cached page again. This
will cause reading the trace_pipe_raw again after EOF to return
duplicate data, making the output look like the time went backwards
but instead data is just repeated.

The fix is to not reset the index right after all data is read
from the cache, but to reset it after all data is read and more
data exists in the ring buffer.

Reported-by: Jeremy Eder <jeder@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:53 -08:00
hank
c7f65094f7 time: Change jiffies_to_clock_t() argument type to unsigned long
commit cbbc719fccdb8cbd87350a05c0d33167c9b79365 upstream.

The parameter's origin type is long. On an i386 architecture, it can
easily be larger than 0x80000000, causing this function to convert it
to a sign-extended u64 type.

Change the type to unsigned long so we get the correct result.

Signed-off-by: hank <pyu@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
[ build fix ]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:52 -08:00
Jiri Kosina
3cdf240310 kmod: prevent kmod_loop_msg overflow in __request_module()
commit 37252db6aa576c34fd794a5a54fb32d7a8b3a07a upstream.

Due to post-increment in condition of kmod_loop_msg in __request_module(),
the system log can be spammed by much more than 5 instances of the 'runaway
loop' message if the number of events triggering it makes the kmod_loop_msg
to overflow.

Fix that by making sure we never increment it past the threshold.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11 09:35:48 -08:00
Patrick Cain
afcffe54c3 Revert "power: wakelock: don't set check_done flag if aborting"
This reverts commit 7ef4dbaa2a.

Revert "power: wakelock: BUG if wakelock is taken very late"

This reverts commit 57b34bbdca.

Change-Id: Id4e05b191cf7781b8e0d0bed42f1c136b1ac428e
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2011-11-08 23:21:57 -08:00
Patrick Cain
b811aa7237 Revert "power: wakelock: reset wakelock checked early"
This reverts commit fa3a6d5572.

Signed-off-by: Patrick Cain <pcain@codeaurora.org>
2011-11-04 14:25:36 -07:00
Colin Cross
2bb3e31015 Merge commit 'v3.0.8' into android-3.0 2011-10-27 15:01:19 -07:00
Abhijeet Dharmapurikar
fa3a6d5572 power: wakelock: reset wakelock checked early
Currently we reset msm_suspend_check_done in resume_noirq and set it
in the suspend_noirq stage right after the wakelock checks are done.

An issue is discovered recently that a wakelock can be held much earlier
in the resume sequence - a tasklet was scheduled right in the arch specifc
resume path, and this tasklet was run as soon as local interrupt was
enabled (the I bit in the CPSR for arm processors).  The tasklet endedup
grabbing a wakelock thus causing a BUG.

To avoid this situation reset msm_suspend_check_done before enabling local
interrupts using the syscore ops.

Change-Id: I12766cb759134185e9727829f71893934492cc5f
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2011-10-25 20:14:44 -07:00
Peter Zijlstra
607ce3ed1c cputimer: Cure lock inversion
commit bcd5cff7216f9b2de0a148cc355eac199dc6f1cf upstream.

There's a lock inversion between the cputimer->lock and rq->lock;
notably the two callchains involved are:

 update_rlimit_cpu()
   sighand->siglock
   set_process_cpu_timer()
     cpu_timer_sample_group()
       thread_group_cputimer()
         cputimer->lock
         thread_group_cputime()
           task_sched_runtime()
             ->pi_lock
             rq->lock

 scheduler_tick()
   rq->lock
   task_tick_fair()
     update_curr()
       account_group_exec()
         cputimer->lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
the second one is keeping up-to-date.

This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
SMP accounting oddities").

Cure the problem by removing the cputimer->lock and rq->lock nesting,
this leaves concurrent enablers doing duplicate work, but the time
wasted should be on the same order otherwise wasted spinning on the
lock and the greater-than assignment filter should ensure we preserve
monotonicity.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-25 07:10:14 +02:00
Linus Torvalds
60635529f6 Avoid using variable-length arrays in kernel/sys.c
commit a84a79e4d369a73c0130b5858199e949432da4c6 upstream.

The size is always valid, but variable-length arrays generate worse code
for no good reason (unless the function happens to be inlined and the
compiler sees the length for the simple constant it is).

Also, there seems to be some code generation problem on POWER, where
Henrik Bakken reports that register r28 can get corrupted under some
subtle circumstances (interrupt happening at the wrong time?).  That all
indicates some seriously broken compiler issues, but since variable
length arrays are bad regardless, there's little point in trying to
chase it down.

"Just don't do that, then".

Reported-by: Henrik Grindal Bakken <henribak@cisco.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-25 07:10:14 +02:00
Grant Likely
971a0ee566 dt/irq: add irq_domain_generate_simple() helper
irq_domain_generate_simple() is an easy way to generate an irq translation
domain for simple irq controllers.  It assumes a flat 1:1 mapping from
hardware irq number to an offset of the first linux irq number assigned
to the controller

Change-Id: I0820754314a0c15b1cc9881bc38b75b0de2509b2
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Sathish Ambley <sambley@codeaurora.org>
2011-10-24 10:42:45 -07:00
Grant Likely
6469dfb4ee irq: add irq_domain translation infrastructure
This patch adds irq_domain infrastructure for translating from
hardware irq numbers to linux irqs.  This is particularly important
for architectures adding device tree support because the current
implementation (excluding PowerPC and SPARC) cannot handle
translation for more than a single interrupt controller.  irq_domain
supports device tree translation for any number of interrupt
controllers.

This patch converts x86, Microblaze, ARM and MIPS to use irq_domain
for device tree irq translation.  x86 is untested beyond compiling it,
irq_domain is enabled for MIPS and Microblaze, but the old behaviour is
preserved until the core code is modified to actually register an
irq_domain yet.  On ARM it works and is required for much of the new
ARM device tree board support.

PowerPC has /not/ been converted to use this new infrastructure.  It
is still missing some features before it can replace the virq
infrastructure already in powerpc (see documentation on
irq_domain_map/unmap for details).  Followup patches will add the
missing pieces and migrate PowerPC to use irq_domain.

SPARC has its own method of managing interrupts from the device tree
and is unaffected by this change.

Change-Id: Ia5fa674a97c85e2fc8e30275753b1494a97bd1d9
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Sathish Ambley <sambley@codeaurora.org>
2011-10-24 10:42:45 -07:00
Linux Build Service Account
e9baa79ea0 Merge "genirq: chip: set pending only for edge interrupts" into msm-3.0 2011-10-24 03:04:12 -07:00
Abhijeet Dharmapurikar
b615b0e3d4 genirq: chip: set pending only for edge interrupts
The IRQS_PENDING flag is meant to record an edge interrupt trigger event
when that interrupt is disabled.

When an edge triggered interrupt is enabled, check_irq_resend() retriggers
that irq and resets the flag to zero if set. Note that check_irq_resend()
only does this for edge triggered interrupts.

For level triggered interrupts it is expected that the interrupt remains
active and doesn't need this PENDING flag assistance from software for
re-triggering it.

However, handle_fasteoi_irq flow handler sets the PENDING flag even for
a disabled level interrupt. This causes an adverse effect if that level
interrupt is marked wakeup. The suspend code sees the pending flag on a
wakeup interrupt and aborts suspend whereas check_irq_resend does not reset
it to 0 (as it is a level interrupt). The end result is that the PENDING
flag on this level triggered wakeup interrupt never clears and the system
 keeps aborting suspend.

Fix this by setting IRQS_PENDING flag only for edge interrupts in the
handle_fasteoi_irq.

CRs-Fixed: 314344
Change-Id: I775d40f434f9309fd9672bae372b0f0fb5b91627
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2011-10-21 20:32:18 -07:00
Amar Singhal
f49d99bc41 rq_stats: Doing rq_stats calculation in the scheduler tick.
With this change, we do the average run queue statistics calculation
in the scheduler tick itself. This helps avoid any extra timers to
do the same. Also doing this calculation in the scheduler tick avoids
any bias if the calculation is done in a workqueue

Change-Id: I854d90acc05cc7a7226487be5555976826d8c837
Signed-off-by: Amar Singhal <asinghal@codeaurora.org>
2011-10-20 14:35:49 -07:00
Steven Rostedt
9374622a99 ftrace: Fix regression where ftrace breaks when modules are loaded
commit f7bc8b61f65726ff98f52e286b28e294499d7a08 upstream.

Enabling function tracer to trace all functions, then load a module and
then disable function tracing will cause ftrace to fail.

This can also happen by enabling function tracing on the command line:

  ftrace=function

and during boot up, modules are loaded, then you disable function tracing
with 'echo nop > current_tracer' you will trigger a bug in ftrace that
will shut itself down.

The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
are registered for tracing. When one or more ftrace_ops are registered,
all the records that represent the functions that the ftrace_ops will
trace have a ref count incremented. If this ref count is not zero,
when the code modification runs, that function will be enabled for tracing.
If the ref count is zero, that function will be disabled from tracing.

To make sure the accounting was working, FTRACE_WARN_ON()s were added
to updating of the ref counts.

If the ref count hits its max (> 2^30 ftrace_ops added), or if
the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
disables all modification of code.

Since it is common for ftrace_ops to trace all functions in the kernel,
instead of creating > 20,000 hash items for the ftrace_ops, the hash
count is just set to zero, and it represents that the ftrace_ops is
to trace all functions. This is where the issues arrise.

If you enable function tracing to trace all functions, and then add
a module, the modules function records do not get the ref count updated.
When the function tracer is disabled, all function records ref counts
are subtracted. Since the modules never had their ref counts incremented,
they go below zero and the FTRACE_WARN_ON() is triggered.

The solution to this is rather simple. When modules are loaded, and
their functions are added to the the ftrace pool, look to see if any
ftrace_ops are registered that trace all functions. And for those,
update the ref count for the module function records.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-10-16 14:14:55 -07:00
Steven Rostedt
d7f04c486e ftrace: Fix regression of :mod:module function enabling
commit 43dd61c9a09bd413e837df829e6bfb42159be52a upstream.

The new code that allows different utilities to pick and choose
what functions they trace broke the :mod: hook that allows users
to trace only functions of a particular module.

The reason is that the :mod: hook bypasses the hash that is setup
to allow individual users to trace their own functions and uses
the global hash directly. But if the global hash has not been
set up, it will cause a bug:

echo '*:mod:radeon' > /sys/kernel/debug/set_ftrace_filter

produces:

 [drm:drm_mode_getfb] *ERROR* invalid framebuffer id
 [drm:radeon_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip
 BUG: unable to handle kernel paging request at ffffffff8160ec90
 IP: [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
 PGD 1a05067 PUD 1a09063 PMD 80000000016001e1
 Oops: 0003 [#1] SMP Jul  7 04:02:28 phyllis kernel: [55303.858604] CPU 1
 Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc rfcomm bnep ip6table_filter hid radeon r8169 ahci libahci mii ttm drm_kms_helper drm video i2c_algo_bit intel_agp intel_gtt

 Pid: 10344, comm: bash Tainted: G        WC  3.0.0-rc5 #1 Dell Inc. Inspiron N5010/0YXXJJ
 RIP: 0010:[<ffffffff810d9136>]  [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
 RSP: 0018:ffff88003a96bda8  EFLAGS: 00010246
 RAX: ffff8801301735c0 RBX: ffffffff8160ec80 RCX: 0000000000306ee0
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880137c92940
 RBP: ffff88003a96bdb8 R08: ffff880137c95680 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81c9df78
 R13: ffff8801153d1000 R14: 0000000000000000 R15: 0000000000000000
 FS: 00007f329c18a700(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffff8160ec90 CR3: 000000003002b000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process bash (pid: 10344, threadinfo ffff88003a96a000, task ffff88012fcfc470)
 Stack:
  0000000000000fd0 00000000000000fc ffff88003a96be38 ffffffff810d92f5
  ffff88011c4c4e00 ffff880000000000 000000000b69f4d0 ffffffff8160ec80
  ffff8800300e6f06 0000000081130295 0000000000000282 ffff8800300e6f00
 Call Trace:
  [<ffffffff810d92f5>] match_records+0x155/0x1b0
  [<ffffffff810d940c>] ftrace_mod_callback+0xbc/0x100
  [<ffffffff810dafdf>] ftrace_regex_write+0x16f/0x210
  [<ffffffff810db09f>] ftrace_filter_write+0xf/0x20
  [<ffffffff81166e48>] vfs_write+0xc8/0x190
  [<ffffffff81167001>] sys_write+0x51/0x90
  [<ffffffff815c7e02>] system_call_fastpath+0x16/0x1b
 Code: 48 8b 33 31 d2 48 85 f6 75 33 49 89 d4 4c 03 63 08 49 8b 14 24 48 85 d2 48 89 10 74 04 48 89 42 08 49 89 04 24 4c 89 60 08 31 d2
 RIP [<ffffffff810d9136>] add_hash_entry+0x66/0xd0
  RSP <ffff88003a96bda8>
 CR2: ffffffff8160ec90
 ---[ end trace a5d031828efdd88e ]---

Reported-by: Brian Marete <marete@toshnix.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-16 14:14:55 -07:00
Peter Zijlstra
249cf808ba posix-cpu-timers: Cure SMP wobbles
commit d670ec13178d0fd8680e6742a2bc6e04f28f87d8 upstream.

David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$

  The diff of 'process' should always be >= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include <unistd.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <fcntl.h>
  #include <string.h>
  #include <errno.h>
  #include <pthread.h>

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
	  pthread_barrier_wait(&barrier);
	  while (1)
		  __asm__ __volatile__("" : : : "memory");
	  return NULL;
  }

  int main(void)
  {
	  clockid_t process_clock, my_thread_clock, th_clock;
	  struct timespec process_before, process_after;
	  struct timespec me_before, me_after;
	  struct timespec th_before, th_after;
	  struct timespec sleeptime;
	  unsigned long diff;
	  pthread_t th;
	  int err;

	  err = clock_getcpuclockid(0, &process_clock);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
	  if (err)
		  return 1;

	  pthread_barrier_init(&barrier, NULL, 2);
	  err = pthread_create(&th, NULL, chew_cpu, NULL);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(th, &th_clock);
	  if (err)
		  return 1;

	  pthread_barrier_wait(&barrier);

	  err = clock_gettime(process_clock, &process_before);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_before);
	  if (err)
		  return 1;

	  err = clock_gettime(th_clock, &th_before);
	  if (err)
		  return 1;

	  sleeptime.tv_sec = 0;
	  sleeptime.tv_nsec = 500000000;
	  nanosleep(&sleeptime, NULL);

	  err = clock_gettime(th_clock, &th_after);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_after);
	  if (err)
		  return 1;

	  err = clock_gettime(process_clock, &process_after);
	  if (err)
		  return 1;

	  diff = process_after.tv_nsec - process_before.tv_nsec;
	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 process_before.tv_sec, process_before.tv_nsec,
		 process_after.tv_sec, process_after.tv_nsec, diff);
	  diff = th_after.tv_nsec - th_before.tv_nsec;
	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 th_before.tv_sec, th_before.tv_nsec,
		 th_after.tv_sec, th_after.tv_nsec, diff);
	  diff = me_after.tv_nsec - me_before.tv_nsec;
	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 me_before.tv_sec, me_before.tv_nsec,
		 me_after.tv_sec, me_after.tv_nsec, diff);

	  return 0;
  }

This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-16 14:14:51 -07:00
Simon Kirby
4e41ce6988 sched: Fix up wchan borkage
commit 6ebbe7a07b3bc40b168d2afc569a6543c020d2e3 upstream.

Commit c259e01a1ec ("sched: Separate the scheduler entry for
preemption") contained a boo-boo wrecking wchan output. It forgot to
put the new schedule() function in the __sched section and thereby
doesn't get properly ignored for things like wchan.

Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-16 14:14:51 -07:00
Shawn Bohrer
113f8b8f99 sched/rt: Migrate equal priority tasks to available CPUs
commit 3be209a8e22cedafc1b6945608b7bb8d9887ab61 upstream.

Commit 43fa5460fe ("sched: Try not to
migrate higher priority RT tasks") also introduced a change in behavior
which keeps RT tasks on the same CPU if there is an equal priority RT
task currently running even if there are empty CPUs available.

This can cause unnecessary wakeup latencies, and can prevent the
scheduler from balancing all RT tasks across available CPUs.

This change causes an RT task to search for a new CPU if an equal
priority RT task is already running on wakeup.  Lower priority tasks
will still have to wait on higher priority tasks, but the system should
still balance out because there is always the possibility that if there
are both a high and low priority RT tasks on a given CPU that the high
priority task could wakeup while the low priority task is running and
force it to search for a better runqueue.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315837684-18733-1-git-send-email-sbohrer@rgmadvisors.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-16 14:14:51 -07:00
Linux Build Service Account
c37683a299 Merge "power: wakelock: don't set check_done flag if aborting" into msm-3.0 2011-10-14 14:39:38 -07:00
Abhijeet Dharmapurikar
7ef4dbaa2a power: wakelock: don't set check_done flag if aborting
If there is a wakelock held the wakelock driver rejects the suspend
by returning EAGAIN(-11). In this case since the suspend is rejected
in suspend_noirq callback, the resume_noirq callback is not called on
the wakelock driver, leading to the check_done flag not being cleared
and causing a BUG for the next wakelock request.

Don't set the check_done flag if suspend_noirq aborts suspend.

Change-Id: Iddd12cefd9a30020784416b4e5bec7fe3f7fc0e6
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2011-10-14 09:22:49 -07:00
Linux Build Service Account
17e081ddcd Merge "Revert "ARM: Make low-level printk work"" into msm-3.0 2011-10-13 05:28:40 -07:00
Stepan Moskovchenko
9049d3635f Revert "ARM: Make low-level printk work"
This reverts commit ffdcd796e23c86d2cfeb25cb2d140f11d5fd6411.
This feature is replaced by passing 'earlyprintk' on the
kernel command line.

Change-Id: I2d4f2812e39b1c7afc061f106863b63710762fa7
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2011-10-11 13:25:03 -07:00
Abhijeet Dharmapurikar
25bcca80f9 genirq: fix handle_nested_irq for lazy disable
When lazy disabling is implemented and an interrupt is disabled the
genirq code ends up marking it as IRQ_DISABLED in the descriptor.
The interrupt stays enabled in the controller.  If the interrupt
fires after disabling, the flow handlers namely handle_level_irq and
handle_edge_irq mask the interrupt in the controller.

This is not the case with handle_nested_irq. The interrupt stays enabled in
the controller and if it were a level interrupt it keeps firing only to be
ignored by handle_nested_irq.

Update handle_nested_irq to mask such an interrupt.

CRs-Fixed: 300931
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>

Conflicts:

	kernel/irq/chip.c
2011-10-11 09:59:29 -07:00
Peter Foley
32a41b2cd1 kernel: prevent unnecessary rebuilding due to config_data.gz
When IKCONFIG is built-in make oldconfig will cause the kernel to be
relinked even if .config didn't change. This happens because of a
config_data.gz dependency on .config. This patch changes the if_changed
to a filechk so that config_data.h is only rebuilt when the contents
have actually changed.

Change-Id: I0c907b5312e1059352a0afff688d8e015dec6bed
Signed-off-by: Peter Foley <pefoley2@verizon.net>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2011-10-07 10:46:08 -07:00
Maya Spivak
2aba4e8b5e printk: Don't allow cpu to get console lock during hotplugging
The flush of the console takes unnecessary time during the cpu
hotplug up operation.  It can delay the rcu synchronize_sched
operation, which in turn delays the hotplug operation. The flush
delays rcu synchronize_sched during scheduling domain creation
by up to 100 ms by interrupting the move to a quiescent state.
This change delays the flush of the console to later in the
notification chain of cpu_online. At the point in the cpu_up
operation where the flush now occurs, other tasks can already
be scheduled on the cpu that just came up.

Signed-off-by: Maya Spivak <mspivak@codeaurora.org>
2011-10-05 10:28:09 -07:00
Abhijeet Dharmapurikar
57b34bbdca power: wakelock: BUG if wakelock is taken very late
There has been handful of corner cases where a driver would call
wake_lock very late in the suspend sequence but the suspend proceeds
anyways and causes issues.

Add code to BUG if a driver grabs a wakelock after suspend_noirq callbacks
are called, this will aid in correct wakelock usage for new drivers and
give us callstacks for misbehaving drivers.

Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2011-10-04 17:14:23 -07:00
Abhijeet Dharmapurikar
ef2293e734 genirq: explicitly mask a freed irq
When an interrupt is freed, the shutdown or the disable callback
is called for that interrupt. These calls might not be implemented
or even if they were, might not mask the interrupt.

Explicitly mask the interrupt when it is freed. If not masked, the
interrupt could trigger, set the pending bit in the irq controller
and cause unnecessary wakeup or exits from idle power collapse.

Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>

Conflicts:

	kernel/irq/manage.c
2011-10-03 16:19:25 -07:00
Maya Spivak
e0473b4aaf partition_sched_domains: Do not destroy old sched domain on cpu_up
This is safe on a cpu_up only. Although a reader may still have
access to the old scheduling domain data, the data indicates that
the new CPU is not up, and therefore the only limitation is the
new CPU will not be schedulable by that reader until that reader
receives the new data.

Signed-off-by: Maya Spivak <mspivak@codeaurora.org>
2011-10-03 16:18:46 -07:00
Maya Spivak
3d321a3570 cpu-hotplug: Add the function 'cpu_hotplug_inprogress'
Allows a caller to detect whether a cpu hotplug operation
is in progress. This is useful for optimizing code paths
based on this condition.

Signed-off-by: Maya Spivak <mspivak@codeaurora.org>
2011-10-03 16:17:26 -07:00
Thomas Tuttle
d5b1a08d0d workqueue: lock cwq access in drain_workqueue
commit fa2563e41c3d6d6e8af437643981ed28ae0cb56d upstream.

Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to
make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing
and then incrementing nr_active when it activates a delayed work.

We discovered this when a corner case in one of our drivers resulted in
us trying to destroy a workqueue in which the remaining work would
always requeue itself again in the same workqueue.  We would hit this
race condition and trip the BUG_ON on workqueue.c:3080.

Signed-off-by: Thomas Tuttle <ttuttle@chromium.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:31 -07:00
Geert Uytterhoeven
79e72e1b97 genirq: Make irq_shutdown() symmetric vs. irq_startup again
commit ed585a651681e822089087b426e6ebfb6d3d9873 upstream.

If an irq_chip provides .irq_shutdown(), but neither of .irq_disable() or
.irq_mask(), free_irq() crashes when jumping to NULL.
Fix this by only trying .irq_disable() and .irq_mask() if there's no
.irq_shutdown() provided.

This revives the symmetry with irq_startup(), which tries .irq_startup(),
.irq_enable(), and irq_unmask(), and makes it consistent with the comment for
irq_chip.irq_shutdown() in <linux/irq.h>, which says:

 * @irq_shutdown:	shut down the interrupt (defaults to ->disable if NULL)

This is also how __free_irq() behaved before the big overhaul, cfr. e.g.
3b56f0585f ("genirq: Remove bogus conditional"),
where the core interrupt code always overrode .irq_shutdown() to
.irq_disable() if .irq_shutdown() was NULL.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Link: http://lkml.kernel.org/r/1315742394-16036-2-git-send-email-geert@linux-m68k.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03 11:40:27 -07:00