This patch adds the kernel portions needed to implement
RFC 5082 Generalized TTL Security Mechanism (GTSM).
It is a lightweight security measure against forged
packets causing DoS attacks (for BGP).
This is already implemented the same way in BSD kernels.
For the necessary Quagga patch
http://www.gossamer-threads.com/lists/quagga/dev/17389
Description from Cisco
http://www.cisco.com/en/US/docs/ios/12_3t/12_3t7/feature/guide/gt_btsh.html
It does add one byte to each socket structure, but I did
a little rearrangement to reuse a hole (on 64 bit), but it
does grow the structure on 32 bit
This should be documented on ip(4) man page and the Glibc in.h
file also needs update. IPV6_MINHOPLIMIT should also be added
(although BSD doesn't support that).
Only TCP is supported, but could also be added to UDP, DCCP, SCTP
if desired.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The list macros use LIST_POISON1 and LIST_POISON2 as undereferencable
pointers in order to trap erronous use of freed list_heads. Unfortunately
userspace can arrange for those pointers to actually be dereferencable,
potentially turning an oops to an expolit.
To avoid this allow architectures (currently x86_64 only) to override
the default values for these pointers with truly-undereferencable values.
This is easy on x86_64 as the virtual address space is large and contains
areas that cannot be mapped.
Other 64-bit architectures will likely find similar unmapped ranges.
[ingo: switch to 0xdead000000000000 as the unmapped area]
[ingo: add comments, cleanup]
[jaswinder: eliminate sparse warnings]
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch series adds generic support for creating and extracting
LZO-compressed kernel images, as well as support for using such images on
the x86 and ARM architectures, and support for creating and using
LZO-compressed initrd and initramfs images.
Russell King said:
: Testing on a Cortex A9 model:
: - lzo decompressor is 65% of the time gzip takes to decompress a kernel
: - lzo kernel is 9% larger than a gzip kernel
:
: which I'm happy to say confirms your figures when comparing the two.
:
: However, when comparing your new gzip code to the old gzip code:
: - new is 99% of the size of the old code
: - new takes 42% of the time to decompress than the old code
:
: What this means is that for a proper comparison, the results get even better:
: - lzo is 7.5% larger than the old gzip'd kernel image
: - lzo takes 28% of the time that the old gzip code took
:
: So the expense seems definitely worth the effort. The only reason I
: can think of ever using gzip would be if you needed the additional
: compression (eg, because you have limited flash to store the image.)
:
: I would argue that the default for ARM should therefore be LZO.
This patch:
The lzo compressor is worse than gzip at compression, but faster at
extraction. Here are some figures for an ARM board I'm working on:
Uncompressed size: 3.24Mo
gzip 1.61Mo 0.72s
lzo 1.75Mo 0.48s
So for a compression ratio that is still relatively close to gzip, it's
much faster to extract, at least in that case.
This part contains:
- Makefile routine to support lzo compression
- Fixes to the existing lzo compressor so that it can be used in
compressed kernels
- wrapper around the existing lzo1x_decompress, as it only extracts one
block at a time, while we need to extract a whole file here
- config dialog for kernel compression
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes the sparse warning:
fs/ext4/super.c:2390:40: warning: symbol 'i' shadows an earlier one
fs/ext4/super.c:2368:22: originally declared here
Using 'i' in a macro is dubious practice.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
All callers of the stacking functions use 512-byte sector units rather
than byte offsets. Simplify the code so the stacking functions take
sectors when specifying data offsets.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
DM does not want to know about partition offsets. Add a partition-aware
wrapper that DM can use when stacking block devices.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Discard alignment reporting for partitions was incorrect. Update to
match the algorithm used elsewhere.
The alignment can be negative (misaligned). Fix format string
accordingly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: Fix kernel-doc format error in kgdb.h
blackfin,kgdb: Do not put PC in gdb_regs into retx.
blackfin,kgdb,probe_kernel: Cleanup probe_kernel_read/write
maccess,probe_kernel: Allow arch specific override probe_kernel_(read|write)
linux-next-20081022//include/linux/kgdb.h:308): duplicate section name 'Description'
and fix typos in that file's kernel-doc comments.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Some archs such as blackfin, would like to have an arch specific
probe_kernel_read() and probe_kernel_write() implementation which can
fall back to the generic implementation if no special operations are
needed.
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This is to be used together with switch technologies, like RFC3069,
that where the individual ports are not allowed to communicate with
each other, but they are allowed to talk to the upstream router. As
described in RFC 3069, it is possible to allow these hosts to
communicate through the upstream router by proxy_arp'ing.
This patch basically allow proxy arp replies back to the same
interface (from which the ARP request/solicitation was received).
Tunable per device via proc "proxy_arp_pvlan":
/proc/sys/net/ipv4/conf/*/proxy_arp_pvlan
This switch technology is known by different vendor names:
- In RFC 3069 it is called VLAN Aggregation.
- Cisco and Allied Telesyn call it Private VLAN.
- Hewlett-Packard call it Source-Port filtering or port-isolation.
- Ericsson call it MAC-Forced Forwarding (RFC Draft).
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When working with FDPIC, there are many shared mappings of read-only
code regions between applications (the C library, applet packages like
busybox, etc.), but the current do_mmap_pgoff() function will issue an
icache flush whenever a VMA is added to an MM instead of only doing it
when the map is initially created.
The flush can instead be done when a region is first mmapped PROT_EXEC.
Note that we may not rely on the first mapping of a region being
executable - it's possible for it to be PROT_READ only, so we have to
remember whether we've flushed the region or not, and then flush the
entire region when a bit of it is made executable.
However, this also affects the brk area. That will no longer be
executable. We can mprotect() it to PROT_EXEC on MPU-mode kernels, but
for NOMMU mode kernels, when it increases the brk allocation, making
sys_brk() flush the extra from the icache should suffice. The brk area
probably isn't used by NOMMU programs since the brk area can only use up
the leavings from the stack allocation, where the stack allocation is
larger than requested.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cleanup only.
setup_arch(), doesn't care care if ACPI initialization succeeded
or failed, so delete acpi_boot_table_init()'s return value.
Signed-off-by: Len Brown <len.brown@intel.com>
The previous patches added the use of print_fmt string and changes
the trace_define_field() function to also create the fields and
format output for the event format files.
text data bss dec hex filename
5857201 1355780 9336808 16549789 fc879d vmlinux
5884589 1351684 9337896 16574169 fce6d9 vmlinux-orig
The above shows the size of the vmlinux after this patch set
compared to the vmlinux-orig which is before the patch set.
This saves us 27k on text, 1k on bss and adds just 4k of data.
The total savings of 24k in size.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B273D4D.40604@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
In the clean up of having all events call one specific function,
the syscall event init was changed to call this helper function.
With the new print_fmt updates, the syscalls need to do special
initializations. This patch converts the syscall events to call
its own init function again.
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This is part of a patch set that removes the show_format method
in the ftrace event macros.
The print_fmt field is added to hold the string that shows
the print_fmt in the event format files. This patch only adds
the field but it is currently not used. Later patches will use
this field to enable us to remove the show_format field
and function.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B273D3E.2000704@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch introduces an interface to process data objects
in parallel. The parallelized objects return after serialization
in the same order as they were before the parallelization.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
sysfs_remove_group() waits for sysfs attributes to be removed, therefore
we do not need to worry about driver-specific attributes being accessed
after driver has been detached from the device. In fact, attempts to take
serio->drv_mutex in attribute methods may lead to the following deadlock:
sysfs_read_file()
fill_read_buffer()
sysfs_get_active_two()
psmouse_attr_show_helper()
serio_pin_driver()
serio_disconnect_driver()
mutex_lock(&serio->drv_mutex);
<--------> mutex_lock(&serio_drv_mutex);
psmouse_disconnect()
sysfs_remove_group(... psmouse_attr_group);
....
sysfs_deactivate();
wait_for_completion();
Fix this by removing calls to serio_[un]pin_driver() and functions themselves
and using driver-private mutexes to serialize access to attribute's set()
methods that may change device state.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Use the per cpu allocator functionality to avoid per cpu arrays in struct zone.
This drastically reduces the size of struct zone for systems with large
amounts of processors and allows placement of critical variables of struct
zone in one cacheline even on very large systems.
Another effect is that the pagesets of one processor are placed near one
another. If multiple pagesets from different zones fit into one cacheline
then additional cacheline fetches can be avoided on the hot paths when
allocating memory from multiple zones.
Bootstrap becomes simpler if we use the same scheme for UP, SMP, NUMA. #ifdefs
are reduced and we can drop the zone_pcp macro.
Hotplug handling is also simplified since cpu alloc can bring up and
shut down cpu areas for a specific cpu as a whole. So there is no need to
allocate or free individual pagesets.
V7-V8:
- Explain chicken egg dilemmna with percpu allocator.
V4-V5:
- Fix up cases where per_cpu_ptr is called before irq disable
- Integrate the bootstrap logic that was separate before.
tj: Build failure in pageset_cpuup_callback() due to missing ret
variable fixed.
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
ringbuffer*.c are the last users of local.h.
Remove the include from modules.h and add it to ringbuffer files.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use cpu ops to deal with the per cpu data instead of a local_t. Reduces memory
requirements, cache footprint and decreases cycle counts.
The this_cpu_xx operations are also used for !SMP mode. Otherwise we could
not drop the use of __module_ref_addr() which would make per cpu data handling
complicated. this_cpu_xx operations have their own fallback for !SMP.
V8-V9:
- Leave include asm/module.h since ringbuffer.c depends on it. Nothing else
does though. Another patch will deal with that.
- Remove spurious free.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
It turns out that some PCI devices require extra delays when changing
power state from D3 to D0 (and the other way around). Although this
is against the PCI specification, we can handle it quite easily by
allowing drivers to define arbitrary D3 delays for devices known to
require extra time for switching power states.
Introduce additional field d3_delay in struct pci_dev and use it to
store the value of the device's D0->D3 delay, in miliseconds. Make
the PCI PM core code use the per-device d3_delay unless
pci_pm_d3_delay is greater (in which case the latter is used).
[This also allows the driver to specify d3_delay shorter than the
10 ms required by the PCI standard if the device is known to be able
to handle that.]
Make the sky2 driver set d3_delay to 150 for devices handled by it.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14730 which is a
listed regression from 2.6.30.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current HID code doesn't properly handle HID joysticks which have
larger number of buttons than what fits into current range reserved
for BTN_JOYSTICK.
One such joystick reported to not work properly is Saitek X52 Pro
Flight System.
We can't extend the range to fit more buttons in, because of backwards
compatibility reasons.
Therefore this patch introduces a new BTN_TRIGGER_HAPPY range, and
uses these to map the buttons which are over BTN_JOYSTICK limit.
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> [for the input.h part]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We want to be sure that compiler fetches the limit variable only
once, so add helpers for fetching current and maximal resource
limits which do that.
Add them to sched.h (instead of resource.h) due to circular dependency
sched.h->resource.h->task_struct
Alternative would be to create a separate res_access.h or similar.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: James Morris <jmorris@namei.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
It is an internal function. Move it inside __KERNEL__ ifdef, along
with task_struct declaration.
Then we get:
--- /usr/include/linux/resource.h 2009-09-14 15:09:29.000000000 +0200
+++ usr/include/linux/resource.h 2010-01-04 11:30:54.000000000 +0100
@@ -3,8 +3,6 @@
#include <linux/time.h>
-struct task_struct;
-
/*
* Resource control/accounting header file for linux
*/
@@ -70,6 +68,5 @@
*/
#include <asm/resource.h>
-int getrusage(struct task_struct *p, int who, struct rusage *ru);
#endif
***********
include/linux/Kbuild is untouched, since unifdef is run even on
headers-y nowadays.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This patch adds the flag CAN_CTRLMODE_ONE_SHOT. It is used as mask
or flag in the "struct can_ctrlmode".
It allows userspace via netlink to set a CAN controller into the special
"one-shot" mode. In this mode, if supported by the CAN controller, it
tries only once to deliver a CAN frame and aborts it if an error
(e.g.: arbitration lost) happens.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we relax the reiserfs lock to avoid creating unwanted
dependencies against others locks while grabbing these,
we want to ensure it has not been taken recursively, otherwise
the lock won't be really relaxed. Only its depth will be decreased.
The unwanted dependency would then actually happen.
To prevent from that, add a reiserfs_lock_check_recursive() call
in the places that need it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire, ieee1394: update Kconfig help
firewire, ieee1394: update MAINTAINERS entries
firewire: ohci: always use packet-per-buffer mode for isochronous reception
firewire: cdev: fix another memory leak in an error path
firewire: fix use of multiple AV/C devices, allow multiple FCP listeners
Comments from Stefan:
Distributors who still ship the old stack (ieee1394, ohci1394,
raw1394, sbp2, eth1394 and more) should now switch to the new one
(firewire-core, firewire-ohci, firewire-sbp2, firewire-net). In the
first iteration, those distributors might want to ship the old stack
also (but blacklisted) as a fallback for their users if unforeseen
problems with the newer replacement drivers are encountered.
The older FireWire stack contains several known problems which are
not going to be fixed; instead, those issues are addressed by the new
stack. An incomplete list of these issues is kept in bugzilla:
http://bugzilla.kernel.org/show_bug.cgi?id=10046
We have a guide on migration from the older to the newer stack:
http://ieee1394.wiki.kernel.org/index.php/Juju_Migration
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix sign fields in ftrace_define_fields_##call()
tracing/syscalls: Fix typo in SYSCALL_DEFINE0
tracing/kprobe: Show sign of fields in trace_kprobe format files
ksym_tracer: Remove trace_stat
ksym_tracer: Fix race when incrementing count
ksym_tracer: Fix to allow writing newline to ksym_trace_filter
ksym_tracer: Fix to make the tracer work
tracing: Kconfig spelling fixes and cleanups
tracing: Fix setting tracer specific options
Documentation: Update ftrace-design.txt
Documentation: Update tracepoint-analysis.txt
Documentation: Update mmiotrace.txt
crash_kexec gets called before kmsg_dump(KMSG_DUMP_OOPS) if
panic_on_oops is set, so the kernel log buffer is not stored
for this case.
This patch adds a KMSG_DUMP_KEXEC dump type which gets called
when crash_kexec() is invoked. To avoid getting double dumps,
the old KMSG_DUMP_PANIC is moved below crash_kexec(). The
mtdoops driver is modified to handle KMSG_DUMP_KEXEC in the
same way as a panic.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Since hibernation assumes power loss, we should fully reinitialize
PHYs (including platform fixups), as if PHYs were just attached.
This patch factors phy_init_hw() out of phy_attach_direct(), then
converts mdio_bus to dev_pm_ops and adds an appropriate restore()
callback.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: introduce kernel parameter acpi_sleep=sci_force_enable
ACPI: WMI: Survive BIOS with duplicate GUIDs
dell-wmi - fix condition to abort driver loading
wmi: check find_guid() return value to prevent oops
dell-wmi, hp-wmi, msi-wmi: check wmi_get_event_data() return value
ACPI: hp-wmi, msi-wmi: clarify that wmi_install_notify_handler() returns an acpi_status
dell-wmi: sys_init_module: 'dell_wmi'->init suspiciously returned 21, it should
ACPI video: correct error-handling code
ACPI video: no warning message if "acpi_backlight=vendor" is used
ACPI: fix ACPI=n allmodconfig build
thinkpad-acpi: improve Kconfig help text
thinkpad-acpi: update volume subdriver documentation
thinkpad-acpi: make volume subdriver optional
thinkpad-acpi: don't fail to load the entire module due to ALSA problems
thinkpad-acpi: don't take the first ALSA slot by default
Introduce kernel parameter acpi_sleep=sci_force_enable
some laptop requires SCI_EN being set directly on resume,
or else they hung somewhere in the resume code path.
We already have a blacklist for these laptops but we still need
this option, especially when debugging some suspend/resume problems,
in case there are systems that need this workaround and are not yet
in the blacklist.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Patch up how we claim metadata blocks for quota purposes
ext4: Ensure zeroout blocks have no dirty metadata
ext4: return correct wbc.nr_to_write in ext4_da_writepages
ext4: Update documentation to correct the inode_readahead_blks option name
jbd2: don't use __GFP_NOFAIL in journal_init_common()
ext4: flush delalloc blocks when space is low
fs-writeback: Add helper function to start writeback if idle
ext4: Eliminate potential double free on error path
ext4: fix unsigned long long printk warning in super.c
ext4, jbd2: Add barriers for file systems with exernal journals
ext4: replace BUG() with return -EIO in ext4_ext_get_blocks
ext4: add module aliases for ext2 and ext3
ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configured
ext4: remove unused #include <linux/version.h>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI/cardbus: Add a fixup hook and fix powerpc
PCI: change PCI nomenclature in drivers/pci/ (non-comment changes)
PCI: change PCI nomenclature in drivers/pci/ (comment changes)
PCI: fix section mismatch on update_res()
PCI: add Intel 82599 Virtual Function specific reset method
PCI: add Intel USB specific reset method
PCI: support device-specific reset methods
PCI: Handle case when no pci device can provide cache line size hint
PCI/PM: Propagate wake-up enable for PCIe devices too
vgaarbiter: fix a typo in the vgaarbiter Documentation