Normally, each connection needs a unique identity. Conntrack zones allow
to specify a numerical zone using the CT target, connections in different
zones can use the same identity.
Example:
iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1
iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add TCP support, which is mandated by RFC3261 for all SIP elements.
SIP over TCP is similar to UDP, except that messages are delimited
by Content-Length: headers and multiple messages may appear in one
packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
When using TCP multiple SIP messages might be present in a single packet.
A following patch will parse them by setting the dptr to the beginning of
each message. The NAT helper needs to reload the dptr value after mangling
the packet however, so it needs to know the offset of the message to the
beginning of the packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The static initial tables are pretty large, and after the net
namespace has been instantiated, they just hang around for nothing.
This commit removes them and creates tables on-demand at runtime when
needed.
Size shrinks by 7735 bytes (x86_64).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
The respective xt_table structures already have most of the metadata
needed for hook setup. Add a 'priority' field to struct xt_table so
that xt_hook_link() can be called with a reduced number of arguments.
So should we be having more tables in the future, it comes at no
static cost (only runtime, as before) - space saved:
6807373->6806555.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Rewrite COMPAT_XT_ALIGN in terms of dummy structure hack.
Compat counters logically have nothing to do with it.
Use ALIGN() macro while I'm at it for same types.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
There is compat_u64 type which deals with different u64 type alignment
on different compat-capable platforms, so use it and removed some
hardcoded assumptions.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds GSO/checksum offload to af_packet sockets using
virtio_net_hdr. Based on Rusty's patch to add this support to tun.
It allows GSO/checksum offload to be enabled when using raw socket
backend with virtio_net.
Adds PACKET_VNET_HDR socket option to prepend virtio_net_hdr in the
receive path and process/skip virtio_net_hdr in the send path. This
option is only allowed with SOCK_RAW sockets attached to ethernet
type devices.
v2 updates
----------
Michael's Comments
- Perform length check in packet_snd() when GSO is off even when
vnet_hdr is present.
- Check for SKB_GSO_FCOE type and return -EINVAL
- don't allow tx/rx ring when vnet_hdr is enabled.
Herbert's Comments
- Removed ethernet specific code.
- protocol value is assumed to be passed in by the caller.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many drivers do this in them manually. Now they can use this function.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the similar helpers as those already done for uc list.
However multicast lists are no list_head lists but "mademanually". The three
macros added by this patch will make the transition of mc_list to list_head
smooth in two steps:
1) convert all drivers to use these macros (with the original iterator of type
"struct dev_mc_list")
2) once all drivers are converted, convert list type and iterators to "struct
netdev_hw_addr" in one patch.
>From now on, drivers can (and should) use "netdev_for_each_mc_addr" to iterate
over the addresses with iterator of type "struct netdev_hw_addr". Also macros
"netdev_mc_count" and "netdev_mc_empty" to read list's length. This is the state
which should be reached in all drivers.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ifdef out
struct proto_ops::compat_ioctl
struct proto_ops::compat_setsockopt
struct proto_ops::compat_getsockopt
to make structures smaller on COMPAT=n kernels.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to use macvlan with qemu and other tools that require
a tap file descriptor, the macvtap driver adds a small backend
with a character device with the same interface as the tun
driver, with a minimum set of features.
Macvtap interfaces are created in the same way as macvlan
interfaces using ip link, but the netif is just used as a
handle for configuration and accounting, while the data
goes through the chardev. Each macvtap interface has its
own character device, simplifying permission management
significantly over the generic tun/tap driver.
Cc: Patrick McHardy <kaber@trash.net>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: David S. Miller" <davem@davemloft.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Or Gerlitz <ogerlitz@voltaire.com>
Cc: netdev@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes it possible to hook into the macvlan driver
from another kernel module. In particular, the goal is
to extend it with the macvtap backend that provides
a tun/tap compatible interface directly on the macvlan
device.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.
To make sure that classification stays within a single namespace,
this resets the potentially critical fields.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new target for the raw table, which can be used to specify conntrack
parameters for specific connections, f.i. the conntrack helper.
The target attaches a "template" connection tracking entry to the skb, which
is used by the conntrack core when initializing a new conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Support initializing selected parameters of new conntrack entries from a
"conntrack template", which is a specially marked conntrack entry attached
to the skb.
Currently the helper and the event delivery masks can be initialized this
way.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add two masks for conntrack end expectation events to struct nf_conntrack_ecache
and use them to filter events. Their default value is "all events" when the
event sysctl is on and "no events" when it is off. A following patch will add
specific initializations. Expectation events depend on the ecache struct of
their master conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
On Tue, Feb 02, 2010 at 02:57:14PM -0800, Greg KH (gregkh@suse.de) wrote:
> > There are at least two ways to fix it: using a big cannon and a small
> > one. The former way is to disable notification registration, since it is
> > not used by anyone at all. Second way is to check whether calling
> > process is root and its destination group is -1 (kind of priveledged
> > one) before command is dispatched to workqueue.
>
> Well if no one is using it, removing it makes the most sense, right?
>
> No objection from me, care to make up a patch either way for this?
Getting it is not used, let's drop support for notifications about
(un)registered events from connector.
Another option was to check credentials on receiving, but we can always
restore it without bugs if needed, but genetlink has a wider code base
and none complained, that userspace can not get notification when some
other clients were (un)registered.
Kudos for Sebastian Krahmer <krahmer@suse.de>, who found a bug in the
code.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's currently no way for a virtio driver to ask for unused
buffers, so it has to keep a list itself to reclaim them at shutdown.
This is redundant, since virtio_ring stores that information. So
add a new hook to do this.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Almost all igmp functions accessing inet->mc_list are protected by
rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
remove an entry while ip_mc_sf_allow is running causing a crash.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces three macros to work with uc list from net drivers.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After netdev_ops compat code HAVE_* macros aren't needed, in fact
they _will_ result in compile breakage for out of tree drivers.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee80211_hdrlen() should account account new HT Control field in 802.11
data frame header introduced by IEEE 802.11n standard.
According to 802.11n-2009 HT Control field is present in data frames
when both of following are met:
1. It is QoS data frame.
2. Order bit is set in Frame Control field.
The change might be totally compatible with legacy non-11n aware frames,
because 802.11-2007 standard states that "all QoS STAs set this subfield
to 0".
Signed-off-by: Andriy V. Tkachuk <andrit@ukr.net>
Acked-by : Benoit Papillault <benoit.papillault@free.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: x86: Add support for the ANY bit
perf: Change the is_software_event() definition
perf: Honour event state for aux stream data
perf: Fix perf_event_do_pending() fallback callsite
perf kmem: Print usage help for unknown commands
perf kmem: Increase "Hit" column length
hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
perf timechart: Use tid not pid for COMM change
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Reassign prev and switch_count when reacquire_kernel_lock() fail
sched: Fix vmark regression on big machines
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: isp1362: fix build failure on ARM systems via irq_flags cleanup
USB: isp1362: better 64bit printf warning fixes
USB: fix usbstorage for 2770:915d delivers no FAT
USB: Fix level of isp1760 Reloading ptd error message
USB: FHCI: avoid NULL pointer dereference
USB: Fix duplicate sysfs problem after device reset.
USB: add speed values for USB 3.0 and wireless controllers
USB: add missing delay during remote wakeup
USB: EHCI & UHCI: fix race between root-hub suspend and port resume
USB: EHCI: fix handling of unusual interrupt intervals
USB: Don't use GFP_KERNEL while we cannot reset a storage device
USB: fix bitmask merge error
usb: serial: fix memory leak in generic driver
USB: serial: fix USB serial fix kfifo_len locking
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
fs/bio.c: fix shadows sparse warning
drbd: The kernel code is now equivalent to out of tree release 8.3.7
drbd: Allow online resizing of DRBD devices while peer not reachable (needs to be explicitly forced)
drbd: Don't go into StandAlone mode when authentification failes because of network error
drivers/block/drbd/drbd_receiver.c: correct NULL test
cfq-iosched: Respect ioprio_class when preempting
genhd: overlapping variable definition
block: removed unused as_io_context
DM: Fix device mapper topology stacking
block: bdev_stack_limits wrapper
block: Fix discard alignment calculation and printing
block: Correct handling of bottom device misaligment
drbd: check on CONFIG_LBDAF, not LBD
drivers/block/drbd: Correct NULL test
drbd: Silenced an assert that could triggered after changing write ordering method
drbd: Kconfig fix
drbd: Fix for a race between IO and a detach operation [Bugz 262]
drbd: Use drbd_crypto_is_hash() instead of an open coded check
The is_software_event() definition always confuses me because its an
exclusive expression, make it an inclusive one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to. Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.
Reported-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1262612696.15495.15.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Borislav Petkov reports issues with duplicate sysfs endpoint files after a
resume from a hibernate. It turns out that the code to support alternate
settings under xHCI has issues when a device with a non-default alternate
setting is reset during the hibernate:
[ 427.681810] Restarting tasks ...
[ 427.681995] hub 1-0:1.0: state 7 ports 6 chg 0004 evt 0000
[ 427.682019] usb usb3: usb resume
[ 427.682030] ohci_hcd 0000:00:12.0: wakeup root hub
[ 427.682191] hub 1-0:1.0: port 2, status 0501, change 0000, 480 Mb/s
[ 427.682205] usb 1-2: usb wakeup-resume
[ 427.682226] usb 1-2: finish reset-resume
[ 427.682886] done.
[ 427.734658] ehci_hcd 0000:00:12.2: port 2 high speed
[ 427.734663] ehci_hcd 0000:00:12.2: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
[ 427.746682] hub 3-0:1.0: hub_reset_resume
[ 427.746693] hub 3-0:1.0: trying to enable port power on non-switchable hub
[ 427.786715] usb 1-2: reset high speed USB device using ehci_hcd and address 2
[ 427.839653] ehci_hcd 0000:00:12.2: port 2 high speed
[ 427.839666] ehci_hcd 0000:00:12.2: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
[ 427.847717] ohci_hcd 0000:00:12.0: GetStatus roothub.portstatus [1] = 0x00010100 CSC PPS
[ 427.915497] hub 1-2:1.0: remove_intf_ep_devs: if: ffff88022f9e8800 ->ep_devs_created: 1
[ 427.915774] hub 1-2:1.0: remove_intf_ep_devs: bNumEndpoints: 1
[ 427.915934] hub 1-2:1.0: if: ffff88022f9e8800: endpoint devs removed.
[ 427.916158] hub 1-2:1.0: create_intf_ep_devs: if: ffff88022f9e8800 ->ep_devs_created: 0, ->unregistering: 0
[ 427.916434] hub 1-2:1.0: create_intf_ep_devs: bNumEndpoints: 1
[ 427.916609] ep_81: create, parent hub
[ 427.916632] ------------[ cut here ]------------
[ 427.916644] WARNING: at fs/sysfs/dir.c:477 sysfs_add_one+0x82/0x96()
[ 427.916649] Hardware name: System Product Name
[ 427.916653] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:12.2/usb1/1-2/1-2:1.0/ep_81'
[ 427.916658] Modules linked in: binfmt_misc kvm_amd kvm powernow_k8 cpufreq_ondemand cpufreq_powersave cpufreq_userspace freq_table cpufreq_conservative ipv6 vfat fat
+8250_pnp 8250 pcspkr ohci_hcd serial_core k10temp edac_core
[ 427.916694] Pid: 278, comm: khubd Not tainted 2.6.33-rc2-00187-g08d869a-dirty #13
[ 427.916699] Call Trace:
The problem is caused by a mismatch between the USB core's view of the
device state and the USB device and xHCI host's view of the device state.
After the device reset and re-configuration, the device and the xHCI host
think they are using alternate setting 0 of all interfaces. However, the
USB core keeps track of the old state, which may include non-zero
alternate settings. It uses intf->cur_altsetting to keep the endpoint
sysfs files for the old state across the reset.
The bandwidth allocation functions need to know what the xHCI host thinks
the current alternate settings are, so original patch set
intf->cur_altsetting to the alternate setting 0. This caused duplicate
endpoint files to be created.
The solution is to not set intf->cur_altsetting before calling
usb_set_interface() in usb_reset_and_verify_device(). Instead, we add a
new flag to struct usb_interface to tell usb_hcd_alloc_bandwidth() to use
alternate setting 0 as the currently installed alternate setting.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Borislav Petkov <petkovbb@googlemail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 541cd3ee00 ("phylib: Fix deadlock
on resume") caused TI DaVinci EMAC ethernet driver to oops upon resume:
PM: resume of devices complete after 237.098 msecs
Restarting tasks ... done.
kernel BUG at kernel/workqueue.c:354!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
[...]
Backtrace:
[<c002c598>] (__bug+0x0/0x2c) from [<c0052a54>] (queue_delayed_work_on+0x74/0xf8)
[<c00529e0>] (queue_delayed_work_on+0x0/0xf8) from [<c0052b30>] (queue_delayed_work+0x2c/0x30)
The oops pops up because TI DaVinci EMAC driver detaches PHY on
suspend and attaches it back on resume. Attaching makes phylib call
phy_start_machine() that initializes a workqueue. On the other hand,
PHY's resume routine will call phy_start_machine() again, and that
will cause the oops since we just destroyed the already scheduled
workqueue.
This patch fixes the issue by moving workqueue initialization to
phy_device_create().
p.s. We don't see this oops with ucc_geth and gianfar drivers because
they perform a fine-grained suspend, i.e. they just stop the PHYs
without detaching.
Reported-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Tested-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch abstracts out the CNF area code from tmio_mmc which
is not present in all hardware that can use this driver. This
is required so that we can support non-toshiba based hardware.
ASIC3 support by Philipp Zabel
Signed-off-by: Ian Molton <ian@mnementh.co.uk>
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The constants used to specify ISINK ramp times for WM835x had the
wrong shifts so that the on times applied to the off ramp and vice
versa. The masks for the bitfields are correct.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>