commit 7b8c57cad53a07a2e058c07406f5720990ccee0c Author: Greg Kroah-Hartman Date: Sat Dec 10 19:09:59 2016 +0100 Linux 4.8.14 commit 1670d1584701d691db9a8bbabc5c54e0428eb56c Author: Tobias Brunner Date: Tue Nov 29 17:05:25 2016 +0100 esp6: Fix integrity verification when ESN are used commit a55e23864d381c5a4ef110df94b00b2fe121a70d upstream. When handling inbound packets, the two halves of the sequence number stored on the skb are already in network order. Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface") Signed-off-by: Tobias Brunner Acked-by: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit b3e9d498293926d96446d04696e044a900b1fa1f Author: Tobias Brunner Date: Tue Nov 29 17:05:20 2016 +0100 esp4: Fix integrity verification when ESN are used commit 7c7fedd51c02f4418e8b2eed64bdab601f882aa4 upstream. When handling inbound packets, the two halves of the sequence number stored on the skb are already in network order. Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface") Signed-off-by: Tobias Brunner Acked-by: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit be5339492b2919a910bf06630a861d5da8a478bd Author: Miroslav Urbanek Date: Mon Nov 21 15:48:21 2016 +0100 flowcache: Increase threshold for refusing new allocations commit 6b226487815574193c1da864f2eac274781a2b0c upstream. The threshold for OOM protection is too small for systems with large number of CPUs. Applications report ENOBUFs on connect() every 10 minutes. The problem is that the variable net->xfrm.flow_cache_gc_count is a global counter while the variable fc->high_watermark is a per-CPU constant. Take the number of CPUs into account as well. Fixes: 6ad3122a08e3 ("flowcache: Avoid OOM condition under preasure") Reported-by: Lukáš Koldrt Tested-by: Jan Hejl Signed-off-by: Miroslav Urbanek Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman commit 3a116fa8c95d4c16f864475c4fdd395fd2a6cce4 Author: Eli Cooper Date: Thu Dec 1 10:05:12 2016 +0800 Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" commit 80d1106aeaf689ab5fdf33020c5fecd269b31c88 upstream. This reverts commit ae148b085876fa771d9ef2c05f85d4b4bf09ce0d ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"). skb->protocol is now set in __ip_local_out() and __ip6_local_out() before dst_output() is called. It is no longer necessary to do it for each tunnel. Signed-off-by: Eli Cooper Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 686182870c6a7dce62c432aeba4aaedd113322e7 Author: Eli Cooper Date: Thu Dec 1 10:05:10 2016 +0800 ipv4: Set skb->protocol properly for local output commit f4180439109aa720774baafdd798b3234ab1a0d2 upstream. When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IP before dst_output() is called, fixing a bug where GSO packets sent through a sit tunnel are dropped when xfrm is involved. Signed-off-by: Eli Cooper Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e67bd82fb79d04426f8f5f8bc52c31e9fbb8b69d Author: Eli Cooper Date: Thu Dec 1 10:05:11 2016 +0800 ipv6: Set skb->protocol properly for local output commit b4e479a96fc398ccf83bb1cffb4ffef8631beaf1 upstream. When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called, fixing a bug where GSO packets sent through an ipip6 tunnel are dropped when xfrm is involved. Signed-off-by: Eli Cooper Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 22d94c3266601ac14ff70c76e14ea05d2263b914 Author: Linus Torvalds Date: Tue Dec 6 16:18:14 2016 -0800 Don't feed anything but regular iovec's to blk_rq_map_user_iov commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0 upstream. In theory we could map other things, but there's a reason that function is called "user_iov". Using anything else (like splice can do) just confuses it. Reported-and-tested-by: Johannes Thumshirn Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit e4a6c61c6b71165f78da54ceb05e96f57a716034 Author: Al Viro Date: Mon Oct 10 13:57:37 2016 -0400 constify iov_iter_count() and iter_is_iovec() commit b57332b4105abf1d518d93886e547ee2f98cd414 upstream. [stable note, need this to prevent build warning in commit a0ac402cfcdc904f9772e1762b3fda112dcc56a0] Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit 907bc3181c9a3dcf9207f7fcdb8b413b07d8625b Author: Andreas Larsson Date: Wed Nov 9 10:43:05 2016 +0100 sparc32: Fix inverted invalid_frame_pointer checks on sigreturns [ Upstream commit 07b5ab3f71d318e52c18cc3b73c1d44c908aacfa ] Signed-off-by: Andreas Larsson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 360e257f8cf3cba8af649db060b5c4e2883c7616 Author: Thomas Tai Date: Fri Nov 11 16:41:00 2016 -0800 sparc64: fix compile warning section mismatch in find_node() [ Upstream commit 87a349f9cc0908bc0cfac0c9ece3179f650ae95a ] A compile warning is introduced by a commit to fix the find_node(). This patch fix the compile warning by moving find_node() into __init section. Because find_node() is only used by memblock_nid_range() which is only used by a __init add_node_ranges(). find_node() and memblock_nid_range() should also be inside __init section. Signed-off-by: Thomas Tai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2f02dcb673a4e25bf27c114ad2753bc43dd7416a Author: Thomas Tai Date: Thu Nov 3 09:19:01 2016 -0700 sparc64: Fix find_node warning if numa node cannot be found [ Upstream commit 74a5ed5c4f692df2ff0a2313ea71e81243525519 ] When booting up LDOM, find_node() warns that a physical address doesn't match a NUMA node. WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835 find_node+0xf4/0x120 find_node: A physical address doesn't match a NUMA node rule. Some physical memory will be owned by node 0.Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4 Call Trace: [0000000000468ba0] __warn+0xc0/0xe0 [0000000000468c74] warn_slowpath_fmt+0x34/0x60 [00000000004592f4] find_node+0xf4/0x120 [0000000000dd0774] add_node_ranges+0x38/0xe4 [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4 [0000000000dd0e9c] bootmem_init+0xb8/0x160 [0000000000dd174c] paging_init+0x808/0x8fc [0000000000dcb0d0] setup_arch+0x2c8/0x2f0 [0000000000dc68a0] start_kernel+0x48/0x424 [0000000000dcb374] start_early_boot+0x27c/0x28c [0000000000a32c08] tlb_fixup_done+0x4c/0x64 [0000000000027f08] 0x27f08 It is because linux use an internal structure node_masks[] to keep the best memory latency node only. However, LDOM mdesc can contain single latency-group with multiple memory latency nodes. If the address doesn't match the best latency node within node_masks[], it should check for an alternative via mdesc. The warning message should only be printed if the address doesn't match any node_masks[] nor within mdesc. To minimize the impact of searching mdesc every time, the last matched mask and index is stored in a variable. Signed-off-by: Thomas Tai Reviewed-by: Chris Hyser Reviewed-by: Liam Merwick Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ad02ec7d37a26ec051c94a7a5f696bb05a08c8f2 Author: Alexander Duyck Date: Thu Dec 1 07:27:57 2016 -0500 ipv4: Drop suffix update from resize code [ Upstream commit a52ca62c4a6771028da9c1de934cdbcd93d54bb4 ] It has been reported that update_suffix can be expensive when it is called on a large node in which most of the suffix lengths are the same. The time required to add 200K entries had increased from around 3 seconds to almost 49 seconds. In order to address this we need to move the code for updating the suffix out of resize and instead just have it handled in the cases where we are pushing a node that increases the suffix length, or will decrease the suffix length. Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length") Reported-by: Robert Shearman Signed-off-by: Alexander Duyck Reviewed-by: Robert Shearman Tested-by: Robert Shearman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0b1c601d367f7316470bcad95fcb8185ac53111a Author: Alexander Duyck Date: Thu Dec 1 07:27:52 2016 -0500 ipv4: Drop leaf from suffix pull/push functions [ Upstream commit 1a239173cccff726b60ac6a9c79ae4a1e26cfa49 ] It wasn't necessary to pass a leaf in when doing the suffix updates so just drop it. Instead just pass the suffix and work with that. Since we dropped the leaf there is no need to include that in the name so the names are updated to node_push_suffix and node_pull_suffix. Finally I noticed that the logic for pulling the suffix length back actually had some issues. Specifically it would stop prematurely if there was a longer suffix, but it was not as long as the original suffix. I updated the code to address that in node_pull_suffix. Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length") Suggested-by: Robert Shearman Signed-off-by: Alexander Duyck Reviewed-by: Robert Shearman Tested-by: Robert Shearman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cd8a6c0e95bf59003c131f442e7eafded396627e Author: Alexander Duyck Date: Tue Nov 15 05:46:12 2016 -0500 ipv4: Fix memory leak in exception case for splitting tries [ Upstream commit 3114cdfe66c156345b0ae34e2990472f277e0c1b ] Fix a small memory leak that can occur where we leak a fib_alias in the event of us not being able to insert it into the local table. Fixes: 0ddcf43d5d4a0 ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet Signed-off-by: Alexander Duyck Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a8780378b39e43452e750fb4d7715b06abf9f032 Author: Alexander Duyck Date: Tue Nov 15 05:46:06 2016 -0500 ipv4: Restore fib_trie_flush_external function and fix call ordering [ Upstream commit 3b7093346b326e5d3590c7d49f6aefe6fa5b2c9a, the FIB offload removal didn't occur in 4.8 so that part of this patch isn't here. However we still need to fib_unmerge() bits. ] The patch that removed the FIB offload infrastructure was a bit too aggressive and also removed code needed to clean up us splitting the table if additional rules were added. Specifically the function fib_trie_flush_external was called at the end of a new rule being added to flush the foreign trie entries from the main trie. I updated the code so that we only call fib_trie_flush_external on the main table so that we flush the entries for local from main. This way we don't call it for every rule change which is what was happening previously. Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure") Reported-by: Eric Dumazet Cc: Jiri Pirko Signed-off-by: Alexander Duyck Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5ff5e5c06c25b91335cc32e25315dbb450b71838 Author: Kees Cook Date: Mon Dec 5 10:34:38 2016 -0800 net: ping: check minimum size on ICMP header length [ Upstream commit 0eab121ef8750a5c8637d51534d5e9143fb0633f ] Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there was no check that the iovec contained enough bytes for an ICMP header, and the read loop would walk across neighboring stack contents. Since the iov_iter conversion, bad arguments are noticed, but the returned error is EFAULT. Returning EINVAL is a clearer error and also solves the problem prior to v3.19. This was found using trinity with KASAN on v3.18: BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0 Read of size 8 by task trinity-c2/9623 page:ffffffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G BU 3.18.0-dirty #15 Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT) Call trace: [] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90 [] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171 [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x7c/0xd0 lib/dump_stack.c:50 [< inline >] print_address_description mm/kasan/report.c:147 [< inline >] kasan_report_error mm/kasan/report.c:236 [] kasan_report+0x380/0x4b8 mm/kasan/report.c:259 [< inline >] check_memory_region mm/kasan/kasan.c:264 [] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507 [] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15 [< inline >] memcpy_from_msg include/linux/skbuff.h:2667 [] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674 [] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714 [] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749 [< inline >] __sock_sendmsg_nosec net/socket.c:624 [< inline >] __sock_sendmsg net/socket.c:632 [] sock_sendmsg+0x124/0x164 net/socket.c:643 [< inline >] SYSC_sendto net/socket.c:1797 [] SyS_sendto+0x178/0x1d8 net/socket.c:1761 CVE-2016-8399 Reported-by: Qidan He Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f818e5d86aef49c067296d29f1e277c7ee1713e8 Author: Eric Dumazet Date: Fri Dec 2 09:44:53 2016 -0800 net: avoid signed overflows for SO_{SND|RCV}BUFFORCE [ Upstream commit b98b0bc8c431e3ceb4b26b0dfc8db509518fb290 ] CAP_NET_ADMIN users should not be allowed to set negative sk_sndbuf or sk_rcvbuf values, as it can lead to various memory corruptions, crashes, OOM... Note that before commit 82981930125a ("net: cleanups in sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF and SO_RCVBUF were vulnerable. This needs to be backported to all known linux kernels. Again, many thanks to syzkaller team for discovering this gem. Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bfecf9017f8bc373d2982edb81c11f46536ecacf Author: Sabrina Dubroca Date: Fri Dec 2 16:49:29 2016 +0100 geneve: avoid use-after-free of skb->data [ Upstream commit 5b01014759991887b1e450c9def01e58c02ab81b ] geneve{,6}_build_skb can end up doing a pskb_expand_head(), which makes the ip_hdr(skb) reference we stashed earlier stale. Since it's only needed as an argument to ip_tunnel_ecn_encap(), move this directly in the function call. Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths") Signed-off-by: Sabrina Dubroca Reviewed-by: John W. Linville Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4daa2c73eb05aaa93a4409823e2861beb3eb1ed9 Author: Michal Kubeček Date: Fri Dec 2 09:33:41 2016 +0100 tipc: check minimum bearer MTU [ Upstream commit 3de81b758853f0b29c61e246679d20b513c4cfec ] Qian Zhang (张谦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek Reported-by: Qian Zhang (张谦) Acked-by: Ying Xue Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1ff3209a21c8d88bc477259b70e3a9acf692ae4d Author: Chris Brandt Date: Thu Dec 1 13:32:14 2016 -0500 sh_eth: remove unchecked interrupts for RZ/A1 [ Upstream commit 33d446dbba4d4d6a77e1e900d434fa99e0f02c86 ] When streaming a lot of data and the RZ/A1 can't keep up, some status bits will get set that are not being checked or cleared which cause the following messages and the Ethernet driver to stop working. This patch fixes that issue. irq 21: nobody cared (try booting with the "irqpoll" option) handlers: [] sh_eth_interrupt Disabling IRQ #21 Fixes: db893473d313a4ad ("sh_eth: Add support for r7s72100") Signed-off-by: Chris Brandt Acked-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bbf913d774b71a3198dd2d2c83b4955b55ed72ba Author: Florian Fainelli Date: Thu Dec 1 09:45:45 2016 -0800 net: bcmgenet: Utilize correct struct device for all DMA operations [ Upstream commit 8c4799ac799665065f9bf1364fd71bf4f7dc6a4a ] __bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the same struct device during unmap that was used for the map operation, which makes DMA-API debugging warn about it. Fix this by always using &priv->pdev->dev throughout the driver, using an identical device reference for all map/unmap calls. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit accb7c99fc0f496ec17ce00d291db8ce1d2e9a13 Author: Kristian Evensen Date: Thu Dec 1 14:23:17 2016 +0100 cdc_ether: Fix handling connection notification [ Upstream commit d5c83d0d1d83b3798c71e0c8b7c3624d39c91d88 ] Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling") introduced a work-around in usbnet_cdc_status() for devices that exported cdc carrier on twice on connect. Before the commit, this behavior caused the link state to be incorrect. It was assumed that all CDC Ethernet devices would either export this behavior, or send one off and then one on notification (which seems to be the default behavior). Unfortunately, it turns out multiple devices sends a connection notification multiple times per second (via an interrupt), even when connection state does not change. This has been observed with several different USB LAN dongles (at least), for example 13b1:0041 (Linksys). After bfe9b9d2df66, the link state has been set as down and then up for each notification. This has caused a flood of Netlink NEWLINK messages and syslog to be flooded with messages similar to: cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped This commit fixes the behavior by reverting usbnet_cdc_status() to how it was before bfe9b9d2df66. The work-around has been moved to a separate status-function which is only called when a known, affect device is detected. v1->v2: * Do not open-code netif_carrier_ok() (thanks Henning Schild). * Call netif_carrier_off() instead of usb_link_change(). This prevents calling schedule_work() twice without giving the work queue a chance to be processed (thanks Bjørn Mork). Fixes: bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling") Reported-by: Henning Schild Signed-off-by: Kristian Evensen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 34457543e20375f09af9db214c387079d94f2a5f Author: Artem Savkov Date: Thu Dec 1 14:06:04 2016 +0100 ip6_offload: check segs for NULL in ipv6_gso_segment. [ Upstream commit 6b6ebb6b01c873d0cfe3449e8a1219ee6e5fc022 ] segs needs to be checked for being NULL in ipv6_gso_segment() before calling skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference: [ 97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 97.819112] IP: [] ipv6_gso_segment+0x119/0x2f0 [ 97.825214] PGD 0 [ 97.827047] [ 97.828540] Oops: 0000 [#1] SMP [ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod [ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1 [ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010 [ 97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000 [ 97.947720] RIP: 0010:[] [] ipv6_gso_segment+0x119/0x2f0 [ 97.956251] RSP: 0018:ffff88012fc43a10 EFLAGS: 00010207 [ 97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594 [ 97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000 [ 97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000 [ 97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3 [ 97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000 [ 97.997198] FS: 0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000 [ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0 [ 98.018149] Stack: [ 98.020157] 00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e [ 98.027584] 0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3 [ 98.035010] ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98 [ 98.042437] Call Trace: [ 98.044879] [ 98.046803] [] ? tg3_start_xmit+0x84a/0xd60 [tg3] [ 98.053156] [] skb_mac_gso_segment+0xb0/0x130 [ 98.059158] [] __skb_gso_segment+0x73/0x110 [ 98.064985] [] validate_xmit_skb+0x12d/0x2b0 [ 98.070899] [] validate_xmit_skb_list+0x42/0x70 [ 98.077073] [] sch_direct_xmit+0xd0/0x1b0 [ 98.082726] [] __dev_queue_xmit+0x486/0x690 [ 98.088554] [] ? cpumask_next_and+0x35/0x50 [ 98.094380] [] dev_queue_xmit+0x10/0x20 [ 98.099863] [] br_dev_queue_push_xmit+0xa7/0x170 [bridge] [ 98.106907] [] br_forward_finish+0x41/0xc0 [bridge] [ 98.113430] [] ? nf_iterate+0x52/0x60 [ 98.118735] [] ? nf_hook_slow+0x6b/0xc0 [ 98.124216] [] __br_forward+0x14c/0x1e0 [bridge] [ 98.130480] [] ? br_dev_queue_push_xmit+0x170/0x170 [bridge] [ 98.137785] [] br_forward+0x9d/0xb0 [bridge] [ 98.143701] [] br_handle_frame_finish+0x267/0x560 [bridge] [ 98.150834] [] br_handle_frame+0x174/0x2f0 [bridge] [ 98.157355] [] ? sched_clock+0x9/0x10 [ 98.162662] [] ? sched_clock_cpu+0x72/0xa0 [ 98.168403] [] __netif_receive_skb_core+0x1e5/0xa20 [ 98.174926] [] ? timerqueue_add+0x59/0xb0 [ 98.180580] [] __netif_receive_skb+0x18/0x60 [ 98.186494] [] process_backlog+0x95/0x140 [ 98.192145] [] net_rx_action+0x16d/0x380 [ 98.197713] [] __do_softirq+0xd1/0x283 [ 98.203106] [] do_softirq_own_stack+0x1c/0x30 [ 98.209107] [ 98.211029] [] do_softirq+0x50/0x60 [ 98.216166] [] netif_rx_ni+0x33/0x80 [ 98.221386] [] tun_get_user+0x487/0x7f0 [tun] [ 98.227388] [] tun_sendmsg+0x4b/0x60 [tun] [ 98.233129] [] handle_tx+0x282/0x540 [vhost_net] [ 98.239392] [] handle_tx_kick+0x15/0x20 [vhost_net] [ 98.245916] [] vhost_worker+0x9e/0xf0 [vhost] [ 98.251919] [] ? vhost_umem_alloc+0x40/0x40 [vhost] [ 98.258440] [] ? do_syscall_64+0x67/0x180 [ 98.264094] [] kthread+0xd9/0xf0 [ 98.268965] [] ? kthread_park+0x60/0x60 [ 98.274444] [] ret_from_fork+0x25/0x30 [ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66 [ 98.299425] RIP [] ipv6_gso_segment+0x119/0x2f0 [ 98.305612] RSP [ 98.309094] CR2: 00000000000000cc [ 98.312406] ---[ end trace 726a2c7a2d2d78d0 ]--- Signed-off-by: Artem Savkov Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cef222d40f2e22e5938bac009bb5d8b526cc44ef Author: Philip Pettersson Date: Wed Nov 30 14:55:36 2016 -0800 packet: fix race condition in packet_set_ring [ Upstream commit 84ac7260236a49c79eede91617700174c2c19b0c ] When packet_set_ring creates a ring buffer it will initialize a struct timer_list if the packet version is TPACKET_V3. This value can then be raced by a different thread calling setsockopt to set the version to TPACKET_V1 before packet_set_ring has finished. This leads to a use-after-free on a function pointer in the struct timer_list when the socket is closed as the previously initialized timer will not be deleted. The bug is fixed by taking lock_sock(sk) in packet_setsockopt when changing the packet version while also taking the lock at the start of packet_set_ring. Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Philip Pettersson Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 17941a9d6409e03b41d5757b36cb8a273ce082d8 Author: Arnaldo Carvalho de Melo Date: Mon Nov 28 12:36:58 2016 -0300 GSO: Reload iph after pskb_may_pull [ Upstream commit a510887824171ad260cc4a2603396c6247fdd091 ] As it may get stale and lead to use after free. Acked-by: Eric Dumazet Cc: Alexander Duyck Cc: Andrey Konovalov Fixes: cbc53e08a793 ("GSO: Add GSO type for fixed IPv4 ID") Signed-off-by: Arnaldo Carvalho de Melo Acked-by: Alexander Duyck Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ff0d7874078d27f4b52275de997882e070ad96f4 Author: Eric Dumazet Date: Mon Nov 28 06:26:49 2016 -0800 net/dccp: fix use-after-free in dccp_invalid_packet [ Upstream commit 648f0c28df282636c0c8a7a19ca3ce5fc80a39c3 ] pskb_may_pull() can reallocate skb->head, we need to reload dh pointer in dccp_invalid_packet() or risk use after free. Bug found by Andrey Konovalov using syzkaller. Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 023cd33ece3751b5c265331f5ceb45e69dc62189 Author: Cyrille Pitchen Date: Mon Nov 28 14:40:55 2016 +0100 net: macb: fix the RX queue reset in macb_rx() [ Upstream commit a0b44eea372b449ef9744fb1d90491cc063289b8 ] On macb only (not gem), when a RX queue corruption was detected from macb_rx(), the RX queue was reset: during this process the RX ring buffer descriptor was initialized by macb_init_rx_ring() but we forgot to also set bp->rx_tail to 0. Indeed, when processing the received frames, bp->rx_tail provides the macb driver with the index in the RX ring buffer of the next buffer to process. So when the whole ring buffer is reset we must also reset bp->rx_tail so the driver is synchronized again with the hardware. Since macb_init_rx_ring() is called from many locations, currently from macb_rx() and macb_init_rings(), we'd rather add the "bp->rx_tail = 0;" line inside macb_init_rx_ring() than add the very same line after each call of this function. Without this fix, the rx queue is not reset properly to recover from queue corruption and connection drop may occur. Signed-off-by: Cyrille Pitchen Fixes: 9ba723b081a2 ("net: macb: remove BUG_ON() and reset the queue to handle RX errors") Acked-by: Nicolas Ferre Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 25d9b4bb64ea964769087fc5ae09aee9c838d759 Author: Herbert Xu Date: Mon Dec 5 15:28:21 2016 +0800 netlink: Do not schedule work from sk_destruct [ Upstream commit ed5d7788a934a4b6d6d025e948ed4da496b4f12e ] It is wrong to schedule a work from sk_destruct using the socket as the memory reserve because the socket will be freed immediately after the return from sk_destruct. Instead we should do the deferral prior to sk_free. This patch does just that. Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread") Signed-off-by: Herbert Xu Tested-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f5dad3473d83e7b45eee4e9cc017a65b904cc38c Author: Herbert Xu Date: Mon Nov 28 19:22:12 2016 +0800 netlink: Call cb->done from a worker thread [ Upstream commit 707693c8a498697aa8db240b93eb76ec62e30892 ] The cb->done interface expects to be called in process context. This was broken by the netlink RCU conversion. This patch fixes it by adding a worker struct to make the cb->done call where necessary. Fixes: 21e4902aea80 ("netlink: Lockless lookup with RCU grace...") Reported-by: Subash Abhinov Kasiviswanathan Signed-off-by: Herbert Xu Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 360d6a23e79d68c570a9346c1ad07854d1abb434 Author: Amir Vadai Date: Mon Nov 28 12:56:40 2016 +0200 net/sched: pedit: make sure that offset is valid [ Upstream commit 95c2027bfeda21a28eb245121e6a249f38d0788e ] Add a validation function to make sure offset is valid: 1. Not below skb head (could happen when offset is negative). 2. Validate both 'offset' and 'at'. Signed-off-by: Amir Vadai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit aa239369bdfa73290be0f743a22c637cfee1967b Author: Nikita Yushchenko Date: Mon Nov 28 09:48:48 2016 +0300 net: dsa: fix unbalanced dsa_switch_tree reference counting [ Upstream commit 7a99cd6e213685b78118382e6a8fed506c82ccb2 ] _dsa_register_switch() gets a dsa_switch_tree object either via dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref in returned object (resulting into caller not owning a reference), while later path does create a new object (resulting into caller owning a reference). The rest of _dsa_register_switch() assumes that it owns a reference, and calls dsa_put_dst(). This causes a memory breakage if first switch in the tree initialized successfully, but second failed to initialize. In particular, freed dsa_swith_tree object is left referenced by switch that was initialized, and later access to sysfs attributes of that switch cause OOPS. To fix, need to add kref_get() call to dsa_get_dst(). Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Nikita Yushchenko Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9a747927394998cab12a8462dbd735e8b1f9670d Author: Daniel Borkmann Date: Sun Nov 27 01:18:01 2016 +0100 net, sched: respect rcu grace period on cls destruction [ Upstream commit d936377414fadbafb4d17148d222fe45ca5442d4 ] Roi reported a crash in flower where tp->root was NULL in ->classify() callbacks. Reason is that in ->destroy() tp->root is set to NULL via RCU_INIT_POINTER(). It's problematic for some of the classifiers, because this doesn't respect RCU grace period for them, and as a result, still outstanding readers from tc_classify() will try to blindly dereference a NULL tp->root. The tp->root object is strictly private to the classifier implementation and holds internal data the core such as tc_ctl_tfilter() doesn't know about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root is only checked for NULL in ->get() callback, but nowhere else. This is misleading and seemed to be copied from old classifier code that was not cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic: fix NULL pointer dereference") moved tp->root initialization into ->init() routine, where before it was part of ->change(), so ->get() had to deal with tp->root being NULL back then, so that was indeed a valid case, after d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg() in packet classifiers"); but the NULLifying was reintroduced with the RCUification, but it's not correct for every classifier implementation. In the cases that are fixed here with one exception of cls_cgroup, tp->root object is allocated and initialized inside ->init() callback, which is always performed at a point in time after we allocate a new tp, which means tp and thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()). Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy() handler, same for the tp which is kfree_rcu()'ed right when we return from ->destroy() in tcf_destroy(). This means, the head object's lifetime for such classifiers is always tied to the tp lifetime. The RCU callback invocation for the two kfree_rcu() could be out of order, but that's fine since both are independent. Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here means that 1) we don't need a useless NULL check in fast-path and, 2) that outstanding readers of that tp in tc_classify() can still execute under respect with RCU grace period as it is actually expected. Things that haven't been touched here: cls_fw and cls_route. They each handle tp->root being NULL in ->classify() path for historic reasons, so their ->destroy() implementation can stay as is. If someone actually cares, they could get cleaned up at some point to avoid the test in fast path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a !head should anyone actually be using/testing it, so it at least aligns with cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable destruction (to a sleepable context) after RCU grace period as concurrent readers might still access it. (Note that in this case we need to hold module reference to keep work callback address intact, since we only wait on module unload for all call_rcu()s to finish.) This fixes one race to bring RCU grace period guarantees back. Next step as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") to get the order of unlinking the tp in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving RCU_INIT_POINTER() before tcf_destroy() and let the notification for removal be done through the prior ->delete() callback. Both are independant issues. Once we have that right, we can then clean tp->root up for a number of classifiers by not making them RCU pointers, which requires a new callback (->uninit) that is triggered from tp's RCU callback, where we just kfree() tp->root from there. Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf") Fixes: 9888faefe132 ("net: sched: cls_basic use RCU") Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU") Fixes: 77b9900ef53a ("tc: introduce Flower classifier") Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier") Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU") Reported-by: Roi Dayan Signed-off-by: Daniel Borkmann Cc: Cong Wang Cc: John Fastabend Cc: Roi Dayan Cc: Jiri Pirko Acked-by: John Fastabend Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a9437ebc69f5962670153ed653c6c9499aee643d Author: Florian Fainelli Date: Tue Nov 22 11:40:58 2016 -0800 net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change [ Upstream commit 76da8706d90d8641eeb9b8e579942ed80b6c0880 ] In case the link change and EEE is enabled or disabled, always try to re-negotiate this with the link partner. Fixes: 450b05c15f9c ("net: dsa: bcm_sf2: add support for controlling EEE") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ddf053431efe305992af23ae733927d0e9e6b461 Author: Eric Dumazet Date: Tue Nov 22 09:06:45 2016 -0800 udplite: call proper backlog handlers [ Upstream commit 30c7be26fd3587abcb69587f781098e3ca2d565b ] In commits 93821778def10 ("udp: Fix rcv socket locking") and f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite was forgotten. This leads to crashes if UDPlite header is pulled twice, which happens starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Bug found by syzkaller team, thanks a lot guys ! Note that backlog use in UDP/UDPlite is scheduled to be removed starting from linux-4.10, so this patch is only needed up to linux-4.9 Fixes: 93821778def1 ("udp: Fix rcv socket locking") Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Cc: Benjamin LaHaise Cc: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7b0aa75be3abfdef8ed39377b4f7f925d8c04ab5 Author: Paolo Abeni Date: Tue Nov 22 16:57:40 2016 +0100 ipv6: bump genid when the IFA_F_TENTATIVE flag is clear [ Upstream commit 764d3be6e415b40056834bfd29b994dc3f837606 ] When an ipv6 address has the tentative flag set, it can't be used as source for egress traffic, while the associated route, if any, can be looked up and even stored into some dst_cache. In the latter scenario, the source ipv6 address selected and stored in the cache is most probably wrong (e.g. with link-local scope) and the entity using the dst_cache will experience lack of ipv6 connectivity until said cache is cleared or invalidated. Overall this may cause lack of connectivity over most IPv6 tunnels (comprising geneve and vxlan), if the first egress packet reaches the tunnel before the DaD is completed for the used ipv6 address. This patch bumps a new genid after that the IFA_F_TENTATIVE flag is cleared, so that dst_cache will be invalidated on next lookup and ipv6 connectivity restored. Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device") Fixes: 468dfffcd762 ("geneve: add dst caching support") Acked-by: Hannes Frederic Sowa Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 58c8cc33de6caa707cec2008782f918432668962 Author: Zhang Shengju Date: Sat Nov 19 23:28:32 2016 +0800 rtnl: fix the loop index update error in rtnl_dump_ifinfo() [ Upstream commit 3f0ae05d6fea0ed5b19efdbc9c9f8e02685a3af3 ] If the link is filtered out, loop index should also be updated. If not, loop index will not be correct. Fixes: dc599f76c22b0 ("net: Add support for filtering link dump by master device and kind") Signed-off-by: Zhang Shengju Acked-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 84df56749f48aed274bbfd2db6b6fb9dd540ff6b Author: Guillaume Nault Date: Fri Nov 18 22:13:00 2016 +0100 l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() [ Upstream commit 32c231164b762dddefa13af5a0101032c70b50ef ] Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind(). Without lock, a concurrent call could modify the socket flags between the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way, a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it would then leave a stale pointer there, generating use-after-free errors when walking through the list or modifying adjacent entries. BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8 Write of size 8 by task syz-executor/10987 CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0 ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0 Call Trace: [] dump_stack+0xb3/0x118 lib/dump_stack.c:15 [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156 [< inline >] print_address_description mm/kasan/report.c:194 [] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283 [< inline >] kasan_report mm/kasan/report.c:303 [] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329 [< inline >] __write_once_size ./include/linux/compiler.h:249 [< inline >] __hlist_del ./include/linux/list.h:622 [< inline >] hlist_del_init ./include/linux/list.h:637 [] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239 [] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415 [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422 [] sock_release+0x8d/0x1d0 net/socket.c:570 [] sock_close+0x16/0x20 net/socket.c:1017 [] __fput+0x28c/0x780 fs/file_table.c:208 [] ____fput+0x15/0x20 fs/file_table.c:244 [] task_work_run+0xf9/0x170 [] do_exit+0x85e/0x2a00 [] do_group_exit+0x108/0x330 [] get_signal+0x617/0x17a0 kernel/signal.c:2307 [] do_signal+0x7f/0x18f0 [] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259 [] entry_SYSCALL_64_fastpath+0xc4/0xc6 Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448 Allocated: PID = 10987 [ 1116.897025] [] save_stack_trace+0x16/0x20 [ 1116.897025] [] save_stack+0x46/0xd0 [ 1116.897025] [] kasan_kmalloc+0xad/0xe0 [ 1116.897025] [] kasan_slab_alloc+0x12/0x20 [ 1116.897025] [< inline >] slab_post_alloc_hook mm/slab.h:417 [ 1116.897025] [< inline >] slab_alloc_node mm/slub.c:2708 [ 1116.897025] [< inline >] slab_alloc mm/slub.c:2716 [ 1116.897025] [] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721 [ 1116.897025] [] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326 [ 1116.897025] [] sk_alloc+0x38/0xae0 net/core/sock.c:1388 [ 1116.897025] [] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182 [ 1116.897025] [] __sock_create+0x37b/0x640 net/socket.c:1153 [ 1116.897025] [< inline >] sock_create net/socket.c:1193 [ 1116.897025] [< inline >] SYSC_socket net/socket.c:1223 [ 1116.897025] [] SyS_socket+0xef/0x1b0 net/socket.c:1203 [ 1116.897025] [] entry_SYSCALL_64_fastpath+0x23/0xc6 Freed: PID = 10987 [ 1116.897025] [] save_stack_trace+0x16/0x20 [ 1116.897025] [] save_stack+0x46/0xd0 [ 1116.897025] [] kasan_slab_free+0x71/0xb0 [ 1116.897025] [< inline >] slab_free_hook mm/slub.c:1352 [ 1116.897025] [< inline >] slab_free_freelist_hook mm/slub.c:1374 [ 1116.897025] [< inline >] slab_free mm/slub.c:2951 [ 1116.897025] [] kmem_cache_free+0xc8/0x330 mm/slub.c:2973 [ 1116.897025] [< inline >] sk_prot_free net/core/sock.c:1369 [ 1116.897025] [] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444 [ 1116.897025] [] sk_destruct+0x44/0x80 net/core/sock.c:1452 [ 1116.897025] [] __sk_free+0x53/0x220 net/core/sock.c:1460 [ 1116.897025] [] sk_free+0x23/0x30 net/core/sock.c:1471 [ 1116.897025] [] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589 [ 1116.897025] [] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243 [ 1116.897025] [] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415 [ 1116.897025] [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422 [ 1116.897025] [] sock_release+0x8d/0x1d0 net/socket.c:570 [ 1116.897025] [] sock_close+0x16/0x20 net/socket.c:1017 [ 1116.897025] [] __fput+0x28c/0x780 fs/file_table.c:208 [ 1116.897025] [] ____fput+0x15/0x20 fs/file_table.c:244 [ 1116.897025] [] task_work_run+0xf9/0x170 [ 1116.897025] [] do_exit+0x85e/0x2a00 [ 1116.897025] [] do_group_exit+0x108/0x330 [ 1116.897025] [] get_signal+0x617/0x17a0 kernel/signal.c:2307 [ 1116.897025] [] do_signal+0x7f/0x18f0 [ 1116.897025] [] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156 [ 1116.897025] [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [ 1116.897025] [] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259 [ 1116.897025] [] entry_SYSCALL_64_fastpath+0xc4/0xc6 Memory state around the buggy address: ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table. Fixes: c51ce49735c1 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case") Reported-by: Baozeng Ding Reported-by: Andrey Konovalov Tested-by: Baozeng Ding Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7f8b251a09859816e52a21b3f77eeb0dc579cd1a Author: Sabrina Dubroca Date: Fri Nov 18 15:50:39 2016 +0100 rtnetlink: fix FDB size computation [ Upstream commit f82ef3e10a870acc19fa04f80ef5877eaa26f41e ] Add missing NDA_VLAN attribute's size. Fixes: 1e53d5bb8878 ("net: Pass VLAN ID to rtnl_fdb_notify.") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c39caa8f80c0f927963f4b8470a8b4ae2d2064ee Author: WANG Cong Date: Thu Nov 17 15:55:26 2016 -0800 af_unix: conditionally use freezable blocking calls in read [ Upstream commit 06a77b07e3b44aea2b3c0e64de420ea2cfdcbaa9 ] Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read") converts schedule_timeout() to its freezable version, it was probably correct at that time, but later, commit 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets") breaks the strong requirement for a freezable sleep, according to commit 0f9548ca1091: We shouldn't try_to_freeze if locks are held. Holding a lock can cause a deadlock if the lock is later acquired in the suspend or hibernate path (e.g. by dpm). Holding a lock can also cause a deadlock in the case of cgroup_freezer if a lock is held inside a frozen cgroup that is later acquired by a process outside that group. The pipe_lock is still held at that point. So use freezable version only for the recvmsg call path, avoid impact for Android. Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets") Reported-by: Dmitry Vyukov Cc: Tejun Heo Cc: Colin Cross Cc: Rafael J. Wysocki Cc: Hannes Frederic Sowa Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bdc5c63e4b38eba5fa4765bc24234cb6554e6430 Author: Jeremy Linton Date: Thu Nov 17 09:14:25 2016 -0600 net: sky2: Fix shutdown crash [ Upstream commit 06ba3b2133dc203e1e9bc36cee7f0839b79a9e8b ] The sky2 frequently crashes during machine shutdown with: sky2_get_stats+0x60/0x3d8 [sky2] dev_get_stats+0x68/0xd8 rtnl_fill_stats+0x54/0x140 rtnl_fill_ifinfo+0x46c/0xc68 rtmsg_ifinfo_build_skb+0x7c/0xf0 rtmsg_ifinfo.part.22+0x3c/0x70 rtmsg_ifinfo+0x50/0x5c netdev_state_change+0x4c/0x58 linkwatch_do_dev+0x50/0x88 __linkwatch_run_queue+0x104/0x1a4 linkwatch_event+0x30/0x3c process_one_work+0x140/0x3e0 worker_thread+0x60/0x44c kthread+0xdc/0xf0 ret_from_fork+0x10/0x50 This is caused by the sky2 being called after it has been shutdown. A previous thread about this can be found here: https://lkml.org/lkml/2016/4/12/410 An alternative fix is to assure that IFF_UP gets cleared by calling dev_close() during shutdown. This is similar to what the bnx2/tg3/xgene and maybe others are doing to assure that the driver isn't being called following _shutdown(). Signed-off-by: Jeremy Linton Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a75684ab7a128a88918d3bebada03b0348903952 Author: Paolo Abeni Date: Wed Nov 16 16:26:46 2016 +0100 ip6_tunnel: disable caching when the traffic class is inherited [ Upstream commit b5c2d49544e5930c96e2632a7eece3f4325a1888 ] If an ip6 tunnel is configured to inherit the traffic class from the inner header, the dst_cache must be disabled or it will foul the policy routing. The issue is apprently there since at leat Linux-2.6.12-rc2. Reported-by: Liam McBirnie Cc: Liam McBirnie Acked-by: Hannes Frederic Sowa Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1b079d5b9fc18b8af94f415f637b13be77ff04b5 Author: WANG Cong Date: Wed Nov 16 10:27:02 2016 -0800 net: check dead netns for peernet2id_alloc() [ Upstream commit cfc44a4d147ea605d66ccb917cc24467d15ff867 ] Andrei reports we still allocate netns ID from idr after we destroy it in cleanup_net(). cleanup_net(): ... idr_destroy(&net->netns_ids); ... list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list); -> rollback_registered_many() -> rtmsg_ifinfo_build_skb() -> rtnl_fill_ifinfo() -> peernet2id_alloc() After that point we should not even access net->netns_ids, we should check the death of the current netns as early as we can in peernet2id_alloc(). For net-next we can consider to avoid sending rtmsg totally, it is a good optimization for netns teardown path. Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids") Reported-by: Andrei Vagin Cc: Nicolas Dichtel Signed-off-by: Cong Wang Acked-by: Andrei Vagin Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 65dfc8b4547fdc4077ab563359767ea5ace020f5 Author: Florian Fainelli Date: Tue Nov 15 15:58:15 2016 -0800 net: dsa: b53: Fix VLAN usage and how we treat CPU port [ Upstream commit e47112d9d6009bf6b7438cedc0270316d6b0370d ] We currently have a fundamental problem in how we treat the CPU port and its VLAN membership. As soon as a second VLAN is configured to be untagged, the CPU automatically becomes untagged for that VLAN as well, and yet, we don't gracefully make sure that the CPU becomes tagged in the other VLANs it could be a member of. This results in only one VLAN being effectively usable from the CPU's perspective. Instead of having some pretty complex logic which tries to maintain the CPU port's default VLAN and its untagged properties, just do something very simple which consists in neither altering the CPU port's PVID settings, nor its untagged settings: - whenever a VLAN is added, the CPU is automatically a member of this VLAN group, as a tagged member - PVID settings for downstream ports do not alter the CPU port's PVID since it now is part of all VLANs in the system This means that a typical example where e.g: LAN ports are in VLAN1, and WAN port is in VLAN2, now require having two VLAN interfaces for the host to properly terminate and send traffic from/to. Fixes: Fixes: a2482d2ce349 ("net: dsa: b53: Plug in VLAN support") Reported-by: Hartmut Knaack Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f959eb5074396852a4f366de3bee4793149f4c63 Author: Eric Dumazet Date: Tue Nov 15 22:24:12 2016 -0800 virtio-net: add a missing synchronize_net() [ Upstream commit 963abe5c8a0273a1cf5913556da1b1189de0e57a ] It seems many drivers do not respect napi_hash_del() contract. When napi_hash_del() is used before netif_napi_del(), an RCU grace period is needed before freeing NAPI object. Fixes: 91815639d880 ("virtio-net: rx busy polling support") Signed-off-by: Eric Dumazet Cc: Jason Wang Cc: Michael S. Tsirkin Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8070f33be6c4f7a947db40658a116ef135b54e74 Author: Eric Dumazet Date: Mon Nov 14 16:28:42 2016 -0800 gro_cells: mark napi struct as not busy poll candidates [ Upstream commit e88a2766143a27bfe6704b4493b214de4094cf29 ] Rolf Neugebauer reported very long delays at netns dismantle. Eric W. Biederman was kind enough to look at this problem and noticed synchronize_net() occurring from netif_napi_del() that was added in linux-4.5 Busy polling makes no sense for tunnels NAPI. If busy poll is used for sessions over tunnels, the poller will need to poll the physical device queue anyway. netif_tx_napi_add() could be used here, but function name is misleading, and renaming it is not stable material, so set NAPI_STATE_NO_BUSY_POLL bit directly. This will avoid inserting gro_cells napi structures in napi_hash[] and avoid the problematic synchronize_net() (per possible cpu) that Rolf reported. Fixes: 93d05d4a320c ("net: provide generic busy polling to all NAPI drivers") Signed-off-by: Eric Dumazet Reported-by: Rolf Neugebauer Reported-by: Eric W. Biederman Acked-by: Cong Wang Tested-by: Rolf Neugebauer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman