1. 18 Feb, 2017 22 commits
    • tcp: don't annotate mark on control socket from tcp_v6_send_response() · 0d4c19ee
      Pablo Neira authored
      commit 92e55f412cffd016cc245a74278cb4d7b89bb3bc upstream.
      
      Unlike ipv4, this control socket is shared by all cpus so we cannot use
      it as scratchpad area to annotate the mark that we pass to ip6_xmit().
      
      Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
      family caches the flowi6 structure in the sctp_transport structure, so
      we cannot use to carry the mark unless we later on reset it back, which
      I discarded since it looks ugly to me.
      
      Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled")
      Suggested-by: 's avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • tcp: fix mark propagation with fwmark_reflect enabled · 7c4c32a2
      Pau Espin Pedrol authored
      commit bf99b4ded5f8a4767dbb9d180626f06c51f9881f upstream.
      
      Otherwise, RST packets generated by the TCP stack for non-existing
      sockets always have mark 0.
      The mark from the original packet is assigned to the netns_ipv4/6
      socket used to send the response so that it can get copied into the
      response skb when the socket sends it.
      
      Fixes: e110861f ("net: add a sysctl to reflect the fwmark on replies")
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Signed-off-by: 's avatarPau Espin Pedrol <pau.espin@tessares.net>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() · 16a3fbe5
      Hangbin Liu authored
      [ Upstream commit 9c8bb163ae784be4f79ae504e78c862806087c54 ]
      
      In function igmpv3/mld_add_delrec() we allocate pmc and put it in
      idev->mc_tomb, so we should free it when we don't need it in del_delrec().
      But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.
      
      Fixes: 24803f38 ("igmp: do not remove igmp souce list info when ...")
      Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when ...")
      Reported-by: 's avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: 's avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • mld: do not remove mld souce list info when set link down · 53a76d63
      Hangbin Liu authored
      [ Upstream commit 1666d49e1d416fcc2cce708242a52fe3317ea8ba ]
      
      This is an IPv6 version of commit 24803f38 ("igmp: do not remove igmp
      souce list..."). In mld_del_delrec(), we will restore back all source filter
      info instead of flush them.
      
      Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
      we should not remove source list info when set link down. Remove
      igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
      ipv6_mc_down().
      
      Also clear all source info after igmp6_group_dropped() instead of in it
      because ipv6_mc_down() will call igmp6_group_dropped().
      Signed-off-by: 's avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • l2tp: do not use udp_ioctl() · 5b1bb4cb
      Eric Dumazet authored
      [ Upstream commit 72fb96e7bdbbdd4421b0726992496531060f3636 ]
      
      udp_ioctl(), as its name suggests, is used by UDP protocols,
      but is also used by L2TP :(
      
      L2TP should use its own handler, because it really does not
      look the same.
      
      SIOCINQ for instance should not assume UDP checksum or headers.
      
      Thanks to Andrey and syzkaller team for providing the report
      and a nice reproducer.
      
      While crashes only happen on recent kernels (after commit
      7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
      probably needs to be backported to older kernels.
      
      Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
      Fixes: 85584672 ("udp: Fix udp_poll() and ioctl()")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: 's avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net: dsa: Do not destroy invalid network devices · 12758a28
      Florian Fainelli authored
      [ Upstream commit 382e1eea2d983cd2343482c6a638f497bb44a636 ]
      
      dsa_slave_create() can fail, and dsa_user_port_unapply() will properly check
      for the network device not being NULL before attempting to destroy it. We were
      not setting the slave network device as NULL if dsa_slave_create() failed, so
      we would later on be calling dsa_slave_destroy() on a now free'd and
      unitialized network device, causing crashes in dsa_slave_destroy().
      
      Fixes: 83c0afae ("net: dsa: Add new binding implementation")
      Signed-off-by: 's avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ping: fix a null pointer dereference · a700cf26
      WANG Cong authored
      [ Upstream commit 73d2c6678e6c3af7e7a42b1e78cd0211782ade32 ]
      
      Andrey reported a kernel crash:
      
        general protection fault: 0000 [#1] SMP KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        task: ffff880060048040 task.stack: ffff880069be8000
        RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
        RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
        RSP: 0018:ffff880069bef8b8 EFLAGS: 00010206
        RAX: dffffc0000000000 RBX: ffff880069befb90 RCX: 0000000000000000
        RDX: 0000000000000018 RSI: ffff880069befa30 RDI: 00000000000000c2
        RBP: ffff880069befbb8 R08: 0000000000000008 R09: 0000000000000000
        R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069befab0
        R13: ffff88006c624a80 R14: ffff880069befa70 R15: 0000000000000000
        FS:  00007f6f7c716700(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004a6f28 CR3: 000000003a134000 CR4: 00000000000006e0
        Call Trace:
         inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
         sock_sendmsg_nosec net/socket.c:635 [inline]
         sock_sendmsg+0xca/0x110 net/socket.c:645
         SYSC_sendto+0x660/0x810 net/socket.c:1687
         SyS_sendto+0x40/0x50 net/socket.c:1655
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because we miss a check for NULL pointer for skb_peek() when
      the queue is empty. Other places already have the same check.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: 's avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: 's avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: 's avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • packet: round up linear to header len · 82849541
      Willem de Bruijn authored
      [ Upstream commit 57031eb794906eea4e1c7b31dc1e2429c0af0c66 ]
      
      Link layer protocols may unconditionally pull headers, as Ethernet
      does in eth_type_trans. Ensure that the entire link layer header
      always lies in the skb linear segment. tpacket_snd has such a check.
      Extend this to packet_snd.
      
      Variable length link layer headers complicate the computation
      somewhat. Here skb->len may be smaller than dev->hard_header_len.
      
      Round up the linear length to be at least as long as the smallest of
      the two.
      Reported-by: 's avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: 's avatarWillem de Bruijn <willemb@google.com>
      Acked-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net: introduce device min_header_len · 6ebde312
      Willem de Bruijn authored
      [ Upstream commit 217e6fa24ce28ec87fca8da93c9016cb78028612 ]
      
      The stack must not pass packets to device drivers that are shorter
      than the minimum link layer header length.
      
      Previously, packet sockets would drop packets smaller than or equal
      to dev->hard_header_len, but this has false positives. Zero length
      payload is used over Ethernet. Other link layer protocols support
      variable length headers. Support for validation of these protocols
      removed the min length check for all protocols.
      
      Introduce an explicit dev->min_header_len parameter and drop all
      packets below this value. Initially, set it to non-zero only for
      Ethernet and loopback. Other protocols can follow in a patch to
      net-next.
      
      Fixes: 9ed988cd ("packet: validate variable length ll headers")
      Reported-by: 's avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: 's avatarWillem de Bruijn <willemb@google.com>
      Acked-by: 's avatarEric Dumazet <edumazet@google.com>
      Acked-by: 's avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • sit: fix a double free on error path · 4cd03621
      WANG Cong authored
      [ Upstream commit d7426c69a1942b2b9b709bf66b944ff09f561484 ]
      
      Dmitry reported a double free in sit_init_net():
      
        kernel BUG at mm/percpu.c:689!
        invalid opcode: 0000 [#1] SMP KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-20170206 #1
        Hardware name: Google Google Compute Engine/Google Compute Engine,
        BIOS Google 01/01/2011
        task: ffff8801c9cc27c0 task.stack: ffff88017d1d8000
        RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
        RSP: 0018:ffff88017d1df488 EFLAGS: 00010046
        RAX: 0000000000010000 RBX: 00000000000007c0 RCX: ffffc90002829000
        RDX: 0000000000010000 RSI: ffffffff81940efb RDI: ffff8801db841d94
        RBP: ffff88017d1df590 R08: dffffc0000000000 R09: 1ffffffff0bb3bdd
        R10: dffffc0000000000 R11: 00000000000135dd R12: ffff8801db841d80
        R13: 0000000000038e40 R14: 00000000000007c0 R15: 00000000000007c0
        FS:  00007f6ea608f700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000000002000aff8 CR3: 00000001c8d44000 CR4: 00000000001426f0
        DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
        Call Trace:
         free_percpu+0x212/0x520 mm/percpu.c:1264
         ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
         sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
         ops_init+0x10a/0x530 net/core/net_namespace.c:115
         setup_net+0x2ed/0x690 net/core/net_namespace.c:291
         copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
         create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
         unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
         SYSC_unshare kernel/fork.c:2281 [inline]
         SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because when tunnel->dst_cache init fails, we free dev->tstats
      once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
      redundant but its ndo_uinit() does not seem enough to clean up everything
      here. So avoid this by setting dev->tstats to NULL after the first free,
      at least for -net.
      Reported-by: 's avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: 's avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • sctp: avoid BUG_ON on sctp_wait_for_sndbuf · 00eff2eb
      Marcelo Ricardo Leitner authored
      [ Upstream commit 2dcab598484185dea7ec22219c76dcdd59e3cb90 ]
      
      Alexander Popov reported that an application may trigger a BUG_ON in
      sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
      waiting on it to queue more data and meanwhile another thread peels off
      the association being used by the first thread.
      
      This patch replaces the BUG_ON call with a proper error handling. It
      will return -EPIPE to the original sendmsg call, similarly to what would
      have been done if the association wasn't found in the first place.
      Acked-by: 's avatarAlexander Popov <alex.popov@linux.com>
      Signed-off-by: 's avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: 's avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • tcp: avoid infinite loop in tcp_splice_read() · 0f895f51
      Eric Dumazet authored
      [ Upstream commit ccf7abb93af09ad0868ae9033d1ca8108bdaec82 ]
      
      Splicing from TCP socket is vulnerable when a packet with URG flag is
      received and stored into receive queue.
      
      __tcp_splice_read() returns 0, and sk_wait_data() immediately
      returns since there is the problematic skb in queue.
      
      This is a nice way to burn cpu (aka infinite loop) and trigger
      soft lockups.
      
      Again, this gem was found by syzkaller tool.
      
      Fixes: 9c55e01c ("[TCP]: Splice receive support.")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ipv6: tcp: add a missing tcp_v6_restore_cb() · 1e340bb2
      Eric Dumazet authored
      [ Upstream commit ebf6c9cb23d7e56eec8575a88071dec97ad5c6e2 ]
      
      Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()
      
      A similar bug was fixed in commit 8ce48623 ("ipv6: tcp: restore
      IP6CB for pktoptions skbs"), but I missed another spot.
      
      tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ip6_gre: fix ip6gre_err() invalid reads · ae1768bb
      Eric Dumazet authored
      [ Upstream commit 7892032cfe67f4bde6fc2ee967e45a8fbaf33756 ]
      
      Andrey Konovalov reported out of bound accesses in ip6gre_err()
      
      If GRE flags contains GRE_KEY, the following expression
      *(((__be32 *)p) + (grehlen / 4) - 1)
      
      accesses data ~40 bytes after the expected point, since
      grehlen includes the size of IPv6 headers.
      
      Let's use a "struct gre_base_hdr *greh" pointer to make this
      code more readable.
      
      p[1] becomes greh->protocol.
      grhlen is the GRE header length.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • netlabel: out of bound access in cipso_v4_validate() · 66cdd434
      Eric Dumazet authored
      [ Upstream commit d71b7896886345c53ef1d84bda2bc758554f5d61 ]
      
      syzkaller found another out of bound access in ip_options_compile(),
      or more exactly in cipso_v4_validate()
      
      Fixes: 20e2a864 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
      Fixes: 446fda4f ("[NetLabel]: CIPSOv4 engine")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Acked-by: 's avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ipv4: keep skb->dst around in presence of IP options · f5b54446
      Eric Dumazet authored
      [ Upstream commit 34b2cef20f19c87999fff3da4071e66937db9644 ]
      
      Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
      is accessed.
      
      ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
      are present.
      
      We could refine the test to the presence of ts_needtime or srr,
      but IP options are not often used, so let's be conservative.
      
      Thanks to syzkaller team for finding this bug.
      
      Fixes: d826eb14 ("ipv4: PKTINFO doesnt need dst reference")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net: use a work queue to defer net_disable_timestamp() work · d5b6fd77
      Eric Dumazet authored
      [ Upstream commit 5fa8bbda38c668e56b0c6cdecced2eac2fe36dec ]
      
      Dmitry reported a warning [1] showing that we were calling
      net_disable_timestamp() -> static_key_slow_dec() from a non
      process context.
      
      Grabbing a mutex while holding a spinlock or rcu_read_lock()
      is not allowed.
      
      As Cong suggested, we now use a work queue.
      
      It is possible netstamp_clear() exits while netstamp_needed_deferred
      is not zero, but it is probably not worth trying to do better than that.
      
      netstamp_needed_deferred atomic tracks the exact number of deferred
      decrements.
      
      [1]
      [ INFO: suspicious RCU usage. ]
      4.10.0-rc5+ #192 Not tainted
      -------------------------------
      ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
      critical section!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 0
      2 locks held by syz-executor14/23111:
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
      include/net/sock.h:1454 [inline]
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
      rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
      include/linux/netfilter.h:201 [inline]
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
      __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
      
      stack backtrace:
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
       rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
       ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
      RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
      RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
      R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
      BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:752
      in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
      INFO: lockdep is turned off.
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      
      Fixes: b90e5794 ("net: dont call jump_label_dec from irq context")
      Suggested-by: 's avatarCong Wang <xiyou.wangcong@gmail.com>
      Reported-by: 's avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • tcp: fix 0 divide in __tcp_select_window() · ca876dff
      Eric Dumazet authored
      [ Upstream commit 06425c308b92eaf60767bc71d359f4cbc7a561f8 ]
      
      syszkaller fuzzer was able to trigger a divide by zero, when
      TCP window scaling is not enabled.
      
      SO_RCVBUF can be used not only to increase sk_rcvbuf, also
      to decrease it below current receive buffers utilization.
      
      If mss is negative or 0, just return a zero TCP window.
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarDmitry Vyukov  <dvyukov@google.com>
      Acked-by: 's avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() · e6fbace8
      Dan Carpenter authored
      [ Upstream commit 63117f09c768be05a0bf465911297dc76394f686 ]
      
      Casting is a high precedence operation but "off" and "i" are in terms of
      bytes so we need to have some parenthesis here.
      
      Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
      Signed-off-by: 's avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ipv6: fix ip6_tnl_parse_tlv_enc_lim() · a7fe4e5d
      Eric Dumazet authored
      [ Upstream commit fbfa743a9d2a0ffa24251764f10afc13eb21e739 ]
      
      This function suffers from multiple issues.
      
      First one is that pskb_may_pull() may reallocate skb->head,
      so the 'raw' pointer needs either to be reloaded or not used at all.
      
      Second issue is that NEXTHDR_DEST handling does not validate
      that the options are present in skb->data, so we might read
      garbage or access non existent memory.
      
      With help from Willem de Bruijn.
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net/sched: matchall: Fix configuration race · 6c8556f6
      Yotam Gigi authored
      [ Upstream commit fd62d9f5c575f0792f150109f1fd24a0d4b3f854 ]
      
      In the current version, the matchall internal state is split into two
      structs: cls_matchall_head and cls_matchall_filter. This makes little
      sense, as matchall instance supports only one filter, and there is no
      situation where one exists and the other does not. In addition, that led
      to some races when filter was deleted while packet was processed.
      
      Unify that two structs into one, thus simplifying the process of matchall
      creation and deletion. As a result, the new, delete and get callbacks have
      a dummy implementation where all the work is done in destroy and change
      callbacks, as was done in cls_cgroup.
      
      Fixes: bf3994d2 ("net/sched: introduce Match-all classifier")
      Reported-by: 's avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: 's avatarYotam Gigi <yotamg@mellanox.com>
      Acked-by: 's avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • can: Fix kernel panic at security_sock_rcv_skb · adf86d59
      Eric Dumazet authored
      [ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ]
      
      Zhang Yanmin reported crashes [1] and provided a patch adding a
      synchronize_rcu() call in can_rx_unregister()
      
      The main problem seems that the sockets themselves are not RCU
      protected.
      
      If CAN uses RCU for delivery, then sockets should be freed only after
      one RCU grace period.
      
      Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
      ease stable backports with the following fix instead.
      
      [1]
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0
      
      Call Trace:
       <IRQ>
       [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
       [<ffffffff81d55771>] sk_filter+0x41/0x210
       [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
       [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
       [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
       [<ffffffff81f07af9>] can_receive+0xd9/0x120
       [<ffffffff81f07beb>] can_rcv+0xab/0x100
       [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
       [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
       [<ffffffff81d37f67>] process_backlog+0x127/0x280
       [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
       [<ffffffff810c88d4>] __do_softirq+0x184/0x440
       [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
       [<ffffffff810c8bed>] do_softirq+0x1d/0x20
       [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
       [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
       [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
       [<ffffffff810e3baf>] process_one_work+0x24f/0x670
       [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
       [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
       [<ffffffff810ebafc>] kthread+0x12c/0x150
       [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
      Reported-by: 's avatarZhang Yanmin <yanmin.zhang@intel.com>
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Acked-by: 's avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  2. 14 Feb, 2017 2 commits
  3. 09 Feb, 2017 2 commits
    • can: bcm: fix hrtimer/tasklet termination in bcm op removal · a150e087
      Oliver Hartkopp authored
      commit a06393ed03167771246c4c43192d9c264bc48412 upstream.
      
      When removing a bcm tx operation either a hrtimer or a tasklet might run.
      As the hrtimer triggers its associated tasklet and vice versa we need to
      take care to mutually terminate both handlers.
      Reported-by: 's avatarMichael Josenhans <michael.josenhans@web.de>
      Signed-off-by: 's avatarOliver Hartkopp <socketcan@hartkopp.net>
      Tested-by: 's avatarMichael Josenhans <michael.josenhans@web.de>
      Signed-off-by: 's avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • svcrpc: fix oops in absence of krb5 module · a3d72952
      J. Bruce Fields authored
      commit 034dd34ff4916ec1f8f74e39ca3efb04eab2f791 upstream.
      
      Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
      (4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
      to mount the server with krb5 where the server doesn't have the
      rpcsec_gss_krb5 module built."
      
      The problem is that rsci.cred is copied from a svc_cred structure that
      gss_proxy didn't properly initialize.  Fix that.
      
      [120408.542387] general protection fault: 0000 [#1] SMP
      ...
      [120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
      [120408.567037] Hardware name: VMware, Inc. VMware Virtual =
      Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
      [120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
      ...
      [120408.584946]  ? rsc_free+0x55/0x90 [auth_rpcgss]
      [120408.585901]  gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
      [120408.587017]  svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
      [120408.588257]  ? __enqueue_entity+0x6c/0x70
      [120408.589101]  svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
      [120408.590212]  ? try_to_wake_up+0x4a/0x360
      [120408.591036]  ? wake_up_process+0x15/0x20
      [120408.592093]  ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
      [120408.593177]  svc_authenticate+0xe1/0x100 [sunrpc]
      [120408.594168]  svc_process_common+0x203/0x710 [sunrpc]
      [120408.595220]  svc_process+0x105/0x1c0 [sunrpc]
      [120408.596278]  nfsd+0xe9/0x160 [nfsd]
      [120408.597060]  kthread+0x101/0x140
      [120408.597734]  ? nfsd_destroy+0x60/0x60 [nfsd]
      [120408.598626]  ? kthread_park+0x90/0x90
      [120408.599448]  ret_from_fork+0x22/0x30
      
      Fixes: 1d658336 "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
      Cc: Simo Sorce <simo@redhat.com>
      Reported-by: 's avatarOlga Kornievskaia <kolga@netapp.com>
      Tested-by: 's avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: 's avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  4. 04 Feb, 2017 14 commits
    • net: dsa: Bring back device detaching in dsa_slave_suspend() · 9f42bc4f
      Florian Fainelli authored
      [ Upstream commit f154be241d22298d2b63c9b613f619fa1086ea75 ]
      
      Commit 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid
      lockdep splat") removed the netif_device_detach() call done in
      dsa_slave_suspend() which is necessary, and paired with a corresponding
      netif_device_attach(), bring it back.
      
      Fixes: 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat")
      Signed-off-by: 's avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: 's avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • lwtunnel: Fix oops on state free after encap module unload · e972cce0
      Robert Shearman authored
      [ Upstream commit 85c814016ce3b371016c2c054a905fa2492f5a65 ]
      
      When attempting to free lwtunnel state after the module for the encap
      has been unloaded an oops occurs:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: lwtstate_free+0x18/0x40
      [..]
      task: ffff88003e372380 task.stack: ffffc900001fc000
      RIP: 0010:lwtstate_free+0x18/0x40
      RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300
      [..]
      Call Trace:
       <IRQ>
       free_fib_info_rcu+0x195/0x1a0
       ? rt_fibinfo_free+0x50/0x50
       rcu_process_callbacks+0x2d3/0x850
       ? rcu_process_callbacks+0x296/0x850
       __do_softirq+0xe4/0x4cb
       irq_exit+0xb0/0xc0
       smp_apic_timer_interrupt+0x3d/0x50
       apic_timer_interrupt+0x93/0xa0
      [..]
      Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8
      
      The problem is after the module for the encap can be unloaded the
      corresponding ops is removed and is thus NULL here.
      
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so grab the module
      reference for the ops on creating lwtunnel state and of course release
      the reference when freeing the state.
      
      Fixes: 1104d9ba443a ("lwtunnel: Add destroy state operation")
      Signed-off-by: 's avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net: Specify the owning module for lwtunnel ops · 89c25886
      Robert Shearman authored
      [ Upstream commit 88ff7334f25909802140e690c0e16433e485b0a0 ]
      
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so specify the owning
      module for all lwtunnel ops.
      Signed-off-by: 's avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • af_unix: move unix_mknod() out of bindlock · 93ff5e03
      WANG Cong authored
      [ Upstream commit 0fb44559ffd67de8517098b81f675fa0210f13f0 ]
      
      Dmitry reported a deadlock scenario:
      
      unix_bind() path:
      u->bindlock ==> sb_writer
      
      do_splice() path:
      sb_writer ==> pipe->mutex ==> u->bindlock
      
      In the unix_bind() code path, unix_mknod() does not have to
      be done with u->bindlock held, since it is a pure fs operation,
      so we can just move unix_mknod() out.
      Reported-by: 's avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: 's avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: 's avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net: mpls: Fix multipath selection for LSR use case · ad864d9f
      David Ahern authored
      [ Upstream commit 9f427a0e474a67b454420c131709600d44850486 ]
      
      MPLS multipath for LSR is broken -- always selecting the first nexthop
      in the one label case. For example:
      
          $ ip -f mpls ro ls
          100
                  nexthop as to 200 via inet 172.16.2.2  dev virt12
                  nexthop as to 300 via inet 172.16.3.2  dev virt13
          101
                  nexthop as to 201 via inet6 2000:2::2  dev virt12
                  nexthop as to 301 via inet6 2000:3::2  dev virt13
      
      In this example incoming packets have a single MPLS labels which means
      BOS bit is set. The BOS bit is passed from mpls_forward down to
      mpls_multipath_hash which never processes the hash loop because BOS is 1.
      
      Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len
      tracks the total mpls header length on each pass (on pass N mpls_hdr_len
      is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set
      it verifies the skb has sufficient header for ipv4 or ipv6, and find the
      IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to
      advance past it.
      
      With these changes I have verified the code correctly sees the label,
      BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp
      traffic for ipv4 and ipv6 are distributed across the nexthops.
      
      Fixes: 1c78efa8 ("mpls: flow-based multipath selection")
      Acked-by: 's avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: 's avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • bridge: netlink: call br_changelink() during br_dev_newlink() · 74423145
      Ivan Vecera authored
      [ Upstream commit b6677449dff674cf5b81429b11d5c7f358852ef9 ]
      
      Any bridge options specified during link creation (e.g. ip link add)
      are ignored as br_dev_newlink() does not process them.
      Use br_changelink() to do it.
      
      Fixes: 13323516 ("bridge: implement rtnl_link_ops->changelink")
      Signed-off-by: 's avatarIvan Vecera <cera@cera.cz>
      Reviewed-by: 's avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • tcp: initialize max window for a new fastopen socket · 0c687a73
      Alexey Kodanev authored
      [ Upstream commit 0dbd7ff3ac5017a46033a9d0a87a8267d69119d9 ]
      
      Found that if we run LTP netstress test with large MSS (65K),
      the first attempt from server to send data comparable to this
      MSS on fastopen connection will be delayed by the probe timer.
      
      Here is an example:
      
           < S  seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32
           > S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0
           < .  ack 1 win 342 length 0
      
      Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now',
      as well as in 'size_goal'. This results the segment not queued for
      transmition until all the data copied from user buffer. Then, inside
      __tcp_push_pending_frames(), it breaks on send window test and
      continues with the check probe timer.
      
      Fragmentation occurs in tcp_write_wakeup()...
      
      +0.2 > P. seq 1:43777 ack 1 win 342 length 43776
           < .  ack 43777, win 1365 length 0
           > P. seq 43777:65001 ack 1 win 342 options [...] length 21224
           ...
      
      This also contradicts with the fact that we should bound to the half
      of the window if it is large.
      
      Fix this flaw by correctly initializing max_window. Before that, it
      could have large values that affect further calculations of 'size_goal'.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: 's avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock · 79453ab8
      Kefeng Wang authored
      [ Upstream commit 03e4deff4987f79c34112c5ba4eb195d4f9382b0 ]
      
      Just like commit 4acd4945 ("ipv6: addrconf: Avoid calling
      netdevice notifiers with RCU read-side lock"), it is unnecessary
      to make addrconf_disable_change() use RCU iteration over the
      netdev list, since it already holds the RTNL lock, or we may meet
      Illegal context switch in RCU read-side critical section.
      Signed-off-by: 's avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • lwtunnel: fix autoload of lwt modules · e9db042d
      David Ahern authored
      [ Upstream commit 9ed59592e3e379b2e9557dc1d9e9ec8fcbb33f16]
      
      Trying to add an mpls encap route when the MPLS modules are not loaded
      hangs. For example:
      
          CONFIG_MPLS=y
          CONFIG_NET_MPLS_GSO=m
          CONFIG_MPLS_ROUTING=m
          CONFIG_MPLS_IPTUNNEL=m
      
          $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
      The ip command hangs:
      root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
          $ cat /proc/880/stack
          [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
          [<ffffffff81065efc>] __request_module+0x27b/0x30a
          [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
          [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
          [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
          [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
          ...
      
      modprobe is trying to load rtnl-lwt-MPLS:
      
      root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
      
      and it hangs after loading mpls_router:
      
          $ cat /proc/881/stack
          [<ffffffff81441537>] rtnl_lock+0x12/0x14
          [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
          [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
          [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
          [<ffffffff81119961>] do_init_module+0x5a/0x1e5
          [<ffffffff810bd070>] load_module+0x13bd/0x17d6
          ...
      
      The problem is that lwtunnel_build_state is called with rtnl lock
      held preventing mpls_init from registering.
      
      Given the potential references held by the time lwtunnel_build_state it
      can not drop the rtnl lock to the load module. So, extract the module
      loading code from lwtunnel_build_state into a new function to validate
      the encap type. The new function is called while converting the user
      request into a fib_config which is well before any table, device or
      fib entries are examined.
      
      Fixes: 745041e2 ("lwtunnel: autoload of lwt modules")
      Signed-off-by: 's avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net: fix harmonize_features() vs NETIF_F_HIGHDMA · 948e137a
      Eric Dumazet authored
      [ Upstream commit 7be2c82cfd5d28d7adb66821a992604eb6dd112e ]
      
      Ashizuka reported a highmem oddity and sent a patch for freescale
      fec driver.
      
      But the problem root cause is that core networking stack
      must ensure no skb with highmem fragment is ever sent through
      a device that does not assert NETIF_F_HIGHDMA in its features.
      
      We need to call illegal_highdma() from harmonize_features()
      regardless of CSUM checks.
      
      Fixes: ec5f0615 ("net: Kill link between CSUM and SG features.")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Reported-by: 's avatar"Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving · 1e7cbb41
      Jason Wang authored
      [ Upstream commit 6391a4481ba0796805d6581e42f9f0418c099e34 ]
      
      Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on
      xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too,
      fixing this by adding a hint (has_data_valid) and set it only on the
      receiving path.
      
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Signed-off-by: 's avatarJason Wang <jasowang@redhat.com>
      Acked-by: 's avatarRolf Neugebauer <rolf.neugebauer@docker.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • net sched actions: fix refcnt when GETing of action after bind · b260a714
      Jamal Hadi Salim authored
      [ Upstream commit 0faa9cb5b3836a979864a6357e01d2046884ad52 ]
      
      Demonstrating the issue:
      
      .. add a drop action
      $sudo $TC actions add action drop index 10
      
      .. retrieve it
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 29 sec used 29 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... bug 1 above: reference is two.
          Reference is actually 1 but we forget to subtract 1.
      
      ... do a GET again and we see the same issue
          try a few times and nothing changes
      ~$ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 31 sec used 31 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... lets try to bind the action to a filter..
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      ... and now a few GETs:
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 3 bind 1 installed 204 sec used 204 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 4 bind 1 installed 206 sec used 206 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 5 bind 1 installed 235 sec used 235 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      .... as can be observed the reference count keeps going up.
      
      After the fix
      
      $ sudo $TC actions add action drop index 10
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 4 sec used 4 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 6 sec used 6 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 32 sec used 32 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 33 sec used 33 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      Fixes: aecc5cef ("net sched actions: fix GETing actions")
      Signed-off-by: 's avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ax25: Fix segfault after sock connection timeout · 2d6b61ec
      Basil Gunn authored
      [ Upstream commit 8a367e74c0120ef68c8c70d5a025648c96626dff ]
      
      The ax.25 socket connection timed out & the sock struct has been
      previously taken down ie. sock struct is now a NULL pointer. Checking
      the sock_flag causes the segfault.  Check if the socket struct pointer
      is NULL before checking sock_flag. This segfault is seen in
      timed out netrom connections.
      
      Please submit to -stable.
      Signed-off-by: 's avatarBasil Gunn <basil@pacabunga.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • ip6_tunnel: Account for tunnel header in tunnel MTU · c7a5df92
      Jakub Sitnicki authored
      [ Upstream commit 02ca0423fd65a0a9c4d70da0dbb8f4b8503f08c7 ]
      
      With ip6gre we have a tunnel header which also makes the tunnel MTU
      smaller. We need to reserve room for it. Previously we were using up
      space reserved for the Tunnel Encapsulation Limit option
      header (RFC 2473).
      
      Also, after commit b05229f4 ("gre6: Cleanup GREv6 transmit path,
      call common GRE functions") our contract with the caller has
      changed. Now we check if the packet length exceeds the tunnel MTU after
      the tunnel header has been pushed, unlike before.
      
      This is reflected in the check where we look at the packet length minus
      the size of the tunnel header, which is already accounted for in tunnel
      MTU.
      
      Fixes: b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
      Signed-off-by: 's avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>