1. 28 Nov, 2016 1 commit
  2. 13 Nov, 2016 1 commit
    • bpf: Add test for bpf_redirect to ipip/ip6tnl · 90e02896
      Martin KaFai Lau authored
      The test creates two netns, ns1 and ns2.  The host (the default netns)
      has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
      
          ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
      
      The test is to have ns1 pinging VIPs configured at the loopback
      interface in ns2.
      
      The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
      at lo@ns2). [Note: 0x66 => 102].
      
      At ns1, the VIPs are routed _via_ the host.
      
      At the host, bpf programs are installed at the veth to redirect packets
      from a veth to the ipip/ip6tnl.  The test is configured in a way so
      that both ingress and egress can be tested.
      
      At ns2, the ipip/ip6tnl dev is configured with the local and remote address
      specified.  The return path is routed to the dev ipip/ip6tnl.
      
      During egress test, the host also locally tests pinging the VIPs to ensure
      that bpf_redirect at egress also works for the direct egress (i.e. not
      forwarding from dev ve1 to ve2).
      Acked-by: 's avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: 's avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  3. 29 Oct, 2016 1 commit
    • bpf: fix samples to add fake KBUILD_MODNAME · 96a8eb1e
      Daniel Borkmann authored
      Some of the sample files are causing issues when they are loaded with tc
      and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
      file as program/map/etc sections are not present, which can be easily
      spotted with readelf(1).
      
      Currently, BPF samples are including some of the kernel headers and mid
      term we should change them to refrain from this, really. When dynamic
      debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
      is easily overlooked in the build as clang spills this along with other
      noisy warnings from various header includes, and llc still generates an
      ELF file with mentioned characteristics. For just playing around with BPF
      examples, this can be a bit of a hurdle to take.
      
      Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
      done in xdp*_kern samples already.
      
      Fixes: 65d472fb ("samples/bpf: add 'pointer to packet' tests")
      Fixes: 6afb1e28 ("samples/bpf: Add tunnel set/get tests.")
      Fixes: a3f74617 ("cgroup: bpf: Add an example to do cgroup checking in BPF")
      Reported-by: 's avatarChandrasekar Kannan <ckannan@console.to>
      Signed-off-by: 's avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  4. 10 Oct, 2016 1 commit
  5. 29 Sep, 2016 1 commit
    • bpf: allow access into map value arrays · 48461135
      Josef Bacik authored
      Suppose you have a map array value that is something like this
      
      struct foo {
      	unsigned iter;
      	int array[SOME_CONSTANT];
      };
      
      You can easily insert this into an array, but you cannot modify the contents of
      foo->array[] after the fact.  This is because we have no way to verify we won't
      go off the end of the array at verification time.  This patch provides a start
      for this work.  We accomplish this by keeping track of a minimum and maximum
      value a register could be while we're checking the code.  Then at the time we
      try to do an access into a MAP_VALUE we verify that the maximum offset into that
      region is a valid access into that memory region.  So in practice, code such as
      this
      
      unsigned index = 0;
      
      if (foo->iter >= SOME_CONSTANT)
      	foo->iter = index;
      else
      	index = foo->iter++;
      foo->array[index] = bar;
      
      would be allowed, as we can verify that index will always be between 0 and
      SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
      check to make sure the index isn't less than 0, or do something like index %=
      SOME_CONSTANT.
      Signed-off-by: 's avatarJosef Bacik <jbacik@fb.com>
      Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  6. 27 Sep, 2016 2 commits
  7. 23 Sep, 2016 4 commits
  8. 21 Sep, 2016 1 commit
  9. 20 Sep, 2016 1 commit
  10. 17 Sep, 2016 2 commits
  11. 09 Sep, 2016 4 commits
    • rpmsg: Allow callback to return errors · 4b83c52a
      Bjorn Andersson authored
      Some rpmsg backends support holding on to and redelivering messages upon
      failed handling of them, so provide a way for the callback to report and
      error and allow the backends to handle this.
      Signed-off-by: 's avatarBjorn Andersson <bjorn.andersson@linaro.org>
    • rpmsg: Clean up rpmsg device vs channel naming · 92e1de51
      Bjorn Andersson authored
      The rpmsg device representing struct is called rpmsg_channel and the
      variable name used throughout is rpdev, with the communication happening
      on endpoints it's clearer to just call this a "device" in a public API.
      Signed-off-by: 's avatarBjorn Andersson <bjorn.andersson@linaro.org>
    • rpmsg: rpmsg_send() operations takes rpmsg_endpoint · 2a48d732
      Bjorn Andersson authored
      The rpmsg_send() operations has been taking a rpmsg_device, but this
      forces users of secondary rpmsg_endpoints to use the rpmsg_sendto()
      interface - by extracting source and destination from the given data
      structures. If we instead pass the rpmsg_endpoint to these functions a
      service can use rpmsg_sendto() to respond to messages, even on secondary
      endpoints.
      
      In addition this would allow us to support operations on multiple
      channels in future backends that does not support off-channel
      operations.
      Signed-off-by: 's avatarBjorn Andersson <bjorn.andersson@linaro.org>
    • bpf: fix range propagation on direct packet access · 2d2be8ca
      Daniel Borkmann authored
      LLVM can generate code that tests for direct packet access via
      skb->data/data_end in a way that currently gets rejected by the
      verifier, example:
      
        [...]
         7: (61) r3 = *(u32 *)(r6 +80)
         8: (61) r9 = *(u32 *)(r6 +76)
         9: (bf) r2 = r9
        10: (07) r2 += 54
        11: (3d) if r3 >= r2 goto pc+12
         R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
         R9=pkt(id=0,off=0,r=0) R10=fp
        12: (18) r4 = 0xffffff7a
        14: (05) goto pc+430
        [...]
      
        from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
                       R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
        24: (7b) *(u64 *)(r10 -40) = r1
        25: (b7) r1 = 0
        26: (63) *(u32 *)(r6 +56) = r1
        27: (b7) r2 = 40
        28: (71) r8 = *(u8 *)(r9 +20)
        invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)
      
      The reason why this gets rejected despite a proper test is that we
      currently call find_good_pkt_pointers() only in case where we detect
      tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
      derived, for example, from a register of type pkt(id=Y,off=0,r=0)
      pointing to skb->data. find_good_pkt_pointers() then fills the range
      in the current branch to pkt(id=Y,off=0,r=Z) on success.
      
      For above case, we need to extend that to recognize pkt_end >= rX
      pattern and mark the other branch that is taken on success with the
      appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
      Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
      only two practical options to test for from what LLVM could have
      generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
      that we would need to take into account as well.
      
      After the fix:
      
        [...]
         7: (61) r3 = *(u32 *)(r6 +80)
         8: (61) r9 = *(u32 *)(r6 +76)
         9: (bf) r2 = r9
        10: (07) r2 += 54
        11: (3d) if r3 >= r2 goto pc+12
         R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
         R9=pkt(id=0,off=0,r=0) R10=fp
        12: (18) r4 = 0xffffff7a
        14: (05) goto pc+430
        [...]
      
        from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
                       R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
        24: (7b) *(u64 *)(r10 -40) = r1
        25: (b7) r1 = 0
        26: (63) *(u32 *)(r6 +56) = r1
        27: (b7) r2 = 40
        28: (71) r8 = *(u8 *)(r9 +20)
        29: (bf) r1 = r8
        30: (25) if r8 > 0x3c goto pc+47
         R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
         R9=pkt(id=0,off=0,r=54) R10=fp
        31: (b7) r1 = 1
        [...]
      
      Verifier test cases are also added in this work, one that demonstrates
      the mentioned example here and one that tries a bad packet access for
      the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
      pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
      with both test variants (>, >=).
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: 's avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  12. 02 Sep, 2016 2 commits
  13. 20 Aug, 2016 1 commit
    • samples/bpf: Add tunnel set/get tests. · 6afb1e28
      William Tu authored
      The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
      and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE.  A native
      tunnel device is created in a namespace to interact with a lwtunnel
      device out of the namespace, with metadata enabled.  The bpf_skb_set_*
      program is attached to tc egress and bpf_skb_get_* is attached to egress
      qdisc.  A ping between two tunnels is used to verify correctness and
      the result of bpf_skb_get_* printed by bpf_trace_printk.
      Signed-off-by: 's avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  14. 13 Aug, 2016 4 commits
  15. 11 Aug, 2016 1 commit
  16. 07 Aug, 2016 1 commit
  17. 04 Aug, 2016 4 commits
  18. 26 Jul, 2016 2 commits
    • samples/bpf: Add test/example of using bpf_probe_write_user bpf helper · cf9b1199
      Sargun Dhillon authored
      This example shows using a kprobe to act as a dnat mechanism to divert
      traffic for arbitrary endpoints. It rewrite the arguments to a syscall
      while they're still in userspace, and before the syscall has a chance
      to copy the argument into kernel space.
      
      Although this is an example, it also acts as a test because the mapped
      address is 255.255.255.255:555 -> real address, and that's not a legal
      address to connect to. If the helper is broken, the example will fail
      on the intermediate steps, as well as the final step to verify the
      rewrite of userspace memory succeeded.
      Signed-off-by: 's avatarSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
    • bpf: Add bpf_probe_write_user BPF helper to be called in tracers · 96ae5227
      Sargun Dhillon authored
      This allows user memory to be written to during the course of a kprobe.
      It shouldn't be used to implement any kind of security mechanism
      because of TOC-TOU attacks, but rather to debug, divert, and
      manipulate execution of semi-cooperative processes.
      
      Although it uses probe_kernel_write, we limit the address space
      the probe can write into by checking the space with access_ok.
      We do this as opposed to calling copy_to_user directly, in order
      to avoid sleeping. In addition we ensure the threads's current fs
      / segment is USER_DS and the thread isn't exiting nor a kernel thread.
      
      Given this feature is meant for experiments, and it has a risk of
      crashing the system, and running programs, we print a warning on
      when a proglet that attempts to use this helper is installed,
      along with the pid and process name.
      Signed-off-by: 's avatarSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  19. 21 Jul, 2016 2 commits
  20. 20 Jul, 2016 2 commits
    • bpf: add sample for xdp forwarding and rewrite · 764cbcce
      Brenden Blanco authored
      Add a sample that rewrites and forwards packets out on the same
      interface. Observed single core forwarding performance of ~10Mpps.
      
      Since the mlx4 driver under test recycles every single packet page, the
      perf output shows almost exclusively just the ring management and bpf
      program work. Slowdowns are likely occurring due to cache misses.
      Signed-off-by: 's avatarBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
    • Add sample for adding simple drop program to link · 86af8b41
      Brenden Blanco authored
      Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
      hook of a link. With the drop-only program, observed single core rate is
      ~20Mpps.
      
      Other tests were run, for instance without the dropcnt increment or
      without reading from the packet header, the packet rate was mostly
      unchanged.
      
      $ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
      proto 17:   20403027 drops/s
      
      ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
      Running... ctrl^C to stop
      Device: eth4@0
      Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
        5056638pps 2427Mb/sec (2427186240bps) errors: 0
      Device: eth4@1
      Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
        5133311pps 2463Mb/sec (2463989280bps) errors: 0
      Device: eth4@2
      Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
        5077431pps 2437Mb/sec (2437166880bps) errors: 0
      Device: eth4@3
      Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
        5043067pps 2420Mb/sec (2420672160bps) errors: 0
      
      perf report --no-children:
       26.05%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
       17.84%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
        5.52%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
        4.90%  swapper      [kernel.vmlinux]  [k] poll_idle
        4.14%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
        2.78%  ksoftirqd/0  [kernel.vmlinux]  [k] __free_pages_ok
        2.57%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
        2.51%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq
        1.94%  ksoftirqd/0  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
        1.45%  swapper      [mlx4_en]         [k] mlx4_en_alloc_frags
        1.35%  ksoftirqd/0  [kernel.vmlinux]  [k] free_one_page
        1.33%  swapper      [kernel.vmlinux]  [k] intel_idle
        1.04%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5c5
        0.96%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c58d
        0.93%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6ee
        0.92%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6b9
        0.89%  ksoftirqd/0  [kernel.vmlinux]  [k] __alloc_pages_nodemask
        0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c686
        0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5d5
        0.78%  ksoftirqd/0  [mlx4_en]         [k] mlx4_alloc_pages.isra.23
        0.77%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5b4
        0.77%  ksoftirqd/0  [kernel.vmlinux]  [k] net_rx_action
      
      machine specs:
       receiver - Intel E5-1630 v3 @ 3.70GHz
       sender - Intel E5645 @ 2.40GHz
       Mellanox ConnectX-3 @40G
      Signed-off-by: 's avatarBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: 's avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
  21. 19 Jul, 2016 1 commit
  22. 14 Jul, 2016 1 commit