IPv6 Neighbor Discovery Responder for KVM VPS

Junxiao Shi

2021-04-10

I Want IPv6 for Docker

I'm playing with Docker these days, and I want IPv6 in my Docker containers. The best guide for enabling IPv6 in Docker is how to enable IPv6 for Docker containers on Ubuntu 18.04. The first method in that article assigns private IPv6 addresses to containers, and uses IPv6 NAT similar to how Docker handles IPv4 NAT. I quickly got it working, but I noticed an undesirable behavior: Network Address Translation (NAT) changes the source port number of outgoing UDP datagrams, even if there's a port forwarding rule for inbound traffic; consequently, a UDP flow with the same source and destination ports is being recognized as two separate flows.

$ docker exec nfd nfdc face show 262
    faceid=262
    remote=udp6://[2001:db8:f440:2:eb26:f0a9:4dc3:1]:6363
     local=udp6://[fd00:2001:db8:4d55:0:242:ac11:4]:6363
congestion={base-marking-interval=100ms default-threshold=65536B}
       mtu=1337
  counters={in={25i 4603d 2n 1179907B} out={11921i 14d 0n 1506905B}}
     flags={non-local permanent point-to-point congestion-marking}
$ docker exec nfd nfdc face show 270
    faceid=270
    remote=udp6://[2001:db8:f440:2:eb26:f0a9:4dc3:1]:1024
     local=udp6://[fd00:2001:db8:4d55:0:242:ac11:4]:6363
   expires=0s
congestion={base-marking-interval=100ms default-threshold=65536B}
       mtu=1337
  counters={in={11880i 0d 0n 1498032B} out={0i 4594d 0n 1175786B}}
     flags={non-local on-demand point-to-point congestion-marking}

The second method in that article allows every container to have a public IPv6 address. It avoids NAT and the problems that come with it, but requires the host to have a routed IPv6 subnet. However, routed IPv6 is hard to come by on KVM servers, because virtualization platform such as Virtualizor does not support routed IPv6 subnets, but can only provide on-link IPv6.

On-Link IPv6 vs Routed IPv6

So what's the difference between on-link IPv6 and routed IPv6, anyway? It differs in how the router at the previous hop is configured to reach a destination IP address.

Let me explain in IPv4 terms first:

|--------| 192.0.2.1/24       |--------| 198.51.100.1/24    |-----------|
| router |--------------------| server |--------------------| container |
|--------|       192.0.2.2/24 |--------|    198.51.100.2/24 |-----------|
            (192.0.2.16-23/24)    |
                                  | 192.0.2.17/28           |-----------|
                                  \-------------------------| container |
                                              192.0.2.18/28 |-----------|

The server has on-link IP address 192.0.2.2.
- The router knows this IP address is on-link because it is in the 192.0.2.0/24 subnet that is configured on the router interface.
- To deliver a packet to 192.0.2.2, the router sends an ARP query of 192.0.2.2 to learn the server's MAC address, which should be responded by the server.
The server has routed IP subnet 198.51.100.0/24.
- The router must be configured to know: 198.51.100.0/24 is reachable via 192.0.2.2.
- To deliver a packet to 198.51.100.2, the router first queries its routing table and finds the above entry, then sends an ARP query to learn the MAC address of 192.0.2.2 which should be responded by the server, and finally delivers the packet to the learned MAC address.
The main difference is what IP address is enclosed in the ARP query:
- If the destination IP address is an on-link IP address, the ARP query contains the destination IP address itself.
- If the destination IP address is in a routed subnet, the ARP query contains the nexthop IP address, as determined by the routing table.
If I want to assign an on-link IPv4 address (e.g. 192.0.2.18/28) to a container, the server should be made to answer ARP queries for that IP address so that the router would deliver packets to the server, and then forwards these packets to the container.
- This technique is called ARP proxy, in which the server responds to ARP queries on behalf of the container.

The situation is a bit more complex in IPv6 because each network interface can have multiple IPv6 addresses, but the same concept applies. Instead of Address Resolution Protocol (ARP), IPv6 uses Neighbor Discovery Protocol that is part of ICMPv6. A few terminology differs:

IPv4	IPv6
ARP	Neighbor Discovery Protocol (NDP)
ARP query	ICMPv6 Neighbor Solicitation
ARP reply	ICMPv6 Neighbor Advertisement
ARP proxy	NDP proxy

If I want to assign an on-link IPv6 address to a container, the server should respond to neighbor solicitations for that IP address, so that the router would deliver packets to the server. After that, the server's Linux kernel could route the packet to the container's bridge, as if the destination IPv6 address was in a routed subnet.

NDP Proxy Daemon to the Rescue, I Hope?

ndppd, or NDP Proxy Daemon, is a program that listens for neighbor solicitations on a network interface and responds with neighbor advertisements. It is often recommended for dealing with the scenario when the server has only on-link IPv6 but we need a routed IPv6 subnet.

I installed ndppd on one of my servers, and it worked as expected with this configuration:

proxy uplink {
  rule 2001:db8:fbc0:2:646f:636b:6572::/112 {
    auto
  }
}

I can start up a Docker container with a public IPv6 address. It can reach the IPv6 Internet, and can be ping-ed from outside.

$ docker network create --ipv6 --subnet=172.26.0.0/16 \
  --subnet=2001:db8:fbc0:2:646f:636b:6572::/112 ipv6exposed
118c3a9e00595262e41b8cb839a55d1bc7bc54979a1ff76b5993273d82eea1f4

$ docker run -it --rm --network ipv6exposed \
  --ip6 2001:db8:fbc0:2:646f:636b:6572:d002 alpine

# wget -q -O- https://www.cloudflare.com/cdn-cgi/trace | grep ip
ip=2001:db8:fbc0:2:646f:636b:6572:d002

However, when I repeated the same setup on another KVM server, things didn't go well: the container cannot reach the IPv6 Internet at all.

$ docker run -it --rm --network ipv6exposed \
  --ip6 2001:db8:f440:2:646f:636b:6572:d003 alpine

/ # ping -c 4 ipv6.google.com
PING ipv6.google.com (2607:f8b0:400a:809::200e): 56 data bytes

--- ipv6.google.com ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

What's Wrong with ndppd?

Why ndppd works on the first server, but does not work on the second server? What's the difference? We need to go deeper, so I turned to tcpdump.

On the first server, I see:

$ sudo tcpdump -pi uplink icmp6
19:13:17.958191 IP6 2001:db8:fbc0::1 > ff02::1:ff72:d002:
    ICMP6, neighbor solicitation, who has 2001:db8:fbc0:2:646f:636b:6572:d002, length 32
19:13:17.958472 IP6 2001:db8:fbc0:2::2 > 2001:db8:fbc0::1:
    ICMP6, neighbor advertisement, tgt is 2001:db8:fbc0:2:646f:636b:6572:d002, length 32

The neighbor solicitation from the router comes from a global IPv6 address.
The server responds with a neighbor advertisement from its global IPv6 address. Note that this address differs from the container's address.
IPv6 works in the container.

On the second server, I see:

$ sudo tcpdump -pi uplink icmp6
00:07:53.617438 IP6 fe80::669d:99ff:feb1:55b8 > ff02::1:ff72:d003:
    ICMP6, neighbor solicitation, who has 2001:db8:f440:2:646f:636b:6572:d003, length 32
00:07:53.617714 IP6 fe80::216:3eff:fedd:7c83 > fe80::669d:99ff:feb1:55b8:
    ICMP6, neighbor advertisement, tgt is 2001:db8:f440:2:646f:636b:6572:d003, length 32

The neighbor solicitation from the router comes from a link-local IPv6 address.
The server responds with a neighbor advertisement from its link-local IPv6 address.
IPv6 does not work in the container.

Since IPv6 has been working on the second server for IPv6 addresses assigned to the server itself, I added a new IPv6 address and captured its NDP exchange:

$ sudo tcpdump -pi uplink icmp6
00:29:39.378544 IP6 fe80::669d:99ff:feb1:55b8 > ff02::1:ff00:a006:
    ICMP6, neighbor solicitation, who has 2001:db8:f440:2::a006, length 32
00:29:39.378581 IP6 2001:db8:f440:2::a006 > fe80::669d:99ff:feb1:55b8:
    ICMP6, neighbor advertisement, tgt is 2001:db8:f440:2::a006, length 32

The neighbor solicitation from the router comes from a link-local IPv6 address, same as above.
The server responds with a neighbor advertisement from the target global IPv6 address.
IPv6 works on the server from this address.

In IPv6, each network interface can have multiple IPv6 addresses. When the Linux kernel responds to a neighbor solicitation in which the target address is assigned to the same network interface, it uses that particular address as the source address. On the other hand, ndppd transmits neighbor advertisements via a PF_INET6 socket and does not specify the source address. In this case, some complicated rules for default address selection come into play.

One of these rules is preferring a source address that has the same scope as the destination address (i.e. the router). On my first server, the router uses a global address, and the server selects a global address as the source address on its neighbor advertisement. On my second server, the router uses a link-local address, and the server selects a link-local address, too.

In an unfiltered network, the router wouldn't care where the neighbor advertisements come from. However, when it comes to a KVM server on Virtualizor, the hypervisor would treat such packets as attempted IP spoofing attacks, and drop them via ebtables rules. Consequently, the neighbor advertisement never reaches the router, and the router has no way to know how to reach the container's IPv6 address.

ndpresponder: NDP Responder for KVM VPS

I tried a few tricks such as deprecating the link-local address, but none of them worked. Thus, I made my own NDP responder that sends neighbor advertisements from the target address.

ndpresponder is a Go program using the GoPacket library.

The program opens an AF_PACKET socket, with a BPF filter for ICMPv6 neighbor solicitation messages.
When a neighbor solicitation arrives, it checks the target address against a user-supplied IP range.
If the target address is in the range used for Docker containers, the program constructs an ICMPv6 neighbor advertisement messages and transmits it through the same AF_PACKET socket.

A major difference from ndppd is that, the source IPv6 address on a neighbor advertisement message is always set to the same value as the target address of the neighbor solicitation, so that the message wouldn't be dropped by the hypervisor. This is made possible because I'm sending the message via an AF_PACKET socket, instead of the AF_INET6 socket used by ndppd.

ndpresponder operates similarly as ndppd in "static" mode. It does not forward neighbor advertisements to the destination subnet like ndppd does in its "auto" mode, but this feature isn't important on a KVM server.

If ndppd doesn't seem to work on your KVM VPS, give ndpresponder a try! Head to my GitHub repository for installation and usage instructions: https://github.com/yoursunny/ndpresponder

Join the discussion: LowEndSpirit LowEndTalk

Tags: Docker Go IPv6 Linux hosting