Exadata versus IPv6

Recently one of my customers got a complaint from their DNS administrators, our Exadata’s are doing 40.000 DNS requests per minute. We like our DNS admins so we had a look into these request and what was causing them. I started with just firing up a tcpdump on one of the bonded client interfaces on a random compute node:

[root@dm01db01 ~]# tcpdump -i bondeth0 -s 0 port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bondeth0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:41:04.937009 IP dm0101.domain.local.59868 > dnsserver01.domain.local:  53563+ AAAA? dm0101-vip.domain.local. (41)
15:41:04.937287 IP dm0101.domain.local.46672 > dnsserver01.domain.local:  44056+ PTR? 8.18.68.10.in-addr.arpa. (41)
15:41:04.938409 IP dnsserver01.domain.local > dm0101.domain.local.59868:  53563* 0/1/0 (116)
15:41:04.938457 IP dm0101.domain.local.56576 > dnsserver01.domain.local:  45733+ AAAA? dm0101-vip.domain.local.domain.local. (54)
15:41:04.939547 IP dnsserver01.domain.local > dm0101.domain.local.46672:  44056* 1/1/1 PTR dnsserver01.domain.local. (120)
15:41:04.940204 IP dnsserver01.domain.local > dm0101.domain.local.56576:  45733 NXDomain* 0/1/0 (129)
15:41:04.940237 IP dm0101.domain.local.9618 > dnsserver01.domain.local:  64639+ A? dm0101-vip.domain.local. (41)
15:41:04.941912 IP dnsserver01.domain.local > dm0101.domain.local.9618:  64639* 1/1/1 A dm0101-vip.domain.local (114)

So what are we seeing here, there are a bunch of AAAA requests to the DNS server and only one A record request. But the weirdest thing is of course the requests with the double domainname extensions. If we zoom in at those AAAA records requests we see the following, here is the request:

15:41:04.937009 IP dm0101.domain.local.59868 > dnsserver01.domain.local:  53563+ AAAA? dm0101-vip.domain.local. (41)

And here is our answer:

15:41:04.938409 IP dnsserver01.domain.local > dm0101.domain.local.59868:  53563* 0/1/0 (116)

The interesting part is in the answer of the dnsserver, in 0/1/0 the DNS server tells me that for this lookup it found 0 answer resource records, 1 authority resource records, and 0 additional resource records. So it could not resolve my VIP name in DNS. Now if we look at the A records request:

15:41:04.945697 IP dm0101.domain.local.10401 > dnsserver01.domain.local:  37808+ A? dm0101-vip.domain.local. (41)

and the answer:

15:41:04.947249 IP dnsserver01.domain.local > dm0101.domain.local.10401:  37808* 1/1/1 A dm0101-vip.domain.local (114)

Now by looking at the answer: 1/1/1 we can see that i got 1 answer record in return (the first 1), so the DNS server knows the IP for dm0101-vip.domain.local when an A record is requested. What is going on here? Well the answer is simple, AAAA records are IPv6 DNS requests, our DNS servers are not configured for IPv6 name request so it rejects these requests. So what about those weird double domain names like dm0101-vip.domain.local.domain.local? When Linux requests a DNS record the following happens:

1. Linux issues DNS request for dm0101-vip.domain.local, because IPv6 is enabled, it issues an AAAA request.
2. DNS server is not configured for IPv6 requests and discards request.
3. Linux retries the requests, looks into resolv.conf and adds domainame, we now have dm0101-vip.domain.local.domain.local
4. Once again, the DNS server discards this request.
5. Linux once agains retries the AAAA request, adds domain name: dm0101-vip.domain.local.domain.local.domain.local
6. DNS server discards AAAA request
7. Linux now falls back to a DNS IPv4 request and does an A request: dm0101-vip.domain.local
8. DNS servers understands this and replies

This happens because Exadata comes with IPv6 Enabled on both infiniband and ethernet interfaces:

[root@dm0101 ~]# ifconfig bond0;ifconfig bond1
bond0     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:192.168.100.1  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::221:2800:13f:2673/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
          RX packets:226096104 errors:0 dropped:0 overruns:0 frame:0
          TX packets:217747947 errors:0 dropped:55409 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:320173078389 (298.1 GiB)  TX bytes:176752381042 (164.6 GiB)

bond1     Link encap:Ethernet  HWaddr 00:21:28:84:16:49  
          inet addr:10.18.1.10  Bcast:10.18.1.255  Mask:255.255.255.0
          inet6 addr: fe80::221:28ff:fe84:1649/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:14132063 errors:2 dropped:0 overruns:0 frame:2
          TX packets:7334898 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2420637835 (2.2 GiB)  TX bytes:3838537234 (3.5 GiB)

[root@dm0101 ~]# 

Let’s disable ipv6, my client is not using on ipv6 its internal network anyway (like most companies i assume). You can edit /etc/modprobe.conf to prevent it from being loaded at boot time, add the following 2 lines modprobe.conf:

alias ipv6 off
install ipv6 /bin/true

Then add the below entries to /etc/sysconfig/network

IPV6INIT=no

Reboot the host and lets look at what we see after the host is up again:

[root@dm0103 ~]# cat /proc/net/if_inet6
00000000000000000000000000000001 01 80 10 80       lo
fe8000000000000002212800013f111f 08 40 20 80    bond0
fe80000000000000022128fffe8e5f6a 02 40 20 80     eth0
fe80000000000000022128fffe8e5f6b 09 40 20 80    bond1
[root@dm0103 ~]# ifconfig bond0;ifconfig bond1
bond0     Link encap:InfiniBand  HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:192.168.100.3  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::221:2800:13f:111f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
          RX packets:318265 errors:0 dropped:0 overruns:0 frame:0
          TX packets:268072 errors:0 dropped:16 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:433056862 (412.9 MiB)  TX bytes:190905039 (182.0 MiB)

bond1     Link encap:Ethernet  HWaddr 00:21:28:8E:5F:6B  
          inet addr:10.18.1.12  Bcast:10.18.1.255  Mask:255.255.255.0
          inet6 addr: fe80::221:28ff:fe8e:5f6b/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:10256 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5215 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1559169 (1.4 MiB)  TX bytes:1350653 (1.2 MiB)

[root@dm0103 ~]# 

So disabling ipv6 modules through modprobe.conf did not do the trick, what did broughgt the ipv6 stack:

[root@dm0103 ~]# lsmod | grep ipv6
ipv6 291277 449 bonding,ib_ipoib,ib_addr,cnic

The infiniband stack brought up ipv6, we can disable ipv6 at kernel level:

root@dm0103 ~]# sysctl -a | grep net.ipv6.conf.all.disable_ipv6 
net.ipv6.conf.all.disable_ipv6 = 0
[root@dm0103 ~]# echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
[root@dm0103 ~]# sysctl -a | grep net.ipv6.conf.all.disable_ipv6 
net.ipv6.conf.all.disable_ipv6 = 1
[root@dm0103 ~]# cat /proc/net/if_inet6
[root@dm0103 ~]# 

Now we are running this exadata compute node without ipv6, we can now check if we still have infiniband connectivity, on a cell start a ibping server and use ibstat to get the port GUID:

[root@dm01cel01 ~]# ibstat -p
0x00212800013ea3bf
0x00212800013ea3c0
[root@dm01cel01 ~]# ibping -S

On our ipv6 disabled host start the ibping to one of the

[root@dm0103 ~]# ibping -c 4 -v -G 0x00212800013ea3bf
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.148 ms
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.205 ms
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.247 ms
ibwarn: [14476] ibping: Ping..
Pong from dm01cel01.oracle.vxcompany.local.(none) (Lid 6): time 0.139 ms
ibwarn: [14476] report: out due signal 0

--- dm01cel01.oracle.vxcompany.local.(none) (Lid 6) ibping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 4001 ms
rtt min/avg/max = 0.139/0.184/0.247 ms
[root@dm0103 ~]# 

So we have infiniband connectivity, lets see how Oracle reacts:

[root@dm0103 ~]# crsctl stat res -t

And now we play the waiting game… well basically it never comes back, it tries to read from 2 network sockets if we look with strace, it hangs at:

[pid 15917] poll([{fd=3, events=POLLIN|POLLRDNORM}, {fd=4, events=POLLIN|POLLRDNORM}], 2, -1

Which points to 2 file descriptors which it can’t read from:

[root@dm0103 ~]# ls -altr /proc/15917/fd
total 0
dr-xr-xr-x 7 root root  0 Feb  3 18:37 ..
lrwx------ 1 root root 64 Feb  3 18:37 4 -> socket:[3447070]
lrwx------ 1 root root 64 Feb  3 18:37 3 -> socket:[3447069]
lrwx------ 1 root root 64 Feb  3 18:37 2 -> /dev/pts/0
lrwx------ 1 root root 64 Feb  3 18:37 1 -> /dev/pts/0
lrwx------ 1 root root 64 Feb  3 18:37 0 -> /dev/pts/0
dr-x------ 2 root root  0 Feb  3 18:37 .
[root@dm0103 ~]# 

There is an dependency between ipv6 and CRS on an Exadata, disabling ipv6 will cripple your clusterware. There is no real solution for this problem because we need ipv6 on an Exadata, we can’t disable it. However we van easily reduce the amount of ipv6 DNS lookups by extending our /etc/hosts file and adding all hostnames, vip names etc. of all our hosts in our cluster in every single hostfile on computenodes. Unfortunately we can’t do this on our Cell servers, because oracle does not want us to go ‘messing’ with them so you have to live with it for now.

About these ads

7 thoughts on “Exadata versus IPv6

  1. Interesting; I took at a peek at a sample Exadata DB server and see the AAAA requests refused too, though more like 20/minute than 4000/minute. A quick look at MOS shows that an enhanvement request (12748296) was submited in 2011 to remove IPV6 by default. It was justified on the basis of security from rogue DHCPv6 servers (though Exadata doesn’t use DHCP afaik), and also on the high number of DNS lookups.

    However even 4000 DNS requests/minute is only 67/second, and especially if they return rejections, it’s very low overhead for a modern DNS server. If this is causing problems for your DNS, I’d think that there’s something seriously broken in your DNS. If you have to, you could even set up highly-available caching DNS servers to serve these Exadata requests.

    Cheers!

    Marc

    • Hi Marc,

      At this customer it was 40.000 DNS requests per minute, not 4000 :-) the unusual high amount of DNS requests also comes from the fact that they use Exadata as a consolidation platform. They have 50 instances running per compute node. Besides the enhancement request there is also an unpublished bug about the same issue.

      The DNS requests are not really causing real issues at the DNS server side, the admins just noticed the unusual high amount of requests. The issue for me is not really the ipv6 DNS request but the fact the disabling ipv6 (which was my first idea) is just generaly a bad idea.

      • Ahh, got it. But like you say, even at 40k requests per minute, an ordinary server should be able to handle the load without trouble.

        Good to know about the ipv6 issues, especially since the enhancement request included procedures for disabling it. Have you seen MOS note 1259924.1 by the way? It says to set interface=ipv4 in $GRID_HOME/opmn/conf/ons.config is you have ipv6 disabled.

        Cheers!

        Marc

      • Being one of the friendly DNS administrators for this case, we see hardly any CPU load caused by this. DNS response is around a millisecond, so server performance is not impacted, which would be the case if the DNS request would remain unanswered and you run into DNS client time outs.

        Still it would be nice to have the cause of abundant pollution of the logging removed. If DNS issues would arise in future, the are much harder to isolate in a flood of irrelevant request.
        Also in reporting of the utilization of our DNS environment, this obfuscates the health of the environment and client request

  2. Pingback: Exadata versus IPv6 « ytd2525

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s