The One and the Many

DNS Lookup Failure in Go and The GNU C Library

Background

I was writing a program to look up IPs associated with an SPF record recently. I was writing this program in Go as it's my current favoured all around language, and I like to use it where I can.

I encountered a strange issue that I initially attributed to a bug in Go. I would try to look up the A record of a certain hostname, and my Go program would come back with an error "no such host".

I thought this was a bug in Go's library itself. Why? Because I could look up this host just fine using the program dig. I looked up on Go's bug tracker to find if anyone else had run into the issue. I did find an issue that on the surface sounded like what I had run into:

Issue 6336: net: ResolveIPAddr triggers glibc bug writing to wrong fd

I ended up putting this program away and Go as being immature in this area.

Investigation

One weekend recently I decided: Well, even if it is a bug in Go, maybe I can fix it and contribute a fix upstream! So I decided to trace through what was happening in more detail. (Before that I tried to find a bug report that was more of a match for what I found, as upon review of the above report I could not see it as exactly what I was encountering. I could not find one, which made me suspicious).

(Note: My Go environment is built from a checkout of its source tree. I based this on Go 1.3.3 release).

The call that was giving me a "no such host" error was net.LookupIP() which takes a single argument: The host to look up.

What I found from looking at Go's source was that this call eventually calls the C library call getaddrinfo(3) to do its main work (pkg/net/lookup_unix.go has lookupIP which calls cgoLookupIP()). This C library function was returning an error code EAI_NONAME (the node/service is not known).

Seeing this I thought that maybe the function is being called in an odd way that I did not understand.

I wrote a program in C to try calling this function directly and emulating the parameters that Go was using to call it. I found that no matter what I did, the host would not resolve.

So it seemed I had found a bug in glibc which I was skeptical about. Finding a bug there would be pretty surprising to me. I thought that I must be wrong, so I downloaded my distribution's glibc source package and tried to trace through getaddrinfo(3). I eventually found that what was happening was a character in the hostname was causing the lookup to be rejected.

Specifically, in resolv/res_comp.c we have a function called res_hnok() which is apparently used to determine if a hostname is valid or not.

It uses this macro (among others) to decide if a character is valid:

#define borderchar(c) (alphachar(c) || digitchar(c))

Now the hostname I was testing with has an underscore in it. So I decided to test what I found (mostly since I found the glibc code fairly difficult to be confident in reading). I changed the above macro to this:

#define borderchar(c) (alphachar(c) || digitchar(c) || underscorechar(c))

And recompiled and installed glibc. Afterwards, my test program is able to look up the hostname just fine! (I used an image on Terminal to do this as I did not want to risk reinstalling glibc on my local machine! I later found a way to test glibc without installing it globally, but this worked too).

As to whether this is a bug... I think it is not. I think it is glibc being strict about what is a valid hostname and that it is actually correct. However it is quite surprising to me and was not obvious.

Primarily this is surprising because each other tool I used (dig, PHP, Perl) was able to look up this hostname (but apparently some versions of Internet Explorer and Android will reject it like glibc). It was only Go, and this C function, that I tested that ended up failing. Perhaps this could be made more clear with a separate error code for this situation, or mention be made in the documentation.

Invalid hostname?

The hostname in this case was

1._spf.citigroup.com

Which has multiple A records (which is what I wanted to know about). To rule out any issue with their nameservers I also created a similar hostname on my domain (which was allowed through freedns.afraid.org's web interface).

As you may be aware (though clearly I was not at the time) this is not technically considered a valid hostname (though it is valid for things like SRV or TXT records apparently as they are not hostnames but are domain names/sub domain names).

What makes a valid hostname? Specifically an internet host, not a valid DNS record. Let's go to the RFCs!

Restricting to the definition of a hostname (as opposed to an arbitrary DNS label), none of the above say anything about "_" being a valid character in a hostname.

So it seems that glibc and Go are right after all!