The One and the Many

Socket timeouts and binary strings in Ruby

I recently wrote a small program in Ruby. This program makes DNS queries and reads and parses the answers. Rather than using a library, I implemented the protocol at the socket level and encoded and decoded the DNS messages myself. I had not yet done any binary or socket programming in Ruby prior to this project, so it was a learning experience. In this post I talk about enforcing timeouts on a TCP socket, and encoding to and decoding from binary strings.

I am not a Ruby expert, so there may be approaches superior to what I describe. I appreciate any feedback!

Networks and timeouts

One thing I've learned is that one should never trust there will be sane timeouts when working with networks. I have seen connections hang around for hours waiting on network I/O, so applying timeouts is important to avoid such things. I always think about whether an action I take should have a timeout. For network interaction a timeout is almost always essential.

My DNS program's purpose is debugging servers that behave bizarrely, so it is crucial it not get hung up waiting on a server for an inordinate amount of time.

How do we enforce a timeout on I/O? A common way is to use select(2) (or something similar like poll(2) or epoll(7)) to wait until the socket is ready, and then use non-blocking I/O. I found that this pattern is available in Ruby.

Connecting

The first step (well, second if you need to do name resolution!) is to connect to the server. In my program I use TCP, so there is a connection handshake step. We do not want to wait on this for too long! Happily there is a class method on Socket called Socket::tcp which lets us set a timeout when connecting.

In this example I try to connect to 127.0.0.1:53 with a 5 second timeout:

ip = 127.0.0.1
port = 53
local_host = nil
local_port = nil
timeout = 5
sock = Socket.tcp(ip, port, local_host, local_port,
                  { connect_timeout: timeout })

Writing

Once we're connected, we want to do something with the connection, such as reading and writing!

Our Socket instance is a subclass of BasicSocket which is a subclass of IO. I found that the IO class implements a select method, IO::select. Using this method we can read from and write to our socket while enforcing a timeout.

If we want to write data to the socket, we can use IO#select with IO#write_nonblock, like so:

# Wait until there is an IO object ready (to write in this case).
ios_ready = IO.select([], [sock], [], timeout)
if ios_ready.nil?
  # Timeout! Try again, or give up.
end

begin
  bytes_written = sock.write_nonblock(buf)
rescue IO::WaitWritable
  # Write would block (EWOULDBLOCK or we should try again (EINTR).
  # We could go try again here! Remember to check for a partial write.
end

# We may have a partial write. In that case, we should either try again to
# write the remainder, or give up.

I like to set the timeout parameter to 1 second, and call IO::select up to as many times as the number of seconds I want to wait in total. If IO::select times out or a write succeeds, I decrement the total time I'm willing to wait by 1. This means I will call IO::select up to 5 times if I have a 5 second timeout. The reason I do this is to account for partial writes. It may be my socket becomes ready for writing 1 second into my 5 second timeout, but I only succeed in writing a portion of my buffer. Should I give up? Or wait 5 seconds more? With the 1 second approach I can wait another second and try to write the rest, and so on. Something like this:

def send_with_timeout(sock, buf, timeout)
  if timeout == 0
    return 0
  end

  # Wait up to 1 second for socket to be writable.
  rdy = IO.select([], [sock], [], 1)

  # Socket isn't ready within 1 second. Try again. Decrease how long we are
  # now willing to wait by 1 second.
  if rdy == nil
    return send_with_timeout(sock, buf, timeout-1)
  end

  # Try to write. May have a partial write.
  begin
    n = sock.write_nonblock buf
  rescue
  end

  # Wrote it all? Then we're done.
  if n == buf.bytesize
    return n
  end

  # Try again to write whatever is left. Decrease how long we will wait by 1
  # second. [n, bytesize)
  newbuf = buf.byteslice(n...buf.bytesize)

  return n+send_with_timeout(sock, newbuf, timeout-1)
end

Reading

To read from the socket, we can use a similar approach to the one we used when writing. Instead of IO#write_nonblock, use IO#read_nonblock:

# Wait for the socket to become readable
ios_ready = IO.select([sock], [], [], timeout)
if ios_ready.nil?
  # Timeout! Try again, or give up.
end

# Try to read.
begin
  buf = sock.read_nonblock(read_size)
rescue IO::WaitReadable
  # Read would block (EWOULDBLOCK) or we should try again (EAGAIN).
  # Call select() again, or give up.
end

Usually we will want to repeatedly call IO::select until we have exceeded the amount of time we want to wait, or until we have read however much we hope to read. It is useful to be able to know when the message ends so we can stop immediately when we see the end.

As with my write example, I like to call IO::select with a timeout of 1 second. If it succeeds or times out, I decrement from a total number of seconds I am willing to wait. This is to account for partial reads.

Binary strings

It is important to be able to work with binary data. It is often necessary when working with network protocols and files. In my DNS program, I construct the messages in a binary format described in RFC 1035 (section 4).

Ruby supports this well. Here are a few methods I found particularly useful:

What if we have variables we want to encode into a binary string suitable to be sent to our socket? We can use Array#pack. This method is the opposite of String#unpack. You provide it with a string describing how to convert each element in the array into a binary string. In this example I convert 15 and 2 to 16-bit unsigned integers:

irb(main):014:0> [15, 2].pack('nn')
=> "\x00\x0F\x00\x02"

Using these String and Array methods it is easy to convert data to and from binary. This is a pattern you can see in other languages as well. For example, Perl also has a pack function!

Ruby's documentation

Doing this project exposed me to the state of Ruby documentation. I primarily relied on ruby-doc.org which apparently is the standard for the core and standard library (ruby-lang.org links to it).

A gripe I have with the documentation is understanding which errors/exceptions a function may raise. For example, consider the documentation for