Socket timeouts and binary strings in Ruby

January 12, 2017

I recently wrote a small program in Ruby. This program makes DNS queries and reads and parses the answers. Rather than using a library, I implemented the protocol at the socket level and encoded and decoded the DNS messages myself. I had not yet done any binary or socket programming in Ruby prior to this project, so it was a learning experience. In this post I talk about enforcing timeouts on a TCP socket, and encoding to and decoding from binary strings.

I am not a Ruby expert, so there may be approaches superior to what I describe. I appreciate any feedback!

Networks and timeouts

One thing I've learned is that one should never trust there will be sane timeouts when working with networks. I have seen connections hang around for hours waiting on network I/O, so applying timeouts is important to avoid such things. I always think about whether an action I take should have a timeout. For network interaction a timeout is almost always essential.

My DNS program's purpose is debugging servers that behave bizarrely, so it is crucial it not get hung up waiting on a server for an inordinate amount of time.

How do we enforce a timeout on I/O? A common way is to use select(2) (or something similar like poll(2) or epoll(7)) to wait until the socket is ready, and then use non-blocking I/O. I found that this pattern is available in Ruby.

Connecting

The first step (well, second if you need to do name resolution!) is to connect to the server. In my program I use TCP, so there is a connection handshake step. We do not want to wait on this for too long! Happily there is a class method on Socket called Socket::tcp which lets us set a timeout when connecting.

In this example I try to connect to 127.0.0.1:53 with a 5 second timeout:

ip = 127.0.0.1
port = 53
local_host = nil
local_port = nil
timeout = 5
sock = Socket.tcp(ip, port, local_host, local_port,
                  { connect_timeout: timeout })

Writing

Once we're connected, we want to do something with the connection, such as reading and writing!

Our Socket instance is a subclass of BasicSocket which is a subclass of IO. I found that the IO class implements a select method, IO::select. Using this method we can read from and write to our socket while enforcing a timeout.

If we want to write data to the socket, we can use IO#select with IO#write_nonblock, like so:

# Wait until there is an IO object ready (to write in this case).
ios_ready = IO.select([], [sock], [], timeout)
if ios_ready.nil?
  # Timeout! Try again, or give up.
end

begin
  bytes_written = sock.write_nonblock(buf)
rescue IO::WaitWritable
  # Write would block (EWOULDBLOCK or we should try again (EINTR).
  # We could go try again here! Remember to check for a partial write.
end

# We may have a partial write. In that case, we should either try again to
# write the remainder, or give up.

I like to set the timeout parameter to 1 second, and call IO::select up to as many times as the number of seconds I want to wait in total. If IO::select times out or a write succeeds, I decrement the total time I'm willing to wait by 1. This means I will call IO::select up to 5 times if I have a 5 second timeout. The reason I do this is to account for partial writes. It may be my socket becomes ready for writing 1 second into my 5 second timeout, but I only succeed in writing a portion of my buffer. Should I give up? Or wait 5 seconds more? With the 1 second approach I can wait another second and try to write the rest, and so on. Something like this:

def send_with_timeout(sock, buf, timeout)
  if timeout == 0
    return 0
  end

  # Wait up to 1 second for socket to be writable.
  rdy = IO.select([], [sock], [], 1)

  # Socket isn't ready within 1 second. Try again. Decrease how long we are
  # now willing to wait by 1 second.
  if rdy == nil
    return send_with_timeout(sock, buf, timeout-1)
  end

  # Try to write. May have a partial write.
  begin
    n = sock.write_nonblock buf
  rescue
  end

  # Wrote it all? Then we're done.
  if n == buf.bytesize
    return n
  end

  # Try again to write whatever is left. Decrease how long we will wait by 1
  # second. [n, bytesize)
  newbuf = buf.byteslice(n...buf.bytesize)

  return n+send_with_timeout(sock, newbuf, timeout-1)
end

Reading

To read from the socket, we can use a similar approach to the one we used when writing. Instead of IO#write_nonblock, use IO#read_nonblock:

# Wait for the socket to become readable
ios_ready = IO.select([sock], [], [], timeout)
if ios_ready.nil?
  # Timeout! Try again, or give up.
end

# Try to read.
begin
  buf = sock.read_nonblock(read_size)
rescue IO::WaitReadable
  # Read would block (EWOULDBLOCK) or we should try again (EAGAIN).
  # Call select() again, or give up.
end

Usually we will want to repeatedly call IO::select until we have exceeded the amount of time we want to wait, or until we have read however much we hope to read. It is useful to be able to know when the message ends so we can stop immediately when we see the end.

As with my write example, I like to call IO::select with a timeout of 1 second. If it succeeds or times out, I decrement from a total number of seconds I am willing to wait. This is to account for partial reads.

Binary strings

It is important to be able to work with binary data. It is often necessary when working with network protocols and files. In my DNS program, I construct the messages in a binary format described in RFC 1035 (section 4).

Ruby supports this well. Here are a few methods I found particularly useful:

String#bytes lets us access the individual bytes in a string. This is as opposed to accessing the characters in a string using String#[], where a single character may be made up of multiple bytes depending on the character and the encoding.
```
irb(main):001:0> a = "abc"
=> "abc"
irb(main):002:0> a.bytes
=> [97, 98, 99]
irb(main):003:0> b = "😀"
=> "😀"
irb(main):004:0> b.bytes
=> [240, 159, 152, 128]
irb(main):005:0> b[0]
=> "😀"
irb(main):006:0> b.bytes[0]
=> 240
```
String#bytesize tells us the length of the string in bytes. This is as opposed to relying on String#length which tells us how many characters are in the string.
```
irb(main):003:0> b = "😀"
=> "😀"
irb(main):005:0> b.length
=> 1
irb(main):006:0> b.bytesize
=> 4
```

String#byteslice lets us take a portion of the string. We can either provide a single index, two indices, or a range.

irb(main):003:0> b = "😀"
=> "😀"
irb(main):007:0> b.byteslice(1...b.bytesize)
=> "\x9F\x98\x80"
irb(main):008:0> b.byteslice(1...b.bytesize).bytes
=> [159, 152, 128]

String#unpack takes a string containing binary data, and decodes it to an array of fields depending on what you say is inside it. For example, if we received a block of bytes from our socket containing two 16-bit unsigned integers, how do we pull them out? One way to do this is with String#unpack and using n in the template. There are many different characters you can use in the template to decode the data differently.
```
irb(main):012:0> a = "\x00\x0f\x00\x02"
=> "\u0000\u000F\u0000\u0002"
irb(main):013:0> a.unpack('nn')
=> [15, 2]
```

What if we have variables we want to encode into a binary string suitable to be sent to our socket? We can use Array#pack. This method is the opposite of String#unpack. You provide it with a string describing how to convert each element in the array into a binary string. In this example I convert 15 and 2 to 16-bit unsigned integers:

irb(main):014:0> [15, 2].pack('nn')
=> "\x00\x0F\x00\x02"

Using these String and Array methods it is easy to convert data to and from binary. This is a pattern you can see in other languages as well. For example, Perl also has a pack function!

Ruby's documentation

Doing this project exposed me to the state of Ruby documentation. I primarily relied on ruby-doc.org which apparently is the standard for the core and standard library (ruby-lang.org links to it).

A gripe I have with the documentation is understanding which errors/exceptions a function may raise. For example, consider the documentation for BasicSocket#shutdown. I discovered that it can raise exceptions only through experiencing it. Nothing in its documentation states what they can be. The documentation refers me to shutdown(2), so perhaps I should infer the errors from that manpage. Even so, I am not sure what constant Ruby would translate each error as (maybe there is a standard naming scheme to translate C errors to Ruby constants?). This appears to be the norm for the documentation. I often found myself wanting to know more than the documentation said about errors and return values. This might be a good area to contribute to Ruby!

One thing I like about the documentation is that it is broken down by Ruby version. I developed using Ruby 2.3.3 and I viewed the documentation for this version through ruby-doc.org. For other languages, seeing documentation for a particular version is not always possible through the language's online documentation. My current favoured language, Go, does not offer this capability in its standard library documentation. I know it is possible to see the documentation from your locally installed version of Go, but it is often convenient to visit the language's website.

The One and the Many