Shedding some light into UUID version 4 in Ruby and Rails

Ruby's standard library and Rails's PostgreSQL adapter use by default version 4 UUIDs. This can be changed in Rails migrations via:

default: 'uuid_generate_v1()'

param whereas Ruby's stdlib only supports version 4.

Another interesting difference is in the implementations:

In the Ruby's stdlib the UUID method is found in SecureRandom:
  # SecureRandom.uuid generates a random v4 UUID (Universally Unique IDentifier).
  #
  # The version 4 UUID is purely random (except the version).
  # It doesn't contain meaningful information such as MAC addresses, timestamps, etc.
  #
  # See RFC 4122 for details of UUID.
  #
  def uuid
    ary = random_bytes(16).unpack("NnnnnN")
    ary[2] = (ary[2] & 0x0fff) | 0x4000
    ary[3] = (ary[3] & 0x3fff) | 0x8000
    "%08x-%04x-%04x-%04x-%04x%08x" % ary
  end

source-code

The Rails version uses uuid-ossp postgres extension:
  # By default, this will use the +uuid_generate_v4()+ function from the
  # +uuid-ossp+ extension, which MUST be enabled on your database.
  def primary_key(name, type = :primary_key, options = {})
    return super unless type == :uuid
    options[:default] = options.fetch(:default, 'uuid_generate_v4()')
    options[:primary_key] = true
    column name, type, options
  end

source-code

The only difference is how the actual numbers are being generated in each library - this is quite interesting to investigate - let's start with the Ruby stdlib.

Ruby standard library implementation
module SecureRandom
  if defined? OpenSSL::Random
    def self.gen_random(n)
      @pid = 0 unless defined?(@pid)
      pid = $$
      unless @pid == pid
        now = Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)
        ary = [now, @pid, pid]
        OpenSSL::Random.random_add(ary.join("").to_s, 0.0)
        @pid = pid
      end
      return OpenSSL::Random.random_bytes(n)
    end
  else
    def self.gen_random(n)
      ret = Random.raw_seed(n)
      unless ret
        raise NotImplementedError, "No random device"
      end
      unless ret.length == n
        raise NotImplementedError, "Unexpected partial read from random device: only #{ret.length} for #{n} bytes"
      end
      ret
    end
  end
end

source-code

Is starts with OpenSSL::Random which can be described as:

OpenSSL cannot generate truly random numbers directly. The choices are to use a cryptographically secure PRNG with a good random seed (i.e. with OS harvested data from effectively random hardware events); or use a real hardware RNG. [1]

If OpenSSL is not present it falls back to Ruby's pseudo-random number generator implemented in Random.raw_seed which uses a modified Mersenne Twister with a period of 2**19937-1.

uuid-ossp extension implementation

Looking at uuid-ossp source-code we can find:

Datum
uuid_generate_v4(PG_FUNCTION_ARGS)
{
  return uuid_generate_internal(UUID_MAKE_V4, NULL, NULL, 0);
}

and the uuid_generate_internal relevant part of the function:

case 4:         /* random uuid */
default:
{
  #ifdef HAVE_UUID_E2FS
    uuid_t      uu;
    uuid_generate_random(uu);
    uuid_unparse(uu, strbuf);
  #endif
   break;
}

As we can see there's a call to uuid_generate_random - what can this be?

uuid_generate_random(3) - Linux man page

Its description:

The uuid_generate_random function forces the use of the all-random UUID format, even if a high-quality random number generator (i.e., /dev/urandom) is not available, in which case a pseudo-random generator will be substituted. Note that the use of a pseudo-random generator may compromise the uniqueness of UUIDs generated in this fashion.

Interesting to note that if /dev/urandom is not available - again it falls back to PRNG - It seems there's a nuance here or even a contradiction as /dev/urandom uses csPRNG (cryptographically secure pseudorandom number generator) [3]

Conclusions

This started as a simple curiosity on what versions of UUID I could use - note that this is not a critique of truly-random vs PRGNs as the RFC clearly states:

The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers. [0]

which makes both implementations correct.

It makes sense for the UUID method to be in SecureRandom in Ruby since it's only the version 4 implementation and by default it uses OpenSSL::Random. The only minor gripe I have with this is that there's no UUID lib for general use in Ruby supporting all the versions [2].

Regarding their implementations it should be noted that they differ and if the quality of randomness is important one should further investigate (e.g. due to missing OpenSSL - Ruby will use the internal PRNG which at first look it appears quite solid as it uses for seeding dev/urandom [6] but it might not be in the same class of csPRNG)

For UUIDs both implementations should yield usable random UUIDs even if the library falls back to a non csPRNG algorithm. In the end there's a small gotcha:

Distributed applications generating UUIDs at a variety of hosts must be willing to rely on the random number source at all hosts. If this is not feasible, the namespace variant should be used.

In order to avoid UUID collisions [4].

[0] - RFC-4122
[1] - Why OpenSSL can't use /dev/random directly? question
[2] - I usually use uuidtools
[3] - Myths about /dev/urandom
[4] - uuid-collisions
[5] - bonus read GoodPracticeRNG
[6] - source-code

Tagged under: