Is Rubinius 2.2.3 production ready ?

Ruby 2.1 was released this Christmas, great news everyone! It sports a better GC (RGenGC - gerational GC), hierarchical method caching, some small syntax changes and non-experimental rafinements. All in all one can expect 5% to 15% performance increase which is quite awesome.

As I was reading the comments in the hacker news thread related to the release, one of them caught my eye - we need JIT in the MRI Ruby VM. OK but looks like everyone forgets about Rubinius that sports for some time a LLVM-based JIT, native threads, low-pause generational garbage collector and almost perfect support of C extensions.

Yes we also have JRuby - it might just be the fastest implementation and when coupled with a server like TorqBox is even faster, the main issue: some C libraries need to be swapped but as one can see later in the article this was just old thinking at play, clearly things are much better now as most gems do support JRuby without any issue.

So we have this awesome middle-ground so to say between Ruby VMs: it supports both C extensions without a problem (yet, empirically at least, some more exotic gems might fail to install) but somehow it gets ignored ?

The plan is simple, take one production Rails 4.0.2 app with all its dependences and convert it to Rubinius, install Puma and do some benchmarking then if all good deploy to staging.

The setup

Rubinius extracted most of the standard library into gems so in order to properly boot any Ruby script that uses them one needs this into the Gemfile:

gem 'racc'
gem 'rubysl'
gem 'puma'

Notes:

  • rubysl - is a rather cryptic name for ruby standard library gem
  • racc is a LALR(1) parser generator written by tenderlove - is also a hard requirement for Rubinius or else no booting up Rails.
  • puma - well this is the best server choice for Rubinius as it supports native threads

Gems that won't work with Rubinius (will update if I'll find more):

# gem 'oj'

Remember to comment them out or again Rails won't boot up.

For VM install and switching I'm using the good old RVM with the latest version of Rubinius 2.2.3 and Ruby 2.1.0.

On to the benchmarks!

OK for obvious reasons I can't share the source code of the app, at some point I might create a public repository with something meaty for
testing. These benchmarks are for a special case only and for some good fun also so before making any decisions based on them DO TEST first on your own.

The Rails app is actually an API so most of the important parts are disabled (i.e. Streaming DataStreaming Rendering RequestForgeryProtection) also the sprockets railtie.

ApacheBench config

I used the simple ApacheBench, Version 2.3 - clearly not the ideal tool for benchmarking (one should use siege or something similar) but
for a quick glance like this test it fits the job nicely.

The command to start it up:
ab -n400 -c16 -T'application/json' http://localhost:3000/entries

unicorn.rb

# config/unicorn.rb
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3)
timeout 15
preload_app true

before_fork do |server, worker|
  # et cetera
end

The command to start it up:
unicorn_rails -c config/unicorn.rb -p 3000

Results after some runs:

Concurrency Level:      16
Time taken for tests:   6.769 seconds
Complete requests:      400
Failed requests:        0
Write errors:           0
Total transferred:      3611600 bytes
HTML transferred:       3346800 bytes
Requests per second:    59.09 [#/sec] (mean)
Time per request:       270.766 [ms] (mean)
Time per request:       16.923 [ms] (mean, across all concurrent requests)
Transfer rate:          521.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.0      0       5
Processing:    49  267  36.8    270     331
Waiting:       44  266  36.8    269     330
Total:         49  267  36.3    270     331

Percentage of the requests served within a certain time (ms)
  50%    270
  66%    283
  75%    288
  80%    293
  90%    308
  95%    315
  98%    324
  99%    327
 100%    331 (longest request)

puma.rb on Rubinius 2.2.3

# config/puma.rb
threads 8,32
workers 1

preload_app!

on_worker_boot do
  # et cetera
end

The command to start it up:
puma -C config/puma.rb -b tcp://localhost:3000

Results after several runs (so that the JIT can do its magic):

Concurrency Level:      16
Time taken for tests:   9.383 seconds
Complete requests:      400
Failed requests:        0
Write errors:           0
Total transferred:      3590400 bytes
HTML transferred:       3346800 bytes
Requests per second:    42.63 [#/sec] (mean)
Time per request:       375.311 [ms] (mean)
Time per request:       23.457 [ms] (mean, across all concurrent requests)
Transfer rate:          373.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    84  371 105.4    348     731
Waiting:       83  363 104.1    338     728
Total:         84  371 105.3    348     732

Percentage of the requests served within a certain time (ms)
  50%    348
  66%    390
  75%    431
  80%    458
  90%    526
  95%    571
  98%    640
  99%    683
 100%    732 (longest request)

puma.rb on JRuby 1.7.9

# config/puma.rb
threads 8,32

preload_app!

on_worker_boot do
  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.establish_connection
  end
end

The command to start it up:
puma -C config/puma.rb -b tcp://localhost:3000

Gems that need replacement:

# gem 'pg'
gem 'activerecord-jdbcpostgresql-adapter'

Results after several runs (so that the JIT can do its magic):

Concurrency Level:      16
Time taken for tests:   4.019 seconds
Complete requests:      400
Failed requests:        0
Write errors:           0
Total transferred:      3590400 bytes
HTML transferred:       3346800 bytes
Requests per second:    99.53 [#/sec] (mean)
Time per request:       160.760 [ms] (mean)
Time per request:       10.048 [ms] (mean, across all concurrent requests)
Transfer rate:          872.42 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    35  158  31.5    157     389
Waiting:       34  151  27.1    149     261
Total:         36  158  31.5    157     389

Percentage of the requests served within a certain time (ms)
  50%    157
  66%    166
  75%    173
  80%    177
  90%    189
  95%    204
  98%    232
  99%    260
 100%    389 (longest request)

Conclusion

The poor Rubinius performance might be related to the racc gem as it might be really slow as detailed in this Github thread redirect

14 req/s vs 60 req/s (I disabled cache and the app produces lots of ActiveRecord objects that's why the numbers are rather low) makes Rubinius, for now, not a good choice for this particular Rails app.

Update

Thanks to headius redirect I've revised the benchmarks:
- apparently my VM was accessing only one core (thus the initial abysmal performance of Rubinius and JRuby) - bumped to four
- updated all benchmarks and also added JRuby 1.7.9

   jruby 99.53 #################################
   cruby 59.09 ###################
rubinius 42.63 ##############

As one can clearly see from the chart above JRuby is the winner by an impressive margin and with only one gem change I think it deserves to be pushed to staging.

How to reduce the size of your VMs

As the host OS I use Windows 8 (just upgraded to 8.1), well why ? one might ask.Is rather simple: Steam and after some dreadful searching and configuring tools I managed to install the basics for writing and deploying code: Vim, Git, PuTTy and a decent console replacement like Console2 [0].

Of course those aren't enough as 90% of my coding is *nix dependent and I wouldn't run it on anything else; actually I'm still thinking of dropping Windows for something like ArchLinux [1] I fiddled with it and enjoyed all the low level stuff that just doesn't exist in Ubuntu Desktop.

Needless to say I use lots of VMs: I currently have three images that I use daily: Ubuntu Desktop, Ubuntu Server and an old Windows XP (well, I don't use it daily more like monthly). The problem of using just a few VMs with many projects is that it involves a lot of hdd trashing (i.e. git, deleting and creating lots of files).

What I did not realize is that my VMs were growing bigger and bigger by the day while their internal hdd space usage stayed mostly the same as I deleted old files and cleaned them up. For reference these were the initial sizes:

  • Ubuntu desktop: 19.7 GB
  • Ubuntu server: 8.0 GB
  • Windows XP : 7.9 GB

While trashing the SSD with all the writes and deletes I was expanding the VHDs despite the fact that I deleted the files in the VM. The cause: the actual blocks weren't nil so running a VM compact wouldn't yield any results.

With the --compact option, can be used to compact disk images, i.e. remove blocks that only contains zeroes. This will shrink a dynamically allocated image again; it will reduce the physical size of the image without affecting the logical size of the virtual disk. Compaction works both for base images and for diff images created as part of a snapshot.

The fix was clear: one needs from time to time to nullify the free space. After doing some quick research I've found easy ways to do it on every platform.

On Linux

The first version is using a tool like secure delete but with a very fast run:

sudo apt-get install secure-delete
sfill -f -z -l -l -I -v /

Where:

-f  fast (and insecure mode): no /dev/urandom, no synchronize mode.
-z  last wipe writes zeros, not random data.
-l  lessens the security (use twice for total insecure mode)
-I  just wipe space, not inodes
-v  is verbose mode.

Or a much simpler way:

sudo dd if=/dev/zero of=/bigemptyfile bs=4096k
sudo rm -rf /bigemptyfile

This will fill the entire empty space with a null file that is erased at the end. I haven't tested this too much, it might yield issues but is faster than the first version.

On Windows

On Windows Xp one just has to download SysinternalsSuite [2] and run sdelete –z from the command prompt.

OK once all of this is over: shutdown the VMs and run compact on the VHD. Using VirtualBox just run from the console VBoxManage modifyhd thedisk.vdi --compact, using VMware, well just click around till you find it.

OK time for results:

  • Ubuntu desktop: 11.0 GB ~ -45%
  • Ubuntu server: 4.3 GB ~ -47%
  • Windows XP : 6.6 GB ~ -16%

That's a whopping 13.7 GB out of my main SSD 128G drive that I use as the system drive, not bad, not bad at all !

Credits:

Chunked transfer encoding in Rails (streaming)

Anyone that has written a little PHP knows what the flush() family of functions do. The ideal usage scenario for using chunked transfer[0] is when we have something costly to render e.g. the first three most recent articles on a blog. Why ? one might ask.

Is rather simple: in a normal request where the server responds with a Content-Length header the browser will wait until the whole page comes down the wire then it goes loading the assets et al.

Using the Transfer-Encoding: chunked header, the server will send chunks of the rendered page back to the browser so in the case of Rails, it starts with the layout and sends out the <head> part including assets like js and css.

It's clear how this helps the rendering of the page on the client side : get the first chunk containing the <head> with assets, immediately start loading the assets while waiting for the rest of the response. Of course, browsers nowadays include lots of micro-optimizations that might already do something similar but still this remains a good practice.

Implementation wise, you just need to add to your controller methods something like :

class YourController < ApplicationController
  def index
    @articles = Article.most_recent
    render stream: true
  end
  # other controller logic
end

The latest version of Unicorn (4.x) comes by default[1] with support for chunked response. You can always add to your unicorn_config.rb something like:

# :tcp_nopush This prevents partial TCP frames from being sent out
# :tcp_nodelay Disables Nagle’s algorithm on TCP sockets if true.
port = ENV["PORT"].to_i || 3000
# the ENV["PORT"] is a Heroku environment variable
listen port, tcp_nopush: false, tcp_nodelay: true

We also have some quirks when using streaming in Rails because of the inversion of template rendering order[2]:

When streaming, rendering happens top-down instead of inside-out. Rails starts with the layout, and the template is rendered later, when its yield is reached .

tl;dr: use provide instead of content_for when you have multiple calls to content_for otherwise it will break the purpose of streaming and/or it will concatenate the values from content_for.

There's also a “small” issue with NewRelic agent and Heroku: you need to disable browser instrumentation or else you'll get a blank page[3], thankfully the fix is rather trivial:

# config/newrelic.yml
  browser_monitoring:
    # By default the agent automatically injects
    # the monitoring JavaScript into web pages
    # Turn this to false
    auto_instrument: false

There's also ActionController::Live that can be used to create a simple Rails 4 chat application[4][5].

Marian Posăceanu


rubyist@okapistudio