Ruby 2.2.0 Preview 1 quick Rails benchmarks

Great news everyone! Ruby 2.2 preview 1 has been released redirect! I'm really curios about the Incremental GC redirect and Symbol GC redirect so let's run some quick Rails benchmarks on a normal Rails API app.

First off let's install the preview via RVM:

rvm install ruby-2.2.0-preview1

After fiddling around about five minutes trying to find a part of the application that doesn't fail under the Preview I stopped at the simple /profiles endpoint that just renders a JSON of all profiles, quite simple indeed. Using the trusty wrk redirect I fired up a quick bench:

wrk -t10 -c10 -d20s http://localhost:8080/profiles

The results are as follows:

Ruby 2.1.2p95

Running 20s test @ http://localhost:8080/profiles
  10 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   255.02ms   25.10ms 372.80ms   67.61%
    Req/Sec     3.21      0.70     5.00     71.13%
  771 requests in 20.01s, 4.40MB read
Requests/sec:     38.53
Transfer/sec:    225.31KB
------------------------------
50%,252 ms
90%,285 ms
99%,328 ms
99.999%,372 ms

Ruby 2.2.0preview1

Running 20s test @ http://localhost:8080/profiles
  10 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   253.27ms   30.64ms 344.75ms   64.21%
    Req/Sec     3.34      0.70     5.00     89.63%
  786 requests in 20.02s, 4.49MB read
Requests/sec:     39.26
Transfer/sec:    229.60KB
------------------------------
50%,251 ms
90%,291 ms
99%,329 ms
99.999%,344 ms

I'm not really sure that I should interpret them yet it seems that under the Preview we have a slight improvement but within margins of error. At this point I don't think is the best benchmark for the Preview as we don't use views thus Rails won't bloat up the memory with Strings.

On the memory usage side we have 65M vs 75M (Preview vs. 2.1) so in this scenario we clearly have a winner.

note: this was measured using OSX's Activity Monitor after wrk finished the benchmark and it's the average of the unicorn workers sizes.

Issues

Bundler and all the gems installed without issue but in some cases I got silent failures. The benchmarks were run on an actual working/production Rails 4.0.x app with around 25 gems. Nonetheless all of the gems installed and I could boot up Rails with unicorn and benchmark the simpler endpoints which is great.

Conclusion

TBD - this is work in progress I will update it with more information

Improve Rails performance by adding a few gems

Working with Rails for some time you start nitpicking on how to improve it. This is a first in the series of articles regarding on how to improve (even marginally) Rails's performance.

I'll focus on a bunch of gems that speed up, in some cases considerably, small parts of Rails, like the html escaping, the String#blank? and JSON utils.

Benchmarking methodology

Methodology is a strong word for just running a couple of times in the console wrk but I'm not searching for the holy grail here, just to get a raw idea.

I switched from the old apache ab to wrk redirect:

wrk is a modern HTTP benchmarking tool capable of generating significant
load when run on a single multi-core CPU.

wrk -t10 -c10 -d10s http://localhost:3000

This runs a benchmark for 10 seconds, using 10 threads, and keeping 50 HTTP connections open i.e. this should suffice. Just remember to benchmark on your actual app to see the real improvements.

The escape_utils gem

Just faster all html escaping via the lovely escape_utils redirect gem. In order to use it in Rails one needs to add an initializer that patches things up:

begin
  require 'escape_utils/html/rack' # to patch Rack::Utils
  require 'escape_utils/html/erb' # to patch ERB::Util
  require 'escape_utils/html/cgi' # to patch CGI
  require 'escape_utils/html/haml' # to patch Haml::Helpers
rescue LoadError
  Rails.logger.info 'Escape_utils is not in the gemfile'
end

The logic to test it:

def escape_utils
  @escape_me = <<-HTML
    <body class="application articles_show">
      <!-- Responsive navigation
      ==================================================== -->
      <div class="container">
        <nav id="nav">
      <ul>
        <li><a href="/"><i class="ss-standard ss-home"></i>home</a></li>
        <li><a href="/home/about"><i class="ss-standard ss-info"></i>about</a></li>
        <li><a href="/contact"><i class="ss-standard ss-ellipsischat"></i>contact</a></li>
        <li><a href="/home/projects"><i class="ss-standard ss-fork"></i>projects</a></li>
        <li><a href="/tags"><i class="ss-standard ss-tag"></i>tags</a></li>
        <li><a href="/articles?query=code"><i class="ss-standard ss-search"></i>search</a></li>
      </ul>
    </nav>
    <a href="#" class="ss-standard ss-list" id="nav-toggle" aria-hidden="true"></a>
  HTML

  render inline: "Hello  world <%= @escape_me %>"
end

With standard Rails:

Running 10s test @ http://localhost:3000/sidechannels/bench
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    35.40ms    3.55ms  64.70ms   91.98%
    Req/Sec   142.19     11.68   164.00     83.12%
  2837 requests in 10.00s, 4.92MB read
Requests/sec:    283.61
Transfer/sec:    503.34KB

With the escape_utils gem:

Running 10s test @ http://localhost:3000/sidechannels/bench
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    34.06ms    3.89ms  63.92ms   89.10%
    Req/Sec   148.65     13.36   180.00     75.94%
  2960 requests in 10.00s, 5.46MB read
Requests/sec:    295.98
Transfer/sec:    558.72KB

The fast_blank gem

Living under the impression that the blank? method is too slow? say no more and just try the fast_blank redirect gem!

Just add gem 'fast_blank' to your Gemfile and this should speed up quite nicely the String#blank? method as described in this article redirect. For testing I just added this code:

fast_blank is a simple extension which provides a fast implementation of active support's string#blank? function

  def fast_blank_test
    n = 1000

    strings = [
      "",
      "\r\n\r\n  ",
      "this is a test",
      "   this is a longer test",
      "   this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test
      this is a longer test"
    ]

    Benchmark.bmbm  do |x|
      strings.each do |s|
        x.report("Fast Blank #{s.length}    :") do
          n.times { s.blank? }
        end
      end
    end

    render nothing: true
  end

With standard Rails:

Running 10s test @ http://localhost:3000/sidechannels/bench
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.40s   207.72ms   1.58s    92.68%
    Req/Sec     3.10      2.11     6.00     53.66%
  69 requests in 10.01s, 33.08KB read
Requests/sec:      6.90
Transfer/sec:      3.31KB

With the fast_blank gem:

Running 10s test @ http://localhost:3000/sidechannels/bench
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.33s   179.56ms   1.41s    93.33%
    Req/Sec     3.07      0.80     4.00     40.00%
  72 requests in 10.00s, 34.52KB read
Requests/sec:      7.20
Transfer/sec:      3.45KB

The oj gem

# oj gem
gem 'oj'
gem 'oj_mimic_json' # we need this for Rails 4.1.x

The test logic is simple, just serialize all articles into JSON:

class SidechannelsController < ApplicationController
  def oj
    render json: Article.all
  end
end

With standard Rails serializers:

Running 10s test @ http://localhost:3000/sidechannels/bench
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   108.37ms    5.12ms 134.90ms   83.33%
    Req/Sec    45.76      3.60    55.00     57.69%
  922 requests in 10.00s, 57.41MB read
Requests/sec:     92.17
Transfer/sec:      5.74MB

With oj gem:

Running 10s test @ http://localhost:3000/sidechannels/bench
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    78.06ms    4.43ms  92.83ms   81.31%
    Req/Sec    63.64      5.33    71.00     64.49%
  1277 requests in 10.00s, 79.83MB read
Requests/sec:    127.65
Transfer/sec:      7.98MB

Using jemalloc

OK, this is not really a gem, if you want to dig into it then do check out my gist redirect. On initial testing it won't yield much performance gains, at least for my use case.

note: it will be included by default in Ruby at some point.

update: do try the jemalloc redirect gem by kzk:

gem install jemalloc

je -v rails s

Dig into your Rails app

Fear not and use MiniProfiler redirect with the awesome FlameGraphs redirect by Sam Saffron.

Conclusion

Depending on what your app is doing you might want to add to your Gemfile some of these gems, I usually add them all just for good measure (you might want to check your RAM usage and have a full test suite before doing this though).

The oj gem is just great for a Rails based JSON API where you can drop the views and just serialize using representers or your pattern of choice.

Is Rubinius 2.2.3 production ready ?

Ruby 2.1 was released this Christmas, great news everyone! It sports a better GC (RGenGC - gerational GC), hierarchical method caching, some small syntax changes and non-experimental rafinements. All in all one can expect 5% to 15% performance increase which is quite awesome.

As I was reading the comments in the hacker news thread related to the release, one of them caught my eye - we need JIT in the MRI Ruby VM. OK but looks like everyone forgets about Rubinius that sports for some time a LLVM-based JIT, native threads, low-pause generational garbage collector and almost perfect support of C extensions.

Yes we also have JRuby - it might just be the fastest implementation and when coupled with a server like TorqueBox is even faster, the main issue: some C libraries need to be swapped but as one can see later in the article this was just old thinking at play, clearly things are much better now as most gems do support JRuby without any issue.

So we have this awesome middle-ground so to say between Ruby VMs: it supports both C extensions without a problem (yet, empirically at least, some more exotic gems might fail to install) but somehow it gets ignored ?

The plan is simple, take one production Rails 4.0.2 app with all its dependences and convert it to Rubinius, install Puma and do some benchmarking then if all good deploy to staging.

The setup

Rubinius extracted most of the standard library into gems so in order to properly boot any Ruby script that uses them one needs this into the Gemfile:

gem 'racc'
gem 'rubysl'
gem 'puma'

Notes:

  • rubysl - is a rather cryptic name for ruby standard library gem
  • racc is a LALR(1) parser generator written by tenderlove - is also a hard requirement for Rubinius or else no booting up Rails.
  • puma - well this is the best server choice for Rubinius as it supports native threads

Gems that won't work with Rubinius (will update if I'll find more):

# gem 'oj'

Remember to comment them out or again Rails won't boot up.

For VM install and switching I'm using the good old RVM with the latest version of Rubinius 2.2.3 and Ruby 2.1.0.

On to the benchmarks!

OK for obvious reasons I can't share the source code of the app, at some point I might create a public repository with something meaty for
testing. These benchmarks are for a special case only and for some good fun also so before making any decisions based on them DO TEST first on your own.

The Rails app is actually an API so most of the important parts are disabled (i.e. Streaming DataStreaming Rendering RequestForgeryProtection) also the sprockets railtie.

ApacheBench config

I used the simple ApacheBench, Version 2.3 - clearly not the ideal tool for benchmarking (one should use siege or something similar) but
for a quick glance like this test it fits the job nicely.

The command to start it up:
ab -n400 -c16 -T'application/json' http://localhost:3000/entries

unicorn.rb

# config/unicorn.rb
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3)
timeout 15
preload_app true

before_fork do |server, worker|
  # et cetera
end

The command to start it up:
unicorn_rails -c config/unicorn.rb -p 3000

Results after some runs:

Concurrency Level:      16
Time taken for tests:   6.769 seconds
Complete requests:      400
Failed requests:        0
Write errors:           0
Total transferred:      3611600 bytes
HTML transferred:       3346800 bytes
Requests per second:    59.09 [#/sec] (mean)
Time per request:       270.766 [ms] (mean)
Time per request:       16.923 [ms] (mean, across all concurrent requests)
Transfer rate:          521.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.0      0       5
Processing:    49  267  36.8    270     331
Waiting:       44  266  36.8    269     330
Total:         49  267  36.3    270     331

Percentage of the requests served within a certain time (ms)
  50%    270
  66%    283
  75%    288
  80%    293
  90%    308
  95%    315
  98%    324
  99%    327
 100%    331 (longest request)

puma.rb on Rubinius 2.2.3

# config/puma.rb
threads 8,32
workers 1

preload_app!

on_worker_boot do
  # et cetera
end

The command to start it up:
puma -C config/puma.rb -b tcp://localhost:3000

Results after several runs (so that the JIT can do its magic):

Concurrency Level:      16
Time taken for tests:   9.383 seconds
Complete requests:      400
Failed requests:        0
Write errors:           0
Total transferred:      3590400 bytes
HTML transferred:       3346800 bytes
Requests per second:    42.63 [#/sec] (mean)
Time per request:       375.311 [ms] (mean)
Time per request:       23.457 [ms] (mean, across all concurrent requests)
Transfer rate:          373.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    84  371 105.4    348     731
Waiting:       83  363 104.1    338     728
Total:         84  371 105.3    348     732

Percentage of the requests served within a certain time (ms)
  50%    348
  66%    390
  75%    431
  80%    458
  90%    526
  95%    571
  98%    640
  99%    683
 100%    732 (longest request)

puma.rb on JRuby 1.7.9

# config/puma.rb
threads 8,32

preload_app!

on_worker_boot do
  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.establish_connection
  end
end

The command to start it up:
puma -C config/puma.rb -b tcp://localhost:3000

Gems that need replacement:

# gem 'pg'
gem 'activerecord-jdbcpostgresql-adapter'

Results after several runs (so that the JIT can do its magic):

Concurrency Level:      16
Time taken for tests:   4.019 seconds
Complete requests:      400
Failed requests:        0
Write errors:           0
Total transferred:      3590400 bytes
HTML transferred:       3346800 bytes
Requests per second:    99.53 [#/sec] (mean)
Time per request:       160.760 [ms] (mean)
Time per request:       10.048 [ms] (mean, across all concurrent requests)
Transfer rate:          872.42 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:    35  158  31.5    157     389
Waiting:       34  151  27.1    149     261
Total:         36  158  31.5    157     389

Percentage of the requests served within a certain time (ms)
  50%    157
  66%    166
  75%    173
  80%    177
  90%    189
  95%    204
  98%    232
  99%    260
 100%    389 (longest request)

Conclusion

The poor Rubinius performance might be related to the racc gem as it might be really slow as detailed in this Github thread redirect

14 req/s vs 60 req/s (I disabled cache and the app produces lots of ActiveRecord objects that's why the numbers are rather low) makes Rubinius, for now, not a good choice for this particular Rails app.

Update

Thanks to headius redirect I've revised the benchmarks:
- apparently my VM was accessing only one core (thus the initial abysmal performance of Rubinius and JRuby) - bumped to four
- updated all benchmarks and also added JRuby 1.7.9

   jruby 99.53 #################################
   cruby 59.09 ###################
rubinius 42.63 ##############

As one can clearly see from the chart above JRuby is the winner by an impressive margin and with only one gem change I think it deserves to be pushed to staging.