Method of the Month 1: Ruby's sort vs. sort_by

Ruby’s Array has 2 methods for sorting: sort and sort_by. Both methods sort your array (obviously) but they do it in slightly different ways.

Background

This article is based on a talk I gave at the Ann Arbor Ruby Brigade in August. The idea for this talk was conjured up at eRubyCon during a discussion between Gayle Craig and me. We decided we would each give a brief talk about the differences between sort and sort_by at our next user group meetings. Amazingly, we both followed through.

sort

A Ruby array’s sort method is pretty straightforward. If called without a block, it uses each objects’ comparison operator (<=>) to return a new array with all the elements sorted. If called with a block, you may specify your own comparator as shown below:

[1, 4, 3, 7, 2, 9].sort                   # => [1, 2, 3, 4, 7, 9]
[1, 4, 3, 7, 2, 9].sort {|a, b| b <=> a}  # => [9, 7, 4, 3, 2, 1]

Note that the block for sort accepts two parameters. The return value of the block must be -1, 0, or 1. -1 indicates that the first item is less than the second, 0 indicates they are equal, and 1 indicates that the first item is greater than the second one.

sort_by

The sort_by method is slightly different. Its block only takes one parameter instead of two. Your array is then sorted by the results of each of the block calls. For example:

[1, 4, 3, 7, 2, 9].sort_by {|i| -i}             # => [9, 7, 4, 3, 2, 1]
%w{ abcd ab abc abcde }.sort_by {|i| i.length}  # => ["ab", "abc", "abcd", "abcde"]

This is accomplished using something called a Schwartzian Transform.

The Schwartzian Transform

All a schwartzian transform does is turn each record of your array into a tuple with the first element being the item on which you wish to sort. That array of tuples is then sorted. Since the first item in the tuple gets precedence, it is effectively the same as sorting the array by the results of the sort_by block. Once sorted, the first element of the tuple is removed, leaving your original array in sorted order. Below is an example implementation in pure ruby:

module Enumerable
  # This is essentially the implementation of sort_by
  # It's written in C in MRI, so it's a little faster
  def my_sort_by
    self.map {|i| [yield(i), i]}.sort.map {|j| j[1]}
  end
end

Why choose one over the other?

If you’re sorting a small array, it really doesn’t matter which sort method you choose. Use whichever is easiest to code and understand. For larger arrays, however, the performance differences between sort and sort_by can be substantial.

You generally want to use sort_by over sort whenever comparisons are relatively expensive. Note that in Ruby, method dispatch is often enough to count as an expensive comparison. Unless you’re sorting things that end up as C primitives in MRI (like an Integer) you’ll likely want to use sort_by.

The performance varies so greatly because of the Schwartzian Transform. Using sort, each pairwise comparison that must be performed calls the <=> operator. In the case of sort_by, the comparison “calculation” is only done once at the beginning, and then the array is sorted on an Integer primitive much more quickly.

I hope this was helpful to someone. It’s probably more than anyone has written on the difference between these two methods.

comments powered by Disqus