Method of the Month 1: Ruby's sort vs. sort_by
Ruby’s Array has 2 methods for sorting: sort
and sort_by
. Both
methods sort your array (obviously) but they do it in slightly different ways.
Background
This article is based on a talk I gave at the Ann Arbor Ruby
Brigade in August. The idea for this talk was
conjured up at eRubyCon during a discussion
between Gayle Craig and me. We
decided we would each give a brief talk about the differences between
sort
and sort_by
at our next user group meetings. Amazingly, we
both followed through.
sort
A Ruby array’s sort
method is pretty straightforward. If called
without a block, it uses each objects’ comparison operator (<=>
) to
return a new array with all the elements sorted. If called with a
block, you may specify your own comparator as shown below:
[1, 4, 3, 7, 2, 9].sort # => [1, 2, 3, 4, 7, 9]
[1, 4, 3, 7, 2, 9].sort {|a, b| b <=> a} # => [9, 7, 4, 3, 2, 1]
Note that the block for sort
accepts two parameters. The return
value of the block must be -1, 0, or 1. -1 indicates that the first
item is less than the second, 0 indicates they are equal, and 1
indicates that the first item is greater than the second one.
sort_by
The sort_by
method is slightly different. Its block only takes one
parameter instead of two. Your array is then sorted by the results of
each of the block calls. For example:
[1, 4, 3, 7, 2, 9].sort_by {|i| -i} # => [9, 7, 4, 3, 2, 1]
%w{ abcd ab abc abcde }.sort_by {|i| i.length} # => ["ab", "abc", "abcd", "abcde"]
This is accomplished using something called a Schwartzian Transform.
The Schwartzian Transform
All a schwartzian transform does is turn each record of your array
into a tuple with the first element being the item on which you wish
to sort. That array of tuples is then sorted. Since the first item
in the tuple gets precedence, it is effectively the same as sorting
the array by the results of the sort_by
block. Once sorted, the
first element of the tuple is removed, leaving your original array in
sorted order. Below is an example implementation in pure ruby:
module Enumerable
# This is essentially the implementation of sort_by
# It's written in C in MRI, so it's a little faster
def my_sort_by
self.map {|i| [yield(i), i]}.sort.map {|j| j[1]}
end
end
Why choose one over the other?
If you’re sorting a small array, it really doesn’t matter which sort
method you choose. Use whichever is easiest to code and understand.
For larger arrays, however, the performance differences between sort
and sort_by
can be substantial.
You generally want to use sort_by
over sort
whenever comparisons
are relatively expensive. Note that in Ruby, method dispatch is often
enough to count as an expensive comparison. Unless you’re sorting
things that end up as C primitives in MRI (like an Integer) you’ll
likely want to use sort_by
.
The performance varies so greatly because of the Schwartzian
Transform. Using sort
, each pairwise comparison that must be
performed calls the <=>
operator. In the case of sort_by
, the
comparison “calculation” is only done once at the beginning, and then
the array is sorted on an Integer primitive much more quickly.
I hope this was helpful to someone. It’s probably more than anyone has written on the difference between these two methods.