Subscribe to XML Feed
18 Nov 2009

Method of the Month 1: Ruby's sort vs. sort_by

Ruby’s Array has 2 methods for sorting: sort and sort_by. Both methods sort your array (obviously) but they do it in slightly different ways.

Background

This article is based on a talk I gave at the Ann Arbor Ruby Brigade in August. The idea for this talk was conjured up at eRubyCon during a discussion between Gayle Craig and me. We decided we would each give a brief talk about the differences between sort and sort_by at our next user group meetings. Amazingly, we both followed through.

sort

A Ruby array’s sort method is pretty straightforward. If called without a block, it uses each objects’ comparison operator (<=>) to return a new array with all the elements sorted. If called with a block, you may specify your own comparator as shown below:

[1, 4, 3, 7, 2, 9].sort                   # => [1, 2, 3, 4, 7, 9]
[1, 4, 3, 7, 2, 9].sort {|a, b| b <=> a}  # => [9, 7, 4, 3, 2, 1]

Note that the block for sort accepts two parameters. The return value of the block must be -1, 0, or 1. -1 indicates that the first item is less than the second, 0 indicates they are equal, and 1 indicates that the first item is greater than the second one.

sort_by

The sort_by method is slightly different. Its block only takes one parameter instead of two. Your array is then sorted by the results of each of the block calls. For example:

[1, 4, 3, 7, 2, 9].sort_by {|i| -i}             # => [9, 7, 4, 3, 2, 1]
%w{ abcd ab abc abcde }.sort_by {|i| i.length}  # => ["ab", "abc", "abcd", "abcde"]

This is accomplished using something called a Schwartzian Transform.

The Schwartzian Transform

All a schwartzian transform does is turn each record of your array into a tuple with the first element being the item on which you wish to sort. That array of tuples is then sorted. Since the first item in the tuple gets precedence, it is effectively the same as sorting the array by the results of the sort_by block. Once sorted, the first element of the tuple is removed, leaving your original array in sorted order. Below is an example implementation in pure ruby:

module Enumerable
  # This is essentially the implementation of sort_by
  # It's written in C in MRI, so it's a little faster
  def my_sort_by
    self.map {|i| [yield(i), i]}.sort.map {|j| j[1]}
  end
end

Why choose one over the other?

If you’re sorting a small array, it really doesn’t matter which sort method you choose. Use whichever is easiest to code and understand. For larger arrays, however, the performance differences between sort and sort_by can be substantial.

You generally want to use sort_by over sort whenever comparisons are relatively expensive. Note that in Ruby, method dispatch is often enough to count as an expensive comparison. Unless you’re sorting things that end up as C primitives in MRI (like an Integer) you’ll likely want to use sort_by.

The performance varies so greatly because of the Schwartzian Transform. Using sort, each pairwise comparison that must be performed calls the <=> operator. In the case of sort_by, the comparison “calculation” is only done once at the beginning, and then the array is sorted on an Integer primitive much more quickly.

I hope this was helpful to someone. It’s probably more than anyone has written on the difference between these two methods.

View Comments
07 Apr 2009

Walk Like a Duck

Introduction

In a duck-typed language like Ruby, it’s very important that you actually use duck typing. This is especially important when you’re designing a library or other code that could interact with objects that you don’t control. I have come across a few libraries lately that don’t follow duck typing conventions and have caused unexpected behavior when I’ve used them.

In this article, I’m going to pick on Shoulda. Thoughtbots, please do not take offense! I’m a fan, and I figured you guys could take it.

Duck Typing

it's a duck!

Before I get into the details, I’ll say a word or two about duck typing for those who may need a refresher. The principle of duck typing basically says that if something “walks like a duck and talks like a duck” it probably is a duck. What this really means is that when you’re interacting with other objects, you should not care what they are, but rather how they behave. If you have an object that responds to #each, who cares if it’s an Array or Set or a custom collection?

Implementing Duck Typing

You should rarely ever use #is_a? on an object. You should be using #respond_to? instead. The whole point in duck typing is that you don’t care what kind of object you’ve received. The only thing you care about is that the object does what you want it to do. Let me show you an example from Shoulda.

The following code is from lib/shoulda/assertions.rb in Shoulda’s git repository. assert_contains is a custom assertion for Shoulda that allows you to easily test a collection for the presence of a particular element.

def assert_contains(collection, x, extra_msg = "")
  collection = [collection] unless collection.is_a?(Array)
  msg = "#{x.inspect} not found in #{collection.to_a.inspect} #{extra_msg}"
  case x
  when Regexp
    assert(collection.detect { |e| e =~ x }, msg)
  else         
    assert(collection.include?(x), msg)
  end
end

If you look at the first line of the method definition, you’ll notice that they are calling is_a?(Array) on the “collection” this is passed into the function. The code I was writing was using a Set instead of an Array. The trouble is, if you pass anything other than an array, that line will wrap whatever you passed in an array. So I ended up with an Array with a Set inside of it, which caused the rest of the assertion code to fail.

The solution to this problem is very easy. If you look at my fork, all I’ve done is change that one line (and add some tests, of course). Now we only wrap the collection in an array if it doesn’t respond to include?. It’ll work with an Array still. It’ll work with a Set. It’ll work with your custom data structure that can detect the presence of objects. That’s the power of duck typing.

def assert_contains(collection, x, extra_msg = "")
  collection = [collection] unless collection.respond_to? :include?
  msg = "#{x.inspect} not found in #{collection.to_a.inspect} #{extra_msg}"
  case x
  when Regexp
    assert(collection.detect { |e| e =~ x }, msg)
  else         
    assert(collection.include?(x), msg)
  end
end

Conclusion

Whenever you’re checking for an actual class of an object rather than examining how it behaves, take a second to think if you can duck type it instead. It will make your code more generic, and make it easier for others to use.

Again, thank you to Thoughtbot for Shoulda and I’m sorry I singled you out. Shoulda is great, and just happened to be in the wrong place at the wrong time for me to start a rant about duck typing.

View Comments
17 Mar 2009

Automatically log in on merb-auth account creation

Recently I was adding new user registration to tasteb.in and I wanted my users to be automatically logged in as soon as they created their account. I’m using merb for tasteb.in, and consequently merb-auth for authentication. It turns out it’s really easy to implement, but it took me a while to get it right. All you need to do is add the following to your action that creates the new user:

session.user = @user

That’s it. All you must do is set the user in the session to the newly created user. I hope this saves somebody else some time.

View Comments
04 Mar 2009

Fix Missing Disqus Comment Form

I had a bit of trouble with Disqus where it would only display a link saying “Discuss this topic on Disqus” instead of displaying an actual comment form. It turns out that this problem was caused by a mismatch between the URL configured in Disqus and the url where my blog was actually accessible. So if you’re missing a Disqus comment form, make sure that the URL listed under Settings -> Website URL in Disqus is the actual URL where your site resides.

View Comments

Older Posts

tasteb.in is up! 01 Mar 2009 Comments
Play in the Sandbox 14 Feb 2009 Comments
How to learn Ruby 05 Feb 2009 Comments
Sinatra block parameters! 04 Feb 2009 Comments
Run specs with autotest, er, autospec! 26 Jan 2009 Comments
Gearing up for CodeMash 06 Jan 2009 Comments
Commit a linear git history to subversion 04 Jan 2009 Comments
Use Capistrano to set your production database password 03 Jan 2009 Comments
Resolutions 02 Jan 2009 Comments
Ruby Hoedown Videos Online 22 Aug 2008 Comments
Make RubyMate work with MacPorts Ruby 18 Jun 2008 Comments
Install Passenger Nginx From Scratch On Ubuntu 01 May 2008 Comments
Fix empty sake tasks 10 Feb 2008 Comments
Rails 2.0 is out 07 Dec 2007 Comments
Using other MySQL Storage Engines 03 Dec 2007 Comments
RubyInject 27 Nov 2007 Comments
shoulda 29 Sep 2007 Comments
Test behavior, not implementation! 21 Sep 2007 Comments
I can has patch? 18 Sep 2007 Comments
Caboose Sample App 10 Sep 2007 Comments
Railsify 07 Sep 2007 Comments
Massive list of Rails development tips from Peepcode 07 Sep 2007 Comments
Predefined models? No different than any others. 31 Aug 2007 Comments
Installing a Rails stack on Mac OS X with MacPorts 28 Aug 2007 Comments
First Post 27 Aug 2007 Comments