tl;dr: You can improve performance by storing false instead of null for timestamps you want to query in Mongo.

Soft deletion

It’s relatively common, at least for Rails developers, to have a column deleted_at on models to “soft delete” some data. It’s handy for cases like a customer writing in and complaining about something that they definitely didn’t delete and need back, right now, please. For the sake of this example, let’s say that we have the following model:

class Bonus
  include Mongoid::Document
  include Mongoid::Timestamps

  field :deleted_at, type: DateTime

  index({ deleted_at: 1 })
end

When we mark a bonus as deleted, we simply store the current timestamp in the deleted_at field and then we know exactly when it was deleted. Very handy.

The problem

We do queries that look like Bonus.where(deleted_at: nil) pretty frequently. You’d think that this query would align well with our index, but the problem is that Mongo’s schema-less nature means that it has to check for documents that both have deleted_at set to null and documents that do not have deleted_at set at all.

This means that its usage of our index is limited. MongoDB is tracking this issue, so hopefully this won’t be true for much longer.

Seeing the details with explain

We have about 13.7 million bonuses in our staging database. Let’s run explain to see some insights into how Mongo is executing this query:

explanation = Bonus.where(deleted_at: nil).explain

This returns a large hash, but there are two particularly interesting values: explanation['queryPlanner']['winningPlan'] shows the plan that Mongo executed to run the query. It looks something like:

{
  "stage" => "FETCH",
  "filter" => {
    "deleted_at"=>{"$eq"=>nil}
  },
  "inputStage" => {
    "stage" => "IXSCAN",
    "keyPattern" => {"deleted_at" => 1},
    "indexName" => "deleted_at_1",
    # ...
  }
}

We can verify here that Mongo is using our index to the best of its ability.

The other interesting value for our purposes is explanation['executionStats']['executionTimeMillis'], which, using our under-powered staging database, is a whopping 76822.

Accommodating Mongo

By storing false instead of null in Mongo, however, we can see great performance improvements. In the Bonus model, let’s change the line declaring the deleted_at field to:

field :deleted_at, type: DateTime, default: false

We’ll also change our query to Bonus.where(deleted_at: false).

The immediate problem is that Mongoid coerces false into nil when storing it on a field with the type DateTime. We’ve made a gem that convinces Mongoid to allow storing false instead: mongoid-falsehoods. Add gem "mongoid-falsehoods" to your Gemfile, bundle, and you’ll be on your way.

You’ll also need to migrate your existing data to have false instead of null:

Bonus.where(deleted_at: nil).update_all(deleted_at: false)

If you’re migrating a large data set, you’ll want to update your code to handle either nil or false values (e.g. Bonus.where(:deleted_at.in => [nil, false]) before running the migration or you’ll see wonky results.

Our results after migrating

The results we see on our staging server after this migration are as follows:

explanation = Bonus.where(deleted_at: false).explain
explanation['executionStats']['executionTimeMillis']

This returns 15722 now. That’s an improvement of 61,100 ms. That’s over a minute shorn off the time. Great success.

Now true, we won’t be fetching that much data at a time from the database, but you’ll still see nice improvements provided that you have the appropriate indexes set up.

Conclusion

The performance improvements possible by doing something that feels dirty - storing false where we really mean nil - are pretty great. I was very surprised by this being necessary, and this trick does still feel dirty, but.. I’ll take the performance win. Hopefully Mongo makes indexes hang out with null better soon and this hack will no longer be necessary.

Go check out mongoid-falsehoods and definitely open a PR if there’s anything that we missed!