Beginner
This post is part of my Stepping Up Rails series.
Ahh, the elusive N+1 query. Perhaps you’ve heard of him from your engineering pit, cursed under the breath of engineers or beshrewed out load when such an engineer encounters an endpoint (or web request) that’s slow, slow, slow. The N+1 query is very often the culprit.
See, in web development, we like things fast, fast, fast. That is, we want our webpages to load really fast. (And typically that means 200ms or faster).
Let’s say you have a database of teachers and students. Your teachers have many students.
In Rails, this might be expressed as a Rails class Teacher
that has_many :students
.
Let’s load the students in the HAML via the association.
When viewed in the browser, it might look something like so:
Finally, if you examine the Rails console, you’ll see Rails makes 3 queries: 1) to load up the teacher, 2) to count the number of students (this is because I explicitly asked for the count of students) , and 3) it the loads all of the associated students in the final query.
Let me propose making this slightly more complex to demonstrate how to create an N+1 query. Remember, N+1 queries are not something we want. I’m going to show you how to create an N+1 query and then how to fix it.
Students have many report_cards
. In this simple example, we’re going to be querying for the last report card for each student.
We have some HAML that looks like this:
Here I want to show an unordered list of the name of the students and the last report card for each student.
When we run this, we’ll see this in our browser:
In our console, we will see:
Take a look at that nasty N+1 query. See how repetitive it is? For these seven records, it might seem fast now. But as your app grows, your database server will strain and struggle to keep up.
Won’t Fly With N+1 Queries
While it might look trivial now, those “ReportCard Load
” lines are telling you something important: Your app won’t scale with N+1 queries.
As a result of the object relational mapping (ORM) we are doing with Active Record, we will do something called pre-fetching. That’s when we tell Rails to fetch all of the ReportCard
s we will need to display quickly in one single query.
As you can see from the query, without telling Rails to side-load the related data, Rails must loop through the Student results as it fetches each of the “last” report cards for the student, causing it to query 8 times, 7 times for the report cards and once for the student (thus, “n + 1”). In fact, it does it the other way around, but we call the phenomenon “n+1” and not “1+n”.
As the developer you must know N+1 Queries
Using an ActiveRecord helper method called :includes(:____)
we will chain this instruction to load the associated objects onto our ActiveRecord query.
(Do not confuse this with the ruby command include
[singular], which is to load a Ruby file other than the one you are executing—they are unrelated: include is Ruby and looks like this
include ‘byebug’
.includes
, on the other hand (notice the dot ‘.’) comes after an Active-Relation (also known as an “AREL”), like so:
@teacher.students.includes(:report_cards)
This directive tells Active Record to smartly fetch the student data in “as few queries as possible.” You will note I’ve put this in quotation marks because Rails actually does some advanced stuff under the hood to choose one of two loading strategies: JOINing the table in the original query or pre-fetching when the data is needed in a single, composite query for all of the associated Students.
Notice in this example that Rails makes precisely two queries to the database (and remember, each query has overhead, or a time that it takes for the webserver to communicate with the database server and for the database server to process the request): once for the students and then again for all of the report cards.
SELECT `students`.* FROM `students` WHERE `students`.`teacher_id` = 4 [["teacher_id", 4]]
ReportCard Load (0.6ms) SELECT `report_cards`.* FROM `report_cards` WHERE `report_cards`.`student_id` IN (22, 23, 24, 25, 26, 27, 28)
Boom! Your N+1 query is gone. Take special note that Rails has queried for the students associated with teacher_id
4 and found that they all have ids of 22, 23, 24, 25, 26, 27, and 28. Since we only need those report cards records for the seven associated students, Rails has concatenated the ids of the students in the second query, to form the WHERE clause. As in: WHERE
… IN (22, 23, 24, 25, 26, 27, 28)
This technique is called pre-fetching or eager loading. We’re telling Rails that we’ll need the associated report_cards
on each student when we are looping through the students.
Prefetching fixes N+1 queries
The key to fixing your N+1 queries is to realize that Active Record relations are instantiated in one place and then invoked “just-in-time,” that is, typically when the view is rendered.
Now, finally, here there one more thing to note: AR has three different methods for loading your associations: includes
, eager_load
, and preload
. A fourth method, joins
, allows you to join a table in your SQL without necessarily loading the associated records.
Includes will load all of related the records referenced as arguments in the includes
method. Typically includes will load the records like you see above (in a separate query using WHERE … IN (…) syntax)
However, if you have a where
or order
clause that references a relationship, includes
will instead us a left outer join.
Rails has two ways of dealing with N+1s: 1) “big join” and 2) “separate queries.”
.includes
— uses either strategy #1 or #2 depending on whether or not there is a where
or order
clause that references a relationship. .eager_load
always uses strategy #1 and .preload
uses #2.
I recommend that you always start with .includes
. When your app gets large and your queries are slow, then you will be forced to optimize, which means measure and tweak your queries to make them faster. During this process is when you will experiment with .eager_load
vs. .preload
and determine on a case-by-case basis which one is most optimized for you.
I hope you’ve enjoyed this brief introduction to N+1 queries.