(Not always) Cool methods to be careful when working with rails

João Paulo Lethier
4 min readAug 30, 2022

--

Every ruby developer probably have heard at least a couple of times someone saying that rails is magic. This normally is said because of some cool methods that rails have that make our lives as developers easy, and sometimes we don't even know what happens inside those methods. And that is good because those methods help a lot to maintain good legibility of our code, but not knowing what happens when a method is called can leave to some unexpected behaviors sometimes when dealing with a bigger scale. Let's try to understand three of those methods and their issues.

pluck

pluck is a helpful method of active record, making easier to get an array of values from a specific column from a table. However it has a dangerous use that can impact on bad database query performances. Looking at rubocop cops, we can see a conflict. There is a cop that recommends the use pluck, instead of map, but also a cop that recommends the use of select instead of pluck.

The issue with pluck, as we can see in rubocop cop is when it is used inside a where condition. For example, if we have a code like Book.where(author_id: Author.brazilian.pluck(:ids)) , what will happen is:

  • It will open a first connection with database to run the author query first, running something like SELECT authors.id FROM authors WHERE ... .
  • After running the authors query, active record will bring all the ids returned from the database and save it in a memory array
  • Lastly, it will open a second connection with the database to make the books query and it will use the array returned by the first query as a parameter to the query, running something like this SELECT * FROM books WHERE books.author_id IN (1, 2, 3, ..., 100, 101)

Beyond the first unnecessary query, that could be avoided using a subquery, this approach could results in memory issues if the return of the first query brings a huge list that would need to be stored in memory as result.

A better approach would be to replace the pluck by the select method in this scenario, that would result in only one query in the database, running something like this SELECT * FROM books WHERE books.author_id IN (SELECT authors.id FROM authors WHERE ...) . This avoids both the unnecessary query and the memory usage that happens when using pluck inside where.

ids

As we can see in rails doc, this is just an alias to pluck(:id) , so we have the same issues that we have when using it inside a where clause. However, it is recommended by rubocop to be used when outside a where clause, as we can see in this PluckId cop.

find_each

Depending on the size of a database records list that we need to deal, querying everything at once can lead us to a memory issue. For example, if we need to iterate in a table of books that has millions of records, using Book.all.each will bring all millions of records to memory and iterate in the memory array after it. One way to avoid it is to use the find_each method, which avoids bringing all records to memory at once and splits them into some batch queries.

This is really good and helpful, but using the find_each without worrying about the batch size can leave to another issue that is splitting the result into too many batches, which leads us to an issue that is too many queries being executed in database. Backing to the millions books example, let's say that we have 10 millions books and we have to iterate in everyone, using the find_each with its default batch_size that is 1000 will lead us to 10000 queries in our database. If we know that we have enough memory to use 10000 batches and use find_each with this batch_size instead of the default, we could have less 9000 queries in the same task.

This is a bigger issue when we have a slow query. The slower the query is, the more important is to balance between batch size and memory usage, hitting less the database with slow queries.

In my opinion, the rails magic is a great and helpful thing, it makes us more productive and helps to maintain a good quality code, helping in maintainability and legibility. However, it is important to try to understand what is happing under the hood when using those methods, mainly when we start to work with high-scale projects.

--

--

No responses yet