Active Record Pitfalls
Introduction
I will start out by saying that I LOVE active record. I love the ease of mind that comes with not having to decide from the onset what technologies drive the database layer of my application. I LOVE the ability to grab and manipulate data from the database using pure ruby, freeing my mind from having to shift between two different modes of thinking. I LOVE all the tools provided by such a mature ORM. While there is much to love about Active Record, the very things we love creates a false sense of confidence, allowing us to fall into bad habits that plagues our applications. This is especially true for novice web developers. Ruby on Rails initial claim to fame was its ability to create an application off the ground quickly. This claim to fame made it one of the most famous frameworks for people starting their venture into web development. Ruby on Rails is great at hiding a lot of the initial growing pains of an application, but this abstraction creates many bad habits for growing developers. Many of these bad habits commonly revolve around Active Record.
Pitfall #1 - Using Active Record without first understanding basic SQL
Unfortunately many Ruby on Rails developers feel that it is unnecessary to learn SQL . This belief can be seen as a compliment to Active Record’s ability of expression. Through this very familiar object oriented style, developers can manipulate the database to their heart desire. Anything that can not be readily express through Active Record’s mapping can then be manipulated through pure ruby. Programmers find them selves in a very comfortable and welcoming environment. It is easy to see why developers would rather forget that the database layer is part of their application and instead live in a picture perfect world of objects.
Almost every single pitfall due to Active Record can be traced back to neglecting SQL in some form or another. While it is true that you can create complex applications without writing a line of SQL, a big benefit gained from learning SQL comes from understanding how the your ruby code is translated to eventually interact with the database. By gaining a deeper understanding of the language that interacts with the database, you are also able to expand your realm of possibilities.
In reality there is no need to run away from this new language as Active Record can serve as one of the best tools to learn and explore SQL. Once you learn the basics , Active Record can serve as your training wheels, allowing you to explore while falling back to a familiar and safe environment. While you learn the in’s and out of SQL, make it a habit to imagine each interaction as raw SQL before your reach for an Active Record Query method. As you become more become more comfortable, you can start visualizing queries coming together with each function you chain through the Active Record Query interface. Eventually you will come to a point where you will realize that Active Record is the limiting factor that prevents you from fully expressing yourself. When that time comes, my advice is to prevent your urge to ditch Active Record. Express as much as you can through the ORM and fall back to SQL only when it is truly necessary to write efficient code.
Pitfall #2 - Relying on heavily on objects produced by Active Record
One of the most important feature of a good ORM is its ability to convert data gathered from the database into intelligently mapped objects. In the case of Active Record, each object contain many utility methods to facilitate our usage. While these different methods are extremely useful, there are many instances when they can be detrimental to the overall performance of our application.
The most common instance where we rely on Active Record objects too heavily is when we need a specific attribute from a model. The default approach is to query the condition and let Active Record create an object for every row in a table that matches the conditions provided. While you might not see much of an adverse effect in smaller applications, as your program scales in size and complexity, you will start seeing a considerable change in performance for the worst.
To combat this problem, it is often advisable to grab any attribute you need individually, especially when all you need are unmodified versions of these attributes. You do not need to make bloated object when all you want to do is display information to the user. Weight the benefits of having all of the methods afforded to you by Active Record vs the improvements gained in performance as a result of lessening the workload of your application . If you need to mass modify values received from the database consider creating plain ruby classes with any methods needed to modify values obtained from the database. You are not bound to the objects created by default through Active Record. As funny as it might sound, usage of plain ruby classes and objects are often overlooked especially when using a framework like Ruby on Rails.
Pitfall #3 - Using ruby to manipulate items from the database.
Ruby is a joy to work with. Much if its power derives from its simplicity and ability to of expression. It is an example to other programming languages that you do not need to sacrifice simplicity for usability. While ruby is great to work with, there are many instances when it is not the right choice for the task at hand. Part of growing as developing is discerning what is the right tool for any problem you are trying to solve. When it comes to manipulating information from the database, ruby is usually not the right answer.
Many Ruby on Rails developers rely too heavily when when manipulating data. Developers will use the database as a place to store and receive information while delegation all of their parsing to the programming language. Then they turn around and complain that their application is slow and blame the framework for a problem that can be overcome with a bit of knowledge. The reality is that ruby is slower than many programming language, especially when it comes to modifying data. If you are using a framework like Ruby on Rails, you are sacrificing speed for convenience. The overhead that make Ruby on Rails comparatively slower than other competitors, allows for the almost stress-free environment it provides. Realizing and understanding this problem allows you to side-step it, and this is most evident is when using Active Record.
Once again this is an instance where furthering your understanding about the inner workings of the database can help. Instead of using ruby to parse data, use your understanding of SQL to delegate this job to the database. The database is specifically optimize to handle the the storage and processing of data while Ruby is serves as a generalized tool that allows you to tackle a wide array of tasks. Learn how to use subqueries and join to get just the data that you need. Apply strategic index on your tables. to allowing you to get to the data you need quickly. Use the various functions afforded to you by you specific database, allowing you to manipulate the data in a form that is closest to your desired output. Use grouping to format your data in a way that you application can easily understand it. Before you use ruby to modify your data, ask yourself if you can delegate this job to the database. You will be surprised at what you can do if you make it a priority to involve the database layer in your data processing.
Pitfall #4 - Using multiple queries when one query would be suffice
You are creating an application and you realize that a specific page is taking a too long to load. Using your logs, you figure out that your application is excessively querying the database for a given request. You see that the same table is queried multiple time, the difference being that value of the primary key. This is a classic case of the N+1 querying problem.
By default, Active Record uses lazy-load when a a query is made. When you try to access child record model associated with the parent model, a separate query is made. As the number of parent records grows, separate queries are made for each child record you are trying to access. Once again, with the abstraction of the database layer and the ease of access to any associated record, it is easy to fall into this trap and completely cripple your server’s performance.
By definition, ORM translates code that is understood in the application layer into code that can be understood by the database . In order to translate between two different languages, assumptions are made since the range of expressions differs between the two languages. Since active record is an ORM, it has to make many assumptions about the way you use it. It is your job to learn what those assumptions are and explicitly tell Active Record what what it needs to do. In the case of our previous example, Active Record assumes that you only need record that you explicitly asked for.
The above problem can be alleviated planning ahead, deciding what are all the associated records you are going to use before making the initial request, and eager-loading them the instance you fire your initial query . Eager loading is the act of telling an ORM any and all associated data to grab through the query that will be fired. Your goal is to grab all the data need using the least amount of queries as possible, as long as it is efficient. There are a few instances where you will be better off doing multiple queries but we will dive deeper into the subject in the next article.
Conclusion
While Active Record is one of the most useful component of the Ruby on Rails stack, it is not without it’s faults. Most problems encountered can be suppressed by learning the inner workings of your database vendor, especially the language that interacts with your database, like your chosen flavor of SQL. With the correct mind-frame, your exploration of either subject will yield growth in the other, opening a world of data expression that was not previously available. It will also allow you to use ruby to efficiently communicate with the database.
In the second part of this article, we will go through each pitfall and demonstrate how we can takle them from a programmatic point of view.