Preloader image

Last month we participated in the Django meetup organized by the Django Bulgaria group. A few people had the chance to present to the community about various topics related to Django development. Jordan, from IO Era had the opportunity to speak about Django, and what you should (not) optimize. The presentation, the discussion, and the event were great, and we would like to share a few aspects in the blog post.

What is optimization?

Is it about making your website faster? Some people will claim, that it is about utilizing fully the CPU, or the maybe the memory consumption. Or maybe it is about making your code less-crappy. People with mathematical background might also come up with more scientific definition of what optimization is. And project managers will claim for sure, that what really needs to be optimized is the performance of the software development team, as time of software developers is pretty expensive. All of those claims are correct, but still do not answer the main questions: what is optimization, what can be optimized, and how we should optimize?

Optimization is a modification, aimed to improve efficiency. Of course, that does not mean that you can start to make spontaneous modifications, hoping that one of them will optimize your system. You can truly optimize only what you can measure - and nothing more.

Speaking about optimizations, it is important to understand that in many cases it is about introducing trade-offs. Yes, you might want to fully optimize the CPU usage, but that might contradict with the memory usage. Or vice-versa. Or sometimes the optimization may require so much effort, that it would not make any sense to proceed with it. Then you are optimizing your time, by simply skipping it. Optimizations depend on the business case you are working with. Optimizing something that would run on an embedded system would make sense, but probably it would not if you are operating on more powerful machinery. You can never claim that you are ready with that system optimization task in your backlog - as optimization is an ongoing process and literally, you can never move that task to done.

What is performance?

Many people are seduced to say “performance”, when you ask “What is optimization?”. While the answer is not completely wrong, we should differentiate between the two terms. The general definition of performance, is the way your system performs in different situations. You can consider performance as the different metrics you are optimizing. That could be throughput (requests per second), response time, client-side rendering time, and even the overall stability of your system. You can hardly claim, that a system, which is unstable, is performing well. Make sure to make it clear what you mean, when speaking about performance.

What is scalability?

Scalability, is the ability of your system to scale. A scalable system can become bigger when peak traffic kicks in, and shrink once needed. And there are two types of scalability: vertical and horizontal. Think of vertical scaling as adding more RAM to your computer (may require you to stop your computer, which leads to downtime in your business operations). Think of horizontal scaling as adding more computers to your system, and then distributing the traffic between those computers. You may buy new computers when you scale up, or sell your computers when you scale down. No downtime.

Let’s speak about Django

Now, once we are all aligned with the the terminology, let’s touch Django. While learning a new framework every developer is asking: “Does Django (replace with another framework) scale?”. One of the most popular questions on StackOverflow for every single framework, including Django. The answer is no. Django does not scale. By itself. What makes Django scale are technologies like Redis, Nginx, PostgreSQL, HAproxy and so on. And surprise - these are the same technologies that make other frameworks scale - Rails, CakePHP, or whatsoever.

Speaking about Django, we should not forget, that it is a high level framework. Yes, it is appealing that the level of abstraction is high, and it is easy to make changes. But we should not forget that we are riding high. QuerySet method calls transform to SQL queries. Templating engine calls transform to Python code. Enjoy the high level goodies, and understand the low-level details.

Know your tools

In parallel to the previous point about understanding low-level details, it is essential to know what tools to use, how to use them, and when to do that. When dealing with optimization problems, you will definitely have to do analysis of your code. One of the established tools is Django Debug Toolbar. Make sure to use the use the right tools for each step in your workflow, being it debugging, profiling, log forwarding, continuous integration, testing or anything. Before proceeding with any optimization, make sure that you provide with metrics. Spontaneous optimization leads to unexpected results. And do not guess too much.

Know the ORM

Based on our experience, most of the bottlenecks usually occur on the ORM level.

When speaking about the Django QuerySet, there are two things that are crucial to understand:
1. QuerySet is Lazy - it will not call the database unless needed
2. QuerySet is Immutable - every time you call filter(), exclude(), or other method, it will construct a new QuerySet

When accessing foreign keys, it will result in execution of additional database queries, which may slow down your overall system performance - select_related and prefetch_related will help you select that data you need with less queries.

Use count()

When dealing with large datasets (and even not-so-large ones), databases are more likely to perform better than your Python code. Prefer QuerySet.count() in favor to len(QuerySet). You will have to execute a database query in both cases, but in the second case the overhead is much bigger.

Don’t use count()

Yes, databases are faster at counting, but what if the QuerySet is already constructed? Since QuerySets are immutable, calling count again, would execute one more query. If you would like to count and loop through the same QuerySet, using len(QuerySet) might be the better option, since the query will already be cached.

Add database indexes

Database indexes look in similar way as Table of Contents in books. When you are looking for something in a book, it would be much faster to check the contents, and directly go to that page, instead of skimming the whole book page by page until you find it. The general rule in Django is to index every field that you filter(), exclude() or order_by(). Database indexes boost the read performance of your database, and most web applications are mostly reading.

Do not add database indexes

While it might seem seductive to add database indexes, do not do that right now, simply because you heard it is correct. Database indexes are hard to manage and slow down write operations to your database. In parallel to the example with books, now every time you update some page, you will have to update the Table of Contents as well.

Know your database

To sum up the previous few points, it is very important to know how databases work. Know how to optimize your queries. Know what are database indexes and use them wisely. Know when to replicate and what replication means. When working high, low level details are essential.

Caching

Cache - as much as possible

Why should we bother calculating something twice, if we already know the most likely outcome? Do a favor to your system and try caching as much as possible. Start with caching all the QuerySets, proceed with views level caching, memoization, cache in templates, and maybe even introduce Varnish (which is great). The performance gains of caching can be tremendous.

Do not cache

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

Use Django Compressor

Because it is great. Automatically minifies your static files, you work only in the templates, provides integration with CDN, and its usage is extremely easy to use, compared to other alternatives, where you have to manually build the pipeline.

Do not use Django Compressor

Especially if you are not reading the docs and if you have complex logic in your templates. During offline compression you do not have access to the context, and it is very likely that your code will be working fine on development, and break on multi-server setup with offline compression enabled. Let me re-iterate again. Know your tools.

Omega

Do not trust utter statements

Many software developers will tell you what to do and what not. Some of them may even seem convincing. But never trust 100% their utter statements. Maybe your business case is different. Before proceeding with any optimization, measure and prove it yourself.

Do not solve problems that you don’t have

We already discussed that in the blog post about Meteor Unconference in Berlin, and that’s 100% valid when speaking about optimization problems.

Love the documentation

Many of the problems which are highlighted here, are already described in the documentation of Django. A lot of people are putting so much effort to make that documentation accessible to us for free. Make sure to read it.

"Premature optimization is the root of all evil"

Donald Knuth.

Presentation and Links

Check the full video of the presentation, the slides and some code samples with a demo optimization project: Video @ YouTube, Slides @ GitHub Pages, Source @ GitHub.