How to Scale Drupal

 

Once of the first questions any CEO, CTO, CIO or VC new to the Drupal content management system asks is: Will Drupal scale up to handle a large amount of traffic? The short answer is: It all depends upon on how you set it up. If you set up Drupal correctly, it can scale to handle millions of pages views per day. High traffic websites like: The Economist, The Emmys, Al Jazeera, Men’s Health  all run with Drupal as their backend content management system. Under the hood, Drupal CMS is just like any complex web application. It has multiple layers and each layer has to be optimized and configured correctly to scale up.
 
So how do you scale up Drupal quickly?
 
To scale Drupal, requires taking a phased or incremental approach starting with the simplest techniques first and progressing to the more complex. Since Drupal is a classic two tier web application with a web server as the front end and a database server as the backend, the most effective way to increase the performance of Drupal is to reduce the number of requests to the database server. Like all web apps with a database server at their foundation, disk input and output I/O is always the slowest part of the system. Fortunately, that is starting to change with the introduction of solid state disk drives which greatly reduce the latency of the disk subsystem. The only drawback to the solid state drives is their cost and their lack of availability in low cost VPS server environments. With disk I/O continuing to be a major factor in scalability and performance, focusing on reducing database requests continues to produce a high ROI when it comes to performance tuning Drupal. 
 
With reducing Disk I/O at the core of the performance tuning strategy, here are the most effective ways to improve the performance and scalability of Drupal:
 
Phase 1 -- The easy stuff
 
  1. Turn on page caching inside Drupal under the performance settings.
  2. Set minimum page cache lifetime to 15 minutes or longer.
  3. Enable page compression (this should be on by default) inside Drupal under the performance settings.
  4. Turn on the optimize CSS Files setting inside Drupal under the performance settings.
  5. Turn on the optimize JavaScript Files setting inside Drupal under the performance settings.
  6. Stop web crawlers (bots) from chewing up resources by editing the Robots.txt file and putting in a crawl delay (e.g: User-Agent: *,  Crawl-Delay: 10,Disallow: /archive)
  7. Optimize your image sizes. Run ySlow or Page Speed tools in Firebug to identify pages with slow content.
  8. Disable and uninstall all add-on unnecessary modules. Make sure you are using only well tested and popular modules that don’t have open issues around performance and scalability.
  9. Try to keep you website as “stateless” as possible. This means the more anonymous the users are -- the better.  If large numbers of users are not anonymous (logged in) this will decrease the statelessness of the website and result in more database requests.
Phase 2 --The intermediate stuff 
 
  1. Increase the amount of RAM available to the server. If you are on a shared hosting environment move Drupal to a virtual private server (VPS) where you have full control over the amount of RAM, CPU and disk and network bandwidth available.
  2. Make sure your hosting provider is providing enough bandwidth especially if you are not using a CDN.
  3. Tweak the amount of RAM available to Drupal in the PHP.ini file settings to 128MB or greater. The more modules you install the more memory you will need available.
  4. Tweak the max_execution_time time setting in the PHP.ini settings to from 60 to 120 seconds or greater if necessary. This will prevent long running processes such as the cron from timing out and hanging.
  5. Tweak the Apache max clients settings. To calculate number of max clients use this formula: RAM/Average Apache Memory Size in Use = # max clients
  6. Turn off page compression inside Drupal and enable page compression at the server level and use the apache modules mod_deflate or mod_gzip instead.
  7. Turn on browser caching by enabling and correctly configuring the Apache mod-expire module.
Phase 3 -- The complex stuff
 
  1. Consider upgrading to Drupal 7 or using Pressflow (a fork of Drupal)
  2. If you have a lot of static content (CSS and JavaScript) and images, install the CDN module and put your static content on content delivery network(CDN)  like Amazon Cloud, Akamai or Edgecast.
  3. Identify slow database queries and their associated modules with the Development module. Either replace the slow modules with more high performance ones or tune the database queries.  In some cases, tune the database tables (apply table indexes or convert to row level record locking).
  4. Give your web server and database server more hardware resources(RAM,CPU and Disk) by putting each on dedicated servers either virtual private servers or physical servers.
  5. Install and configure an opcode cache like APC (Alternative PHP Cache). Opcode caches save the Web server from having to read, parse and compile PHP files on every request.
  6. Install and configure Memcache, preferable on a dedicated server with plenty of RAM.  Memcached allows entire database tables to be stored in memory preventing hits to the database.
  7. Consider installing and configuring an HTTP accelerator such as Varnish. Varnish is a “reverse proxy cache”. Varnish handles serving static files and anonymous page-views for your site much faster and at higher volumes than Apache, in the neighborhood of 3,000 requests per second. 
  8. As an alternative to Varnish consider the Boost Module which provides similar functionality via provides static page caching.
  9. Scale out by setting up a web farm(multiple web servers) and put a load balancer in front of you web servers and round robin load balance the web requests between the web servers.
  10. Scale up database server hardware. Increase number of cpu cores, RAM and disk storage array technology to maximize I/O on disk reads and minimize latency.
  11. Consider leveraging Galera Cluster for MySQL. Galera is synchronous multi-master cluster for MySQL/InnoDB database. 
As you can see from the above list of performance tuning recommendations, the task of scaling Drupal can range from as simple as changing a few configuration settings to diving into the intricacies of  SQL query optimizations. The approach you take will depend upon your budget, performance goals and the type of content you host on your website.
 
Moreover, to be successful in scaling Drupal requires a diverse set of skills that range from basic web developer stuff like LAMP stack setup to advanced DBA SQL tuning expertise. In addition to implementing the above performance tuning recommendations, you’ll also need to take into consideration your performance/load testing tools and environments. At a minimum you’ll want to use a load testing tool like jMeter to simulate your server under load while profiling the impact of your changes on the server’s hardware resources.
 
If you are looking for more in-depth resources on how to scale Drupal, here’s our short list of recommended reading:
 
Drupal High Performance Group: http://groups.drupal.org/high-performance
Drupal caching, speed and performance: http://drupal.org/node/326504