Supporting High-Traffic Websites Using Drupal Pressflow

Share on FacebookTweet about this on TwitterShare on LinkedInGoogle+

Share on FacebookTweet about this on TwitterShare on LinkedInGoogle+

Segue has developed websites for multiple clients using the Drupal open-source Content Management System (CMS). This has proven to be an excellent platform for website development due to its ease of use and multitude of modules which can enhance a site’s capabilities. However, despite our past successes in implementing Drupal websites, Segue recently encountered some difficulties with Drupal in supporting the demands of the High-Traffic, High-Availability websites of one of our customers.

The following were the key challenges at project initiation:

  • The infrastructure supporting the sites demonstrated frequent problems related to its ability to handle traffic surges without crashing
  • The existing sites were on Drupal 5.x and were in need of migration to 6.x, however there were several “hacks” in the legacy site codebase that hindered a smooth upgrade
  • The client had determined to migrate the hosting of the site to a newly-designed infrastructure, built specifically to address the shortcomings of the original environment, which had an insufficient number of front-end and back-end servers to handle the anticipated traffic
  • There was a need to maintain the existing look and feel of the sites to maintain brand continuity for the audience, even though one of the sites was being moved from a proprietary CMS developed in-house
  • Aggressive project delivery timelines had been established to stay within greater customer operational needs

We followed our typical Drupal implementation path and the migration of the sites went relatively smoothly. Early on, it appeared that we had successfully addressed the traffic concerns, as the sites were initially more stable and were able to handle an increased server load. After a short period of time, however, it became clear that the sites were still prone to crashing when the number of daily hits rose into the hundreds of thousands. (We were successful in delivering over 40 million page views in the first week of the first site post-migration, which seemed to indicate success, but it was short-lived).

After continued analysis of the situation, we determined that our best solution (and an emerging industry standard solution) was found in Pressflow, which has the following features:

  • Specialized distribution of Drupal, designed for high-traffic sites (and as the name would suggest, frequently used by media sites)
  • Optimized for MySQL, which was the back-end in use (as a tradeoff, Pressflow does not support the use of other databases)
  • Released roughly in parallel to Drupal core – new releases of core are tested, optimized, and released to make enhancements from recent Drupal core releases available to Pressflow sites
  • Drop-in replacement for “vanilla” core (In theory, all Drupal contrib modules will work with Pressflow core)
  • Introduces “Lazy Session Initiation” (Eliminates need for anonymous users to create records in the sessions table)

Relaunching the sites with Pressflow, drastically improved performance and stability, which were further improved with the implementation and configuration of Varnish, a caching reverse proxy server, which allowed extended scaling of the existing hardware infrastructure.

Although there were minor tweaks needed to address specific functional requirements of the client’s sites (including the need to maintain accurate read count data and serve video from specific non-standard video providers), Segue was able to provide our customer with sites that can handle hundreds of thousands of hits each day, without crashing, as well as limit the number of servers necessary to support the sites, reducing their recurring hosting costs.