PHP Classes

Accelerate Page Accesses Throttling Background Tasks: Unusual Site Speedup Techniques: Part 2

Recommend this page to a friend!
  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Accelerate Page Acces...   Post a comment Post a comment   See comments See comments (10)   Trackbacks (1)  

Author:

Viewers: 11

Last month viewers: 4

Categories: PHP Tutorials, PHP Performance

Web server machines usually do many other things besides serving Web pages. Often they have background tasks running that may slow down significantly the Web server processes that deliver the pages to the site users.

This article is the second part of the Unusual Site Speedup series that focus on what to do to make sure that Web server processes run as fast as possible by slowing down background tasks that may be taking too much machine resources.




Loaded Article

Contents

Introduction

Server side background processes

CPU Throttling of background processes

CPU throttling in PHP

The system monitor class

Throttling external programs

Other server excessive load problems

Conclusion


Introduction

More than often, the speed of a site is a major factor that determines the success of the site. When I mention the speed of a site, I am talking about the user perception of whether the site pages are being served fast enough or not.

In the previous article I talked about one important factor that often seriously affects the user perception of the speed of a site, which is the presence of content from external sites that slows down the load of pages, such as advertising and widgets.

In that article I presented a technique that I am using to make external content not affect the user perception of the site speed. In this article I am addressing another factor that may also affect the user perception of site speed, but this time is related to aspects of the server side environment.

Server side background processes

Being able to serve dynamic Web pages fast enough does not depend on just the Web server and the code of the language that generates the pages.

On a Web server machine there are usually many processes running in parallel. Sometimes those other processes do not even have to do with the purpose of serving Web pages. Lets call those unrelated processes, background processes.

Often those background processes largely exceed the number of Web server processes. Sometimes background processes consume even more CPU and other machine resources than the actual Web server processes.

Often background processes may even perform tasks that are important tasks for the site but they do not influence the speed of Web page delivery. Examples of such tasks are: delivery of newsletters, backup processes, crawl the site pages to update the search engine, and other maintenance tasks that have to be performed regularly on the site.

The problem is that all those processes competing for the available CPU cores often end up affecting the speed of generation and delivery of Web pages. Therefore it is very important to make sure those other processes do not significantly affect the performance of the main Web server processes. Otherwise the user perception of site speed will start getting noticeably affected.

CPU Throttling of background processes

One obvious solution to avoid that background processes affect Web server processes speed is to buy or hire more server machines and push those heavy tasks to separate machines. That could solve the problem but obviously it would make it more expensive to maintain the site.

Actually, moving heavy background processes to separate machines is the way to go if those background processes are consuming too much CPU and other resources during all day. However, that is often not the case.

For instance, if you have a newsletter to deliver to your site registered users, usually it takes minutes or hours at most to deliver to all users. The script that performs the delivery of the newsletter may actually take a lot of CPU until it is done. On the other hand, the newsletter delivery may not be urgent. Therefore, if we slow it down once in a while, it will probably not be a problem. In the worst case, the users will get the newsletter somewhat later.

The technique of controlling the speed of a program slowing it down when necessary is called CPU throttling.

CPU throttling in PHP

One way we can slow down the delivery of newsletters or any repetitive process running in the background, is to insert code in the repetitive loops that pauses the script for a while. In PHP that can be done for instance with the sleep() function. That function simply waits for a given number of seconds, effectively pausing the PHP script. During that time the OS will give the CPU to other processes that need it.

Now that we know how to pause a PHP script, the only remaining question is when we shall we do it so the background process script does not consume too much CPU to the point of affecting the main Web server processes? If you do not know the answer, that is the secret that this article is revealing.

First you need to understand quite well how operating systems work. Most modern operating systems implement what is called preemptive multi-tasking. This means that when there are a certain number of processes running in the system, they all need the CPU to execute their code, and so they have to wait in a queue.

Even if your machine has multiple CPUs or multiple CPU cores, there are usually more processes needing the CPU to run than there are CPUs. Therefore, the OS has to rotate the available CPU cores among all running processes. The way a preemptive multi-tasking OS does it is to allocate a time slot for each process. When the time slot ends, the OS suspends that process and allocates another time slot to the next process needing the CPU.

When the machine is running too many heavy processes, the average number or processes waiting to get their CPU core time slot is high. Therefore, we need to reduce the number of processes waiting for the CPU to lower the load of the machine.

In OSes like Linux there is an easy way to determine the current average number of processes waiting for the CPU. Linux makes the CPU load average statistics available through the file /proc/loadavg. Actually this is not a real file stored on a disk. It is just a system variable in the /proc virtual file system. You can access it using regular file access calls. In PHP you can use fopen() and fread() or file_get_contents() calls to access it.

The system monitor class

So the solution to implement CPU throttling in PHP scripts under Linux is quite simple. We just read the loadavg file regularly. If it says the current average load is too high, we just call the PHP sleep() function to wait for a while. Then we check the loadavg file again. If the CPU load it is still too high we wait again for another while until the load is low enough.

To simplify this process, I have developed many years ago a class named system monitor that does all the checking of the CPU load for you. I have just published the system monitor class in the PHPClasses site, so everybody can benefit from it.

As usual, it is made available under the BSD license. So you are free to use, distribute, modify it at will, as long as you preserve the original copyright notices. I just ask to preserve any URLs that you find in the class file, so other people may figure where to get the latest updates to the class.

The way to use this class to throttle your scripts is very simple. There is a function call named Throttle. That function returns a value that tells whether the system is under excessive load. It considers a reference load limit that you may define by setting the cpu_load_limit class variable.

If the Throttle function tells that the CPU load is to high, we just call the sleep() function and repeat the process until the CPU load becomes low again.

So, the only remaining factor to decide is that reference load limit value above which the CPU load is considered too high and below which the CPU load is considered low enough.

I use the following criteria. I take the number of CPU cores available in the machine and multiply it by 2. So, if your server machine has 1 CPU with 2 cores, the CPU load limit should be set to 2 x 2 = 4. That means that if the there 2 or more processes waiting to get one CPU core, the background processes should sleep for a while.

Other factors that you may want to tune are the periods that determine for how long your process should sleep when the load is too high and how often you should check the CPU load to determine if the system is too heavy or light enough.

The Linux loadavg file provides 3 CPU load average statistics: one for the last minute and other two for the last 5 and 15 minutes. By default the system monitor class uses the one for the last minute, but you may configure it to change that.

Given the CPU load average of the last minute, it seems reasonable to poll the CPU load variable at least once every 60 seconds. You may poll it more often if you want to react faster to system load surges. It also seems reasonable to sleep for 60 seconds when the system load is to high, but you may also to wait less time if you do not want the background processes to be delayed too much.

Your CPU intensive task loop may look like this:


$monitor = new system_monitor_class;
$monitor->cpu_load_limit = 4;

/* Work loop starts here */

for($i = $start; $i < $end; ++$i)
{

/* do some heavy work here, say
deliver a bunch of newsletters messages */

for(;;)
{
if(!$monitor->Throttle($cause))
  die('Fatal error: '.$monitor->error);

if($cause == THROTTLE_CAUSE_NONE)
  break;

sleep(60);
}
}

Throttling external programs

Throttling PHP scripts became easy following the instruction above. However you can only apply this method to throttle only scripts that you have control of the code. If you rely on an external program that performs an heavy task but you do not control its code, eventually because it was written on another language, this solution will not do to throttle that external program.

The PHPClasses site started having that problem because the internal search engine relies on an external program named HtDig. This is a very old Open Source program written in C++. It is used to crawl and index the whole site. That process takes a long time because there are over 50,000 pages to crawl.

Maybe it would be better to develop a replacement search engine in PHP, so it would be easier to throttle the crawling code. Unfortunately that is not a viable solution right now, as it would take a long time to redo the integration of HtDig with PHPClasses search related features.

Fortunately I figured a different approach. In Linux, there is an easy way to put currently running programs to sleep for a while, as long as you are using the the same OS user that started the program or "you" are the root user.

The idea consists in sending a signal to the running background process named SIGSTOP. That puts the process to sleep until it gets another signal named SIGCONT. In PHP you can send signals to running processes using the proc_terminate() function. Despite the name of the function it does not necessarily terminate the program. Just specify the right signal to send.

If you want to start and control the execution of a process in PHP and leave the program running on the background, you can use the proc_open() function.

Recently I improved the system monitor class to use these functions and automate the whole process of running an external program from PHP and throttle its execution. All you need to do is to call the ThrottleExecute function passing it the command line of the external program you want to run. You can pass additional parameters to control other aspects like capturing the output of the command.

Your external program throttling script may look like this:


$monitor = new system_monitor_class;
$monitor->cpu_load_limit = 4;

$command = 'path_to_external_command command parameters';

$parameters = array(
'Command' => $command,
'CaptureOutput' => 'string',
);

if(!$monitor->ThrottleExecute( $parameters ))
die('Fatal error: '.$monitor->error);

if($parameters['Status'] != 0)
die('the command failed for some reason');

do_something_with_output( $parameters['Output'] );

Other server excessive load problems

If you have accessed the PHPClasses site in the latest weeks, specifically since the OpenID authentication was implemented, you may have experienced difficulties to access it during certain periods of the day. Excessive load was one of the causes for having too many background processes running at the same time without being throttled.

Fortunately the system monitor class was used to solve most issues, but that took me time figure, as it is not easy to figure all the parts of the scripts running on the background needed to be throttled. Things are much better now, but there may be some moments of the day on which the server load may be too high. I will continue to monitor the server and figure what else needs to be throttled.

Another issue that was causing overwhelming server load but was not related with excessive CPU usage, was due to a mistake configuration. I would like to comment about it because it was so subtle that took me weeks to realize what was wrong. So it may happen to you as well.

It started happening when I launched the OpenID authentication site and then the JSClasses site. Then I thought it would be a good idea for security reasons to use different database users for each of the sites, despite they were all running on the same server.

I thought that if some cracker invaded one site and manipulated its database, it would not be able to access the other site databases, as they would not allow the same database user to access them. The idea seems wise in terms of security but the consequences were catastrophic in terms of server resource usage.

The problem is that the sites uses persistent database connections to access the database server. Persistent connections are good to save connection time overhead. However, if you use different users to access the same database server, PHP establishes different connections for each database user.

Therefore this made the Web server running PHP to establish 3 times more database servers than it would if all 3 sites used the same database user. The consequence is that all those database connections were even crashing the database server during site access peak hours. The database server would recover only after a few minutes.

I was noticing too many database connections but could not figure why because the Web server only had 1/3 of the processes running. Unfortunately only after a few weeks I noticed that each Web server process was establishing different connections for each database user.

I fixed it quickly and made all the sites use the same database user, but it was frustrating to realize the problem only after several weeks after it started. A bad consequence for the content of the site besides the users frustration, is that the number of submitted packages dropped dramatically. Site contributors were giving up submitting their packages to the site. Now the contribution rate is almost back to previous levels, but it is definitely an hard learned lesson.

I suppose most PHP Web hosting companies have similar problems because customers sharing the same Web server have to use different database users. In their case they cannot allow their customers to use the same database user, or else they would be able to steal each other database data.

Usually they push their clients to upgrade their plans to VPS or dedicated server once their sites start taking too much server resources, but I wonder if there would not be a better solution at the PHP or Web server configuration level. Maybe for instance automatically closing unused database connections after a while, even if they were started as persistent connections. I just could not figure any option to do that for MySQL. I think there is something like that for Oracle in PHP configuration though.

Conclusion

Well this was yet another article on Unusual Site Speedup Techniques. If you were having similar load problems in your Web servers, I hope the throttling solution helps you to balance and squeeze more processing work into your Web server machines. It certainly did for the PHPClasses site.

The system load checking solution works well under Linux because that is what the PHPClasses and many other sites use in their servers. Therefore I did not bother to find a solution that also works under Windows or other Unix like OSes besides Linux. If you know how to check the CPU load average or suspend and resume processes on those other systems using PHP, just let me know. Maybe I can update the system monitor class to make it useful to more PHP developers.

For now that is all. Until the next article of the Unusual Site Speedup Techniques series, feel free to send your questions and opinions, posting a comment to this article.




You need to be a registered user or login to post a comment

Login Immediately with your account on:



Comments:

6. operating system scheduling priority - Eike (2010-10-27 07:53)
Using operating system scheduling priority... - 1 reply
Read the whole comment and replies

5. Very good article - carlos eduardo (2010-10-27 01:53)
which is a summary?... - 1 reply
Read the whole comment and replies

4. yeah right - Sergey Shilko (2010-10-27 01:49)
too unusual... - 1 reply
Read the whole comment and replies

3. Excellent idea! - Moises Jafet (2010-10-26 16:46)
Thanks Manuel for sharing your experiences!... - 0 replies
Read the whole comment and replies

2. Accelleration - claudio klemp (2010-10-26 16:45)
server accelleration, ajax, compression... - 0 replies
Read the whole comment and replies

1. I see the difference in site speed - Barton Phillips (2010-10-25 21:47)
The site seems to be much faster now... - 1 reply
Read the whole comment and replies


Trackbacks:

1. PHPClasses.org Blog: Throttling Background Tasks: Unusual Site Speedup Techniques: Part 2 (2010-10-26 07:24)
On the PHPClasses.org blog Manuel Lemos has posted part two of his look at techniques to help speed up your site - a few things that you maybe hadn’t thought of before...



  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Accelerate Page Acces...   Post a comment Post a comment   See comments See comments (10)   Trackbacks (1)