PHP Classes

Accelerate scripts running multiple tasks in parallel using asynchronous programming: Unusual Site Speedup Techniques: Part 3

Recommend this page to a friend!
  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Accelerate scripts ru...   Post a comment Post a comment   See comments See comments (19)   Trackbacks (0)  

Author:

Viewers: 247

Last month viewers: 69

Categories: PHP Tutorials, PHP Performance

Asynchronous programming allows developers to write faster code by running multiple parallel tasks at the same time.

This contrasts with the traditional synchronous programming on which programs perform a task and do nothing else besides waiting for the task to finish before proceeding to the next task.

This article explains better what is asynchronous programming and how you can implement it in PHP to develop code that executes tasks much faster.




Loaded Article

Contents

Introduction

What is asynchronous programming?

Implementing asynchronous programming in practice

Asynchronous programming in JavaScript

Server side asynchronous programming in JavaScript with Node.js

Asynchronous programming in PHP

Conclusions


Introduction

Many of the lengthy tasks that server side scripts execute are tasks on which the scripts are waiting for some other program, hardware or remote computer to respond.

For instance, when your scripts accesses a database, it has to establish a connection and then it executes database queries. During most of the time that your script is establishing connections and executing queries, the script itself is doing nothing else, just waiting for the request data to be sent to the database server and then waiting for the database server to respond.

What if instead of just waiting for the database server to receive or respond to requests, your script could actually do something else in parallel? That is the premise of asynchronous programming.

What is asynchronous programming?

Imagine that you are cooking dinner. Lets say your dinner will be made of a nice soup as an entry, roast beef as main dish and the end you will have cake for desert.

You can prepare and cook each of those parts of your dinner one by one, but since each part will require that you cook the food in the stove or woven, while one part is in the stove, instead of just sitting and waiting, you could prepare and put another part in the stove or woven at the same time.

You may call this practice of cooking multiple parts of your dinner in parallel: asynchronous cooking. It seems logical and more efficient that you do that, as you can finish cooking your dinner in much less time if you manage the cooking of multiple dinner parts in parallel.

Asynchronous programming works in a similar way. Imagine that you have a Web site page that is made of content of retrieved from a database running several queries. Usually you run one query, wait for the response, retrieve the results into some variables, run another query, wait for the response, retrieve the results into some variables, and so one. Only at the end you assemble all the data to generate the resulting HTML page.

Now, imagine if instead of waiting for the results you could immediately start the next query while the database does not return the response for the first query? You could eventually gain a lot of time, precisely for the same reason you would gain time if you cook multiple dinner parts at the same time.

Implementing asynchronous programming in practice

In low level programming languages you need to use calls to the underlying operating system to know when some event your program is waiting, finally happens, so you can trigger the appropriate action. If your program is waiting for multiple types of events, it may periodically loop poll different system resources to check if some event happened.

It would be like checking the different parts of your dinner being cooked in the stove or woven to see if any of them are done before you move on to the next cooking steps.

In higher level programming languages usually there is a more simplified way to implement the asynchronous programming. Usually you call some function to register a callback function. That callback function is invoked when some event happens. You can register multiple callback functions, which are called when different types of events happen.

Implicitly there is some event loop running to check when any of the events on which your program is interested have happened, But you do not have to take care of that directly in your program. The runtime environment runs the event loop and calls your callback functions when necessary.

Asynchronous programming in JavaScript

One example of high level programming language that provides built-in support for asynchronous programming is JavaScript. On the browser side you often define what code is called when different types of events happen.

You can do that for instance assigning the ON attributes of page tags (ONCLICK, ONMOUSEOVER, ONCHANGE, etc..).


<div id="mydiv" onclick="alert('Click!');">Click me!</div>

Alternatively you can call functions of certain page element objects to register event callback functions, like addEventListener.


<div id="mydiv">Click me!</div>

<script type="text/javascript"><!--

document.getElementById( 'mydiv' ).addEventListener(
'click',
function()
{
alert('Click!');
},
false);

// --></script>

Server side asynchronous programming in JavaScript with Node.js

The above examples describe how it works on the browser side. Now you may wonder, given that usually you just run a clear set of instructions in server side scripts and then your script exits, does it make sense to use asynchronous programming on server side applications?

As I mentioned above, it makes sense if you need to run several tasks and you could run them in parallel. In example case that I mentioned above of retrieving the results of multiple queries, you could use asynchronous programming to make the finish the whole job faster.

An higher level approach of using callback functions to handle events on the server side would be great. JavaScript is traditionally used on the browser side, but since several years ago there are JavaScript solutions to developer server side scripts which use the event callback handler method to implement asynchronous programming.

One of such solutions is Node.js . Node.js is framework that runs on top of the Google V8 JavaScript engine. V8 is the same JavaScript engine that ships with Google Chrome browser. It is very fast because it uses advanced techniques to implicitly compile JavaScript into machine code native to your machine CPU.

But V8 does not need Google Chrome browser. It is a JavaScript engine that can be embedded in other programs. Node.js is a program that embeds V8. Node.js actually extends V8 to provide many JavaScript modules which perform operations that are more typical of server side programming, like for instance access files, databases and communicate over a network.

Nowadays, Node.js comes with so many modules that you can implement most of the tasks that would implement in PHP for server side applications. To some extent you can see Node.js as the PHP for JavaScript. I mean, you could develop the same server side applications in JavaScript with Node.js as you can with PHP.

Actually that was one of the topics of discussion of the first episode of the Lately in JavaScript podcast that goes on once a month in the JSClasses site.

The big difference between Node.js and PHP is that with Node.js, by default, everything is asynchronous. This means that every function of Node.js for which your program you would have to wait, the function just returns immediately and Node.js will invoke a callback function passed by your program when the actual function task ends.

For instance, if you want to open a file, read or write to a file, list a directory, query a DNS to resolve some address, send a request to a remote server, open a connection to a database server, execute a database query, etc... you always have to pass a callback function to handle the result of the function.

Let me show you a simple example code using Node.js to list the current directory.


var fs = require('fs');

fs
.readdir(".", function(error, files)
{
 
if(error)
throw error;
for(var f in files)
  console.log("Path: " + files[f]);
}
);


If you want to take a look at a more complex situation and see how it compares with traditional synchronous programming done in PHP, you may want to check this Node.js module for performing validation of an e-mail address checking the destination SMTP server. This is a port of an e-mail adddres validation class that I wrote in PHP many years ago.

Asynchronous programming in PHP

Unlike Node.js, in PHP everything is synchronous by default. This means that when you call any function, PHP always waits for the function to finish the job and returns the result at the end.

Currently PHP does not provide any built-in form of support of doing asynchronous programming using callbacks. There is some support to implement asynchronous programming in PHP using the function stream_select. This function lets PHP scripts poll opened files or network connection sockets to see if reading or writing data to files or the network would block. PHP can tell the function to wait for a given period until any data is received or has finished to be sent.

This is clearly insufficient to do full blown asynchronous programming in PHP. Node.js has support to implement much more types of asynchronous operations, like for instance checking file (listing files and checking permissions, etc..) as well access some types of databases asynchronously.

The limitation is not in the PHP language itself. It is rather the PHP extensions that could provide built-in asynchronous support but currently they don't.

An alternative solution to implement asynchronous programming in PHP is to use Gearman. For those not familiar with Gearman, it is a middle-ware program that acts as a server that can take calls to functions implemented by worker programs that run on the background.

The worker programs themselves can be written in PHP and so they can implement any functionality you need. Worker programs can be running on the same machine or even in remote machines. You may want to read this article written by César Rodas about how to use Gearman with PHP to learn more about Gearman.

One useful feature of Gearman that can enable asynchronous programming in PHP is the support to call jobs asynchronously. This means that when job is called, the calling script does not have to wait for the job to finish. Collecting the results of asynchronous jobs can be done but it is tricky, so this solution is not ideal.

Another solution is to use the PHP pcntl extension to fork new processes and run PHP scripts in parallel. While this solution may work to some extent, it is expensive in terms of CPU and memory that are necessary to fork new processes.

Conclusions

As you may understand by now, asynchronous programming, while possible, currently it is not a natural thing to do in PHP. As I mentioned above, The limitation is not in the PHP language itself. Some PHP extensions would need to evolve to support asynchronous programming in a more natural way, like you see in Node.js.

PHP got closure support in version 5.3. Closures are basically anonymous functions that may be used directly to define a callback function right in the function call that takes a callback handler. It was a good PHP evolution step as it makes callback based asynchronous programming a more developer friendly in PHP. Previously you would have always to declare named functions to use them as callbacks.

On the other hand, asynchronous programming using callbacks as you see in Node.js is tricky. It completely changes the way you are used to think when you define the control flow of your code. For instance, if you want to perform a loop on which what you do on each iteration depends on the result of an event that triggers a call to a callback function, that is hard thing to implement for developers that are used only to write synchronous code.

There are some techniques on how to workaround those oddities of asynchronous programming, but that is outside of the scope of this article. Here is an article that gives you an idea about dealing with control flow challenges of asynchronous programming.

It would take time for you to get used to asynchronous programming using callbacks and become productive developing software that way.

In sum, I would say it will take some time until asynchronous program becomes a more natural thing in PHP. Still there are a few use cases on which it would be worth doing it, even with current PHP's limited support. All for the sake of optimizing to the extreme the performance of tasks that can be parallellized, just like cooking your dinner in the least time possible by putting in the woven or stove multiple types of food at the same time.




You need to be a registered user or login to post a comment

1,616,820 PHP developers registered to the PHP Classes site.
Be One of Us!

Login Immediately with your account on:



Comments:

2. Async programming in PHP on Windows - Richard Quadling (2010-11-21 20:05)
Multiple, "threaded" asynchronous, IPC and Windows. All with PHP... - 2 replies
Read the whole comment and replies

1. libevent - jpaulo69 (2010-11-06 01:12)
PHP has this nice implementation of libevent... - 7 replies
Read the whole comment and replies

4. the limitation is from php language - Yiyu Jia (2010-11-02 22:27)
the limitation is from PHP language itself.... - 3 replies
Read the whole comment and replies

3. Leverage JS and Native PHP for Asynchronous Programming - David Gibbons (2010-11-01 23:42)
Leverage JS and Native PHP for Asynchronous Programming... - 1 reply
Read the whole comment and replies

5. batch processing - Joe Fox (2010-10-29 22:12)
Curl has multi-fetch support... - 1 reply
Read the whole comment and replies



  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Accelerate scripts ru...   Post a comment Post a comment   See comments See comments (19)   Trackbacks (0)