Rafał Wrzeszcz - Wrzasq.pl

Pork - fork() for PHP

Tuesday, 10 January 2012, 10:24

Multi-processing in PHP

You always hear how many things modern computers can do at once, how multi-tasking they are… but when it comes to your simple mailing script it hangs on sending few mails and it can't go further. How it is that you run your application on powerful server machine, but your script flow is completely single-tasking? Well, machines and operating systems supports multi-tasking, but you have to write your code in specific way in order to benefit from it - you have to split your program into multiple single tasks that can be executed parallelly. If you are a PHP developer and you need something for multi-tasking, here I give you Pork - fork() for PHP.

Parallel processing

Before we dive in an ocean full of processes from a thin river of single process, lets take a brief look at how it all works. Not to go too deep, it all ends on a fact, that your program is a process running by CPU. It is set of instructions, that must be executed one by one in proper order. So a machine can't go further with your code processing until it ends previous instruction. And this is how a process execution looks.

In order to do many tasks at once and stop taking care of hanging jobs you need to tell system which instructions are connected within single flow and which can be executed without waiting - you need to group your routines into smaller sets that can be executed independently.

Multi-process versus multi-threading

There are two ways of doing that - multi-threading and multi-processing. The difference is critical. A thread is a set of instructions, while a process is a complete instance of program. In simpler way - threads belong to processes. Each process has a thread, mostly just one.

Multithreading

Thread is an atomic part of your program - all it's instructions are executed in order and each one has to wait for previous one. If you define more threads, they can be executed in parallel. Since all your threads will belong to your program, they will be part of the same process - it means they will share the same memory, they will be "aware" of each other on code-level.

Multiprocessing

The other way is to create processes. The difference is, that creating new process creates completely new running process in operating system - it's like if you run just a second instance of your program. The new process has a new, separated memory; execution of all processes is completely separated and on code-level they don't see effects of each other work.

Usually to do multi-tasking, using threads is very useful way. You define flow of your program, that works on same data. When you use multiple processes you have to take care about inter-process communication in addition.

State of parallel processing in PHP

The problem is, that in PHP you don't have multi-threading. PHP was designed as web server content processing language. So the web server (Apache for example) takes care about handling threads and multiple requests at once, while PHP is just run for single flow of single website.

And if you need to write a service you will hardly use PHP - rather Python, JavaScript (Node.js), or just C/C++. But sometimes you simply have/want to write it this way. Then, in PHP, you are stuck with multi-processing.

Here is where Pork comes in

And if you came to that point, this is what Pork can do for you - handle processes management allowing you to write your code for parallel execution without all that mess with forking, exit codes and BSE.

class MyProcess extends Pork\Process
{
    public function main()
    {
        // do something in costly
    }
}

$process = new MyProcess();
// put that job in background
$process->start();

// continue your script flow - MyProcess::main() is executed in background

In a nutshell - this is pretty much the full basic use of Pork: define a process and call it's start() method which starts a separated process which executes in background. All required process-handling is defined in Pork\Process class. To define own process tasks you have to use this class in one of following ways.

Subclassing

First one is visible in the example above - define your own processes as a classes derived from Pork\Process. It has one abstract method - main() which it calls after spawning new process. So simply define a class and define your task as it's public function main().

Callback

The other way is to use one of pre-defined subclasses, one of which is Pork\Process\CallbackProcess. Using it you can just pass a callback function to be executed in a separated process:

$process = new Pork\Process\CallbackProcess();
$process->setCallback(function()
{
    // don't waste my time
});
// put that job in background
$process->start();

// continue your script flow

Signals

Great, we have now plenty of new processes. But what to do with them? The basic way to control process execution is sending signal to it. There are different symbols and don't dive into them too deeply - if you want, you can read man 7 signal.

Pork\Process class provides signal() method that allows you to send signal you want to target process. But for most common cases, there are pre-defined methods: stop() sends SIGTERM - a signal that tells the process, that it should stop, kill() sends SIGKILL - which immediately stops the process (you should rather use stop(), since SIGKILL doesn't gives a process chance to handle shutdown like closing I/O handles, dumping data, but if you really need it, kill() is there for you) and hup() sends SIGHUP which is widely used as a soft restart signal (for example daemons reloads their configuration and restarts listeners).

class MyProcess extends Pork\Process\ContinousProcess
{
    public function main()
    {
        declare (ticks=1);

        while (!$this->shutdown) {
            // do something here
        }

        // normal exit after shutdown
        return \Pork\Process::EXIT_NORMAL;
    }
}

$process = new MyProcess();
// put that job in background
$process->start();

// continue your script flow

// time to end
$process->stop();
// wait until process ends
sleep(1);
// something went wrong
if ($process->isRunning()) {
    $process->kill();
}

declare (ticks=1)

Did you notice strange declare (ticks=1) line? You thought declare is completely useless in PHP? It isn't! When it comes to multiprocessing it's critical! PHP is interpreted language and no matter how your code looks it always work on high-level interpreter. If you have cougth a signal and modified your object, PHP needs an interrupt to synchronize changes - this is done using ticks. If you ommit this line, then no matter what logic you will implement it will be completely blind for all external changes.

Waiting for processes

Ok, we know how to terminate a child process. But what if I just want to wait, until it finishes? You can use wait() method for that:

class MyProcess extends Pork\Process
{
    public function main()
    {
        sleep(3);
    }
}

$process = new MyProcess();
// put that job in background
$process->start();

// continue your script flow

// wait for a background process to finish it's job
$process->wait();

Daemons

Great, so we know how to run simple processes. But when it comes to services you probably want to write a daemon - self-running process that is just started and runs in background in operating system no matter if you log out of shell etc. and just interacts with environment responding to commands you send to it. No worry, there is Pork\Process\Daemon class! Instances of this class are daemonized after running (you can also set user and group IDs for target process).

class MyDaemon extends Pork\Process\Daemon
{
    public function run()
    {
        declare (ticks=1);

        while (!$this->shutdown) {
            // do something here that will run forever, until you end a process
        }
    }
}

$process = new MyDaemon();
$process->start();

echo $process->getPid(), "\n";

Important note - when you run a daemon, it is no longer a child process of your program. The sole of daemonization is to obtain own session by a daemon process. It means you can't use wait() method on a daemon process, since it's no longer your child and method will hang up forever.

Control

Even greater, we have a daemon running! So… how to control it? It goes into background and I can't do anything with it, will it be running forever? The key to manage daemons is their PID. When you daemonize a process the only way to control it is to send singnals using it's PID. So your controling script must have some persistent control handle. Of course after re-running your control script it will loose all data, so it must store daemon PID somewhere. This is what process control interface offers. You can plug process control instance into process and it will get all notifications of process status (start/stop) changes while exposing you possibility to attach to already running process.

PID file

The most common way to control a daemon is using pidfile - a file that stores PID of running daemon. If file exists and it's PID is valid - it is used as already running daemon PID. If file doesn't exist, or PID is not valid (daemon crashed leaving pidfile unhandled) it means daemon is not running. This strategy is implemented by Pork\Control\PidFile class. Simply pass a pidfile path to a constructor and it will handle everything for you!

class MyDaemon extends Pork\Process\Daemon
{
    public function run()
    {
        declare (ticks=1);
        while (!$this->shutdown) {
            // do something
        }
    }
}

$control = new Pork\Control\PidFile('pid');

$process = new MyDaemon();
$process->setControl($control);

switch ($_SERVER['argv'][1]) {
    case 'start':
        $process->start();
        break;

    case 'stop':
        $process->stop();
        break;

    case 'kill':
        $process->kill();
        break;

    case 'restart':
        $process->restart();
        break;

    case 'reload':
        $process->hup();
        break;
}

It's pretty much a fully featured daemon!

Bonus features

Process class (Pork\Process) implements many additional features, like magic PHP calls, like handling serialization, and also usefull methods used by Zend Framework - for example toArray() and toJson() methods.

__invoke

Pork\Process class implements PHP magic __invoke() method which allows to execute call directly on process instance. You can use your process instance for example as a callback for event observers:

// same as just $process();
call_user_func($process);

__call

Process class also handle all undefined method calls that starts from sig as signal sending methods. It means, that you don't need to have methods for all signals defined - if only there is SIG* named constant that match called method name, signal will be sent:

$process->sigstop();

// you can even use own defined constant if only it's names start from SIG
define('SIGMY', SIGUSR1);
$process->sigmy();

Contribute!

Hey, does Pork seem intersting to you? It's open-source (GNU LGPL-3 licensed) and published on GitHub - fork Pork! You don't have to ask for permission, just fork it and code.

Tags: Pork, Skrypty, Code, Kod, PHP