Unit Tests for PHP code … Without WordPress Loading – Part 1

In this post, we delve deep into the topic of unit tests.

Unit tests are a kind of automated software tests, where the things under test are single “units” of the whole application.

When we talk about PHP applications, a single “unit” is very likely a function, or in case of object oriented code, a class.

In this article I’m not going to explore “academic” definitions and technical descriptions of what unit tests are and what differentiates them from other kind of tests, but if you want to deepen the concept a good starting point is this article by my fellow Inpsyder Thorsten Frommen.

100% Unit Tests Juice

If I had to explain in a nutshell what unit tests are, I would say:

Take the smallest portion of code that can be extracted from the rest of application, execute it, get some feedback about the obtained result being the expected result.

In his now-famous “TestFrameworkInATweet” by Mathias Verraes he provides a function that can be used as very simple, concise and clean way to run unit tests for PHP code.

A slightly modified version of it (the original is linked above) looks like this:

<?php
function it($m,$p){ $d=debug_backtrace(0)[0];
    is_callable($p) and $p=$p();
    global $e;$e=$e||!$p;
    $o="e[3".($p?"2m✔":"1m✘")." It $me[0m";
    echo $p?"$on":"$o FAIL in: {$d['file']} #{$d['line']}n";
}

register_shutdown_function(function(){global $e;$e and die(1);});

Now that we have 280 characters this still fits in a tweet (with spaces and everything) and I can tell you that this tweet-sized code can be anything you need to run unit tests.

An usage example to run unit tests:
<?php
// the file that contains the 9 lines of code above
require_once __DIR__.'/test-framework-in-a-tweet.php';

it( 'should sum two numbers.', 1 + 1 === 2 );

it( 'should display an X for a failing test.', 1 + 1 === 3 );

it( 'should append to an ArrayIterator.', function () {
    $iterator = new ArrayIterator();
    $iterator->append( 'test' );
    return
        count( $iterator ) === 1
        && $iterator->current() === 'test';
} );

The (colored!) output will be:

✔ It should sum two numbers.
✘ It should display an X for a failing test. FAIL in: /path/to/test-file.php #3
✔ It should append to an ArrayIterator.

It ends with an exit code 1 for when failures occur. This make the framework integratable with other quality control tools.

Logic-wise, what this “mini framework” does is: accepts a description of the test and a “predicate”, that is something we declare as true and then it tells if what we declared as true is actually true.

This is all about unit tests.

A library like PHPUnit with its 41.000+ lines of code and its 7000+ classes is little more than just “ornament” and helpers around this core concept.

Directions To Userland

In previous example, we tested just a couple of arithmetic operations and a single class provided by PHP, which means that we actually tested that PHP has no bugs in those things.

No dependencies are involved, actually no userland code is involved in the what we tested (besides the “framework” itself).

However, with our micro framework we can effectively test userland code.

Take this class:

<?php
namespace MyCompany\MyPlugin;

final class Email {

    private $email;

    public function __construct( string $email ) {
        if ( ! filter_var( $email, FILTER_VALIDATE_EMAIL ) ) {
            throw new InvalidArgumentException(
                "{$email} is not a valid email." );
            }
            $this->email = $email;
}

    public function __toString(): string {
        return $this->email;
     }

    public function equals( Email $email ): bool {
        return (string) $this === (string) $email;
    }
}

It is an example of value object. We can easily test it with our framework:

<?php
require_once '/path/to/test-framework-in-a-tweet.php';
require_once '/path/to/code/src/MyCompany/MyPlugin/Email.php';


it( 'should fail for invalid emails.', function () {
    try {
        new MyCompany\MyPlugin\Email( 'foo' );
    } catch ( InvalidArgumentException $e ) {
        return true;
    }
} );


it( 'should equal a string when casted to string.', function () {
    $email = new MyCompany\MyPlugin\Email( 'foo@example.com' );
    return (string) $email === 'foo@example.com';
} );


it( 'should equal another instance by value.', function () {
    $email = new MyCompany\MyPlugin\Email( 'foo@example.com' );
    return $email->equals( new Email( 'foo@example.com' ) );
} );

and the “green” results:

✔ It should fail for invalid emails.
✔ It should equal string value when serialized.
✔ It should equal another instance by value.

We proved our code to work well without the help of any “bloated” test framework.

WordPress Into The Mix: WordPress and PHP Unit Tests

All examples above contain just “pure” PHP: no external library is required to execute the code or the tests.

But what about WordPress? Many WordPress developers are used to write unit tests for their plugins in the same way WordPress core developers write WP PHPUnit Test Suite. A WP CLI command exists to scaffold plugin tests written in that way.

If our Email class from above would be part of a WordPress plugin, and we would use that workflow, we would:

  • have a quite “expensive” scaffolding procedure
  • need to setup a database
  • need to load the whole WordPress environment

all of this to test something that we saw can be effectively and completely tested with a 280-characters test framework.

So, does it make sense to run unit tests in the “WordPress way” for code that does not use WordPress functionalities even if it is part of a plugin (or theme)? The answer is surely no. If there’s no WordPress code involved, use your test framework of choice (PHPUnit, phpspec, Codeception… or even our “TestFrameworkInATweet“) and just test the thing.

But now, let’s see a more interesting example:

<?php
namespace MyCompany\MyPlugin;

function register_product_cpt() {
    register_post_type( 'product', [ /* ...bunch of args... */ ] );
}

If we try to test this function without loading WordPress we will fail hard, because it makes use of a WordPress function that would not be defined if WordPress is not loaded.

Does it mean that we are “doomed” to load WordPress to test these two functions? Are we really going to scaffold an entire WordPress installation (including configuration and database) even if our code needs pretty much nothing of that? Do we really need to load the whole WordPress environment with its hundreds of thousands of lines of code and side effects for every single test when the test target is a function that uses just one simple WordPress function?

Luckily no. Three times no.

Unit or Not Unit: Is That The Question?

Let’s imagine for a moment that we want to use our micro framework to test the register_product_cpt function in the context of WordPress. How that would look like?

Maybe something like this:

<?php
require_once __DIR__.'/test-framework-in-a-tweet.php';
// This load the whole WP environment
require_once '/path/to/wordpress/wp-load.php';


it( 'should register product post type.', function () {
  
$exists_before = post_type_exists( 'product' );
  
MyCompany\MyPlugin\register_product_cpt();
  
$exists_after = post_type_exists( 'product' );
  
return $exists_before === false && $exists_after === true;
} );

There are developers better than me that will tell you that the test above is not a unit test, because it actually executes code from WordPress so it is an integration test, because it is testing that our custom code integrates well with WordPress.

However, there are other developers better than me that will tell you that the test above is an unit test. Because the system under test, i.e. the thing we are testing, is still a single unit of the application (the register_product_cpt function) and what we are doing is a “state verification”, that is, we are comparing the state of the application before and after a single unit has been executed.

If you don’t have an idea (yet) who among the two factions is right, don’t worry, you have time for that, because for the purpose of this article it is not relevant at all.

What is given for sure is that to run the tests in such way we need to setup a WordPress environment (configure the application, prepare a database, trigger side effects like cron tasks…) and we need to “wait” WordPress loads hundreds of files even if we use just one function…

…but in the end, it works, doesn’t it?

The real question is: what we are testing here?

Our register_product_cpt function is logicless. It is a wrapper around a WordPress function.

If the test passes we are effectively testing that:

  1. our function calls the WordPress function in the proper way
  2. the WordPress function does what we expect it does

But, do we really need to test that WordPress function does its job? It is not that a WP core developers’ job? When testing our plugin, shouldn’t we test only our plugin?

Let’s change perspective for a moment. In all our code we assume that PHP works. Writing a line like this:

return $cpt_exists_before === false && $cpt_exists_after === true;

we trust PHP about it returning a value when we type return, and we trust it properly calculate the logic… in the code we write every day, we trust PHP about doing things much more complex than this.

When a test we wrote fails, it could definitively possible it fails because of a PHP bug (PHP is code, and code without bugs does not exist). But we are not concerned about PHP to fail in our unit tests.

Considering that we are writing a WordPress plugin, and WordPress is a given part of our application infrastructure just like PHP, couldn’t we do the same for WordPress? Couldn’t we assume WordPress works, just like we do for PHP?

If the answer to the above questions is “yes”, and we could assume that WordPress works without the burden of loading it, wouldn’t that be great?

What about rewriting the test like this:

<?php
function post_type_exists( $post_type ) {
    return array_key_exists( "post_type_{$post_type}", $GLOBALS );
}

function register_post_type( $post_type, $args ) {
    $GLOBALS[ "post_type_{$post_type}" ] = $args;
}


it( 'should register product post type.', function () {
  
    $exists_before = post_type_exists( 'product' );
  
    MyCompany\MyPlugin\register_product_cpt();
  
    $exists_after = post_type_exists( 'product' );
  
    return $exists_before === FALSE && $exists_after === TRUE;
} );


it( 'should register product as not hierarchical.', function () {
  
    MyCompany\MyPlugin\register_product_cpt();
  
    global $post_type_product;
  
    return
        is_array( $post_type_product )
        && array_key_exists( 'hierarchical', $post_type_product )
        && $post_type_product[ 'hierarchical' ] === FALSE;
} );

Because this time we are not loading WordPress, we can write our own simplified version of post_type_exists and register_post_type.

In facts, with the code above we are able to test that register_product_cpt:

  • actually calls register_post_type passing the expected CPT name
  • the argument passed to register_post_type contains the "hierarchical" argument set to the expected false.

That’s possible because we designed the “fake” WordPress functions to behave in a way purposely designed to facilitate the tests.

Unlike when we run test loading WordPress, now we are just assuming that the real WordPress functions works well, but we are testing that our functions interacts well with WordPress functions. That is exactly what we had planned to do! Moreover, we also removed the burden of loading WordPress.

It appears to be a win-win situation… and it is too good to be something I invented myself.

Don’t Stub Your Toe

The technique used above is not anything new, it is actually as old as unit tests. A “fake” version of a third party code (in our case WordPress) written for the sole purpose of testing custom code is known as “stub”.

As we seen, stubs can be a great help in testing code. But they have an issue: they also are code. And like any other piece of code, they need to be written and then maintained and they might have bugs (only code that does not exist has no bugs).

This is why when writing stubs it is very important to keep them as short and simple as possible, in facts, reducing their length and complexity to the minimum is the only way to reduce to the minimum the maintainability effort and the number of bugs.

If the only way to test some code is to write long and complex stubs, then there’s something wrong either in the tests or in the code, or maybe in that case it’s just better use the real code and not the stubs code.

Actually, some experienced developers you trust might tell you to never write stubs, and always use the real code, and that doing it will not make your unit tests be less unit. As I already said, the debate “that is unit test” VS “that is not unit test” might be of some of interest, but not for this article.

The thing is that when dealing with WordPress, because of the very nature of WordPress, to load the whole WordPress environment, with all the side effects and the effort it brings (in term of configuration burden and performance) is something that is better avoided when writing unit tests, especially during development. For me, it is not a matter of correctness or purity of unit tests, but just a matter of convenience.

If a class of 200 lines of code uses no WordPress code except a single call to trailingslashit, and to properly test it I need to write 20 tests, I have two choices:

  1. prepare an ad-hoc WordPress installation including database and configuration, then every time I need to run my test suite, load that WordPress environment 20 times, once for each test
  2. write once a stub of 1 line.

It is quite obvious (to me) what’s more convenient between the two. Maybe this was an unfair comparison, because in real world WordPress plugins one normally use more than just a single WP function, but very hardly loading WordPress will ever be more convenient, especially if we consider that:

  • the bigger the plugin, the more times WordPress would be loaded, with all the related performance concerns
  • (spoiler) there are options to don’t write stubs by hand

Don’t get me wrong. At some point, it is important to also run tests that loads WordPress, it is essential for plugins with heavy UI integration, but the speed and the easiness of running tests for WordPress plugins without loading WordPress, in my opinion, always pays in terms of augmented productivity and reduced frustration. Moreover, my experience is that when developers learn about testing plugins without loading WordPress, the amount of tested code they write increase exponentially.

Back to stubs: they are great, but they are hard.

They need to be written, they need to be maintained, they need extra-care during development because a badly written stub might cause an application to be tested against bad assumptions making it to fail hard when ran with the real code (and I never met someone who tested the stubs used in tests).

All of those are arguments for the “don’t do stubs” people, and all of those are reasons to limit the number of stubs and to make them very simple and very short.

Let’s OOP for the best

Isolation from 3rd party code is not the only issue when dealing with unit tests. Often times there’s code that performs IO operations on filesystem, database, external services and so on. These operations are awkward to test for many reasons (unavailability of the target, hard to reproduce state, performance issues, and so on).

Even developers who usually are against “fake code” techniques are usually also likely to classify as integration or system tests (and not unit) the tests in which these kinds of operations are actually performed, and everyone generally agrees to don’t execute real production code in unit tests in such cases.

Moreover, there are other kinds of operations that only applies to non-deterministic situations (e.g. at a specific time of the day), and it is important to be able to test these operations in unit tests in a deterministic way.

These issues are as old as software development, and the way developers have solved it (or tried to) in the past, involved either introducing special flags in the production code to allow testing or the usage of stubs.

Object oriented programming (OOP), when done right, provides an effective approach to this issue.

When there’s an object that is awkward to test (because of IO operations or non-deterministic nature), its behavior can be “hidden” behind an interface, so that it is possible to write a different, simplified implementation of the same interface just for tests.

When OOP is done right, other software entities rely on the interface and not on the implementation, so we are able to run tests using this other simplified implementation and other entities would not even recognize it.

What’s described in the paragraph above is very similar to the concept of “stub” we already saw, and actually, the simplified implementation described there is at any effect a stub object, however, the function stubs we saw previously relied on the fact that the original function was not available.

For stub objects, if the original implementation is loaded is not an issue, because the stub is an alternative implementation that we can use instead of the original.

Some code, to not to lose the habit:
interface Clock {

    public function hours(): int;
    public function minutes(): int;
    public function seconds(): int;
}

final class SystemClock implements Clock {

    public function hours(): int {
        return (int) ( new Datetime() )->format( 'G' );
    }

    public function minutes(): int {
        return (int) ( new Datetime() )->format( 'i' );
    }

    public function seconds(): int {
        return (int) ( new Datetime() )->format( 's' );
    }
}

function maybeGong( Clock $clock ) {

    if ( $clock->minutes() === 0 && $clock->seconds() === 0 ) {
        $h     = $clock->hours();
        $gongs = $h > 0 && $h <= 12 ? $h : abs( $h - 12 );
        print rtrim( str_repeat( 'GONG ', $gongs ) );
    }
}
And its test:
<?php
require_once __DIR__.'/test-framework-in-a-tweet.php';
require_once '/path/to/clock-and-gong.php'; // The code above

class ClockStub implements Clock {

    private $hours;
    private $minutes;

    public function __construct( int $hours, int $minutes ) {
        $this->hours   = $hours;
        $this->minutes = $minutes;
    }

    public function hours(): int {
        return $this->hours;
    }

    public function minutes(): int {
        return $this->minutes;
    }

    public function seconds(): int {
        return 0;
    }
}


it( 'should gong 12 times at midnight.', function () {
    ob_start();
    maybeGong( new ClockStub( 0, 0 ) );
    return ob_get_clean() === "GONG GONG GONG GONG GONG GONG GONG GONG GONG GONG GONG GONG";
} );


it( "should gong 3 times at 3 o'clock.", function () {
    ob_start();
    maybeGong( new ClockStub( 3, 0 ) );
    return ob_get_clean() === "GONG GONG GONG";
} );


it( 'should not gong at 10 past 3.', function () {
    ob_start();
    maybeGong( new ClockStub( 3, 10 ) );
    return ob_get_clean() === "";
} );

Closing Remarks

In code above we were able to test the behavior of maybeGong function even if it is a classical example of non-deterministic behavior.

It was possible because maybeGong function accepts as argument an interface that we could replace with a deterministic implementation. If the function had instantiated an instance of SystemClock inside the function block, testing it had been practically impossible, unless one wanted to wait the exact second system clock reached next o’clock to run the test.

The lesson here is that it is necessary that the code we want to test is not coupled with specific implementation, but through their interfaces. And this is one of the main reasons some code is defined “testable” and some other code is not. But that’s a story for another Christmas, maybe.

This new kind of OOP stub, actually solves some issues, but other stubs issues are still there: in the last example, even if quite trivial, writing the test necessitated a stub of 22 lines of code, which now have to be maintained. For example, if in the next version of the software the Clock interface changes, the stub needs to change as well.

Is there any other effective way? Is there something we could do to reduce the maintenance burden of ad-hoc stubs? And if stubs are an object thing, how we deal with the thousands of WordPress functions when WordPress is not loaded?