This is a technical article, but it's not the main focus. It's designed to make you go "huh?" or "what?" Most importantly however, even if for a split second, I want this article to make you go "I wonder…" or "what if?".
Also, for the remainder of this article, imagine that we don't care about performance - there are many articles on the web about performant PHP and this article is the opposite of that.
In computer science, a stream is a sequence of data elements made available over time. Think of it as items on a conveyor belt that are processed as they arrive - most streams, like conveyor belts, end once there are no more items but they can run continuously if need be.
If you write PHP applications that deal with the input and output of data, you use streams. They're a core function in PHP and pretty unavoidable for any developer. Streams in PHP are a way of generalizing file, network, data compression, and other operations which share a common set of functions and uses: a stream is resource object which can be read from and written to in a linear fashion.
If you've used Guzzle or sockets, accessed the filesystem, or even dealt with a $_POST
request,
you've used streams. PHP is known for its low barrier-of-entry so the vast majority of stream handling is
abstracted away from the developer.
An example of this is file_get_contents('hello.txt')
which implicitly means
file_get_contents('file://hello.txt')
- fetching the specified URI using the
file://
protocol.
Known as stream wrappers, PHP supports the following protocols natively: file
, http
,
ftp
, php
, zlib
, bzip2
, data
,
glob
, phar
. The following protocols are also supported if their associated
extensions are installed: zip
, ssh2
, rar
, ogg
,
expect
. If you've come across the s3://
protocol while using the
AWS SDK, you've come across a stream wrapper. PHP lets us
define out own stream wrappers in userland PHP.
Introducing Brainf*ck!
Obviously not exactly the best name to use (especially at an international conference where this article was
given as a talk) but it's perhaps the world's most well known esoteric language. Let's assume we have a
script hello.bf
:
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
If, for some arcane, horrendous reason, we wanted to write parts of our application in BF, and require it in
our application the file://
protocol is implicitly used when we require
it in our
application and the contents of the file will get dumped byte-for-byte because the PHP interpreter doesn't
find any PHP code.
However, we can substitute the implicit protocol for explicitly using a custom stream wrapper in userland PHP to manipulate the input before including the result in our application.
Here we have the boilerplate code for a stream wrapper: the class definition and a static method to register it with PHP:
<?php
class BfStreamWrapper
{
/** @var resource $context */
public $context;
public static function register(string $filterName): void
{
stream_wrapper_register($filterName, static::class);
}
}
Stream wrappers don't extend or implement anything in PHP core, but we need to implement a minimum of 5 methods to process an incoming stream.
The first is stream_open()
; we're just checking that the file specified exists and taking note
of its path.
private $uri;
public function stream_open(string $path, string $mode, int $options, &$opened_path): bool
{
$realpath = $this->removeProtocolFromPath($path, static::WRAPPER_PROTOCOL);
if (file_exists('file://' . $realpath)) {
$this->uri = $opened_path = $realpath;
return true;
}
return false;
}
stream_stat()
is something that's called each time to collect information, it's obviously
recommended that you return some useful information about the stream here but technically it's not actually
required; in this example we're just returning an empty array:
public function stream_stat(): array
{
return [];
}
Streams are read from and written to in chunks, meaning stream_read()
could be invoked multiple
times. This method is more complicated than the rest because it needs to keep track of how much data has
flowed out. But simplified, the first line is the Brainf*ck script getting executed; every line thereafter
is dealing with returning the buffered script output in chunks as they're requested by the stream user.
private $output;
private $pointer = 0;
private $eof = false;
public function stream_read(int $readNumBytes)
{
$this->execute();
$remainingBytes = strlen($this->output) - $this->pointer;
if ($remainingBytes > 0) {
$buffer = substr($this->output, $this->pointer, $readNumBytes);
$this->pointer += $readNumBytes;
return $buffer;
}
$this->eof = true;
return false;
}
The stream_eof()
method returns a boolean value indicating whether the previous
stream_read()
has been called enough times to reach the end of the stream.
private $eof = false;
public function stream_eof(): bool
{
return $this->eof;
}
stream_close()
is just cleaning up the internal state - in other protocols this might be where
you close a connection or delete a lock file.
private $uri;
private $output;
private $pointer = 0;
private $eof = false;
public function stream_close(): bool
{
// Clear internal buffers.
$this->uri = null;
$this->output = null;
$this->pointer = 0;
$this->eof = false;
return true;
}
Finally, for completeness, the powerhouse of stream wrapper is the method that executes Brainf*ck scripts.
In this example I'm using Anthony Ferrara's library.
We take the contents of the Brainf*ck script, pump it through the language runtime, and save the result as
a string in the $output
class property.
private $input = [];
private $output;
private function execute(): void
{
if (is_string($output)) {
return;
}
$result = (new \Brainfuck\Language)->run(
file_get_contents('file://' . $this->uri),
$this->input
);
$this->output = implode('', array_map(function (int $ord): string {
// Brainf*ck returns result in bytes (8-bit integers). Convert to ASCII.
return chr($ord);
}, $result));
}
However, requiring a script using our new stream wrapper doesn't give us a change to provide the script with any input, which is kind of important for some scripts.
We can accept an input string, and save it in a class property to use when we execute the script. Unfortunately in our example this means that we can only accept input before we execute our script, which is whenever we attempt to read the output.
private $input = [];
public function stream_write(string $data): int
{
// No point in recording input if script has already been executed.
if (is_string($this->output)) {
return 0;
}
$count = 0;
foreach (str_split($data) as $chr) {
// Brainf*ck takes bytes (8-bit integers) as input.
$this->input[] = ord($chr);
$count++;
}
return $count;
}
Now it's time to use it! After registering our stream filter, we can open up a stream handle, pump in some
input, and get the output of our script! Notice how if our script needs input, we can't require
the script like we would a PHP file.
Next up in the PHP streams arsenal is stream filters. They're simpler than wrappers: just a method to modify data as it passes through - this does mean that it cannot deal with both input and output so this is a perfect time to introduce source code transformation!
Here we have the boilerplate for a stream filter, the class definition and a static method to register it with PHP. Unlike wrappers, stream filters extend a core PHP class and only need to implement one method.
<?php
class BfStreamFilter extends \php_user_filter
{
public static function register(string $filterName): void
{
stream_filter_register($filterName, self::class);
}
public function filter($in, $out, &$consumed, $closing): int
{
while ($bucket = stream_bucket_make_writeable($in)) {
$this->input .= $bucket->data;
}
if ($closing || feof($this->stream)) {
$consumed = strlen($this->input);
$bucket = stream_bucket_new(
$this->stream,
// Has to be static because userland doesn't control instantiation.
static::$twig->render('bf_closure.twig.php', [
'bf_php_value' => var_export($this->input, true),
])
);
stream_bucket_append($out, $bucket);
return \PSFS_PASS_ON;
}
return \PSFS_FEED_ME;
}
}
Again, streams are processed in chunks so the vast majority of this method is dealing with pulling in the contents of the stream, possibly over multiple invocations… But the main logic of this method is still simply just data in and data out:
<?php
public function filter($in, $out, &$consumed, $closing): int
{
// Data in...
$this->input .= $bucket->data;
// Data out...
static::$twig->render('bf_closure.twig.php', [
'bf_php_value' => var_export($this->input, true),
])
}
Almost every time I've had to output XML from an application, I've used Twig rather than deal with encoding data structures. Why not do it for PHP, too?
<?php declare(strict_types=1);
{% Brainf*ck Closure Code Template %}
return function (array $input = []): string {
$result = (new \Brainfuck\Language)->run(
{{ bf_php_value }},
$input
);
return implode('', array_map(
function (int $ord): string {
// BF returns result in bytes (8-bit
// integers). Convert to ASCII.
return chr($ord);
},
$result
));
};
Similar to before, we register our stream filter, except this time we attach our stream filter to an already
existing stream handle. When we read the contents of the stream we get back valid PHP code that we can
eval()
!
<?php
BfStreamFilter::register('bf');
$handle = fopen('/app/hello.bf', 'r+');
stream_filter_append($handle, 'bf');
$closureCode = stream_get_contents($handle);
$closure = eval('?>' . $closureCode);
$output = $closure($input);
Attaching a filter to a stream handle and evaluating the result each time isn't ideal - and most importantly
we can't attach a stream filter to a require
statement. That's where one of the most underrated
features of PHP comes in: the php://
stream wrapper… in particular the php://filter
meta-wrapper.
The php://filter
meta-wrapper allows us to act upon any other stream resource
(/resource=
), while attaching any number of filters to manipulate the stream when reading it
(/read=
) as well as when writing to it (/write=
).
Most importantly, this allows us to specify a resource and attach a filter to it all in one string which we can pass to require. This one string saves us from having to open up a handle, attaching a filter, and evaluating the result of the stream. This one feature of PHP is the basis of this entire article.
<?php
BfStreamFilter::register('bf');
$script = '/app/hello.bf';
// php://filter/read=bf/resource=file:///app/hello.bf
$closure = require 'php://filter'
. '/read=bf'
. '/resource=file://' . $script;
$output = $closure($input);
At Dutch PHP Conference 2016 I saw Alexander Lisachenko talk about the Go! AOP framework he created, which is the main inspiration for this article. I'll give an extremely brief idea of what the project does, then dive straight into the internals to figure out what's going on.
Go! AOP is a framework for developing PHP application using aspect-oriented programming: separating out logic when developing and joining it back together at runtime.
To ensure that I explain the framework accurately, I'll use the same examples as the creator of this framework and I recommend you look up Lisachenko's previous presentations on this subject.
Let's say we have a method that creates a new user in our application. As far as business logic goes, all that's required for creating a new user is pretty simple:
<?php
function createNewUser(string $email, string $password): UserInterface {
$user = new User($email, $password);
$this->entityManager->persist($user);
$this->entityManager->flush();
return $user;
}
But in the real world things are never this simple:
<?php
function createNewUser(string $email, string $password): UserInterface {
if (!$this->security->isGranted('ROLE_ADMIN')) {
throw new AccessDeniedException;
}
try {
$user = new User($email, $password);
$this->entityManager->persist($user);
$this->entityManager->flush();
} catch (ORMException $e) {
$this->logger->error('Could not persist new user.', ['email' => $email]);
throw $e;
}
$this->eventDispatcher->dispatch(new UserCreatedEvent($email));
$this->logger->info('User created successfully.', ['email' => $email]);
return $user;
}
While this example may not be unwieldy once everything has been factored in, it does demonstrate how even the simplest things need to consider many different cross-cutting concerns. More complex situations can get unwieldy very quickly.
Aspect-oriented programming is about keeping your method simple and interrupting the execution flow of the application at specific points to add logic. You keep logic separated, and inject each piece to where it needs to be. These injected pieces of logic are called pointcuts.
<?php
function createNewUser(string $email, string $password): UserInterface {
$user = new User($email, $password);
$this->entityManager->persist($user);
$this->entityManager->flush();
return $user
}
/** @Before(pointcut=”public UserService->*(*)”) */
function checkCurrentUserCanCreateUsers(MethodInvocation $method): void {
if (!$this->security->isGranted('ROLE_ADMIN')) {
throw new AccessDeniedException;
}
}
/** @AfterThrow(pointcut=”public UserService->createNewUser(*)”) */
function handleNewUserDatabaseError(MethodInvocation $method): void {
$this->logger->error('Could not persist new user.', ['email' => $method->getArguments()[0]]);
throw $e;
}
/** @After(pointcut=”public UserService->createNewUser(*)”) */
function onNewUserSuccessfullyCreated(MethodInvocation $method): void {
$email = $method->getArguments()[0];
$this->eventDispatcher->dispatch(new UserCreatedEvent($email));
$this->logger->info('User created successfully.', ['email' => $email]);
}
We can even decorate the original method and make modifications to the result if we want to.
<?php
/** @Around(pointcut=”public UserService->createNewUser(*)”) */
function logTimeTakenToCreateUser(MethodInvocation $method): UserInterface {
$start = microtime(true);
$result = $method->proceed();
$duration = microtime(true) - $start;
$this->logger->log(sprintf('Creating a user took %f seconds', $duration));
return new TimedUserCreation($result, $duration);
}
The majority of the AOP's logic is in the form of source transformers. By using AOP in your application, you're entering the realm of meta-programming: your application transforms its own source code when including it. So what is AOP doing inside that transformer?
MyClass
.
nikic/php-parser
.
All of this happens while you're writing PHP applications normally without worrying about loading classes or compiling into usable code. As this article is all about PHP streams, it may not come as a surprise that the main logic behind the AOP framework is built around PHP's stream filters.
A huge amount of complexity is held within this process, enough for an entirely different presentation, so for the sake of simplicity, we'll simplify AOP's source transforming process into a single function call. From now on, we'll use the following function as an alias to implementing and registering a stream filter because, quite frankly, implementing a stream filter didn't fit on the original slides!
$executableCodeContents = transformCodeFilter($sourceCodeFileOnDisk);
new MyClass
⇢ ??? ⇢transformCodeFilter()
But if you try to create a new instance of MyClass
, how does AOP manipulate Composer to pass
the source code file through the transformCodeFilter()
function before getting PHP to execute
the contents?
new MyClass
/var/www/MyClass.php
php://filter/read=go.source.transforming.loader/resource=file:///var/www/MyClass.php
PHP then applies the go.source.transforming.loader
filter when reading the contents of the URI
/var/www/MyClass.php
from the file://
system.
So what kind of sneaky magic is this? 🐍
spl_autoload_functions()
function:[Composer\Autoload\ClassLoader, MyAppLoader]
.
[AopComposerLoader(Composer\Autoload\ClassLoader), MyAppLoader]
.
So what does this look like in code?
This is our overloader. Since it decorates Composer we accept it as the first argument, and it's always useful to know the name of the filter we should be applying.
<?php
class AutoloaderOverloader
{
private $composer;
private $filterName;
protected function __construct(ComposerClassLoader $composer, string $filterName) {
$this->composer = $composer;
$this->filterName = $filterName;
}
}
The autoloader part: all of the heavy lifting is still done by Composer, all we do it rewrite the file path
so that the php://
stream wrapper applies a filter when reading the file.
<?php
class AutoloaderOverloader
{
// ...
public function loadClass($class) {
$file = $this->composer->findFile($class);
$compiledFile = $this->cache->findCachedFile($file)
?: 'php://filter/read=' . $this->filterName . '/resource=file://' . $file;
include $compiledFile;
}
}
We've already established that we can generate code using templates, so if
$this->composer->findFile($class)
did not return a file, this would be the place to do it.
Next, the overloader part, we add a static helper method to initialize our overloader.
<?php
class AutoloaderOverloader
{
// ...
public static function init(string $filterName): void
{
$loaders = spl_autoload_functions();
foreach ($loaders as &$loader) {
// Unregister each loader, so they can be re-registered in the same order.
spl_autoload_unregister($loader);
if (is_array($loader) && $loader[0] instanceof ComposerClassLoader) {
// Replace Composer autoloader with our hijacking autoloader.
$loader[0] = new self($loader[0], $filterName);
}
}
// Processed all the loaders, re-register them with PHP in their original order.
foreach ($loaders as $loader) {
spl_autoload_register($loader);
}
}
}
Just like we described before in less technical terms, we:
foreach (spl_autoload_functions() as &$loader)
if (is_array($loader) && $loader[0] instanceof ComposerClassLoader)
$loader[0] = new self($loader[0], $filterName);
spl_autoload_unregister($loader);
andforeach ($loaders as $loader) spl_autoload_register($loader);
To make all this come to life, we: load Composer, register our stream filter, and initialise our autoloader overloader.
Now we carry on developing as normal - any class loaded after initializing our autoloader will have its source code passed through our stream filter!
<?php // index.php
require_once __DIR__ . '/vendor/autoload.php';
stream_filter_register('my_compiler_filter', MyCompilerFilter::class);
AutoloaderOverloader::init('my_compiler_filter');
That was a lot of code to throw at you, but that's it for our overloader!
There are other projects out there that implement autoloader overloading.
Some evaluate the source code with eval($executableCode)
but evaluating strings as code using the eval()
function is usually disabled on
hardened PHP installations, or at least highly discouraged.
Others write the generated code to disk before requiring the new file, But this won't work on read-only filesystems, such as sandboxes containers where cache is already warmed up on deployment.
Using stream filters is perhaps the most elegant. It's built into PHP's core functionality and just works -
it has been there since at least PHP 5.0
. If you're already into meta-programming by
manipulating source code, streams aren't going to be a challenge.
Now that we know we can modify the PHP code that actually gets executed by PHP on-the-fly from within the application we are running, what's next? What important work can we do given all this power?
We could implement PHP 7.4's short closures in PHP 7.3 using Christopher Pitt's Preprocessor library. His Preprocessor library takes invalid PHP code and turns it into valid PHP code:
<?php
// Pre
$oddLetterCountWords = array_filter($fruits, ($fruit) => {
return strlen($fruit) % 2;
});
// Compiled
$oddLetterCountWords = array_filter($fruits, function ($fruit) {
return strlen($fruit) % 2;
});
It's a pretty unremarkable change, but we're transpiling invalid code into something that PHP will understand. Building on the knowledge we've already gone over, implementing this is a two-step process: include the library as a dependency and then use it inside your stream filter.
<?php
# composer require pre/short-closures:^0.8
// Alias for implementing and registering a stream filter.
public function transformCodeFilter($sourceFile) {
return \Pre\Plugin\parse(file_get_contents($sourceFile));
}
We've established that as long as we return a string containing valid PHP code, we can accept pretty much any input to manipulate even if that input is nothing. Even encrypted source code!
<?php exit('Source Code Protected.'); ?>
aFcZG1BJF0UNAxVSBAgAEREcERF/HxwJRRJTVQlSeSoADhlFHhUADREANR9QLyYMGwYXAxVMABxYeHMF
B0VUOxxNFQAbCz8mTw4fCwtLOhw1O1QdA2YBGhpEAFQaCg0pIABTGwoXUwRVbioKH0EdHFRkCAMAGxhU
NwBOBxcMGR4AHnNbb05DUlkAAUIYAQYAFRobERcMTw1POzpHOh4GGEVBIUUfGhFTFQBXFwYEBwBTH0xD
ADMLF1AGHVMLTw8qTUVBTlQAVE9SFhEWBxxFAhxXRTwGAQkfGlMRQEJoFgMZHU9FdwwdCAEPc0FSeQBJ
UwATZQk=
Which can be unencrypted on-the-fly into something resembling a normal PHP file.
<?php declare(strict_types=1);
namespace App\Controller;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
class DefaultController
{
publicfunction __invoke(Request $request): Response {
return new Response('Hello, World!');
}
}
Once again, we have a fairly simple-ish stream filter method to what once seemed like something that could only happen using extensions.
<?php
function transformCodeFilter(string $sourceFile): string {
$source = file_get_contents($sourceFile);
return strpos($source, PREAMBLE_TEXT) !== 0
? $source
: xorStringWithKey(
base64_decode(substr($source, strlen(PREAMBLE_TEXT))),
ENCRYPTION_KEY
);
}
We grab the source code just like every other transformation we've done, and if we don't need to decrypt the source code we immediately return it untouched. But if we do detect the source code needs decrypting, by looking for the exit statement at the beginning of the file (called the preamble) we pass it to another function to decrypt it.
<?php
const PREAMBLE_TEXT = "<?php exit('Source Code Protected'); ?>\n\n";
// The "encryption" key used by GitHub Enterprise. See:
// http://blog.orange.tw/2017/01/bug-bounty-github-enterprise-sql-injection.html
const ENCRYPTION_KEY = 'This obfuscation is intended to discourage GitHub Enterprise customers from making modifications to the VM. We know this \'encryption\' is easily broken.';
As for the decryption function itself, it iterates each byte of the source file and xor
's it
with the corresponding byte in an encryption key. This isn't AES - it's meant to be simple. It also happens
to be how GitHub protects their Enterprise source code (people always going to reverse engineer something if
they really want, XOR encryption should be enough prove intent in court of law).
<?php
function xorStringWithKey(string $str, string $key): string {
for ($i = 0; $i < strlen($str); $i++) {
$str[$i] .= $str[$i] ^ $key[$i % strlen($key)];
}
return $str;
}
Remember that this is an experiment. Caching the transformation result from decrypting source code would be completely pointless. Even with the Just-In-Time compiler coming in PHP8, source transformation or manipulating ASTs on the fly is never going to be fast.
But that's not the point.
You can do some amazing things in PHP that it was never designed to do.
stream_select()
function.
There are so many projects out there that push PHP beyond the limits of what people thought was possible for the language and the likelihood that the authors of these projects specifically knew what they were going to build ahead of time is very low.
Question what's possible. Experiment with ideas. Have fun, like you did when you first became a developer.
You won't know what's possible until you push past the limits of the language. Be curious, I hope you finish this article thinking it didn't go into enough detail, and that you're curious to know more.
Create terrible things, because if you don't the amazing things you create will just be average.
Since I wrote this talk, Lisachenko has come out with yet another crazy framework for PHP.
Z-Engine uses FFI to access internal structures of PHP
itself; it loads the definitions of native PHP structures, like zend_class_entry
,
zval
, etc and manipulates them in runtime.
If you can use streams to execute encrypted source code, imagine the possibilities of manipulating the internals of the PHP engine itself!