Circular References

Circular-references has been a long outstanding issue with PHP. They are caused by the fact that PHP uses a reference counted memory allocation mechanism for its internal variables. This causes problems for longer running scripts (such as an Application Server or the eZ Components test-suite) as the memory is not freed until the end of the request. But not everybody is aware on what exactly the problem is, so here is a small introduction to circular references in PHP.

In PHP a refcount value is kept for every variable container (zval). Those containers are pointed to from a function's symbol table that contains the names of all the variables in the function's scope. Every variable, array element or object property that points to a zval will increase its refcount by one. The refcount of a zval container is decreased by one whenever call unset() on a variable name that points to it, or when a variable goes away because the function in which it was used ends. For a more thorough explanation about references, please see the article on this that I wrote for php|architect some time ago.

The problems with circular references all start by creating an array or an object:

<?php
$tree = array( 'one' );
?>

This creates the following structure in memory:

Now if we proceed to add a new element to the array, that points back to the array with:

<?php
$tree[] = $tree;
?>

We create a circular reference like this:

As you can see there are two variable names pointing to the array itself. Once through the $tree variable, and once through the 2nd element of the array. Because there are two variable names pointing to the container, its refcount is 2.

Now, the next step that actually creates the problem if we unset() the $tree variable. As I mentioned before an unset() on a variable name will decrease the refcount of the variable container the variable points to, in this case the array value. The structure in memory now looks like this:

Note that there is no variable pointing to the array container anymore. Because of this PHP has no way of freeing this data anymore, and the memory leak is born. PHP however does remove all allocated variables at the end of each request, so it's not a hard-memory leak, but it's still annoying enough for long running scripts and daemons written in PHP.

Luckily there are a few solutions to this problem. The first one is to use a new garbage collection algorithm, another is to augment the reference counting system with some cyclic tracing . The latter solution is something that is currently undertaken by David Wang, one of the Google Summer of Code students. He mentioned he is making good progress and I can hardly wait until I can play with it :)

Shortlink

This article has a short URL available: https://drck.me/cr-5li

Comments

With MDB2, I ran into a situation where I needed circular references. MDB2 can be extended with modules. These module instances need to be attached to the mdb2 instance so that they can be used like so: $mdb2->manager->listTables();

The manager property stores the instance of the manager module.

Now obviously inside the manager module we need to be able to execute a query, which in turn requires that the manager module have a reference to the MDB2 instance. Bad, ugly etc.

So my solution is still not pretty, though I guess it could get cleaned up a bit by making more use of OO constructs. So what I do is that I actually store a reference to all MDB2 instances into a global array. In the constructor of all modules I do not pass in the MDB2 instance but instead I pass the key to the global array that points to the MDB2 instance. Of course this global array is cleaned up via a destructor.

Lukas: maybe you shoud make some kind of singleton, which would store references to those objects? I think Symfony does it this way (sfContext class)

also, you can add some kind of destructor to singleton, which would effectively deinitialize it, on user request…

Derick: yup. I am impatiently waiting to see the result of Davids work too.

But my other thought is, that in most cases one can easily solve problems without circular references, and in most cases they are the sign of bad design

Actually I am already leveraging this global array to provide a "singleton". I put that in quotes, because you might obviously require access to multiple different database at the same time. However for the purposes I described, its a lot more performat and reliable to use the approach I described instead of going through the normal singlaton end user API I also provide.

Great to hear that.. I'm tired of manually calling __destruct() methods :)

Add Comment

Name:
Email:

Will not be posted. Please leave empty instead of filling in garbage though!
Comment:

Please follow the reStructured Text format. Do not use the comment form to report issues in software, use the relevant issue tracker. I will not answer them here.


All comments are moderated
Become a Patron!
Mastodon
GitHub
LinkedIn
RSS Feed
Flickr
YouTube
Vimeo
Email

My Amazon wishlist can be found here.

Life Line