Extending and Embedding PHP

Extending and Embedding PHP By Sara Golemon ............................................... Publisher: Sams Pub Date: May 30, 2006 Print ISBN-10: 0-672-32704-X Print ISBN-13: 978-0-672-32704-9 Pages: 456 Table of Contents | Index In just a few years PHP has rapidly evolved from a small niche language to a powerful web development tool. Now in use on over 14 million Web sites, PHP is more stable and extensible than ever. However, there is no documentation on how to extend PHP; developers seeking to build PHP extensions and increase the performance and functionality of their PHP applications are left to word of mouth and muddling through PHP internals without systematic, helpful guidance. Although the basics of extension writing are fairly easy to grasp, the more advanced features have a tougher learning curve that can be very difficult to overcome. This is common at any moderate to high-traffic site, forcing the company hire talented, and high-priced, developers to increase performance. With Extending and Embedding PHP, Sara Golemon makes writing extensions within the grasp of every PHP developer, while guiding the reader through the tricky internals of PHP. Extending and Embedding PHP By Sara Golemon ............................................... Publisher: Sams Pub Date: May 30, 2006 Print ISBN-10: 0-672-32704-X Print ISBN-13: 978-0-672-32704-9 Pages: 456 Table of Contents | Index Copyright Foreword About the Author We Want to Hear from You! Reader Services Introduction Chapter 1. The PHP Life Cycle It All Starts with the SAPI Starting Up and Shutting Down Life Cycles Zend Thread Safety Summary Chapter 2. Variables from the Inside Out Data Types Data Values Data Creation Data Storage Data Retrieval Data Conversion Summary Chapter 3. Memory Management Memory Reference Counting Summary Chapter 4. Setting Up a Build Environment Building PHP Configuring PHP for Development Compiling on UNIX Compiling on Win32 Summary Chapter 5. Your First Extension Anatomy of an Extension Building Your First Extension Building Statically Functional Functions Summary Chapter 6. Returning Values The return_value Variable Returning Values by Reference Summary Chapter 7. Accepting Parameters Automatic Type Conversion with zend_parse_parameters() Arg Info and Type-hinting Summary Chapter 8. Working with Arrays and HashTables Vectors Versus Linked Lists Zend Hash API zval* Array API Summary Chapter 9. The Resource Data Type Complex Structures Persistent Resources The Other refcounter Summary Chapter 10. PHP4 Objects The Evolution of the PHP Object Type Implementing Classes Working with Instances Summary Chapter 11. PHP5 Objects Evolutionary Leaps Methods Properties Interfaces Handlers Summary Chapter 12. Startup, Shutdown, and a Few Points in Between Cycles Exposing Information Through MINFO Constants Extension Globals Userspace Superglobals Summary Chapter 13. INI Settings Declaring and Accessing INI Settings Summary Chapter 14. Accessing Streams Streams Overview Opening Streams Accessing Streams Static Stream Operations Summary Chapter 15. Implementing Streams PHP Streams Below the Surface Wrapper Operations Implementing a Wrapper Manipulation Inspection Summary Chapter 16. Diverting the Stream Contexts Filters Summary Chapter 17. Configuration and Linking Autoconf Looking for Libraries Enforcing Module Dependencies Speaking the Windows Dialect Summary Chapter 18. Extension Generators ext_skel PECL_Gen Summary Chapter 19. Setting Up a Host Environment The Embed SAPI Building and Compiling a Host Application Re-creating CLI by Wrapping Embed Reusing Old Tricks Summary Chapter 20. Advanced Embedding Calling Back into PHP Dealing with Errors Initializing PHP Overriding INI_SYSTEM and INI_PERDIR Options Capturing Output Extending and Embedding at Once Summary Appendix A. A Zend API Reference Parameter Retrieval Classes Objects Exceptions Execution INI Settings Array Manipulation Hash Tables Resources/Lists Linked Lists Memory Constants Variables Miscellaneous API Function Summary Appendix B. PHPAPI Core PHP Streams API Extension APIs Summary Appendix C. Extending and Embedding Cookbook Skeletons Code Pantry Summary Appendix D. Additional Resources Open Source Projects Places to Look for Help Summary Index Copyright Extending and Embedding PHP Copyright © 2006 by Sams Publishing All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of the information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions. Nor is any liability assumed for damages resulting from the use of the information contained herein. Library of Congress Catalog Card Number: 2004093741 Printed in the United States of America First Printing: June 2006 09 08 07 06 4 3 2 1 Trademarks All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Sams Publishing cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. Warning and Disclaimer Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. The information provided is on an "as is" basis. The author and the publisher shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book. Bulk Sales Sams Publishing offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales. For more information, please contact U.S. Corporate and Government Sales 1-800-382-3419 corpsales@pearsontechgroup.com For sales outside of the U.S., please contact International Sales international@pearsoned.com Acquisitions Editors Betsy Brown Shelley Johnston Development Editor Damon Jordan Managing Editor Charlotte Clapp Project Editor Dan Knott Copy Editor Kate Givens Indexer Erika Millen Proofreader Susan Eldridge Technical Editor Brian France Publishing Coordinator Vanessa Evans Multimedia Developer Dan Scherf Interior Designer Gary Adair Cover Designer Alan Clements Page Layout Juli Cook Dedication To my partner Angela, who waited with patience and constancy while I ignored her night after night making this title a reality. And to my family, who gave me strength, courage, and confidence, and made me the person I am today. Foreword If you had told me when I submitted my first patch to the PHP project that I'd be writing a book on the topic just three years later, I'd have called you something unpleasant and placed you on /ignore. However, the culture surrounding PHP development is so welcoming, and so thoroughly entrapping, that looking back my only question is "Why aren't there more extension developers?" The short (easy) answer, of course, is that while PHP's documentation of userspace syntax and functions isin every waysecond to none, the documentation of its internals is far from complete and consistently out of date. Even now, the march of progress towards full Unicode support in PHP6 is introducing dozens of new API calls and changing the way everyone from userspace scripters to core developers looks at strings and binary safety. The response from those of us working on PHP who are most familiar with its quirks is usually, "Use the source." To be fair, that's a valid answer because nearly every method in the core, and the extensions (both bundled and PECL), are generously peppered with comments and formatted according to strict, well followed standards that are easy to read...once you're used to it. But where do new developers start? How do they find out what PHP_LONG_MACRO_NAME() does? And what, precisely, is the difference between a zval and a pval? (Hint: There isn't one; they're the same variable type). This book aims to bring the PHP internals a step closer to the level of accessibility that has made the userspace language so popular. By exposing the well planned and powerful APIs of PHP and the Zend Engine, we'll all benefit from a richer pool of talented developers both from the commercial ranks and within the open source community. About the Author Sara Golemon is a self-described terminal geek (pun intended). She has been involved in the PHP project as a core developer for nearly four years and is best known for approaching the language "a little bit differently than everyone else"; a quote you're welcome to take as either praise or criticism. She has worked as a programmer/analyst at the University of California, Berkeley for the past six years after serving the United States District Courts for several years prior. Sara is also the developer and lead maintainer of a dozen PECL extensions as well as libssh2, a non-PHP related project providing easy access to the SSH2 protocol. At the time of this writing, she is actively involved with migrating the streams layer for Unicode compatibility in PHP6. We Want to Hear from You! As the reader of this book, you are our most important critic and commentator. We value your opinion and want to know what we're doing right, what we could do better, what areas you'd like to see us publish in, and any other words of wisdom you're willing to pass our way. You can email or write me directly to let me know what you did or didn't like about this bookas well as what we can do to make our books stronger. Please note that I cannot help you with technical problems related to the topic of this book, and that due to the high volume of mail I receive, I might not be able to reply to every message. When you write, please be sure to include this book's title and author as well as your name and phone or email address. I will carefully review your comments and share them with the author and editors who worked on the book. Email: opensource@samspublishing.com Mail: Mark Taber Associate Publisher Sams Publishing 800 East 96th Street Indianapolis, IN 46240 USA Reader Services Visit our website and register this book at www.samspublishing.com/register for convenient access to any updates, downloads, or errata that might be available for this book. Introduction Should You Read This Book? You probably picked this book off the shelf because you have some level of interest in the PHP language. If you are new to programming in general and are looking to get into the industry with a robust but easy-to-use language, this is not the title for you. Have a look at PHP and MySQL Web Development or Teach Yourself PHP in 24 Hours. Both titles will get you accustomed to using PHP and have you writing applications in no time. After you become familiar with the syntax and structure of the PHP scripts, you'll be ready to delve into this title. Encyclopedic knowledge of the userspace functions available within PHP won't be necessary, but it will help to know what wheels don't need reinventing, and what proven design concepts can be followed. Because the PHP interpreter was written in C, its extension and embedding API was written from a C language perspective. Although it is certainly possible to extend from or embed into another language, doing so is outside of the scope of this book. Knowing basic C syntax, datatypes, and pointer management is vital. It will be helpful if you are familiar with autoconf syntax. Don't worry about it if you aren't; you'll only need to know a few basic rules of thumb to get by and you'll be introduced to these rules in Chapters 17, "Configuration and Linking" and 18, "Extension Generators." Why Should You Read This Book? This book aims to teach you how to do two things. First, it will show you how to extend the PHP language by adding functions, classes, resources, and stream implementations. Second, it will teach you how to embed the PHP language itself into other applications, making them more versatile and useful to your users and customers. Why Would You Want to Extend PHP? There are four common reasons for wanting to extend PHP. By far, the most common reason is to link against an external library and expose its API to userspace scripts. This motivation is seen in extensions like mysql, which links against the libmysqlclient library to provide the mysql_*() family of functions to PHP scripts. These types of extensions are what developers are referring to when they describe PHP as "glue." The code that makes up the extension performs no significant degree of work on its own; rather, it creates an interpretation bridge between PHP's extension API and the API exposed by the library. Without this, PHP and libraries like libmysqlclient would not be able to communicate on a common level. Figure I.1 shows how this type of extension bridges the gap between third-party libraries and the PHP core. Figure I.1. Glue Extensions Another common reason to extend PHP is performing special internal operations like declaring superglobals, which cannot be done from userspace because of security restrictions or design limitations. Extensions such as apd (Advanced PHP Debugger) and runkit perform this kind of "internal only" work by exposing bits of the virtual machine's execution stack that are ordinarily hidden from view. Coming in third is the sheer need for speed. PHP code has to be tokenized, compiled, and stepped through in a virtual machine environment, which can never be as fast as native code. Certain utilities (known as Opcode Caches) can allow scripts to skip the tokenization and compilation step on repeated execution, but they can never speed up the execution step. By translating it to C code, the maintainer sacrifices some of the ease of design that makes PHP so powerful, but gains a speed increase on the order of several multiples. Lastly, a script author may have put years of work into a particularly clever subroutine and now wants to sell it to another party, but doesn't want to reveal the source code. One approach would be to use an opcode encryption program; however, this approach is more easily decoded than a machine code extension. After all, in order to be useful to the licensed party, their PHP build must, at some point, have access to the compiled bytecode. After the decrypted bytecode is in memory, it's a short road to extracting it to disk and displaying the code. Bytecode, in turn, is much easier to parse into source script than a native binary. What's worse, rather than having a speed advantage, it's actually slightly slower because of the decryption phase. What Does Embedding Actually Accomplish? Let's say you've written an entire application in a nice, fast, lean, compiled language like C. To make the application more useful to your users or clients, you'd like to provide a means for them to script certain behaviors using a simple high-level language where they don't have to worry about memory management, or pointers, or linking, or any of that complicated stuff. If the usefulness of such a feature isn't immediately obvious, consider what your office productivity applications would be without macros, or your command shell without batch files. What sorts of behavior would be impossible in a web browser without JavaScript? Would you be able to capture the magic Hula-Hoop and rescue the prince without being able to program your F1 key to fire a triple shot from your rocket launcher at just the right time to defeat the angry monkey? Well, maybe, but your thumbs would hurt. So let's say you want to build customizable scripting into your application; you could write your own compiler, build an execution framework, and spend thousands of hours debugging it, or you could take a ready-made enterprise class language like PHP and embed its interpreter right into your application. Tough choice, isn't it? What's Inside? This book is split into three primary topics. First you'll be reintroduced to PHP from the inside out in Part I, "Getting to Know PHP All Over Again." You'll see how the building blocks of the PHP interpreter fit together, and learn how familiar concepts from userspace map to their internal representations. In Part II, "Extensions", you'll start to construct a functional PHP extension and learn how to use additional features of the PHPAPI. By the end of this section, you should be able to translate nearly any PHP script to faster, leaner C code. You'll also be ready to link against external libraries and perform actions not possible from userspace. In Part III, "Embedding", you'll approach PHP from the opposite angle. Here, you'll start with an ordinary application and add PHP scripting support into it. You'll learn how to leverage safe_mode and other security features to execute user-supplied code safely, and coordinate multiple requests simultaneously. Finally, you'll find a set of appendices containing a reference guide to API calls, solutions to common problems, and where to find existing extensions to crib from. PHP Versus Zend The first thing you need to know about PHP is that it's actually made up of five separate pieces shown in Figure I.2. Figure I.2. Anatomy of PHP. [View full size image] At the bottom of the heap is the SAPI (Server API) layer, which coordinates the lifecycle process you'll see in Chapter 1, "The PHP Lifecycle." This layer is what interfaces to web servers like Apache (through mod_php5.so) or the command line (through bin/php). In Part III, you'll be linking against the embed SAPI which operates at this layer. Above the SAPI layer is the PHP Core. The core provides a binding layer for key events and handles certain low-level operations like file streams, error handling, and startup/shutdown triggering. Right next to the core you'll find the Zend Engine, which parses and compiles human readable scripts into machine readable bytecode. Zend also executes that bytecode inside a virtual machine where it reads and writes userspace variables, manages program flow, and periodically passes control to one of the other layers such as during a function call. Zend also provides per-request memory management and a robust API for environment manipulation. Lying above PHP and Zend is the extension layer where you'll find all the functions available from userspace. Several of these extensions (such as standard, pcre, and session) are compiled in by default and are often not even thought of as extensions. Others are optionally built into PHP using ./configure options like with-mysql or enable-sockets, or built as shared modules and then loaded in the php.ini with extension= or in userspace scripts using the dl() function. You'll be developing in this layer in Part II and Part III when you start to perform simultaneous embedding and extending. Wrapped up around and threaded through all of this is the TSRM (Thread Safe Resource Management) layer. This portion of the PHP interpreter is what allows a single instance of PHP to execute multiple independent requests at the same time without stepping all over each other. Fortunately most of this layer is hidden from view through a range of macro functions that you'll gradually come to be familiar with through the course of this book. What Is an Extension? An extension is a discrete bundle of code that can be plugged into the PHP interpreter in order to provide additional functionality to userspace scripts. Extensions typically export at least one function, class, resource type, or stream implementation, often a dozen or more of these in some combination. The most widely used extension is the standard extension, which defines more than 500 functions, 10 resource types, 2 classes, and 5 stream wrappers. This extension, along with the zend_builtin_functions extension, is always compiled into the PHP interpreter regardless of any other configuration options. Additional extensions, such as session, spl, pcre, mysql, and sockets, are enabled or disabled with configuration options, or built separately using the phpize tool. One structure that each extension (or module) shares in common is the zend_module_entry struct defined in the PHP source tarball under Zend/zend_modules.h. This structure is the "start point" where PHP introduces itself to your extension and defines the startup and shutdown methods used by the lifecycle process described in Chapter 1 (see Figure I.3). This structure also references an array of zend_function_entry structures, defined in Zend/zend_API.h. This array, as the data type suggests, lists the built-in functions exported by the extension. Figure I.3. PHP extension entry point. You'll examine this structure in more depth starting with Chapter 6, "Returning Values," when you begin to build a functioning extension. How Is Embedding Accomplished with PHP? Ordinarily, the PHP interpreter is linked into a process that shuttles script requests into the interpreter and passes the results back out. The CLI SAPI does this in the form of a thin wrapper between the interpreter and the command line shell while the Apache SAPI exports the right hooks as an apxs module. It might be tempting to embed PHP into your application using a custom written SAPI module. Fortunately, it's completely unnecessary! Since version 4.3, the standard PHP distribution has included a SAPI called embed, which allows the PHP interpreter to act like an ordinary dynamic link library that you can include in any application. In Part III, you'll see how any application can leverage the power and flexibility of PHP code through the use of this simple and concise library. Terms Used Throughout This Book PHP Refers to the PHP interpreter as a whole including Zend, TSRM, the SAPI layer, and any extensions. PHP Core A smaller subset of the PHP interpreter as defined in the "PHP Versus Zend" section earlier in this chapter. Zend The Zend Engine, which handles parsing, compiling, and executing script opcodes. PEAR The PHP Extension and Application Repository. The PEAR project (http://pear.php.net) is the official home for community-generated open source free projects. PEAR houses several hundred object- oriented classes written in PHP script, providing drop-in solutions to common programming tasks. Despite its name, PEAR does not include C-language PHP extensions. PECL The PHP Extension Code Library, pronounced "pickle." PECL (http://pecl.php.net) is the C-code offshoot of the PEAR project that uses many of the same packaging, deployment, and installation systems. PECL packages are usually PHP extensions, but may include Zend extensions or SAPI implementations. PHP extension Also known as a module. A discrete bundle of compiled code defining userspace-accessible functions, classes, stream implementations, constants, ini options, and specialized resource types. Anywhere you see the term extension used elsewhere in the text, you may assume it is referring to a PHP extension. Zend extension A variant of the PHP extension used by specialized systems such as OpCode caches and encoders. Zend extensions are beyond the scope of this book. Userspace The environment and API library visible to scripts actually written in the PHP language. Userspace has no access to PHP internals or data structures not explicitly granted to it by the workings of the Zend Engine and the various PHP extensions. Internals (C-space) Engine and extension code. This term is used to refer to all those things that are not directly accessible to userspace code. Chapter 1. The PHP Life Cycle IN A COMMON WEB SERVER ENVIRONMENT, YOU'LL NEVER explicitly start the PHP interpreter; you'll start Apache or some other web server that will load PHP and process scripts as neededthat is, as .php documents are requested. It All Starts with the SAPI Though it may look very different, the CLI binary actually behaves just the same way. A php command, entered at the system prompt starts up the "command line sapi," which acts like a miniweb server designed to service a single request. When the script is done running, this miniPHP-web server shuts down and returns control to the shell. Starting Up and Shutting Down This startup and shutdown process happens in two separate startup phases and two separate shutdown phases. One cycle is for the PHP interpreter as a whole to perform an initial setup of structures and values that will persist for the life of the SAPI. The second is for transient settings that only last as long as a single page request. During the initial startup, before any request has been made, PHP calls every extension's MINIT (Module Initialization) method. Here, extensions are expected to declare constants, define classes, and register resource, stream, and filter handlers that all future script requests will use. Features such as these, which are designed to exist across all requests, are referred to as being persistent. A common MINIT method might look like the following: /* Initialize the myextension module * This will happen immediately upon SAPI startup */ PHP_MINIT_FUNCTION(myextension) { /* Globals: Chapter 12 */ #ifdef ZTS ts_allocate_id(&myextension_globals_id, sizeof(php_myextension_globals), (ts_allocate_ctor) myextension_globals_ctor, (ts_allocate_dtor) myextension_globals_dtor); #else myextension_globals_ctor(&myextension_globals TSRMLS_CC); #endif /* REGISTER_INI_ENTRIES() refers to a global * structure that will be covered in * Chapter 13 "INI Settings" */ REGISTER_INI_ENTRIES(); /* define('MYEXT_MEANING', 42); */ REGISTER_LONG_CONSTANT("MYEXT_MEANING", 42, CONST_CS | CONST_PERSISTENT); /* define('MYEXT_FOO', 'bar'); */ REGISTER_STRING_CONSTANT("MYEXT_FOO", "bar", CONST_CS | CONST_PERSISTENT); /* Resources: chapter 9 */ le_myresource = zend_register_list_destructors_ex( php_myext_myresource_dtor, NULL, "My Resource Type", module_number); le_myresource_persist = zend_register_list_destructors_ex( NULL, php_myext_myresource_dtor, "My Resource Type", module_number); /* Stream Filters: Chapter 16 */ if (FAILURE == php_stream_filter_register_factory("myfilter", &php_myextension_filter_factory TSRMLS_CC)) { return FAILURE; } /* Stream Wrappers: Chapter 15 */ if (FAILURE == php_register_url_stream_wrapper ("myproto", &php_myextension_stream_wrapper TSRMLS_CC)) { return FAILURE; } /* Autoglobals: Chapter 12 */ #ifdef ZEND_ENGINE_2 if (zend_register_auto_global("_MYEXTENSION", sizeof("_MYEXTENSION") - 1, NULL TSRMLS_CC) == FAILURE) { return FAILURE; } zend_auto_global_disable_jit ("_MYEXTENSION", sizeof("_MYEXTENSION") - 1 TSRMLS_CC); #else if (zend_register_auto_global("_MYEXTENSION", sizeof("_MYEXTENSION") - 1 TSRMLS_CC) == FAILURE) { return FAILURE; } #endif return SUCCESS; } After a request has been made, PHP sets up an operating environment including a symbol table (where variables are stored) and synchronizes per-directory configuration values. PHP then loops through its extensions again, this time calling each one's RINIT (Request Initialization) method. Here, an extension may reset global variables to default values, prepopulate variables into the script's symbol table, or perform other tasks such as logging the page request to a file. RINIT can be thought of as a kind of auto_prepend_file directive for all scripts requested. An RINIT method might be expected to look like this: /* Run at the start of every page request */ PHP_RINIT_FUNCTION(myextension) { zval *myext_autoglobal; /* Initialize the autoglobal variable * declared in the MINIT function * as an empty array. * This is equivalent to performing: * $_MYEXTENSION = array(); */ ALLOC_INIT_ZVAL(myext_autoglobal); array_init(myext_autoglobal); zend_hash_add(&EG(symbol_table), "_MYEXTENSION", sizeof("_MYEXTENSION") - 1, (void**)&myext_autoglobal, sizeof(zval*), NULL); return SUCCESS; } After a request has completed processing, either by reaching the end of the script file or by exiting through a die() or exit() statement, PHP starts the cleanup process by calling each extension's RSHUTDOWN (Request Shutdown) method. RSHUTDOWN corresponds to auto_append_file in much the same was as RINIT corresponds to auto_prepend_file. The most important difference between RSHUTDOWN and auto_append_file, however, is that RSHUTDOWN will always be executed, whereas a call to die() or exit() inside the userspace script will skip any auto_append_file. Any last minute tasks that need to be performed can be handled in RSHUTDOWN before the symbol table and other resources are destroyed. After all RSHUTDOWN methods have completed, every variable in the symbol table is implicitly unset(), during which all non-persistent resource and object destructors are called in order to free resources gracefully. /* Run at the end of every page request */ PHP_RSHUTDOWN_FUNCTION(myextension) { zval **myext_autoglobal; if (zend_hash_find(&EG(symbol_table), "_MYEXTENSION", sizeof("_MYEXTENSION"), (void**)&myext_autoglobal) == SUCCESS) { /* Do something meaningful * with the values of the * $_MYEXTENSION array */ php_myextension_handle_values(myext_autoglobal TSRMLS_CC); } return SUCCESS; } Finally, when all requests have been fulfilled and the web server or other SAPI is ready to shut down, PHP loops through each extension's MSHUTDOWN (Module Shutdown) method. This is an extension's last chance to unregister handlers and free persistent memory allocated during the MINIT cycle. /* This module is being unloaded * constants and functions will be * automatically purged, * persistent resources, class entries, * and stream handlers must be * manually unregistered. */ PHP_MSHUTDOWN_FUNCTION(myextension) { UNREGISTER_INI_ENTRIES(); php_unregister_url_stream_wrapper ("myproto" TSRMLS_CC); php_stream_filter_unregister_factory ("myfilter" TSRMLS_CC); return SUCCESS; } Life Cycles Each PHP instance, whether started from an init script, or from the command line, follows a series of events involving both the Request/Module Init/Shutdown events covered previously, and the actual execution of scripts themselves. How many times, and how frequently each startup and shutdown phase is executed, depends on the SAPI in use. The four most common SAPI configurations are CLI/CGI, Multiprocess Module, Multithreaded Module, and Embedded. CLI Life Cycle The CLI (and CGI) SAPI is fairly unique in its single-request life cycle; however, the Module versus Requests steps are still cycles in discrete loops. Figure 1.1 shows the progression of the PHP interpreter when called from the command line for the script test.php. Figure 1.1. Requests cycles versus engine life cycle. The Multiprocess Life Cycle The most common configuration of PHP embedded into a web server is using PHP built as an APXS module for Apache 1, or Apache 2 using the Pre-fork MPM. Many other web server configurations fit into this same category, which will be referred to as the multiprocess model through the rest of this book. It's called the multiprocess model because when Apache starts up, it immediately forks several child processes, each of which has its own process space and functions independently from each another. Within a given child, the life cycle of that PHP instance looks immediately familiar as shown in Figure 1.2. The only variation here is that multiple requests are sandwiched between a single MINIT/MSHUTDOWN pair. Figure 1.2. Individual process life cycle. This model does not allow any one child to be aware of data owned by another child, although it does allow children to die and be replaced at will without compromising the stability of any other child. Figure 1.3 shows multiple children of a single Apache invocation and the calls to each of their MINIT, RINIT, RSHUTDOWN, and MSHUTDOWN methods. Figure 1.3. Multiprocess life cycles. The Multithreaded Life Cycle Increasingly, PHP is being seen in a number of multithreaded web server configurations such as the ISAPI interface to IIS and the Apache 2 Worker MPM. Under a multithreaded web server only one process runs at any given time, but multiple threads execute within that process space simultaneously. This allows several bits of overhead, including the repeated calls to MINIT/MSHUTDOWN to be avoided, true global data to be allocated and initialized only once, and potentially opens the door for multiple requests to deterministically share information. Figure 1.4 shows the parallel process flow that occurs within PHP when run from a multithreaded web server such as Apache 2. Figure 1.4. Multithreaded life cycles. [View full size image] The Embed Life Cycle Recalling that the Embed SAPI is just another SAPI implementation following the same rules as the CLI, APXS, or ISAPI interfaces, it's easy to imagine that the life cycle of a request will follow the same basic path: Module Init => Request Init => Request => Request Shutdown => Module Shutdown. Indeed, the Embed SAPI follows each of these steps in perfect time with its siblings. What makes the Embed SAPI appear unique is that the request may be fed in multiple script segments that function as part of a single whole request. Control will also pass back and forth between PHP and the calling application multiple times under most configurations. Although an Embed request may consist of one or more code elements, embed applications are subject to the same request isolation requirements as web servers. In order to process two or more simultaneous embed environments, your application will either need to fork like Apache1 or thread like Apache2. Attempting to process two separate request environments within a single non-threaded process space will lead to unexpected, and certainly undesired, results. Zend Thread Safety When PHP was in its infancy, it ran as a single process CGI and had no concern for thread safety because no process space could outlive a single request. An internal variable could be declared in the global scope and accessed or changed at will without consequence so long as its contents were properly initialized. Any resources that weren't cleaned up properly would be released when the CGI process terminated. Later on, PHP was embedded into multiprocess web servers like Apache. A given internal variable could still be defined globally and safely accessed by the active request so long as it was properly initialized at the start of each request and cleaned up at the end because only one request per process space could ever be active at one time. At this point per-request memory management was added to keep resource leaks from growing out of control. As single-process multithreaded web servers started to appear, however, a new approach to handling global data became necessary. Eventually this would emerge as a new layer called TSRM (Thread Safe Resource Management). Thread-Safe Versus NonThread-Safe Declaration In a simple non-threaded application, you would most likely declare global variables by placing them at the top of your source file. The compiler would then allocate a block of memory in your program's data segment to hold that unit of information. In a multithreaded application where each thread needs its own version of that data element, it's necessary to allocate a separate block of memory for each thread. A given thread then picks the correct block of memory when it needs to access its data, and references from that pointer. Thread-Safe Data Pools During an extension's MINIT phase, the TSRM layer is notified how much data will need to be stored by that extension using one or more calls to the ts_allocate_id() function. TSRM adds that byte count to its running total of data space requirements, and returns a new, unique identifier for that segment's portion of the thread's data pool. typedef struct { int sampleint; char *samplestring; } php_sample_globals; int sample_globals_id; PHP_MINIT_FUNCTION(sample) { ts_allocate_id(&sample_globals_id, sizeof(php_sample_globals), (ts_allocate_ctor) php_sample_globals_ctor, (ts_allocate_dtor) php_sample_globals_dtor); return SUCCESS; } When it comes time to access that data segment during a request, the extension requests a pointer from the TSRM layer for the current thread's resource pool, offset by the appropriate index suggested by the resource ID returned by ts_allocate_id(). Put another way, in terms of code flow, the following statement SAMPLE_G(sampleint) = 5; is one that you might see in the module associated with the previous MINIT statement. Under a thread-safe build, this statement expands through a number of intermediary macros to the following: (((php_sample_globals*)(*((void ***)tsrm_ls))[sample_globals_id-1])->sampleint = 5; Don't be concerned if you have trouble parsing that statement; it's so well integrated into the PHPAPI that some developers never bother to learn how it works. When Not to Thread Because accessing global resources within a thread-safe build of PHP involves the overhead of looking up the correct offset into the right data pool, it ends up being slower than its non-threaded counterpart, in which data is simply plucked out of a true global whose address is computed at compile time. Consider the prior example again, this time under a non-threaded build: typedef struct { int sampleint; char *samplestring; } php_sample_globals; php_sample_globals sample_globals; PHP_MINIT_FUNCTION(sample) { php_sample_globals_ctor(&sample_globals TSRMLS_CC); return SUCCESS; } The first thing you'll notice here is that rather than declaring an int to identify a reference to a globals struct declared elsewhere, you're simply defining the structure right in the process's global scope. This means that the SAMPLE_G(sampleint) = 5; statement from before only needs to expand out as sample_globals.sampleint = 5;. Simple, fast, and efficient. Non-threaded builds also have the advantage of process isolation so that if a given request encounters completely unexpected circumstances, it can bail all the way out or even segfault without bringing the entire web server to its knees. In fact, Apache's MaxRequestsPerChild directive is designed to take advantage of this effect by deliberately killing its children every so often and spawning fresh ones in their place. Agnostic Globals Access When creating an extension, you won't necessarily know whether the environment it gets built for will require thread safety or not. Fortunately, part of the standard set of include files that you'll use conditionally define the ZTS preprocessor token. When PHP is built for thread safety, either because the SAPI requires it, or through the enable- maintainer-zts option, this value is automatically defined and can be tested with the usual set of directives such as #ifdef ZTS. As you saw a moment ago, it only makes sense to allocate space in the thread safety pool if the pool actually exists, and it will only exist if PHP was compiled for thread safety. That's why in the previous examples it's wrapped in checks for ZTS, with a non-threaded alternative being called for non-ZTS builds. In the PHP_MINIT_FUNCTION(myextension) example you saw much earlier in this chapter, #ifdef ZTS was used to conditionally call the correct version of global initialization code. For ZTS mode it used ts_allocate_id() to populate the myextension_globals_id variable, and non-ZTS mode just called the initialization method for myextension_globals directly. These two variables would have been declared in your extensions source file using a Zend macro: DECLARE_MODULE_GLOBALS (myextension); which automatically handles testing for ZTS and declaring the correct host variable of the appropriate type depending on whether ZTS is enabled. When it comes time to access these global variables, you'll use a self-defined macro like SAMPLE_G() shown earlier. In Chapter 12, you'll learn how to design this macro to expand to the correct form depending on whether ZTS is enabled. Threading Even When You Don't Have To A normal PHP build has thread safety turned off by default and only enables it if the SAPI being built is known to require thread safety, or if thread safety is explicitly turned on by a ./configure switch. Given the speed issues with global lookups and the lack of process isolation you might wonder why anyone would deliberately turn the TSRM layer on when it's not required. For the most part, it's extension and SAPI developerslike you're about to becomewho turn thread safety on in order to ensure that new code will run correctly in all environments. When thread safety is enabled, a special pointer, called tsrm_ls, is added to the prototype of many internal functions. It's this pointer that allows PHP to differentiate the data associated with one thread from another. You may recall seeing it used with the SAMPLE_G() macro under ZTS mode earlier in this chapter. Without it, an executing function wouldn't know whose symbol table to look up and set a particular value in; it wouldn't even know which script was being executed, and the engine would be completely unable to track its internal registers. This one pointer keeps one thread handling page request from running right over the top of another. The way this pointer parameter is optionally included in prototypes is through a set of defines. When ZTS is disabled, these defines all evaluate to blank; when it's turned on, however, they look like the following: #define TSRMLS_D void ***tsrm_ls #define TSRMLS_DC , void ***tsrm_ls #define TSRMLS_C tsrm_ls #define TSRMLS_CC , tsrm_ls A non-ZTS build would see the first line in the following code as having two parameters, an int and a char*. Under a ZTS build, on the other hand, the prototype contains three parameters: an int, a char*, and a void***. When your program calls this function, it will need to pass in that parameter, but only for ZTS-enabled builds. The second line in the following code shows how the CC macro accomplishes exactly that. int php_myext_action(int action_id, char *message TSRMLS_DC); php_myext_action(42, "The meaning of life" TSRMLS_CC); By including this special variable in the function call, php_myext_action will be able to use the value of tsrm_ls together with the MYEXT_G() macro to access its thread-specific global data. On a non-ZTS build, tsrm_ls will be unavailable, but that's okay because MYEXT_G(), and other similar macros, will have no use for it. Now imagine that you're working on a new extension and you've got the following function that works beautifully under your local build using the CLI SAPI, and even when you compile it using the apxs SAPI for Apache 1: static int php_myext_isset(char *varname, int varname_len) { zval **dummy; if (zend_hash_find(EG(active_symbol_table), varname, varname_len + 1, (void**)&dummy) == SUCCESS) { /* Variable exists */ return 1; } else { /* Undefined variable */ return 0; } } Satisfied that everything is working well, you package up your extension and send it to another office to be built and run on the production servers. To your dismay, the remote office reports that the extension failed to compile. It turns out that they're using Apache 2.0 in threaded mode so their build of PHP has ZTS enabled. When the compiler encountered your use of the EG() macro, it tried to find tsrm_ls in the local scope and couldn't because you never declared it and never passed it to your function. The fix is simple of course; just add TSRMLS_DC to the declaration of php_myext_isset() and toss a TSRMLS_CC onto every line that calls it. Unfortunately, the production team in the remote office is a little less certain of your extension's quality now and would like to put off the rollout for another couple of weeks. If only this problem could have been caught sooner! That's where enable-maintainer-zts comes in. By adding this one line to your ./configure statement when building PHP, your build will automatically include ZTS even if your current SAPI, such as CLI, doesn't require it. Enabling this switch, you can avoid this common and unnecessary programming mistake. Note In PHP4, the enable-maintainer-zts flag was known as enable-experimental-zts; be sure to use the correct flag for your version of PHP. Finding a Lost tsrm_ls Occasionally, it's just not possible to pass the tsrm_ls pointer into a function that needs it. Usually this is because your extension is interfacing with a library that uses callbacks and doesn't provide room for an abstract pointer to be returned. Consider the following piece of code: void php_myext_event_callback(int eventtype, char *message) { zval *event; /* $event = array('event'=>$eventtype, 'message'=>$message) */ MAKE_STD_ZVAL(event); array_init(event); add_assoc_long(event, "type", eventtype); add_assoc_string(event, "message", message, 1); /* $eventlog[] = $event; */ add_next_index_zval(EXT_G(eventlog), event); } PHP_FUNCTION(myext_startloop) { /* The eventlib_loopme() function, * exported by an external library, * waits for an event to happen, * then dispatches it to the * callback handler specified. */ eventlib_loopme(php_myext_event_callback); } Although not all of this code segment will make sense yet, you will notice right away that the callback function uses the EXT_G() macro, which is known to need the tsrm_ls pointer under threaded builds. Changing the function prototype will do no good because the external library has no notion of PHP's thread-safety model, nor should it. So how can tsrm_ls be recovered in such a way that it can be used? The solution comes in the form of a Zend macro called TSRMLS_FETCH(). When placed at the top of a code segment, this macro will perform a lookup based on the current threading context, and declare a local copy of the tsrm_ls pointer. Although it will be tempting to use this macro everywhere and not bother with passing tsrm_ls via function calls, it's important to note that a TSRMLS_FETCH() call takes a fair amount of processing time to complete. Not noticeable on a single iteration certainly, but as your thread count increases, and the number of instances in which you call TSRMLS_FETCH() grows, your extension will gradually begin to show this bottleneck for what it is. Be sure to use it sparingly. Note To ensure compatibility with C++ compilers, be sure to place TSRMLS_FETCH()and all variable declarations for that matterat the top of a given block scope before any statements. Because the TSRMLS_FETCH() macro itself can resolve in a couple of different ways, it's best to make this the last variable declared within a given declaration header. Summary In this chapter you glimpsed several of the concepts that you'll explore in later chapters. You also built a foundation for understanding what goes on, not only under the hood of the extensions you'll come to build, but behind the scenes of the Zend Engine and TSRM layer, which you'll take advantage of as you embed and deploy PHP in your applications. Chapter 2. Variables from the Inside Out ONE THING EVERY PROGRAMMING LANGUAGE SHARES IN COMMON is a means to store and retrieve information; PHP is no exception. Although many languages require all variables to be declared beforehand and that the type of information they will hold be fixed, PHP permits the programmer to create variables on the fly and store any type of information that the language is capable of expressing. When the stored information is needed, it is automatically converted to whatever type is appropriate at the time. Because you've used PHP from the userspace side already, this concept, known as loose typing, shouldn't be unfamiliar to you. In this chapter, you'll look at how this information is encoded internally by PHP's parent language, C, which requires strict typecasting. Of course, encoding data is only half of the equation. To keep track of all these pieces of information, each one needs a label and a container. From the userspace realm, you'll recognize these concepts as variable names and scope. Data Types The fundamental unit of data storage in PHP is known as the zval, or Zend Value. It's a small, four member struct defined in Zend/zend.h with the following format: typedef struct _zval_struct { zvalue_value value; zend_uint refcount; zend_uchar type; zend_uchar is_ref; } zval; It should be a simple matter to intuit the basic storage type for most of these members: unsigned integer for refcount, and unsigned character for type and is_ref. The value member however, is actually a union structure defined, as of PHP5, as: typedef union _zvalue_value { long lval; double dval; struct { char *val; int len; } str; HashTable *ht; zend_object_value obj; } zvalue_value; This union allows Zend to store the many different types of data a PHP variable is capable of holding in a single, unified structure. Zend currently defines the eight data types listed in Table 2.1. Table 2.1. Data Types Used by Zend/PHP Type Value Purpose IS_NULL This type is automatically assigned to uninitialized variables upon their first use and can also be explicitly assigned in userspace using the built-in NULL constant. This variable type provides a special "non-value," which is distinct from a Boolean FALSE or an integer 0. IS_BOOL Boolean variables can have one of two possible states, either TRUE or FALSE. Conditional expressions in userspace control structuresif, while, ternary, forare implicitly typecast to Boolean during evaluation. IS_LONG Integer data types in PHP are stored using the host system's signed long data type. On most 32-bit platforms this yields a storage range of -2147483648 to +2147483647. With a few exceptions, whenever a userspace script attempts to store an integer value outside of this range, it is automatically converted to a doubleprecision floating point type (IS_DOUBLE). IS_DOUBLE Floating point data types use the host system's signed double data type. Floating point numbers are not stored with exact precision; rather, a formula is used to express the value as a fraction of limited precision (mantissa) times 2 raised to a certain power (exponent). This representation allows the computer to store a wide range of values The IS_* constants listed in Table 2.1 are stored in the type element of the zval struct and determine which part of the value element of the zval struct should be looked at when examining its value. The most obvious way to inspect the value of type would probably be to dereference it from a given zval as in the following code snippet: (positive or negative) from as small as 2.225x10^ (-308) to an upper limit of around 1.798x10^308 in only 8 bytes. Unfortunately, numbers that evaluate to exact figures in decimal don't always store cleanly as binary fractions. For example, the decimal expression 0.5 evaluates to an exact binary figure of 0.1, while decimal 0.8 becomes a repeating binary representation of 0.1100110011.... When converted back to decimal, the truncated binary digits yield a slightly offset value because they are not able to store the entire figure. Think of it like trying to express the number 1/3 as a decimal: 0.333333 comes very close, but it's not precise as evidenced by the fact that 3 * 0.333333 is not 1.0. This imprecision often leads to confusion when dealing with floating point numbers on computers. (These range limits are based on common 32-bit platforms; range may vary from system to system.) IS_STRING PHP's most universal data type is the string which is stored in just the way an experienced C programmer would expect. A block of memory, sufficiently large to hold all the bytes/characters of the string, is allocated and a pointer to that string is stored in the host zval. What's worth noting about PHP strings is that the length of the string is always explicitly stated in the zval structure. This allows strings to contain NULL bytes without being truncated. This aspect of PHP strings will be referred to hereafter as binary safety because it makes them safe to contain any type of binary data. Note that the amount of memory allocated for a given PHP string is always, at minimum, its length plus one. This last byte is populated with a terminating NULL character so that functions that do not require binary safety can simply pass the string pointer through to their underlying method. IS_ARRAY An array is a special purpose variable whose sole function is to carry around other variables. Unlike C's notion of an array, a PHP array is not a vector of a uniform data type (such as zval arrayofzvals[]; ). Instead, a PHP array is a complex set of data buckets linked into a structure known as a HashTable. Each HashTable element (bucket) contains two relevant pieces of information: label and data. In the case of PHP arrays, the label is the associative or numeric index within the array, and the data is the variable (zval) to which that key refers. IS_OBJECT Objects take the multi-element data storage of arrays and go one further by adding methods, access modifiers, scoped constants, and special event handlers. As an extension developer, building object-oriented code that functions equally well in PHP4 and PHP5 presents a special challenge because the internal object model has changed so much between Zend Engine 1 (PHP4) and Zend Engine 2 (PHP5). IS_RESOURCE Some data types simply cannot be mapped to userspace. For example, stdio's FILE pointer or libmysqlclient's connection handle can't be simply mapped to an array of scalar values, nor would they make sense if they could. To shield the userspace script writer from having to deal with these issues, PHP provides a generic resource data type. The details of how resources are implemented will be covered in Chapter 9, "The Resource Datatype"; for now just be aware that they exist. void describe_zval(zval *foo) { if (foo->type == IS_NULL) { php_printf("The variable is NULL"); } else { php_printf("The variable is of type %d", foo->type); } } Obvious, but wrong. Well, not wrong, but certainly not the preferred approach. The Zend header files contain a large block of zval access macros that extension authors are expected to use when examining zval data. The primary reason for this is to avoid incompatibilities when and if the engine's API changes, but as a side benefit the code often becomes easier to read. Here's that same code snippet again, this time using the Z_TYPE_P() macro: void describe_zval(zval *foo) { if (Z_TYPE_P(foo) == IS_NULL) { php_printf("The variable is NULL"); } else { php_printf("The variable is of type %d", Z_TYPE_P(foo)); } } The _P suffix to this macro indicates that the parameter passed contains a single level of indirection. Two more macros exist in this set, Z_TYPE() and Z_TYPE_PP(), which expect parameters of type zval (no indirection), and zval** (two levels of indirection) respectively. Note In this example a special output function, php_printf(), was used to display a piece of data. This function is syntactically identical to stdio's printf() function; however, it handles special processing for web server SAPIs and takes advantage of PHP's output buffering mechanism. You'll learn more about this function and its cousin PHPWRITE() in Chapter 5, "Your First Extension." Data Values As with type, the value of zvals can be inspected using a triplet of macros. These macros also begin with Z_, and optionally end with _P or _PP depending on their degree of indirection. For the simple scalar types, Boolean, long, and double, the macros are short and consistent: BVAL, LVAL, and DVAL. void display_values(zval boolzv, zval *longpzv, zval **doubleppzv) { if (Z_TYPE(boolzv) == IS_BOOL) { php_printf("The value of the boolean is: %s\n", Z_BVAL(boolzv) ? "true" : "false"); } if (Z_TYPE_P(longpzv) == IS_LONG) { php_printf("The value of the long is: %ld\n", Z_LVAL_P(longpzv)); } if (Z_TYPE_PP(doubleppzv) == IS_DOUBLE) { php_printf("The value of the double is: %f\n", Z_DVAL_PP(doubleppzv)); } } String variables, because they contain two attributes, have a pair of macro triplets representing the char* (STRVAL) and int (STRLEN) elements: void display_string(zval *zstr) { if (Z_TYPE_P(zstr) != IS_STRING) { php_printf("The wrong datatype was passed!\n"); return; } PHPWRITE(Z_STRVAL_P(zstr), Z_STRLEN_P(zstr)); } The array data type is stored internally as a HashTable* that can be accessed using the ARRVAL triplet: Z_ARRVAL (zv), Z_ARRVAL_P(pzv), Z_ARRVAL_PP(ppzv). When looking through old code in the PHP core and PECL modules, you might encounter the HASH_OF() macro, which expects a zval*. This macro is generally the equivalent of the Z_ARRVAL_P() macro; however, its use is deprecated and should not be used with new code. Objects represent complex internal structures and have a number of access macros: OBJ_HANDLE, which returns the handle identifier, OBJ_HT for the handler table, OBJCE for the class definition, OBJPROP for the property HashTable, and OBJ_HANDLER for manipulating a specific handler method in the OBJ_HT table. Don't worry about the meaning of these various object macros just yet; they'll be covered in detail in Chapter 10, "PHP4 Objects," and Chapter 11, "PHP5 Objects." Within a zval, a resource data type is stored as a simple integer that can be accessed with the RESVAL tripplet. This integer is passed on to the zend_fetch_resource() function which looks up the registered resource from its numeric identifier. The resource data type will be covered in depth in Chapter 9. Data Creation Now that you've seen how to pull data out of a zval, it's time to create some of your own. Although a zval could be simply declared as a direct variable at the top of a function, it would make the variable's data storage local and it would have to be copied in order to leave the function and reach userspace. Because you will almost always want zvals that you create to reach userspace in some form, you'll want to allocate a block of memory for it and assign that block to a zval* pointer. Once again the "obvious" solution of using malloc (sizeof(zval)) is not the right answer. Instead you'll use another Zend macro: MAKE_STD_ZVAL(pzv). This macro will allocate space in an optimized chunk of memory near other zvals, automatically handle out-of-memory errors (which you'll explore further in the next chapter), and initialize the refcount and is_ref properties of your new zval. Note In addition to MAKE_STD_ZVAL(), you will often see another zval* creation macro used in PHP sources: ALLOC_INIT_ZVAL(). This macro only differs from MAKE_STD_ZVAL() in that it initializes the data type of the zval* to IS_NULL. Once data storage space is available, it's time to populate your brand-new zval with some information. After reading the section on data storage earlier, you're probably all primed to use those Z_TYPE_P() and Z_SOMEVAL_P() macros to set up your new variable. Seems the "obvious" solution right? Again, obviousness falls short! Zend exposes yet another set of macros for setting zval* values. Following are these new macros and how they expand to the ones you're already familiar with. ZVAL_NULL(pvz); Z_TYPE_P(pzv) = IS_NULL; Although this macro doesn't provide any savings over using the more direct version, it's included for completeness. ZVAL_BOOL(pzv, b); Z_TYPE_P(pzv) = IS_BOOL; Z_BVAL_P(pzv) = b ? 1 : 0; ZVAL_TRUE(pzv); ZVAL_BOOL(pzv, 1); ZVAL_FALSE(pzv); ZVAL_BOOL(pzv, 0); Notice that any non-zero value provided to ZVAL_BOOL() will result in a truth value. This makes sense of course, because any non-zero value type casted to Boolean in userspace will exhibit the same behavior. When hardcoding values into internal code, it's considered good practice to explicitly use the value 1 for truth. The macros ZVAL_TRUE () and ZVAL_FALSE() are provided as a convenience and can sometimes lend to code readability. ZVAL_LONG(pzv, l); Z_TYPE_P(pzv) = IS_LONG; Z_LVAL_P(pzv) = l; ZVAL_DOUBLE(pzv, d); Z_TYPE_P(pzv) = IS_DOUBLE; Z_DVAL_P(pzv) = d; The basic scalar macros are as simple as they come. Set the zval's type, and assign a numeric value to it. ZVAL_STRINGL(pzv,str,len,dup); Z_TYPE_P(pzv) = IS_STRING; Z_STRLEN_P(pzv) = len; if (dup) { Z_STRVAL_P(pzv) = estrndup(str, len + 1); } else { Z_STRVAL_P(pzv) = str; } ZVAL_STRING(pzv, str, dup); ZVAL _STRINGL(pzv, str, strlen(str), dup); Here's where zval creation starts to get interesting. Strings, like arrays, objects, and resources, need to allocate additional memory for their data storage. You'll explore the pitfalls of memory management in the next chapter; for now, just notice that a dup value of 1 will allocate new memory and copy the string's contents, while a value of 0 will simply point the zval at the already existing string data. ZVAL_RESOURCE(pzv, res); Z_TYPE_P(pzv) = IS_RESOURCE; Z_RESVAL_P(pzv) = res; Recall from earlier that a resource is stored in a zval as a simple integer that refers to a lookup table managed by Zend. The ZVAL_RESOURCE() macro therefore acts much like the ZVAL_LONG() macro, but using a different type. Data Storage You've used PHP from the userspace side of things, so you're already familiar with the concept of an array. Any number of PHP variables (zvals) can be dropped into a single container (array) and be given names (labels) in the form of numbers or strings. What's hopefully not surprising is that every single variable in a PHP script can be found in an array. When you create a variable, by assigning a value to it, Zend stores that value into an internal array known as a symbol table. One symbol table, the one that defines the global scope, is initialized upon request startup just before extension RINIT methods are called, and then destroyed after script completion and subsequent RSHUTDOWN methods have executed. When a userspace function or object method is called, a new symbol table is allocated for the life of that function or method and is defined as the active symbol table. If current script execution is not in a function or method, the global symbol table is considered active. Taking a look at the execution globals structure (defined in Zend/zend_globals.h), you'll find the following two elements defined: struct _zend_execution_globals { ... HashTable symbol_table; HashTable *active_symbol_table; ... }; The symbol_table, accessed as EG(symbol_table), is always the global variable scope much like the $GLOBALS variable in userspace always corresponds to the global scope for PHP scripts. In fact, the $GLOBALS variable is just a userspace wrapper around the EG(symbol_table) variable seen from the internals. The other part of this pair, active_symbol_table, is similarly accessed as EG(active_symbol_table), and represents whatever variable scope is active at the time. The key difference to notice here is that EG(symbol_table), unlike nearly every other HashTable you'll use and encounter while working with the PHP and Zend APIs, is a direct variable. Nearly all functions that operate on HashTables, however, expect an indirect HashTable* as their parameter. Therefore, you'll have to dereference EG (symbol_table) with an ampersand when using it. Consider the following two code blocks, which are functionally identical: In PHP: In C: { zval *fooval; MAKE_STD_ZVAL(fooval); ZVAL_STRING(fooval, "bar", 1); ZEND_SET_SYMBOL(EG(active_symbol_table), "foo", fooval); } First, a new zval was allocated using MAKE_STD_ZVAL() and its value was initialized to the string "bar". Then a new macro, which roughly equates with the assignment operator (=), combines that value with a label (foo), and adds it to the active symbol table. Because no userspace function is active at the time, EG(active_symbol_table) == &EG (symbol_table), which ultimately means that this variable is stored in the global scope. Data Retrieval In order to retrieve a variable from userspace, you'll need to look in whatever symbol table it's stored in. The following code segment shows using the zend_hash_find() function for this purpose: { zval **fooval; if (zend_hash_find(EG(active_symbol_table), "foo", sizeof("foo"), (void**)&fooval) == SUCCESS) { php_printf("Got the value of $foo!"); } else { php_printf("$foo is not defined."); } } A few parts of this example should look a little funny. Why is fooval defined to two levels of indirection? Why is sizeof() used for determining the length of "foo"? Why is &fooval, which would evaluate to a zval***, cast to a void**? If you asked yourself all three of these questions, pat yourself on the back. First, it's worth knowing that HashTables aren't only used for userspace variables. The HashTable structure is so versatile that it's used all over the engine and in some cases it makes perfect sense to want to store a non-pointer value. A HashTable bucket is a fixed size, however, so in order to store data of any size, a HashTable will allocate a block of memory to wrap the data being stored. In the case of variables, it's a zval* being stored, so the HashTable storage mechanism allocates a block of memory big enough to hold a pointer. The HashTable's bucket uses that new pointer to carry around the zval* and you effectively wind up with a zval** inside the HashTable. The reason for storing a zval* when HashTables are clearly capable of storing a full zval will be covered in the next chapter. When trying to retrieve that data, the HashTable only knows that it has a pointer to something. In order to populate that pointer into a calling function's local storage, the calling function will naturally dereference the local pointer, resulting in a variable of indeterminate type with two levels of indirection (such as void**). Knowing that your "indeterminate type" in this case is zval*, you can see where the type being passed into zend_hash_find() will look different to the compiler, having three levels of indirection rather than two. This is done on purpose here so a simple typecast is added to the function call to silence compiler warnings. The reason sizeof() was used in the previous example was to include the terminating NULL in the "foo" constant used for the variable's label. Using 4 here would have worked equally well; however, it is discouraged because changes to the label name may affect its length, and it's much easier to find places where the length is hard-coded if it contains the label text that's being replaced anyway. (strlen("foo")+1) could have also solved this problem; however, some compilers do not optimize this step and the resulting binary might end up performing a pointless string length loopwhat would be the fun in that? If zend_hash_find() locates the item you're looking for, it populates the dereferenced pointer provided with the address of the bucket pointer it allocated when the requested data was first added to the HashTable and returns an integer value matching the SUCCESS constant. If zend_hash_find() cannot locate the data, it leaves the pointer untouched and returns an integer value matching the FAILURE constant. In the case of userspace variables stored in a symbol table, SUCCESS or FAILURE effectively means that the variable is or is not set. Data Conversion Now that you can fetch variables from symbol tables, you'll want to do something with them. A direct, but painful, approach might be to examine the variable and perform a specific action depending on type. A simple switch statement like the following might work: void display_zval(zval *value) { switch (Z_TYPE_P(value)) { case IS_NULL: /* NULLs are echoed as nothing */ break; case IS_BOOL: if (Z_BVAL_P(value)) { php_printf("1"); } break; case IS_LONG: php_printf("%ld", Z_LVAL_P(value)); break; case IS_DOUBLE: php_printf("%f", Z_DVAL_P(value)); break; case IS_STRING: PHPWRITE(Z_STRVAL_P(value), Z_STRLEN_P(value)); break; case IS_RESOURCE: php_printf("Resource #%ld", Z_RESVAL_P(value)); break; case IS_ARRAY: php_printf("Array"); break; case IS_OBJECT: php_printf("Object"); break; default: /* Should never happen in practice, * but it's dangerous to make assumptions */ php_printf("Unknown"); break; } } Yeah, right, simple. Compared with the ease of it's not hard to imagine this code becoming unmanageable. Fortunately, the very same routine used by the engine when a script performs the action of echoing a variable is also available to an extension or embed environment. Using one of the convert_to_*() functions exported by Zend, this sample could be reduced to simply: void display_zval(zval *value) { convert_to_string(value); PHPWRITE(Z_STRVAL_P(value), Z_STRLEN_P(value)); } As you can probably guess, there are a collection of functions for converting to most of the data types. One notable exception is convert_to_resource(), which wouldn't make sense because resources are, by definition, incapable of mapping to a real userspace expressible value. It's good if you're worried about the fact that the convert_to_string() call irrevocably changed the value of the zval passed into the function. In a real code segment this would typically be a bad idea, and of course it's not what the engine does when echoing a variable. In the next chapter you'll take a look at ways of using the convert functions to safely change a value's contents to something usable without destroying its existing contents. Summary In this chapter you looked at the internal representation of PHP variables. You learned to distinguish types, set and retrieve values, and add variables into symbol tables and fetch them back out. In the next chapter you'll build on this knowledge by learning how to make copies of a zval, how to destroy them when they're no longer needed, and most importantly, how to avoid making copies when you don't need to. You'll also take a look at Zend's per-request memory management layer, and examine persistent versus non- persistent allocations. By the end of the next chapter you'll have the solid foundation necessary to begin creating a working extension and experimenting with your own code variations. Chapter 3. Memory Management ONE OF THE MOST JARRING DIFFERENCES BETWEEN A MANAGED language like PHP, and an unmanaged language like C is control over memory pointers. Memory In PHP, populating a string variable is as simple as and the string can be freely modified, copied, and moved around. In C, on the other hand, although you could start with a simple static string such as char *str = "hello world";, that string cannot be modified because it lives in program space. To create a manipulable string, you'd have to allocate a block of memory and copy the contents in using a function such as strdup(). { char *str; str = strdup("hello world"); if (!str) { fprintf(stderr, "Unable to allocate memory!"); } } For reasons you'll explore through the course of this chapter, the traditional memory management functions (malloc (), free(), strdup(), realloc(), calloc(), and so on) are almost never used directly by the PHP source code. Free the Mallocs Memory management on nearly all platforms is handled in a request and release fashion. An application says to the layer above it (usually the operating system) "I want some number of bytes of memory to use as I please." If there is space available, the operating system offers it to the program and makes a note not to give that chunk of memory out to anyone else. When the application is done using the memory, it's expected to give it back to the OS so that it can be allocated elsewhere. If the program doesn't give the memory back, the OS has no way of knowing that it's no longer being used and can be allocated again by another process. If a block of memory is not freed, and the owning application has lost track of it, then it's said to have "leaked" because it's simply no longer available to anyone. In a typical client application, small infrequent leaks are sometimes tolerated with the knowledge that the process will end after a short period of time and the leaked memory will be implicitly returned to the OS. This is no great feat as the OS knows which program it gave that memory to, and it can be certain that the memory is no longer needed when the program terminates. With long running server daemons, including web servers like Apache and by extension mod_php, the process is designed to run for much longer periods, often indefinitely. Because the OS can't clean up memory usage, any degree of leakageno matter how smallwill tend to build up over time and eventually exhaust all system resources. Consider the userspace stristr() function; in order to find a string using a caseinsensitive search, it actually creates a lowercase copy of both the haystack and the needle, and then performs a more traditional case-sensitive search to find the relative offset. After the offset of the string has been located, however, it no longer has use for the lowercase versions of the haystack and needle strings. If it didn't free these copies, then every script that used stristr() would leak some memory every time it was called. Eventually the web server process would own all the system memory, but not be able to use it. The ideal solution, I can hear you shouting, is to write good, clean, consistent code, and that's absolutely true. In an environment like the PHP interpreter, however, that's only half the solution. Error Handling In order to provide the ability to bail out of an active request to userspace scripts and the extension functions they rely on, a means needs to exist to jump out of an active request entirely. The way this is handled within the Zend Engine is to set a bailout address at the beginning of a request, and then on any die() or exit() call, or on encountering any critical error (E_ERROR) perform a longjmp() to that bailout address. Although this bailout process simplifies program flow, it almost invariably means that resource cleanup code (such as free() calls) will be skipped and memory could get leaked. Consider this simplified version of the engine code that handles function calls: void call_function(const char *fname, int fname_len TSRMLS_DC) { zend_function *fe; char *lcase_fname; /* PHP function names are case-insensitive * to simplify locating them in the function tables * all function names are implicitly * translated to lowercase */ lcase_fname = estrndup(fname, fname_len); zend_str_tolower(lcase_fname, fname_len); if (zend_hash_find(EG(function_table), lcase_fname, fname_len + 1, (void **)&fe) == FAILURE) { zend_execute(fe->op_array TSRMLS_CC); } else { php_error_docref(NULL TSRMLS_CC, E_ERROR, "Call to undefined function: %s()", fname); } efree(lcase_fname); } When the php_error_docref() line is encountered, the internal error handler sees that the error level is critical and invokes longjmp() to interrupt the current program flow and leave call_function() without ever reaching the efree (lcase_fname) line. Again, you're probably thinking that the efree() line could just be moved above the zend_error() line, but what about the code that called this call_function() routine in the first place? Most likely fname itself was an allocated string and you can't free that before it has been used in the error message. Note The php_error_docref() function is an internals equivalent to TRigger_error(). The first parameter is an optional documentation reference that will be appended to docref. root if such is enabled in php.ini. The third parameter can be any of the familiar E_* family of constants indicating severity. The fourth and later parameters follow printf() style formatting and variable argument lists. Zend Memory Manager The solution to memory leaks during request bailout is the Zend Memory Management (ZendMM) layer. This portion of the engine acts in much the same way the operating system would normally act, allocating memory to calling applications. The difference is that it is low enough in the process space to be request-aware so that when one request dies, it can perform the same action the OS would perform when a process dies. That is, it implicitly frees all the memory owned by that request. Figure 3.1 shows ZendMM in relation to the OS and the PHP process. Figure 3.1. Zend Memory Manager replaces system calls for per-request allocations. In addition to providing implicit memory cleanup, ZendMM also controls the perrequest memory usage according to the php.ini setting: memory_limit. If a script attempts to ask for more memory than is available to the system as a whole, or more than is remaining in its per-request limit, ZendMM will automatically issue an E_ERROR message and begin the bailout process. An added benefit of this is that the return value of most memory allocation calls doesn't need to be checked because failure results in an immediate longjmp() to the shutdown part of the engine. Hooking itself in between PHP internal code and the OS's actual memory management layer is accomplished by nothing more complex than requiring that all memory allocated internally is requested using an alternative set of functions. For example, rather than allocate a 16-byte block of memory using malloc(16), PHP code will use emalloc (16). In addition to performing the actual memory allocation task, ZendMM will flag that block with information concerning what request it's bound to so that when a request bails out, ZendMM can implicitly free it. Often, memory needs to be allocated for longer than the duration of a single request. These types of allocations, called persistent allocations because they persist beyond the end of a request, could be performed using the traditional memory allocators because these do not add the additional per-request information used by ZendMM. Sometimes, however, it's not known until runtime whether a particular allocation will need to be persistent or not, so ZendMM exports a set of helper macros that act just like the other memory allocation functions, but have an additional parameter at the end to indicate persistence. If you genuinely want a persistent allocation, this parameter should be set to one, in which case the request will be passed through to the traditional malloc() family of allocators. If runtime logic has determined that this block does not need to be persistent however, this parameter may be set to zero, and the call will be channeled to the perrequest memory allocator functions. For example, pemalloc(buffer_len, 1) maps to malloc(buffer_len), whereas pemalloc(buffer_len, 0) maps to emalloc(buffer_len) using the following #define in Zend/zend_alloc.h: #define pemalloc(size, persistent) \ ((persistent)?malloc(size): emalloc(size)) Each of the allocator functions found in ZendMM can be found below along with their more traditional counterparts. Table 3.1 shows each of the allocator functions supported by ZendMM and their e/pe counterparts: Table 3.1. Traditional versus PHP-specific allocators Allocator funtion e/pe counterpart void *malloc(size_t count); void *emalloc(size_t count); void *pemalloc(size_t count, char persistent); void *calloc(size_t count); void *ecalloc(size_t count); void *pecalloc(size_t count, char persistent); void *realloc(void *ptr, size_t count); void *erealloc(void *ptr, size_t count); void *perealloc(void *ptr, size_t count, char persistent); void *strdup(void *ptr); void *estrdup(void *ptr); void *pestrdup(void *ptr, char persistent); void free(void *ptr); void efree(void *ptr); You'll notice that even pefree() requires the persistency flag. This is because at the time that pefree() is called, it doesn't actually know if ptr was a persistent allocation or not. Calling free() on a non-persistent allocation could lead to a messy double free, whereas calling efree() on a persistent one will most likely lead to a segmentation fault as the memory manager attempts to look for management information that doesn't exist. Your code is expected to remember whether the data structure it allocated was persistent or not. In addition to the core set of allocator functions, a few additional and quite handy ZendMM specific functions exist: void *estrndup(void *ptr, int len); Allocate len+1 bytes of memory and copy len bytes from ptr to the newly allocated block. The behavior of estrndup () is roughly the following: void *estrndup(void *ptr, int len) { char *dst = emalloc(len + 1); memcpy(dst, ptr, len); dst[len] = 0; return dst; } The terminating NULL byte implicitly placed at the end of the buffer here ensures that any function that uses estrndup() for string duplication doesn't need to worry about passing the resulting buffer to a function that expects NULL terminated strings such as printf(). When using estrndup() to copy non-string data, this last byte is essentially wasted, but more often than not, the convenience outweighs the minor inefficiency. void *safe_emalloc(size_t size, size_t count, size_t addtl); void *safe_pemalloc(size_t size, size_t count, size_t addtl, char persistent); The amount of memory allocated by these functions is the result of ((size * count) + addtl). You may be asking, "Why an extra function at all? Why not just use emalloc/pemalloc and do the math myself?"The reason comes in the name: safe. Although the circumstances leading up to it would be exceedingly unlikely, it's possible that the end result of such an equation might overflow the integer limits of the host platform. This could result in an allocation for a negative number of bytes, or worse, a positive number that is significantly smaller than what the calling program believed it requested. safe_emalloc() avoids this type of trap by checking for integer overflow and explicitly failing if such an overflow occurs. Note Not all memory allocation routines have a p* counterpart. For example, there is no pestrndup(), and safe_pemalloc() does not exist prior to PHP 5.1. Occasionally you'll need to work around these gaps in the ZendAPI. void pefree(void *ptr, char persistent); Reference Counting Careful memory allocation and freeing is vital to the long term performance of a multirequest process like PHP, but it's only half the picture. In order for a server that handles thousands of hits per second to function efficiently, each request needs to use as little memory as possible and perform the bare minimum amount of unnecessary data copying. Consider the following PHP code snippet: After the first call, a single variable has been created, and a 12 byte block of memory has been assigned to it holding the string 'Hello World' along with a trailing NULL. Now look at the next two lines: $b is set to the same value as $a, and then $a is unset (freed). If PHP treated every variable assignment as a reason to copy variable contents, an extra 12 bytes would need to be copied for the duplicated string and additional processor load would be consumed during the data copy. This action starts to look ridiculous when the third line has come along and the original variable is unset making the duplication of data completely unnecessary. Now take that one further and imagine what could happen when the contents of a 10MB file are loaded into two variables. That could take up 20MB where 10 would have been sufficient. Would the engine waste so much time and memory on such a useless endeavor? You know PHP is smarter than that. Remember that variable names and their values are actually two different concepts within the engine. The value itself is a nameless zval* holding, in this case, a string value. It was assigned to the variable $a using zend _hash_add (). What if two variable names could point to the same value? { zval *helloval; MAKE_STD_ZVAL(helloval); ZVAL_STRING(helloval, "Hello World", 1); zend_hash_add(EG(active_symbol_table), "a", sizeof("a"), &helloval, sizeof(zval*), NULL); zend_hash_add(EG(active_symbol_table), "b", sizeof("b"), &helloval, sizeof(zval*), NULL); } At this point you could actually inspect either $a or $b and see that they both contain the string "Hello World". Unfortunately, you then come to the third line: unset($a);. In this situation, unset() doesn't know that the data pointed to by the $a variable is also in use by another one so it just frees the memory blindly. Any subsequent accesses to $b will be looking at already freed memory space and cause the engine to crash. Hint: You don't want to crash the engine. This is solved by the third of a zval's four members: refcount. When a variable is first created and set, its refcount is initialized to 1 because it's assumed to only be in use by the variable it is being created for. When your code snippet gets around to assigning helloval to $b, it needs to increase that refcount to 2 because the value is now "referenced" by two variables: { zval *helloval; MAKE_STD_ZVAL(helloval); ZVAL_STRING(helloval, "Hello World", 1); zend_hash_add(EG(active_symbol_table), "a", sizeof("a"), &helloval, sizeof(zval*), NULL); ZVAL_ADDREF(helloval); zend_hash_add(EG(active_symbol_table), "b", sizeof("b"), &helloval, sizeof(zval*), NULL); } Now when unset() deletes the $a copy of the variable, it can see from the refcount parameter that someone else is interested in that data and it should actually just decrement the refcount and otherwise leave it alone. Copy on Write Saving memory through refcounting is a great idea, but what happens when you only want to change one of those variables? Consider this code snippet: Looking at the logic flow you would of course expect $a to still equal 1, and $b to now be 6. At this point you also know that Zend is doing its best to save memory by having $a and $b refer to the same zval after the second line, so what happens when the third line is reached and $b must be changed? The answer is that Zend looks at refcount, sees that it's greater than one and separates it. Separation in the Zend engine is the process of destroying a reference pair and is the opposite of the process you just saw: zval *get_var_and_separate(char *varname, int varname_len TSRMLS_DC) { zval **varval, *varcopy; if (zend_hash_find(EG(active_symbol_table), varname, varname_len + 1, (void**)&varval) == FAILURE) { /* Variable doesn't actually exist fail out */ return NULL; } if ((*varval)->refcount < 2) { /* varname is the only actual reference, * no separating to do */ return *varval; } /* Otherwise, make a copy of the zval* value */ MAKE_STD_ZVAL(varcopy); varcopy = *varval; /* Duplicate any allocated structures within the zval* */ zval_copy_ctor(varcopy); /* Remove the old version of varname * This will decrease the refcount of varval in the process */ zend_hash_del(EG(active_symbol_table), varname, varname_len + 1); /* Initialize the reference count of the * newly created value and attach it to * the varname variable */ varcopy->refcount = 1; varcopy->is_ref = 0; zend_hash_add(EG(active_symbol_table), varname, varname_len + 1, &varcopy, sizeof(zval*), NULL); /* Return the new zval* */ return varcopy; } Now that the engine has a zval* that it knows is only owned by the $b variable, it can convert it to a long and increment it by 5 according to the script's request. Change on Write The concept of reference counting also creates a new possibility for data manipulation in the form of what userspace scripters actually think of in terms of "referencing". Consider the following snippet of userspace code: Being experienced in the ways of PHP code, you'll instinctively recognize that the value of $a will now be 6 even though it was initialized to 1 and never (directly) changed. This happens because when the engine goes to increment the value of $b by 5, it notices that $b is a reference to $a and says, "It's okay for me to change the value without separating it, because I want all reference variables to see the change." But how does the engine know? Simple, it looks at the fourth and final element of the zval struct: is_ref. This is just a simple on/off bit value that defines whether the value is, in fact, part of a userspace-style reference set. In the previous code snippet, when the first line is executed, the value created for $a gets a refcount of 1, and an is_ref value of 0 because its only owned by one variable ($a), and no other variables have a change on write reference to it. At the second line, the refcount element of this value is incremented to 2 as before, except that this time, because the script included an ampersand to indicate full-reference, the is_ref element is set to 1. Finally, at the third line, the engine once again fetches the value associated with $b and checks if separation is necessary. This time the value is not separated because of a check not included earlier. Here's the refcount check portion of get_var_and_separate() again, with an extra condition: if ((*varval)->is_ref || (*varval)->refcount < 2) { /* varname is the only actual reference, * or it's a full reference to other variables * either way: no separating to be done */ return *varval; } This time, even though the refcount is 2, the separation process is short-circuited by the fact that this value is a full reference. The engine can freely modify it with no concern about the values of other variables appearing to change magically on their own. Separation Anxiety With all this copying and referencing, there are a couple of combinations of events that can't be handled by clever manipulation of is_ref and refcount. Consider this block of PHP code: Here you have a single value that needs to be associated with three different variables, two in a change-on-write full reference pair, and the third in a separable copy-on-write context. Using just is_ref and refcount to describe this relationship, what values will work? The answer is: none. In this case, the value must be duplicated into two discrete zval*s, even though both will contain the exact same data (see Figure 3.2). Figure 3.2. Forced separation on reference. Similarly, the following code block will cause the same conflict and force the value to separate into a copy (see Figure 3.3). Figure 3.3. Forced separation on copy. Notice here that in both cases here, $b is associated with the original zval object because at the time separation occurs, the engine doesn't know the name of the third variable involved in the operation. Summary PHP is a managed language. On the userspace side of things, this careful control of resources and memory means easier prototyping and fewer crashes. After you delve under the hood though, all bets are off and it's up to the responsible developer to maintain the integrity of the runtime environment. Chapter 4. Setting Up a Build Environment BY NOW YOU PROBABLY ALREADY HAVE A VERSION OF PHP installed on at least one system and you've been using it to develop web-based applications. You might have downloaded the Win32 build from php.net to run on IIS or Apache for Windows, or used your *nix distribution's (Linux, BSD, or another POSIX-compliant distribution) packaging system to install binaries created by a third party. Building PHP Unless you downloaded the source code as a tarball from php.net and compiled it yourself, however, you're most likely missing at least one component. *nix Tools The first piece of equipment in any C developer's toolkit is an actual C compiler. There's a good chance your distribution included one by default, and a very good chance that it included gcc (GNU Compiler Collection). You can easily check whether or not a compiler is installed by issuing gcc version or cc version, one of which will hopefully run successfully and respond with version information for the compiler installed. If you don't have a compiler yet, check with your distribution's website for instructions on downloading and installing gcc. Typically this will amount to downloading an .rpm or .deb file and issuing a command to install it. Depending on your specific distribution, one of the following commands may simply work out of the box without requiring further research: urpmi gcc, apt-get install gcc, pkg-add -r gcc, or perhaps emerge gcc. In addition to a compiler you'll also need the following programs and utilities: make, autoconf, automake, and libtool. These utilities can be installed using the same per-distribution methods you used for gcc, or they can be compiled from their source using tarballs available from gnu.org. For best results, libtool version 1.4.3 and autoconf 2.13 with automake version 1.4 or 1.5 are recommended. Using newer versions of these packages will quite probably work as well, but only these versions are certified. If you plan on using CVS to check out the latest and most up-to-date version of PHP to develop with, you'll also need bison and flex for constructing the language parser. Like the others, these two packages may either be installed using your distribution's packaging system, or downloaded from gnu.org and compiled from source. If you choose to go the CVS route, you'll also need the cvs client itself. Again, this may be installed by your distribution, or downloaded and compiled. Unlike the other packages, however, this one is found at cvshome.org. Win32 Tools The Win32/PHP5 build system is a complete rewrite and represents a significant leap forward from the PHP4 build system. Instructions for compiling PHP4 under Windows are available on php.net, only the PHP5 build systemwhich requires Windows 2000, Windows 2003, or Windows XPwill be discussed here. First, you'll need to grab libraries and development headers used by many of the core PHP extensions. Fortunately, many of these files are redistributed from php.net as a single .zip file located at http://www.php.net/extra/win32build.zip. Create a new directory named C:\PHPDEV\ and unzip win32build.zip using your favorite zip management program into this location. The folder structure contained in the zip file will create a subdirectory, C:\PHPDEV\win32build, which will contain further subfolders and files. It's not necessary to name your root folder PHPDEV; the only important thing is that win32build and the PHP source tree are both children of the same parent folder. Next you'll need a compiler. If you've already got Visual C++ .NET you have what you need; otherwise, download Visual C++ Express from Microsoft at http://lab.msdn.microsoft.com/express/. The installer, once you've downloaded and run it, will display the usual welcome, EULA (End-User License Agreement), and identification dialogs. Read through these screens and proceed using the Next buttons after you've agreed to the terms of the EULA and entered any appropriate information. Installation location is of course up to you, and a typical installation will work just fine. If you'd like to create a leaner installation, you may deselect the three optional componentsGUI, MSDN, and SQL Server. The final package is the Platform SDK, also available for download from Microsoft at http://www.microsoft.com/downloads/details.aspx?FamilyId=A55B6B43-E24F-4EA3-A93E-40C0EC4F68E5. The site currently lists three download options: PSDK-x86.exe, PSDK-ia64.exe, and PSDK-amd64.exe. These options refer to x86 compatible 32bit, Intel64bit, and AMD64bit processors respectively. If you're not sure which one applies to your processor, select PSDK-x86.exe, which should work cleanly, albeit less efficiently, with both 64 bit variants. As before, proceed through the first few screens as you would with any other installer package until you are prompted to select between Typical and Custom installation. A Typical installation includes the Core SDK package, which is sufficient for the purposes of building PHP. Other packages can be deselected by choosing a Custom installation, but if you have the hard disk space to spare, you might as well install it all. The other packages may come in handy later on. So unless you're byte conscious, select Typical and proceed through the next couple of standard issue screens until the installer begins copying and registering files. This process should take a few minutes so grab some popcorn. Once installation is complete you'll have a new item on your Start menuMicrosoft Platform SDK for Windows Server 2003 SP1. Obtaining the PHP Source Code When downloading PHP, you have a few options. First, if your distribution supports the concept, you can download it from them using a command such as apt-get source php5. The advantage to this approach is that your distribution might have some known quirks that require modifications to the PHP source code. By downloading from them, you can be certain that these quirks have been patched for and your builds will have fewer issues. The disadvantage is that most distributions lag weeks, if not months, behind the official PHP releases, making the version you download outdated before it ever reaches your hard drive. The next option, which is generally preferred, is to download php-x.y.z.tar.gz (where x.y.z is the currently released version) from www.php.net. This release of PHP will have been tested by countless other PHP users around the globe and will be quite up-to-date without pushing the absolute bleeding edge. You could also go a small step further and download a snapshot tarball from snaps.php.net. On this site, the latest revisions of all the source code files in the PHP repository are packaged up every few hours. An accidental commit by a core developer might make one of these bundles unusable occasionally, but if you need the latest PHP 6.0 features before it has been officially released, this is the easier place to go looking. Lastly, you can use CVS to fetch the individual files that make up the PHP source tree directly from the development repository used by the PHP core development team. For the purposes of extension and embedding development, this offers no significant advantage over using an official release tarball or a snapshot. However, if you plan to publish your extension or other application in a CVS repository, it will be helpful to be familiar with the checkout process. Performing a CVS Checkout The entire PHP project, from the Zend Engine and the core to the smallest PEAR component, is hosted at cvs.php.net. From here, hundreds of developers develop and maintain the bits and pieces that make up the whole of PHP and its related projects. Among the other parts housed here, the core PHP package is available in the php-src module and can be downloaded to a workstation with two simple commands. First you'll want to introduce yourself to the php.net CVS server by logging in. $ cvs -d:pserver:cvsread@cvs.php.net:/repository login The cvsread account is a public use (read-only) account with a password of phpfian homage to a much earlier version of what we know today as PHP. Once logged in, the PHP sources may be checked out using $ cvs -d:pserver:cvsread@cvs.php.net:/repository co php-src Variations of this command can be used to check out specific versions of PHP going back as far as PHP2. For more information, refer to the anonymous cvs instructions at http://www.php.net/anoncvs. Configuring PHP for Development As covered in Chapter 1, there are two special ./configure switches you'll want to use when building a development-friendly PHP whether you plan to write an extension to PHP or embed PHP in another application. These two switches should be used in addition to the other switches you'd normally use while building PHP. enable -debug The enable debug switch turns on a few critical functions within the PHP and Zend source trees. First, it enables reporting of leaked memory at the end of every request. Recall from Chapter 3, "Memory Management," that the Zend Memory Manager will implicitly free per-request memory that was allocated but not explicitly freed prior to script end. By running a series of aggressive regression tests against newly developed code, leak points can be easily spotted and plugged prior to any public release. Take a look at the following code snippet: void show_value(int n) { char *message = emalloc(1024); sprintf(message, "The value of n is %d\n", n); php_printf("%s", message); } If thisadmittedly sillyblock of code were executed during the course of a PHP request, it would leak 1,024 bytes of memory. Under ordinary circumstances ZendMM would quietly free that block at the end of script execution and not complain. With enable-debug turned on, however, developers are treated to an error message giving them a clue about what needs to be addressed. /cvs/php5/ext/sample/sample.c(33) : Freeing 0x084504B8 (1024 bytes), script=- === Total 1 memory leaks detected === This short but informative message tells you that ZendMM had to clean up after your mess and identifies exactly from where the lost memory block was allocated. Using this information, it's a simple matter to open the file, scroll down to the line in question, and add an appropriate call to efree(message) at the end of the function. Memory leaks aren't the only problems you'll run into that are hard to track down, of course. Sometimes the problems are much more insidious, and far less telling. Let's say you've been working all night on a big patch that requires hitting a dozen files and changing a ton of code. When everything is in place, you confidently issue make, try out a sample script, and are treated to the following output: $ sapi/cli/php -r 'myext_samplefunc();' Segmentation Fault Well...that's just swell, but where could the problem be? Looking at your implementation of myext_samplefunc() doesn't reveal any obvious clues, and running it through gdb only shows a bunch of unknown symbols. Once again, enable-debug lends a hand. By adding this switch to ./configure, the resulting PHP binary will contain all the debugging symbols needed by gdb or another core file examination program to show you where the problem occurred. Rebuilding with this option, and triggering the crash through gdb, you're now treated to something like the following: #0 0x1234567 php_myext_find_delimiter(str=0x1234567 "foo@#(FHVN)@\x98\xE0...", strlen=3, tsrm_ls=0x1234567) p = strchr(str, ','); Suddenly the cause is clear. The str string is not a NULL terminated string, as evidenced by the garbage at the end, but a nonbinary-safe function was used on it. The underlying strchr() implementation tried scanning past the end of str's allocated memory and got into regions it didn't own, causing a segfault. A quick replacement using memchr() and the strlen parameter will prevent the crash. enable -maintainer -zts This second ./configure option forces PHP to be built with the Thread Safe Resource Manager(TSRM)/Zend Thread Safety(ZTS) layer enabled. This switch will add complexity and processing time when it's not otherwise needed, but for the purposes of development, you'll find that's a good thing. For a detailed description of what ZTS is and why you want to develop with it turned on, refer to Chapter 1. enable -embed One last ./configure switch of importance is only necessary if you'll be embedding PHP into another application. This option identifies that libphp5.so should be built as the selected SAPI in the same way that with-apxs will build mod_php5.so for embedding PHP specifically into Apache. Compiling on UNIX Now that you've got all the necessary tools together, you've downloaded the PHP source tarball, and you've identified all the necessary ./configure switches, it's time to actually compile PHP. Assuming that you've downloaded php-5.1.0.tar.gz to your home directory, you'll enter the following series of commands to unpack the tarball and switch to the PHP source directory: [/home/sarag]$ tar -zxf php-5.1.0.tar.gz [/home/sarag]$ cd php-5.1.0 If you're using a tool other than GNU tar, you might need to use a slightly different command: [/home/sarag]$ gzip -d php-5.1.0.tar.gz | tar -xf - Now, issue the ./configure command with the required switches and any other options you want enabled or disabled: [/home/sarag/php-5.1.0]$ ./configure enable-debug \ enable-maintainer-zts disable-cgi enable-cli \ disable-pear disable-xml disable-sqlite \ without-mysql enable-embed After a lengthy process, during which dozens of lines of informational text will scroll up your screen, you'll be ready to start the compilation process: [/home/sarag]$ make all install At this point, get up and grab a cup of coffee. Compile times can range from anywhere between a couple minutes on a high-end powerhouse system to half an hour on an old overloaded 486. When the build process has finished, you'll have a functional build of PHP with all the right configuration ready for use in development. Compiling on Win32 As with the UNIX build, the first step to preparing a Windows build is to unpack the source tarball. By default, Windows doesn't know what to do with a .tar.gz file. In fact, if you downloaded PHP using Internet Explorer, you probably noticed that it changed the name of the tarball file to php-5.1.0.tar.tar. This isn't IE craving a plate of fish sticks ordepending on who you aska bug, it's a "feature." Start by renaming the file back to php-5.1.0.tar.gz (if necessary). If you have a program installed that is capable of reading .tar.gz files, you'll notice the icon immediately change. You can now double-click on the file to open up the decompression program. If the icon doesn't change, or if nothing happens when you double-click the icon, it means that you have no tar/gzip compatible decompression program installed. Check your favorite search engine for WinZIP, WinRAR, or any other application that is suitable for extracting .tar.gz archives. Whatever decompression program you use, have it decompress php-5.1.0.tar.gz to the root development folder you created earlier. This section will assume you have extracted it to C:\PHPDEV\ which, because the zip file contains a folder structure, will result in the source tree residing in C:\PHPDEV\php-5.1.0. After it's unpacked, open up a build environment window by choosing Start, All Programs, Microsoft Platform SDK for Windows Server 2003 SP1, Open Build Environment Window, Windows 2000 Build Environment, Set Windows 2000 Build Environment (Debug). The specific path to this shortcut might be slightly different depending on the version of the Platform SDK you have installed and the target platform you will be building for (2000, XP, 2003). A simple command prompt window will open up stating the target build platform. This command prompt has most, but not all, necessary environment variables set up. You'll need to run one extra batch file in order to let the PHP build system know where Visual C++ Express is. If you accepted the default installation location this batch file will be located at C:\Program Files\Microsoft Visual Studio 8\VC\bin\vcvars32.bat. If you can't find vcvars32.bat, check the same directoryor its parentfor vcvarsall.bat. Just be sure to run it inside the same command prompt window you just opened. It will set additional environment variables that the build process will need. Now, change the directory to the location where you unpacked PHP C:\PHPDEV\php-5.1.0and run buildconf.bat. C:\Program Files\Microsoft Platform SDK> cd \PHPDEV\php-5.1.0 C:\PHPDEV\php-5.1.0> buildconf.bat If all is going well so far you'll see the following two lines of output: Rebuilding configure.js Now run 'cscript /nologo configure.js help' At this point, you can do as the message says and see what options are available. The enable-maintainer-zts option is not necessary here because the Win32 build automatically assumes that ZTS will be required by any SAPI. If you wanted to turn it off, you could issue disable-zts, but that's not the case here because you're building for a development environment anyway. In this example I've removed a few other extensions that aren't relevant to extension and embedding development for the sake of simplicity. If you'd like to rebuild PHP using additional extensions, you'll need to hunt down the libraries on which they depend. C:\php-5.1.0> cscript /nologo configure.js without-xml without-wddx \ without-simplexml without-dom without-libxml disable-zlib \ without-sqlite disable-odbc disable-cgi enable-cli \ enable-debug without-iconv Again, a stream of informative output will scroll by, followed by instructions to execute the final command: C:\php-5.1.0> nmake Finally, a working build of PHP compiled for the Win32 platform. Summary Now that PHP is installed with all the right options, you're ready to move on to generating a real, functional extension. In the next few chapters you'll be introduced to the anatomy of a PHP extension. Even if you only plan on embedding PHP into your application without extending the language any, you'll want to read through this section because it explains the mechanics of interfacing with the PHP environment in full detail. Chapter 5. Your First Extension EVERY PHP EXTENSION IS BUILT FROM AT LEAST TWO FILES: a configuration file, which tells the compiler what files to build and what external libraries will be needed, and at least one source file, which does the actual work. Anatomy of an Extension In practice, there is typically a second or third configuration file and one or more header files as well. For your first extension, you'll be working with one of each of these types of files and adding from there. Configuration File To start out, create a directory under the ext/ dir in your PHP source tree called "sample". In reality this new directory could be placed anywhere, but in order to demonstrate Win32 and static build options later in this chapter, I'll be asking you to put it here this one time. Next, enter this directory and create a file called config.m4 with the following contents: PHP_ARG_ENABLE(sample, [Whether to enable the "sample" extension], [ enable-sample Enable "sample" extension support]) if test $PHP_SAMPLE != "no"; then PHP_SUBST(SAMPLE_SHARED_LIBADD) PHP_NEW_EXTENSION(sample, sample.c, $ext_shared) fi This minimalist configuration sets up a ./configure option called enable-sample. The second parameter to PHP_ARG_ENABLE will be displayed during the ./configure process as it reaches this extension's configuration file. The third parameter will be displayed as an available option if the end-user issues ./configurehelp. Note Ever wonder why some extensions are configured using enable-extname and some are configured using with-extname? Functionally, there is no difference between the two. In practice, however, enable is meant for features that can be turned on without requiring any third-party libraries. with, by contrast, is meant for features that do have such prerequisites. For now, your sample extension won't require linking against other libraries, so you'll be using the enable version. Chapter 17, "External Libraries," will introduce using with and instructing the compiler to use additional CFLAGS and LDFLAGS settings. If an end user calls ./configure using the enable-sample option, then a local environment variable, $PHP_SAMPLE, will be set to yes. PHP_SUBST() is a PHP-modified version of the standard autoconf AC_SUBST() macro and is necessary to enable building the extension as a shared module. Last but not least, PHP_NEW_EXTENSION() declares the module and enumerates all the source files that must be compiled as part of the extension. If multiple files were required, they would be listed in the second parameter using a space as a delimiter, for example: PHP_NEW_EXTENSION(sample, sample.c sample2.c sample3.c, $ext_shared) The final parameter is a counterpart to the PHP_SUBST(SAMPLE_SHARED_LIBADD) command and is likewise necessary for building as a shared module. Header When developing in C, it almost always makes sense to segregate certain types of data into external header files that are then included by the source files. Although PHP does not require this, it lends simplicity when a module grows beyond the scope of a single source file. You'll start with the following contents in your new header file, called php_sample.h: #ifndef PHP_SAMPLE_H /* Prevent double inclusion */ #define PHP_SAMPLE_H /* Define Extension Properties */ #define PHP_SAMPLE_EXTNAME "sample" #define PHP_SAMPLE_EXTVER "1.0" /* Import configure options when building outside of the PHP source tree */ #ifdef HAVE_CONFIG_H #include "config.h" #endif /* Include PHP Standard Header */ #include "php.h" /* Define the entry point symbol * Zend will use when loading this module */ extern zend_module_entry sample_module_entry; #define phpext_sample_ptr &sample_module_entry #endif /* PHP_SAMPLE_H */ This header file accomplishes two primary tasks: If the extension is being built using the phpize toolwhich is how you'll be building it through most of this bookthen HAVE_CONFIG_H gets defined and config.h will be included as well. Regardless of how the extension is being compiled, it also includes php.h from the PHP source tree. This header file subsequently includes several other headers spread across the PHP sources providing access to the bulk of the PHPAPI. Next, the zend_module_entry struct used by your extension is declared external so that it can be picked up by Zend using dlopen() and dlsym() when this module is loaded using an extension= line. This header file also includes a few preprocessor defines that will be used in the source file shortly. Source Last, and by no means least, you'll create a simple source skeleton in the file sample.c: #include "php_sample.h" zend_module_entry sample_module_entry = { #if ZEND_MODULE_API_NO >= 20010901 STANDARD_MODULE_HEADER, #endif PHP_SAMPLE_EXTNAME, NULL, /* Functions */ NULL, /* MINIT */ NULL, /* MSHUTDOWN */ NULL, /* RINIT */ NULL, /* RSHUTDOWN */ NULL, /* MINFO */ #if ZEND_MODULE_API_NO >= 20010901 PHP_SAMPLE_EXTVER, #endif STANDARD_MODULE_PROPERTIES }; #ifdef COMPILE_DL_SAMPLE ZEND_GET_MODULE(sample) #endif And that's it! These three files are everything needed to create a module skeleton. Granted, it doesn't do anything useful, but it's a place to start and you'll be adding functionality through the rest of this section. First though, let's go through what's happening. The opening line is simple enough: Include the header file you just created, and by extension all the other PHP core header files from the source tree. Next, create the zend_module_entry struct you declared in the header file. You'll notice that the first element of the module entry is conditional based on the current ZEND_MODULE_API_NO definition. This API number roughly equates to PHP 4.2.0, so if you know for certain that your extension will never be built on any version older than this, you could eschew the #ifdef lines entirely and just include the STANDARD_MODULE_HEADER element directly. Consider, however, that it costs you very little in terms of compile time and nothing in terms of the resulting binary or the time it takes to process, so in most cases it will be best to just leave this condition in. The same applies to the version property near the end of this structure. The other six elements of this structure you've initially set to NULL for now; you can see a hint from the comments next to these lines as to what they'll eventually be used for. Finally, at the bottom you'll find a short element common to every PHP extension, which is able to be built as a shared module. This brief conditional simply adds a reference used by Zend when your extension is loaded dynamically. Don't worry about what it does or how it does it too much; just make sure that it's around or the next section won't work. Building Your First Extension Now that you've got all the files in place, it's time to make it go. As with building the main PHP binary, there are different steps to be taken depending on whether you're compiling for *nix or for Windows. Building Under *nix The first step is to generate a ./configure script using the information in config.m4 as a template. This can be done by running the phpize program installed when you compiled the main PHP binary. $ phpize PHP Api Version: 20041225 Zend Module Api No: 20050617 Zend Extension Api No: 220050617 Note The extra 2 at the start of Zend Extension Api No isn't a typo; it corresponds to the Zend Engine 2 version and is meant to keep this API number greater than its ZE1 counterpart. If you look in the current directory at this point, you'll notice a lot more files than you had there a moment ago. The phpize program combined the information in your extension's config.m4 file with data collected from your PHP build and laid out all the pieces necessary to make a compile happen. This means that you don't have to struggle with makefiles and locating the PHP headers you'll be compiling against. PHP has already done that job for you. The next step is a simple ./configure that you might perform with any other OSS package. You're not configuring the entire PHP bundle here, just your one extension, so all you need to type in is the following: $ ./configure enable-sample Notice that not even enable-debug and enable-maintainer-zts were used here. That's because phpize has already taken those values from the main PHP build and applied them to your extension's ./configure script. Now build it! Like any other package, you can just type make and the generated script files will handle the rest. When the build process finishes, you'll be treated to a message stating that sample.so has been compiled and placed in a directory called "modules" within your current build directory. Building Under Windows The config.m4 file you created earlier was actually specific to the *nix build. In order to make your extension compile under Windows, you'll need to create a separatebut similarconfiguration file for it. Add config.w32 with the following contents to your ext/sample directory: ARG_ENABLE("sample", "enable sample extension", "no"); if (PHP_SAMPLE != "no") { EXTENSION("sample", "sample.c"); } As you can see, this file bears a resemblance on a high level to config.m4. The option is declared, tested, and conditionally used to enable the build of your extension. Now you'll repeat a few of the steps you performed in Chapter 4, "Setting Up a Build Environment," when you built the PHP core. Start by opening up a build window from the Start menu by selecting All Programs, Microsoft Platform SDK for Windows Server 2003 SP1, Open Build Environment Window, Windows 2000 Build Environment, Set Windows 2000 Build Environment (Debug), and running the C:\Program Files\Microsoft Visual Studio 8\VC\bin\vcvars32.bat batch file. Remember, your installation might require you to select a different build target or run a slightly different batch file. Refer to the notes in the corresponding section of Chapter 4 to refresh your memory. Again, you'll want to go to the root of your build directory and rebuild the configure script. C:\Program Files\Microsoft Platform SDK> cd \PHPDEV\php-5.1.0 C:\PHPDEV\php-5.1.0> buildconf.bat Rebuilding configure.js Now run 'cscript /nologo configure.js help' This time, you'll run the configure script with an abridged set of options. Because you'll be focusing on just your extension and not the whole of PHP, you can leave out options pertaining to other extensions; however, unlike the Unix build, you do need to include the enable-debug switch explicitly even though the core build already has it. The only crucial switch you'll need hereapart from debug of courseis enable-sample=shared. The shared option is required here because configure.js doesn't know that you're planning to build sample as a loadable extension. Your configure line should therefore look something like this: C:\PHPDEV\php-5.1.0> cscript /nologo configure.js \ enable-debug enable-sample=shared Note Recall that enable-maintainer-zts is not required here as all Win32 builds assume that ZTS must be enabled. Options relating to SAPIssuch as embedare also not required here as the SAPI layer is independent from the extension layer. Lastly, you're ready to build the extension. Because this build is based from the coreunlike the Unix extension build, which was based from the extensionyou'll need to specify the target name in your build line. C:\PHPDEV\php-5.1.0> nmake php_sample.dll Once compilation is complete, you should have a working php_sample.dll binary ready to be used in the next step. Remember, because this book focuses on *nix development, the extension will be referred to as sample.so rather than php_sample.dll in all following text. Loading an Extension Built as a Shared Module In order for PHP to locate this module when requested, it needs to be located in the same directory as specified in your php.ini setting: extension_dir. By default, php.ini is located in /usr/local/lib/php.ini; however, this default can be changed and often is with distribution packaging systems. Check the output of php -i to see where PHP is looking for your config file. This setting, in an unmodified php.ini, is an unhelpful ./. If you don't already have extensions being loaded, or just don't have any extensions other than sample.so anyway, you can change this value to the location where make put your module. Otherwise, just copy sample.so to the directory where this setting is pointing. After extension_dir is pointing to the right place, there are two ways to tell PHP to load your module. The first is using the dl() function within your script: If this script doesn't show sample as a loaded module, something has gone wrong. Look for error messages above the output for a clue, or refer to your error_log if one is defined in your php.ini. The second, and much more common, method is to specify the module in your php.ini using the extension directive. The extension setting is relatively unique among php.ini settings in that it can be specified multiple times with different values. So if you already have an extension setting in your php.ini, don't add it to the same line like a delimited list; instead insert an additional line containing just sample.so. At this point your php.ini should look something like this: extension_dir=/usr/local/lib/php/modules/ extension=sample.so Now you could run the same script without the dl() line, or just issue the command php -m and still see "sample" in the list of loaded modules. Note All sample code in this and the following chapters will assume you've loaded the current extension using this method. If you plan on using dl() instead, be sure to add the appropriate load line to the sample scripts. Building Statically In the list of loaded modules, you probably noticed that several modules were listed that were not included using the extension directive in php.ini. These modules are built directly into PHP and are compiled as part of the main build process. Building Static Under *nix At this point, if you tried navigating up a couple directories to the PHP source tree root, you could run ./configurehelp and see that although your sample extension is located in the ext/ directory along with all the other modules, it's not listed as an option. This is because, at the time that the ./configure script was generated, your extension was unknown. To regenerate ./configure and have it locate your new extension all you need to do is issue one command: $ ./buildconf Note If you're using a production release of PHP to do development against, you'll find that ./buildconf by itself doesn't actually work. In this case you'll need to issue: ./buildconf force to bypass some minor protection built into the ./configure command. Now you can issue ./configure help and see that enable-sample is an available option. From here, you could re- issue ./configure with all the options you used in the main PHP build plus enable-sample to create a single, ready-to- go binary containing a full PHP interpreter and your custom extension. Of course, it's probably a bit early to be doing that. Your extension still needs to do something besides take up space. Let's stick to building a nice lean shared object for now. Building Statically Under Windows Regenerating the configure.js script for Windows follows the same pattern as regenerating the ./configure script for *nix. Navigate to the root of the PHP source tree and reissue buildconf.bat as you did in Chapter 4. The PHP build system will scan for config.w32 files, including the one you just made for ext/sample, and generate a new configure.js script with which to build a static php binary. Functional Functions The quickest link between userspace and extension code is the PHP_FUNCTION(). Start by adding the following code block near the top of your sample.c file just after #include "php_sample.h": PHP_FUNCTION(sample_hello_world) { php_printf("Hello World!\n"); } The PHP_FUNCTION() macro functions just like a normal C function declaration because that's exactly how it expands: #define PHP_FUNCTION(name) \ void zif_##name(INTERNAL_FUNCTION_PARAMETERS) which in this case evaluates out to: void zif_sample_hello_world(zval *return_value, char return_value_used, zval *this_ptr TSRMLS_DC) Simply declaring the function isn't enough, of course. The engine needs to know the address of the function as well as how the function name should be exported to user space. This is accomplished by the next code block, which you'll want to place immediately after the PHP_FUNCTION() block: static function_entry php_sample_functions[] = { PHP_FE(sample_hello_world, NULL) { NULL, NULL, NULL } }; The php_sample_functions vector is a simple NULL terminated vector that will grow as you continue to add functionality to the sample extension. Every function you export will appear as an item in this vector. Taking apart the PHP_FE() macro, you see that it expands to { "sample_hello_world", zif_sample_hello_world, NULL}, thus providing both a name for the new function, as well as a pointer to its implementation function. The third parameter in this set is used to provide argument hinting information such as requiring certain arguments to be passed by reference. You'll see this feature in use in Chapter 7, "Accepting Parameters." So now you've got a list of exportable functions, but still nothing connecting it to the engine. This is accomplished with the last change to sample.c, which amounts to simply replacing the NULL, /* Functions */ line in your sample_module_entry structure with php_sample_functions, (be sure to keep that comma there!) Now rebuild according to the instructions earlier and test it out using the -r option to the php command line, which allows running simple code fragments without having to create an entire file: $ php -r 'sample_hello_world();' If all has gone well, you'll see the words "Hello World!" output almost immediately. Zend Internal Functions The zif_ string prefixed to internal function names stands for "Zend Internal Function" and is used to avoid probable symbol conflicts. For example, the userspace strlen() function could not be implemented as void strlen (INTERNAL_FUNCTION_PARAMTERS) as it would conflict with the C library's implementation of strlen. Sometimes even the default prefix of zif_ simply won't do. Usually this is because the function name expands another macro and gets misinterpreted by the C compiler. In these cases, an internal function may be given an arbitrary name using the PHP_NAMED_FUNCTION() macro; for example, PHP_NAMED_FUNCTION(zif_sample_hello_world) is identical to the earlier use of PHP_FUNCTION(sample_hello_world). When adding an implementation declared using PHP_NAMED_FUNCTION(), the PHP_NAMED_FE() macro is used to link it into the function_entry vector. So if you declared your function as PHP_NAMED_FUNCTION(purplefunc), you'd use PHP_NAMED_FE(sample_hello_world, purplefunc, NULL) rather than using PHP_FE(sample_hello_world, NULL). This practice can been seen in ext/standard/file.c where the fopen() function is actually declared using PHP_NAMED_FUNCTION(php_if_fopen). As far as userspace is concerned, there's nothing usual about the function; it's still called as simply fopen(). Internally, however, the function is protected from being mangled by preprocessor macros and over-helpful compilers. Function Aliases Some functions can be referred to by more than one name. Recalling that ordinary functions are declared internally as the function's userspace name with zif_ prepended, it's easy to see that the PHP_NAMED_FE() macro could be used to create this alternative mapping: PHP_FE(sample_hello_world, NULL) PHP_NAMED_FE(sample_hi, zif_sample_hello_world, NULL) The PHP_FE() macro associates the userspace function name sample_hello_world with zif_sample_hello_worldthe expansion of PHP_FUNCTION(sample_hello_world). The PHP_NAMED_FE() macro then associates the userspace function name sample_hi with this same internal implementation. Now pretend that, because of a major change in the Zend engine, the standard prefix for internal functions changes from zif_ to pif_. Your extension will suddenly stop being able to compile because when the PHP_NAMED_FE() function is reached, zif_sample_hello_world is undefined. This sort of unusual but troublesome case can be avoided by using the PHP_FNAME() macro to expand sample_hello_world for you: PHP_NAMED_FE(sample_hi, PHP_FNAME(sample_hello_world), NULL) This way, if the function prefix ever changes, the function entry will update automatically using the macro expansions defined in the PHP Core. Now that you've got this entry working, guess what? It's not necessary. PHP exports yet another macro designed specifically for creating function aliases. The previous example could be rewritten as simply: PHP_FALIAS(sample_hi, sample_hello_world, NULL) Indeed this is the official way to create function aliases, and how you'll see it done nearly everywhere else in the PHP source tree. Summary In this chapter you created a simple working PHP extension and learned the steps necessary to build it for most major platforms. In the coming chapters, you'll add to this extension, ultimately including every type of PHP feature. The PHP source tree and the tools it relies on to compile and build on the many platforms it supports is constantly changing. If something in this chapter failed to work, refer to the php.net online manual under Installation to see if your version has special needs. Chapter 6. Returning Values USERSPACE FUNCTIONS MAKE USE OF THE return keyword to pass information back to their calling scope in the same manner that you're probably familiar with doing in a C application, for example: function sample_long() { return 42; } $bar = sample_long(); When sample_long() is called, the number 42 is returned and populated into the $bar variable. In C this might be done using a nearly identical code base: int sample_long(void) { return 42; } void main(void) { int bar = sample_long(); } Of course, in C you always know what the function being called is going to return based on its function prototype so you can declare the variable the result will be stored in accordingly. When dealing with PHP userspace, however, the variable type is dynamic and you have to fall back on the zval type introduced in Chapter 2, "Variables from the Inside Out." The return_value Variable You'll probably be tempted to believe that your internal function should return an immediate zval, ormore likelyallocate memory for a zval and return a zval* such as in the following code block: PHP_FUNCTION(sample_long_wrong) { zval *retval; MAKE_STD_ZVAL(retval); ZVAL_LONG(retval, 42); return retval; } Unfortunately, you'll be close, but ultimately wrong. Rather than forcing every function implementation to allocate a zval and return it, the Zend Engine pre-allocates this space before the method is called. It then initializes the zval's type to IS_NULL, and passes that value in the form of a parameter named return_value. Here's that same function again, done correctly: PHP_FUNCTION(sample_long) { ZVAL_LONG(return_value, 42); return; } Notice that nothing is directly returned by the PHP_FUNCTION() implementation. Instead, the return_value parameter is populated with appropriate data directly and the Zend Engine will process this into the value after the internal function has finished executing. As a reminder, the ZVAL_LONG() macro is a simple wrapper around a set of assignment operations, in this case: Z_TYPE_P(return_value) = IS_LONG; Z_LVAL_P(return_value) = 42; Or more primitively: return_value->type = IS_LONG; return_value->value.lval = 42; Note The is_ref and refcount properties of the return_value variable should almost never be modified by an internal function directly. These values are initialized and processed by the Zend Engine when it calls your function. Let's take a look at this particular function in action by adding it to the sample extension from Chapter 5, "Your First Extension," just below the sample_hello_world() function. You'll also need to expand the php_sample_functions struct to contain a function entry for sample_long() as shown: static function_entry php_sample_functions[] = { PHP_FE(sample_hello_world, NULL) PHP_FE(sample_long, NULL) { NULL, NULL, NULL } }; At this point the extension can be rebuilt by issuing make from the source directory or nmake php_sample.dll from the PHP source root for Windows. If all has gone well, you can now run PHP and exercise your new function: $ php -r 'var_dump(sample_long());' Wrap Your Macros Tightly In the interest of readable, maintainable code, the ZVAL_*() macros have duplicated counterparts that are specific to the return_value variable. In each case, the ZVAL portion of the macro is replaced with the term RETVAL, and the initial parameterwhich would otherwise denote the variable being modifiedis omitted. In the prior example, the implementation of sample_long() can be reduced to the following: PHP_FUNCTION(sample_long) { RETVAL_LONG(42); return; } Table 6.1 lists the RETVAL family of macros as defined by the Zend Engine. In all cases except two, the RETVAL macro is identical to its ZVAL counterpart with the initial return_value parameter removed. Note Notice that the trUE and FALSE macros have no parentheses. These are considered aberrations within the Zend/PHP coding standards but are retained primarily for backward compatibility. If you build an extension and receive an error reading undefined macro RETVAL_TRUE(), be sure to check that you did not include these parentheses. Quite often, after your function has come up with a return value it will be ready to exit and return control to the calling scope. For this reason there exists one more set of macros designed specifically for internal functions: The RETURN_*() family. PHP_FUNCTION(sample_long) { RETURN_LONG(42); } Table 6.1. Return Value Macros Generic ZVAL Macro return_value Specific Counterpart ZVAL_NULL(return_value) RETVAL_NULL() ZVAL_BOOL(return_value, bval) RETVAL_BOOL(bval) ZVAL_TRUE(return_value) RETVAL_TRUE ZVAL_FALSE(return_value) RETVAL_FALSE ZVAL_LONG(return_value, lval) RETVAL_LONG(lval) ZVAL_DOUBLE(return_value, dval) RETVAL_DOUBLE(dval) ZVAL_STRING(return_value, str, dup) RETVAL_STRING(str, dup) ZVAL_STRINGL(return_value, str, len, dup) RETVAL_STRINGL(str,len,dup) ZVAL_RESOURCE(return_value, rval) RETVAL_RESOURCE(rval) Although it's not actually visible, this function still explicitly returns at the end of the RETURN_LONG() macro call. This can be tested by adding a php_printf() call to the end of the function: PHP_FUNCTION(sample_long) { RETURN_LONG(42); php_printf("I will never be reached.\n"); } The php_printf(), as its contents suggest, will never be executed because the call to RETURN_LONG() implicitly leaves the function. Like the RETVAL series, a RETURN counterpart exists for each of the simple types shown in Table 6.1. Also like the RETVAL series, the RETURN_TRUE and RETURN_FALSE macros do not use parentheses. More complex types, such as objects and arrays, are also returned through the return_value parameter; however, their nature precludes a simple macro based approach to creation. Even the resource type, while it has a RETVAL macro, requires additional work to generate. You'll see how to return these types later on in Chapters 8 through 11. Is It Worth the Trouble? One underused feature of the Zend Internal Function is the return_value_used parameter. Consider the following piece of userspace code: function sample_array_range() { $ret = array(); for($i = 0; $i < 1000; $i++) { $ret[] = $i; } return $ret; } sample_array_range(); Because sample_array_range() is called without storing the result into a variable, the workand memorybeing used to create a 1,000 element array is completely wasted. Of course, calling sample_array_range() in this manner is silly, but wouldn't it be nice to know ahead of time that its efforts will be in vain? Although it's not accessible to userspace functions, an internal function can conditionally skip otherwise pointless behavior like this depending on the setting of the return_value_used parameter common to all internal functions. PHP_FUNCTION(sample_array_range) { if (return_value_used) { int i; /* Return an array from 0 - 999 */ array_init(return_value); for(i = 0; i < 1000; i++) { add_next_index_long(return_value, i); } return; } else { /* Save yourself the effort */ php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Static return-only function called without processing output"); RETURN_NULL(); } } To see this function operate, just add it to your growing sample.c source file and toss in a matching entry to your php_sample_functions struct: PHP_FE(sample_array_range, NULL) Returning Reference Values As you already know from working in userspace, a PHP function may also return a value by reference. Due to implementation problems, returning references from an internal function should be avoided in versions of PHP prior to 5.1 as it simply doesn't work. Consider the following userspace code fragment: function &sample_reference_a() { /* If $a does not exist in the global scope yet, * create it with an initial value of NULL */ if (!isset($GLOBALS['a'])) { $GLOBALS['a'] = NULL; } return $GLOBALS['a']; } $a = 'Foo'; $b = sample_reference_a(); $b = 'Bar'; In this code fragment, $b is created as a reference of $a just as if it had been set using $b = &$GLOBALS['a']; orbecause it's being done in the global scope anywayjust $b = &$a;. When the final line is reached, both $a and $bwhich you'll recall from Chapter 3, "Memory Management," are looking at the same actual valuecontain the value 'Bar'. Let's look at that same function again using an internals implementation: #if (PHP_MAJOR_VERSION > 5) || (PHP_MAJOR_VERSION == 5 && \ PHP_MINOR_VERSION > 0) PHP_FUNCTION(sample_reference_a) { zval **a_ptr, *a; /* Fetch $a from the global symbol table */ if (zend_hash_find(&EG(symbol_table), "a", sizeof("a"), (void**)&a_ptr) == SUCCESS) { a = *a_ptr; } else { /* $GLOBALS['a'] doesn't exist yet, create it */ ALLOC_INIT_ZVAL(a); zend_hash_add(&EG(symbol_table), "a", sizeof("a"), &a, sizeof(zval*), NULL); } /* Toss out the old return_value */ zval_ptr_dtor(return_value_ptr); if (!a->is_ref && a->refcount > 1) { /* $a is in a copy-on-write reference set * It must be separated before it can be used */ zval *newa; MAKE_STD_ZVAL(newa); *newa = *a; zval_copy_ctor(newa); newa->is_ref = 0; newa->refcount = 1; zend_hash_update(&EG(symbol_table), "a", sizeof("a"), &newa, sizeof(zval*), NULL); a = newa; } /* Promote to full-reference and increase refcount */ a->is_ref = 1; a->refcount++; *return_value_ptr = a; } #endif /* PHP >= 5.1.0 */ The return_value_ptr parameter is another common parameter passed to all internal functions and is a zval** containing a pointer to return_value. By calling zval_ptr_dtor() on it, the default return_value zval* is freed. You're then free to replace it with a new zval* of your choosing, in this case the variable $a, which has been promoted to is_ref and optionally separated from any non-full reference pairings it might have had. If you were to compile and run this code now, however, you'd get a segfault. In order to make it work, you'll need to add a structure to your php_sample.h file: #if (PHP_MAJOR_VERSION > 5) || (PHP_MAJOR_VERSION == 5 && \ PHP_MINOR_VERSION > 0) static ZEND_BEGIN_ARG_INFO_EX(php_sample_retref_arginfo, 0, 1, 0) ZEND_END_ARG_INFO () #endif /* PHP >= 5.1.0 */ Then use that structure when you declare your function in php_sample_functions: #if (PHP_MAJOR_VERSION > 5) || (PHP_MAJOR_VERSION == 5 && \ PHP_MINOR_VERSION > 0) PHP_FE(sample_reference_a, php_sample_retref_arginfo) #endif /* PHP >= 5.1.0 */ This structure, which you'll learn more about later in this chapter, provides vital hints to the Zend Engine function call routine. In this case it tells the ZE that return_value will need to be overridden, and that it should populate return_value_ptr with the correct address. Without this hint, ZE will simply place NULL in return_value_ptr, which would make this particular function crash when it reached zval_ptr_dtor(). Note Each of these code fragments has been wrapped in an #if block to instruct the compiler that support for them should only be enabled if the PHP version is greater than or equal to 5.1. Without these conditional directives, the extension would not be able to compile on PHP4 (because several elements, including return_value_ptr, do not exist), and would fail to function properly on PHP 5.0 (where a bug causes reference returns to be copied by value). Returning Values by Reference Using the return construct to send values and variable references back from a function is all well and good, but sometimes you want to return multiple values from a function. You could use an array to do this, which we'll explore in Chapter 8, "Working with Arrays and Hashtables," or you can return values back through the parameter stack. Call-time Pass-by-ref One of the simpler ways to pass variables by reference is by requiring the calling scope to include an ampersand (&) with the parameter such as in the following piece of userspace code: function sample_byref_calltime($a) { $a .= ' (modified by ref!)'; } $foo = 'I am a string'; sample_byref_calltime(&$foo); echo $foo; The ampersand (&) placed in the parameter call causes the actual zval used by $foo, rather than a copy of its contents, to be sent to the function. This allows the function to modify the value in place and effectively return information through its passed parameter. If sample_byref_calltime() hadn't been called with the ampersand placed in front of $foo, the changes made inside the function would not have affected the original variable. Repeating this endeavor in C requires nothing particularly special. Create the following function after sample_long() in your sample.c source file: PHP_FUNCTION(sample_byref_calltime) { zval *a; int addtl_len = sizeof(" (modified by ref!)") - 1; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &a) == FAILURE) { RETURN_NULL(); } if (!a->is_ref) { /* parameter was not passed by reference, * leave without doing anything */ return; } /* Make sure the variable is a string */ convert_to_string(a); /* Enlarge a's buffer to hold the additional data */ Z_STRVAL_P(a) = erealloc(Z_STRVAL_P(a), Z_STRLEN_P(a) + addtl_len + 1); memcpy(Z_STRVAL_P(a) + Z_STRLEN_P(a), " (modified by ref!)", addtl_len + 1); Z_STRLEN_P(a) += addtl_len; } As always, this function needs to be added to the php_sample_functions structure: PHP_FE(sample_byref_calltime, NULL) Compile-time Pass-by-ref The more common way to pass by reference is by using compile-time pass-by-ref. Here, the parameters to a function are declared to be for reference use only and attempts to pass constants or intermediate valuessuch as the result of a function callwill result in an error because there is nowhere for the function to store the resulting value back into. A userspace compile-time pass-by-ref function might look something like the following: function sample_byref_compiletime(&$a) { $a .= ' (modified by ref!)'; } $foo = 'I am a string'; sample_byref_compiletime($foo); echo $foo; As you can see, this varies from the calltime version only in the placement of the referencing ampersand. When looking at this function in C, the implementation in terms of function code is entirely identical. The only true difference is in how it is declared in the php_sample_functions block: PHP_FE(sample_byref_compiletime, php_sample_byref_arginfo) where php_sample_byref_arginfo is a (verbosely named) constant structure which you'll obviously need to define before this entry will compile. Note The check for is_ref could actually be left out of the compile-time version because it will always failand not exitbut it causes no harm to leave it in for now. In Zend Engine 1 (PHP4), this will be a simple char* list made up of a length byte followed by a set of flags pertaining to each of a function's parameters in turn. static unsigned char php_sample_byref_arginfo[] = { 1, BYREF_FORCE }; Here, the 1 indicates that the vector only contains argument info for one parameter. The argument-specific arg info then follows in subsequent elements with the first arg going in the second element as shown. If there had been a second or third argument involved, their flags would have gone in the third and fourth elements respectively and so on. Possible values for a given argument's element are shown in Table 6.2. In Zend Engine 2 (PHP5+), you'll use a much more extensive structure containing information such as minimum and maximum parameter requirements, type hinting, and whether or not to force referencing. First the arg info struct is declared using one of two macros. The simpler macro, ZEND_BEGIN_ARG_INFO(), takes two Table 6.2. Zend Engine 1 Arg Info Constants Reference Type Meaning BYREF_NONE Pass-by-ref is never allowed on this parameter. Attempts to use call-time pass-by-ref will be ignored and the parameter will be copied instead. BYREF_FORCE Arguments are always passed by reference regardless of how the function is called. This is equivalent to using an ampersand in a userspace function parameter declaration. BYREF_ALLOW Argument passing by reference is determined by call-time semantics. This is equivalent to ordinary userspace function declaration. BYREF_FORCE_REST The current argument and all subsequent arguments will have BYREF_FORCE applied. This flag may only be the last arg info flag in the list. Placing additional flags after BYREF_FORCE_REST will result in undefined behavior. parameters: ZEND_BEGIN_ARG_INFO(name, pass_rest_by_reference) name is quite simply how this struct will be referred to within the extension, in this case: php_sample_byref_arginfo. pass_rest_by_reference takes on the same meaning here as using BYREF_FORCE_REST as the last element of a Zend Engine 1 arg info vector. If this parameter is set to 1, all arguments not explicitly described within the struct will be assumed to be compile-time pass-by-ref arguments. The alternative begin macro, which introduces two new options not found in the Zend Engine 1 version, is ZEND_BEGIN_ARG_INFO_EX(): ZEND_BEGIN_ARG_INFO_EX(name, pass_rest_by_reference, return_reference, required_num_args) name and pass_rest_by_reference have the same meanings here of course. return_reference, as you saw earlier in the chapter, gives a hint to Zend that your function will be overriding return_value_ptr with your own zval. The final argument, required_num_args, is another shortcut hint to Zend that allows it to skip certain function calls entirely when that function's prototype is known to be incompatible with how its being called. After you have a suitable begin macro in place, it may be followed by zero or more ZEND_ARG_*INFO elements. The types and usages of these macros are shown in Table 6.3. Lastly, all arg info structs using the Zend Engine 2 macros must terminate their list using ZEND_END_ARG_INFO(). For your sample function, you might select a final structure that looks like the following: ZEND_BEGIN_ARG_INFO(php_sample_byref_arginfo, 0) ZEND_ARG_PASS_INFO(1) ZEND_END_ARG_INFO() In order to make extensions that are compatible with both ZE1 and ZE2, it's necessary to use an #ifdef statement and define the same arg_info structure for both, in this case: #ifdef ZEND_ENGINE_2 static ZEND_BEGIN_ARG_INFO(php_sample_byref_arginfo, 0) ZEND_ARG_PASS_INFO(1) ZEND_END_ARG_INFO() #else /* ZE 1 */ static unsigned char php_sample_byref_arginfo[] = Table 6.3. ZEND_ARG_INFO Family of Macros Arg Info macro Purpose ZEND_ARG_PASS_INFO (by_ref) by_ref hereas in all subsequent macrosis a binary option indicating whether the corresponding parameter should be forced as pass-by-reference. Setting this option to 1 is equivalent to using BYREF_FORCE in a Zend Engine 1 vector. ZEND_ARG_INFO(by_ref, name) This macro provides an additional name attribute used by internally generated error messages and the reflection API. It should be set to something helpful and non-cryptic. ZEND_ARG_ARRAY_INFO (by_ref, name, allow_null) ZEND_ARG_OBJ_INFO(by_ref, name, classname, allow_null) These two macros provide argument type hinting to internal functions specifying that either an array or particular class instance is expected as the parameter. Setting allow_null to a non-zero value will allow the calling scope to pass a NULL value in place of an array/object. { 1, BYREF_FORCE }; #endif Now that all the pieces are gathered together, it's time to create an actual compile-time pass-by-reference implementation. First let's put the block defining php_sample_byref_arginfo for ZE1 and ZE2 into the header file php_sample.h. Next, you could take two approaches: One approach would be to copy and paste the PHP_FUNCTION (sample_byref_calltime) implementation and rename it to PHP_FUNCTION(sample_byref_compiletime), and then add a PHP_FE(sample_byref_compiletime, php_sample_byref_arginfo) line to php_sample_functions. This approach is straightforward and probably less prone to confusion when making changes years from now. Because this is just sample code, however, you can play a little looser and avoid code duplication by using PHP_FALIAS(), which you saw last chapter. This time, rather than making a duplicate of PHP_FUNCTION(sample_byref_calltime), add a single line to php_sample_functions: PHP_FALIAS(sample_byref_compiletime, sample_byref_calltime, php_sample_byref_arginfo) As you'll recall from Chapter 5, this creates a userspace function called sample_byref_compiletime() with an internal implementation using sample_byref_calltime()'s code. The addition of php_sample_byref_arginfo makes this version unique. Summary In this chapter you looked at how to return values from internal functions both directly by value, as a reference, and through their parameter stack using references. You also got a first look at argument type hinting using Zend Engine 2's zend_arg_info struct. In the next chapter you'll delve more deeply into accepting parameters both as elementary zvals and using zend_parse_parameters()'s powerful type juggling features. Chapter 7. Accepting Parameters WITH A COUPLE OF "SNEAK PREVIEW" EXCEPTIONS, the extension functions you've dealt with so far have been simple, return-only factories. Most functions, however, won't be so single purposed. You usually want to pass in some kind of parameter and receive a meaningful response based on the value and some additional processing. Automatic Type Conversion with zend_parse_parameters() As with return values, which you saw last chapter, parameter values are moved around using indirect zval references. The easiest way to get at these zval* values is using the zend_parse_parameters() function. Calls to zend_parse_parameters() almost invariably begin with the ZEND_NUM_ARGS() macro followed by the ubiquitous TSRMLS_CC. ZEND_NUM_ARGS(), as its name suggests, returns an int representing the number of arguments actually passed to the function. Because of the way zend_parse_parameters() works internally, you'll probably never need to inspect this value directly, so just pass it on for now. The next parameter to zend_parse_parameters() is the format parameter, which is made up of a string of letters or character sequences corresponding to the various primitive types supported by the Zend Engine. Table 7.1 shows the basic type characters. The remaining parameters to ZPP depend on which specific type you've requested in your format string. For the simpler types, this is a dereferenced C language primitive. For example, a long data type is extracted like such: PHP_FUNCTION(sample_getlong) { long foo; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "l", &foo) == FAILURE) { RETURN_NULL(); } php_printf("The integer value of the parameter you " "passed is: %ld\n", foo); RETURN_TRUE; } Note Although it's common for integers and longs to have the same data storage size, they cannot always be used interchangeably. Attempting to dereference an int data type into a long* parameter can lead to unexpected results, especially as 64-bit platforms become more prevalent. Always use the appropriate data type(s) as listed in Table 7.2. Table 7.1. zend_parse_parameters() Type Specifiers Type Specifier Userspace Datatype b Boolean l Integer d Floating point s String r Resource a Array o Object instance O Object instance of a specified type z Non-specific zval Z Dereferenced non-specific zval Table 7.2. zend_parse_parameters() Data Types Notice that all the more complex data types actually parse out as simple zvals. For the most part this is due to the same limitation that prevents returning complex data types using RETURN_*() macros: There's really no C-space analog to these structures. What ZPP does do for your function, however, is ensure that the zval* you do receive is of the appropriate type. If necessary, it will even perform implicit conversions such as casting arrays to stdClass objects. The s and O types are also worth pointing out because they require a pair of parameters for each invocation. You'll see O more closely when you explore the Object data type in Chapters 10, "PHP4 Objects," and 11, "PHP5 Objects." In the case of the s type, let's say you're extending the sample_hello_world() function from Chapter 5, "Your First Extension," to greet a specific person by name: function sample_hello_world($name) { echo "Hello $name!\n"; } In C, you'll use the zend_parse_parameters() function to ask for a string: PHP_FUNCTION(sample_hello_world) { char *name; int name_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &name, &name_len) == FAILURE) { RETURN_NULL(); } php_printf("Hello "); PHPWRITE(name, name_len); php_printf("!\n"); } Tip The zend_parse_parameters() function may fail due to the function being passed to too few arguments to satisfy the format string or because one of the arguments passed simply cannot be converted to the requested type. In such a case, it will automatically output an error message so your extension doesn't have to. To request more than one parameter, extend the format specifier to include additional characters and stack the subsequent arguments onto the zend_parse_parameters() call. Parameters are parsed left to right just as they are in a userspace function declaration: function sample_hello_world($name, $greeting) { echo "Hello $greeting $name!\n"; } sample_hello_world('John Smith', 'Mr.'); Type specifier C datatype(s) b zend_bool l long d double s char*, int r zval* a zval* o zval* O zval*, zend_class_entry* z zval* Z zval** Or: PHP_FUNCTION(sample_hello_world) { char *name; int name_len; char *greeting; int greeting_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss", &name, &name_len, &greeting, &greeting_len) == FAILURE) { RETURN_NULL(); } php_printf("Hello "); PHPWRITE(greeting, greeting_len); php_printf(" "); PHPWRITE(name, name_len); php_printf("!\n"); } In addition to the type primitives, an additional three metacharacters exist for modifying how the parameters will be processed. Table 7.3 lists these modifiers. Optional Parameters Taking another look at the revised sample_hello_world() example, your next step in building this out might be to make the $greeting parameter optional. In PHP: function sample_hello_world($name, $greeting='Mr./Ms.') { echo "Hello $greeting $name!\n"; } sample_hello_world() can now be called with both parameters or just the name: sample_hello_world('Ginger Rogers','Ms.'); sample_hello_world('Fred Astaire'); with the default argument being used when none is explicitly given. In a C implementation, optional parameters are specified in a similar manner. To accomplish this, use the pipe character (|) in zend_parse_parameters()'s format string. Arguments to the left of the pipe will parsed from the call stackif possiblewhile any argument on the right that isn't provided will be left unmodified. For example: PHP_FUNCTION(sample_hello_world) Table 7.3. zend_parse_parameters() Modifiers Type Modifier Meaning | Optional parameters follow. When this is specified, all previous parameters are considered required and all subsequent parameters are considered optional. ! If a NULL is passed for the parameter corresponding to the preceding argument specifier, the internal variable provided will be set to an actual NULL pointer as opposed to an IS_NULL zval. / If the parameter corresponding to the preceding argument specifier is in a copy-on-write reference set, it will be automatically separated into a new zval with is_ref==0, and refcount==1. { char *name; int name_len; char *greeting = "Mr./Mrs."; int greeting_len = sizeof("Mr./Mrs.") - 1; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s|s", &name, &name_len, &greeting, &greeting_len) == FAILURE) { RETURN_NULL(); } php_printf("Hello "); PHPWRITE(greeting, greeting_len); php_printf(" "); PHPWRITE(name, name_len); php_printf("!\n"); } Because optional parameters are not modified from their initial values unless they're provided as arguments, it's important to initialize any parameters to some default value. In most cases this will be NULL/0, though sometimesas aboveanother default is sensible. IS_NULL Versus NULL Every zval, even the ultra-simple IS_NULL data type, occupies a certain minimal amount of memory overhead. Beyond that, it takes a certain number of clock cycles to allocate that memory space, initialize the values, and then ultimately free it when it's deemed no longer useful. For many functions, it makes no sense to go through this process only to find out that the parameter was flagged as unimportant by the calling scope through the use of a NULL argument. Fortunately zend_parse_parameters() allows arguments to be flagged as "NULL permissible" by appending an exclamation point to their format specifier. Consider the following two code fragments, one with the modifier and one without: PHP_FUNCTION(sample_arg_fullnull) { zval *val; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &val) == FAILURE) { RETURN_NULL(); } if (Z_TYPE_P(val) == IS_NULL) { val = php_sample_make_defaultval(TSRMLS_C); } ... PHP_FUNCTION(sample_arg_nullok) { zval *val; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z!", &val) == FAILURE) { RETURN_NULL(); } if (!val) { val = php_sample_make_defaultval(TSRMLS_C); } ... These two versions really aren't so different code wise, though the former uses nominally more processor time. In general, this feature won't be very useful, but it's good to know it's available. Forced Separation When a variable is passed into a function, whether by reference or not, its refcount is almost always at least 2; one reference for the variable itself, and another for the copy that was passed into the function. Before making changes to that zval (if acting on the zval directly), it's important to separate it from any non-reference set it may be part of. This would be a tedious task were it not for the / format specifier, which automatically separates any copy-on-write referenced variable so that your function can do as it pleases. Like the NULL flag, this modifier goes after the type it means to impact. Also like the NULL flag, you won't know you need this feature until you actually have a use for it. zend_get_arguments() If you happen to be designing your code to work on very old versions of PHP, or you just have a function that never needs anything other than zval*s, you might consider using the zend_get_parameters() API call. The zend_get_parameters() call differs from its newer parse counterpart in a few crucial ways. First, it performs no automatic type conversion; instead all arguments are extracted as primitive zval* data types. The simplest use of zend_get_parameters() might be something like the following: PHP_FUNCTION(sample_onearg) { zval *firstarg; if (zend_get_parameters(ZEND_NUM_ARGS(), 1, &firstarg) == FAILURE) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Expected at least 1 parameter."); RETURN_NULL(); } /* Do something with firstarg... */ } Second, as you can see from the manually applied error message, zend_get_parameters() does not output error text on failure. It's also very poor at handling optional parameters. Specifically, if you ask it to fetch four arguments, you had better be certain that at least four arguments were provided or it will return FAILURE. Lastly, unlike parse, this specific get variant will automatically separate any copy-on-write reference sets. If you still wanted to skip automatic separation, you could use its sibling: zend_get_parameters_ex(). In addition to not separating copy-on-write reference sets, zend_get_parameters_ex() differs in that it returns zval** pointers rather than simply zval*. Though that distinction is probably one you won't know you need until you have cause to use it, its usage is ultimately quite similar: PHP_FUNCTION(sample_onearg) { zval **firstarg; if (zend_get_parameters_ex(1, &firstarg) == FAILURE) { WRONG_PARAM_COUNT; } /* Do something with firstarg... */ } Note Notice that the _ex version does not require the ZEND_NUM_ARGS() parameter. This is due to the _ex version being added at a later time when changes to the Zend Engine made this parameter unnecessary. In this example, you also used the WRONG_PARAM_COUNT macro, which handles displaying an E_WARNING error message and automatically leaving the function. Handling Arbitrary Numbers of Arguments Two more members of the zend_get_parameter family exist for extracting a set of zval* and zval** pointers in situations where either the number of parameters is prohibitively large, or will not actually be known until runtime. Consider the var_dump() function, which will display the contents of an arbitrary number of variables passed to it: PHP_FUNCTION(var_dump) { int i, argc = ZEND_NUM_ARGS(); zval ***args; args = (zval ***)safe_emalloc(argc, sizeof(zval **), 0); if (ZEND_NUM_ARGS() == 0 || zend_get_parameters_array_ex(argc, args) == FAILURE) { efree(args); WRONG_PARAM_COUNT; } for (i=0; iname; the second name is accessed by following the next property: people->next->name, and then people->next->next->name, and so on until next is NULL meaning that no more names exist in the list. More commonly, a loop might be used to iterate through such a list: void name_show(namelist *p) { while (p) { printf("Name: %s\n", p->name); p = p->next; } } Such lists are very handy for FIFO chains where new data is added to the end of a list as it comes in, leaving another branch or thread to handle consuming the data: static namelist *people = NULL, *last_person = NULL; void name_add(namelist *person) { person->next = NULL; if (!last_person) { /* No one in the list yet */ people = last_person = person; return; } /* Append new person to the end of the list */ last_person->next = person; /* Update the list tail */ last_person = person; } namelist *name_pop(void) { namelist *first_person = people; if (people) { people = people->next; } return first_person; } New namelist structures can be shifted in and popped out of this list as many times as necessary without having to adjust the structure's size or block copy elements between positions. The form of the linked list you just saw is singly linked, and while it has some interesting features, it also has some serious weaknesses. Given a pointer to one item in the linked list, it becomes difficult to cut that element out of the chain and ensure that the prior element will be properly linked to the next one. In order to even know what the prior element was, it's necessary to iterate through the entire list until the element to be removed is found within the next property of a given temp element. On large lists this can present a significant investment in CPU time. A simple and relatively inexpensive solution to this problem is the doubly linked list. With the doubly linked list, every element gets an additional pointer value indicating the location of the previous element: typedef struct _namelist namelist; struct { namelist *next, *prev; char *name; } _namelist; When an element is added to a doubly linked list, both of these pointers are updated accordingly: void name_add(namelist *person) { person->next = NULL; if (!last_person) { /* No one in the list yet */ people = last_person = person; person->prev = NULL; return; } /* Append new person to the end of the list */ last_person ->next = person; person->prev = last_person; /* Update the list tail */ last_person = person; } So far you haven't seen any advantage to this, but now imagine you have an arbitrary namelist record from somewhere in the middle of the people list and you want to remove it. In the singly linked list you'd need to do something like the following: void name_remove(namelist *person) { namelist *p; if (person == people) { /* Happens to be the first person in the list */ people = person->next; if (last_person == person) { /* Also happens to be the last person */ last_person = NULL; } return; } /* Search for prior person */ p = people; while (p) { if (p->next == person) { /* unlink */ p->next = person->next; if (last_person == person) { /* This was the last element */ last_person = p; } return; } p = p->next; } /* Not found in list */ } Now compare that code with the simpler approach found in a doubly linked list: void name_remove(namelist *person) { if (people == person) { people = person->next; } if (last_person == person) { last_person = person->prev; } if (person->prev) { person->prev->next = person->next; } if (person->next) { person->next->prev = person->prev; } } Rather than a long, complicated loop, delinking this element from the list requires only a simple set of reassignments wrapped in conditionals. A reverse of this process also allows elements to be inserted at arbitrary points in the list with the same improved efficiency. HashTables The Best of Both Worlds Although you'll quite likely use vectors or linked lists in a few places in your application, there exists one more type of collection that you'll end up using even more: The HashTable. A HashTable is a specialized form of a doubly linked list that adds the speed and efficiency of vectors in the form of lookup indices. HashTables are used so heavily throughout the Zend Engine and the PHP Core that an entire subset of the Zend API is devoted to handling these structures. As you saw in Chapter 2, "Variables from the Inside Out," all userspace variables are stored in HashTables as zval* pointers. In later chapters you'll see how the Zend Engine uses HashTables to store userspace functions, classes, resources, autoglobal labels, and other structures as well. To refresh from Chapter 2, a Zend Engine HashTable can literally store any piece of data of any size. Functions, for example, are stored as a complete structure. Autoglobals are smaller elements of just a few bytes, whereas other structures such as variables and PHP5 class definitions are simply stored as pointers to other structs located elsewhere in memory. Further into this chapter you'll look at the function calls that make up the Zend Hash API and how you can use these methods in your extensions. Zend Hash API The Zend Hash API is split into a few basic categories andwith a couple exceptionsthe functions in these categories will generally return either SUCCESS or FAILURE. Creation Every HashTable is initialized by a common constructor: int zend_hash_init(HashTable *ht, uint nSize, hash_func_t pHashFunction, dtor_func_t pDestructor, zend_bool persistent) ht is a pointer to a HashTable variable either declared as an immediate value, or dynamically allocated via emalloc(), pemalloc(), or more commonly ALLOC_HASHTABLE(ht). The ALLOC_HASHTABLE() macro uses pre-sized blocks of memory from a special pool to speed the allocation time required and is generally preferred over ht = emalloc(sizeof (HashTable));. nSize should be set to the maximum number of elements that the HashTable is expected to hold. If more that this number of elements are added to the HashTable, it will be able to grow but only at a noticeable cost in processing time as Zend reindexes the entire table for the newly widened structure. If nSize is not a power of 2, it will be automatically enlarged to the next higher power according to the formula nSize = pow(2, ceil(log(nSize, 2))); pHashFunction is a holdover from an earlier version of the Zend Engine and is no longer used so this value should always be set to NULL. In earlier versions of the Zend Engine, this value could be pointed to an alternate hashing algorithm to be used in place of the standard DJBX33A methoda quick, moderately collision-resistant hashing algorithm for converting arbitrary string keys into reproducible integer values. pDestructor is a pointer to a method to be called whenever an element is removed from a HashTable such as when using zend_hash_del() or replacing an item with zend_hash_update(). The prototype for any destructor method must be void method_name(void *pElement); where pElement is a pointer to the item being removed from the HashTable. The final option, persistent, is a simple flag that the engine passes on to the pemalloc() function calls you were introduced to in Chapter 3, "Memory Management." Any HashTables that need to remain available between requests must have this flag set and must have been allocated using pemalloc(). This method can be seen in use at the start of every PHP request cycle as the EG(symbol_table) global is initialized: zend_hash_init(&EG(symbol_table), 50, NULL, ZVAL_PTR_DTOR, 0); As you can see here, when an item is removed from the symbol tablepossibly in response to an unset($foo); statementa pointer to the zval* stored in the HashTable (effectively a zval**) is sent to zval_ptr_dtor(), which is what the ZVAL_PTR_DTOR macro expands out to. Because 50 is not an exact power of 2, the size of the initial global symbol table will actually be 64the next higher power of 2. Population There are four primary functions for inserting or updating data in HashTables: int zend_hash_add(HashTable *ht, char *arKey, uint nKeyLen, void **pData, uint nDataSize, void *pDest); int zend_hash_update(HashTable *ht, char *arKey, uint nKeyLen, void *pData, uint nDataSize, void **pDest); int zend_hash_index_update(HashTable *ht, ulong h, void *pData, uint nDataSize, void **pDest); int zend_hash_next_index_insert(HashTable *ht, void *pData, uint nDataSize, void **pDest); The first two functions here are for adding associatively indexed data to a HashTable such as with the statement $foo['bar'] = 'baz'; which in C would look something like: zend_hash_add(fooHashTbl, "bar", sizeof("bar"), &barZval, sizeof(zval*), NULL); The only difference between zend_hash_add() and zend_hash_update() is that zend_hash_add() will fail if the key already exists. The next two functions deal with numerically indexed HashTables in a similar manner. This time, the distinction between the two lies in whether a specific index is provided, or if the next available index is assigned automatically. If it's necessary to store the index value of the element being inserted using zend_hash_next_index_insert(), then the zend_hash_next_free_element() function may be used: ulong nextid = zend_hash_next_free_element(ht); zend_hash_index_update(ht, nextid, &data, sizeof(data), NULL); In the case of each of these insertion and update functions, if a value is passed for pDest, the void* data element that pDest points to will be populated by a pointer to the copied data value. This parameter has the same usage (and result) as the pData parameter passed to the zend_hash_find() function you're about to look at. Recall Because there are two distinct organizations to HashTable indices, there must be two methods for extracting them: int zend_hash_find(HashTable *ht, char *arKey, uint nKeyLength, void **pData); int zend_hash_index_find(HashTable *ht, ulong h, void **pData); As you can guess, the first is for associatively indexed arrays while the second is for numerically indexed ones. Recall from Chapter 2 that when data is added to a HashTable, a new memory block is allocated for it and the data passed in is copied; when the data is extracted back out it is the pointer to that data which is returned. The following code fragment adds data1 to the HashTable, and then extracts it back out such that at the end of the routine, *data2 contains the same contents as *data1 even though the pointers refer to different memory addresses. void hash_sample(HashTable *ht, sample_data *data1) { sample_data *data2; ulong targetID = zend_hash_next_free_element(ht); if (zend_hash_index_update(ht, targetID, data1, sizeof(sample_data), NULL) == FAILURE) { /* Should never happen */ return; } if(zend_hash_index_find(ht, targetID, (void **)&data2) == FAILURE) { /* Very unlikely since we just added this element */ return; } /* data1 != data2, however *data1 == *data2 */ } Often, retrieving the stored data is not as important as knowing that it exists; for this purpose two more functions exist: int zend_hash_exists(HashTable *ht, char *arKey, uint nKeyLen); int zend_hash_index_exists(HashTable *ht, ulong h); These two methods do no return SUCCESS/FAILURE; rather they return 1 to indicate that the requested key/index exists or 0 to indicate absence. The following code fragment performs roughly the equivalent of isset($foo): if (zend_hash_exists(EG(active_symbol_table), "foo", sizeof("foo"))) { /* $foo is set */ } else { /* $foo does not exist */ } Quick Population and Recall ulong zend_get_hash_value(char *arKey, uint nKeyLen); When performing multiple operations with the same associative key, it can be useful to precompute the hash using zend_get_hash_value(). The result can then be passed to a collection of "quick" functions that behave exactly like their non-quick counterparts, but use the precomputed hash value rather than recalculating it each time. int zend_hash_quick_add(HashTable *ht, char *arKey, uint nKeyLen, ulong hashval, void *pData, uint nDataSize, void **pDest); int zend_hash_quick_update(HashTable *ht, char *arKey, uint nKeyLen, ulong hashval, void *pData, uint nDataSize, void **pDest); int zend_hash_quick_find(HashTable *ht, char *arKey, uint nKeyLen, ulong hashval, void **pData); int zend_hash_quick_exists(HashTable *ht, char *arKey, uint nKeyLen, ulong hashval); Surprisingly there is no zend_hash_quick_del(). The "quick" hash functions might be used in something like the following code fragment, which copies a specific element from hta to htb, which are zval* HashTables: void php_sample_hash_copy(HashTable *hta, HashTable *htb, char *arKey, uint nKeyLen TSRMLS_DC) { ulong hashval = zend_get_hash_value(arKey, nKeyLen); zval **copyval; if (zend_hash_quick_find(hta, arKey, nKeyLen, hashval, (void**)©val) == FAILURE) { /* arKey doesn't actually exist */ return; } /* The zval* is about to be owned by another hash table */ (*copyval)->refcount++; zend_hash_quick_update(htb, arKey, nKeyLen, hashval, copyval, sizeof(zval*), NULL); } Copying and Merging The previous task, duplicating an element from one HashTable to another, is extremely common and is often done en masse. To avoid the headache and trouble of repeated recall and population cycles, there exist three helper methods: typedef void (*copy_ctor_func_t)(void *pElement); void zend_hash_copy(HashTable *target, HashTable *source, copy_ctor_func_t pCopyConstructor, void *tmp, uint size); Every element in source will be copied to target and then processed through the pCopyConstructor function. For HashTables such as userspace variable arrays, this provides the opportunity to increment the reference count so that when the zval* is removed from one or the other HashTable, it's not prematurely destroyed. If the same element already exists in the target HashTable, it is overwritten by the new element. Other existing elementsthose not being overwrittenare not implicitly removed. tmp should be a pointer to an area of scratch memory to be used by the zend_hash_copy() function while it's executing. Ever since PHP 4.0.3, however, this temporary space is no longer used. If you know your extension will never be compiled against a version older than 4.0.3, just leave this NULL. size is the number of bytes occupied by each member element. In the case of a userspace variable hash, this would be sizeof(zval*). void zend_hash_merge(HashTable *target, HashTable *source, copy_ctor_func_t pCopyConstructor, void *tmp, uint size, int overwrite); zend_hash_merge() differs from zend_hash_copy() only in the addition of the overwrite parameter. When set to a non- zero value, zend_hash_merge() behaves exactly like zend_hash_copy(); when set to zero, it skips any already existing elements. typedef zend_bool (*merge_checker_func_t)(HashTable *target_ht, void *source_data, zend_hash_key *hash_key, void *pParam); void zend_hash_merge_ex(HashTable *target, HashTable *source, copy_ctor_func_t pCopyConstructor, uint size, merge_checker_func_t pMergeSource, void *pParam); The final form of this group of functions allows for selective copying using a merge checker function. The following example shows zend_hash_merge_ex() in use to copy only the associatively indexed members of the source HashTable (which happens to be a userspace variable array): zend_bool associative_only(HashTable *ht, void *pData, zend_hash_key *hash_key, void *pParam) { /* True if there's a key, false if there's not */ return (hash_key->arKey && hash_key->nKeyLength); } void merge_associative(HashTable *target, HashTable *source) { zend_hash_merge_ex(target, source, zval_add_ref, sizeof(zval*), associative_only, NULL); } Iteration by Hash Apply Like in userspace, there's more than one way to iterate a cater...array. The first, and generally easiest, method is using a callback system similar in function to the foreach() construct in userspace. This two part system involves a callback function you'll writewhich acts like the code nest in a foreach loopand a call to one of the three hash application API functions. typedef int (*apply_func_t)(void *pDest TSRMLS_DC); void zend_hash_apply(HashTable *ht, apply_func_t apply_func TSRMLS_DC); This simplest form of the hash apply family simply iterates through ht calling apply_func for each one with a pointer to the current element passed in pDest. typedef int (*apply_func_arg_t)(void *pDest, void *argument TSRMLS_DC); void zend_hash_apply_with_argument(HashTable *ht, apply_func_arg_t apply_func, void *data TSRMLS_DC); In this next hash apply form, an arbitrary argument is passed along with the hash element. This is useful for multipurpose hash apply functions where behavior can be customized depending on an additional parameter. Each callback function, no matter which iterator function it applies to, expects one of the three possible return values shown in Table 8.1. A simple foreach() loop in userspace such as the following: would translate into the following callback in C: int php_sample_print_zval(zval **val TSRMLS_DC) { /* Duplicate the zval so that * the original's contents are not destroyed */ zval tmpcopy = **val; zval_copy_ctor(&tmpcopy); /* Reset refcount & Convert */ INIT_PZVAL(&tmpcopy); convert_to_string(&tmpcopy); /* Output */ php_printf("The value is: "); PHPWRITE(Z_STRVAL(tmpcopy), Z_STRLEN(tmpcopy)); php_printf("\n"); /* Toss out old copy */ zval_dtor(&tmpcopy); /* continue; */ return ZEND_HASH_APPLY_KEEP; } which would be iterated using zend_hash_apply(arrht, php_sample_print_zval TSRMLS_CC); Table 8.1. Hash Apply Callback Return Values Constant Meaning ZEND_HASH_APPLY_KEEP Returning this value completes the current loop and continues with the next value in the subject hash table. This is equivalent to issuing continue; within a foreach() control block. ZEND_HASH_APPLY_STOP This return value halts iteration through the subject hash table and is the same as issuing break; within a foreach() loop. ZEND_HASH_APPLY_REMOVE Similar to ZEND_HASH_APPLY_KEEP, this return value will jump to the next iteration of the hash apply loop. However, this return value will also delete the current element from the subject hash. Note Recall that when variables are stored in a hash table, only a pointer to the zval is actually copied; the contents of the zval are never touched by the HashTable itself. Your iterator callback prepares for this by declaring itself to accept a zval** even though the function type only calls for a single level of indirection. Refer to Chapter 2 for more information on why this is done. typedef int (*apply_func_args_t)(void *pDest, int num_args, va_list args, zend_hash_key *hash_key); void zend_hash_apply_with_arguments(HashTable *ht, apply_func_args_t apply_func, int numargs, ...); In order to receive the key during loops as well as the value, the third form of zend_hash_apply() must be used. For example, if you extended this exercise to support outputting the key: $val) { echo "The value of $key is: $val\n"; } ?> then your current iterator callback would have nowhere to get $key from. By switching to zend_hash_apply_with_arguments(), however, your callback prototype and implementation now becomes int php_sample_print_zval_and_key(zval **val, int num_args, va_list args, zend_hash_key *hash_key) { /* Duplicate the zval so that * the original's contents are not destroyed */ zval tmpcopy = **val; /* tsrm_ls is needed by output functions */ TSRMLS_FETCH(); zval_copy_ctor(&tmpcopy); /* Reset refcount & Convert */ INIT_PZVAL(&tmpcopy); convert_to_string(&tmpcopy); /* Output */ php_printf("The value of "); if (hash_key->nKeyLength) { /* String Key / Associative */ PHPWRITE(hash_key->arKey, hash_key->nKeyLength); } else { /* Numeric Key */ php_printf("%ld", hash_key->h); } php_printf(" is: "); PHPWRITE(Z_STRVAL(tmpcopy), Z_STRLEN(tmpcopy)); php_printf("\n"); /* Toss out old copy */ zval_dtor(&tmpcopy); /* continue; */ return ZEND_HASH_APPLY_KEEP; } Which can then be called as: zend_hash_apply_with_arguments(arrht, php_sample_print_zval_and_key, 0); Note This particular example required no arguments to be passed; for information on extracting variable argument lists from va_list args, see the POSIX documentation pages for va_start(), va_arg(), and va_end (). Notice that nKeyLength, rather than arKey, was used to test for whether the key was associative or not. This is because implementation specifics in Zend HashTables can sometimes leave data in the arKey variable. nKeyLength, however, can be safely used even for empty keys (for example, $foo[''] ="Bar";) because the trailing NULL is included giving the key a length of 1. Iteration by Move Forward It's also trivially possible to iterate through a HashTable without using a callback. For this, you'll need to be reminded of an often ignored concept in HashTables: The internal pointer. In userspace, the functions reset(), key(), current(), next(), prev(), each(), and end() can be used to access elements within an array depending on where an invisible bookmark believes the "current" position to be: 1, 'b'=>2, 'c'=>3); reset($arr); while (list($key, $val) = each($arr)) { /* Do something with $key and $val */ } reset($arr); $firstkey = key($arr); $firstval = current($arr); $bval = next($arr); $cval = next($arr); ?> Each of these functions is duplicated bymore to the point, wrapped aroundinternal Zend Hash API functions with similar names: /* reset() */ void zend_hash_internal_pointer_reset(HashTable *ht); /* key() */ int zend_hash_get_current_key(HashTable *ht, char **strIdx, unit *strIdxLen, ulong *numIdx, zend_bool duplicate); /* current() */ int zend_hash_get_current_data(HashTable *ht, void **pData); /* next()/each() */ int zend_hash_move_forward(HashTable *ht); /* prev() */ int zend_hash_move_backwards(HashTable *ht); /* end() */ void zend_hash_internal_pointer_end(HashTable *ht); /* Other... */ int zend_hash_get_current_key_type(HashTable *ht); int zend_hash_has_more_elements(HashTable *ht); Note The next(), prev(), and end() userspace statements actually map to their move forward/backward statements followed by a call to zend_hash_get_current_data(). each() performs the same steps as next(), but calls and returns zend_hash_get_current_key() as well. Emulating a foreach() loop using iteration by moving forward actually starts to look more familiar, repeating the print_zval_and_key example from earlier: void php_sample_print_var_hash(HashTable *arrht) { for(zend_hash_internal_pointer_reset(arrht); zend_hash_has_more_elements(arrht) == SUCCESS; zend_hash_move_forward(arrht)) { char *key; uint keylen; ulong idx; int type; zval **ppzval, tmpcopy; type = zend_hash_get_current_key_ex(arrht, &key, &keylen, &idx, 0, NULL); if (zend_hash_get_current_data(arrht, (void**)&ppzval) == FAILURE) { /* Should never actually fail * since the key is known to exist. */ continue; } /* Duplicate the zval so that * the orignal's contents are not destroyed */ tmpcopy = **ppzval; zval_copy_ctor(&tmpcopy); /* Reset refcount & Convert */ INIT_PZVAL(&tmpcopy); convert_to_string(&tmpcopy); /* Output */ php_printf("The value of "); if (type == HASH_KEY_IS_STRING) { /* String Key / Associative */ PHPWRITE(key, keylen); } else { /* Numeric Key */ php_printf("%ld", idx); } php_printf(" is: "); PHPWRITE(Z_STRVAL(tmpcopy), Z_STRLEN(tmpcopy)); php_printf("\n"); /* Toss out old copy */ zval_dtor(&tmpcopy); } } Most of this code fragment should be immediately familiar to you. The one item that hasn't yet been touched on is zend_hash_get_current_key()'s return value. When called, this function will return one of three constants as listed in Table 8.2. Preserving the Internal Pointer When iterating through a HashTable, particularly one containing userspace variables, it's not uncommon to encounter circular references, or at least self-overlapping loops. If one iteration context starts looping through a Table 8.2. Zend Hash Key Types Constant Meaning HASH_KEY_IS_STRING The current element is associatively indexed; therefore, a pointer to the element's key name will be populated into strIdx, and its length will be populated into stdIdxLen. If the duplicate flag is set to a nonzero value, the key will be estrndup()'d before being populated into strIdx. The calling application is expected to free this duplicated string. HASH_KEY_IS_LONG The current element is numerically indexed and numIdx will be supplied with the index number. HASH_KEY_NON_EXISTANT The internal pointer is past the end of the HashTable's contents. Neither a key nor a data value are available at this position because no more exist. HashTable and the internal pointer reachesfor examplethe halfway mark, a subordinate iterator starts looping through the same HashTable and would obliterate the current internal pointer position, leaving the HashTable at the end when it arrived back at the first loop. The way this is resolvedboth within the zend_hash_apply implementation and within custom move forward usesis to supply an external pointer in the form of a HashPosition variable. Each of the zend_hash_*() functions listed previously has a zend_hash_*_ex() counterpart that accepts one additional parameter in the form of a pointer to a HashPostion data type. Because the HashPosition variable is seldom used outside of a short-lived iteration loop, it's sufficient to declare it as an immediate variable. You can then dereference it on usage such as in the following variation on the php_sample_print_var_hash() function you saw earlier: void php_sample_print_var_hash(HashTable *arrht) { HashPosition pos; for(zend_hash_internal_pointer_reset_ex(arrht, &pos); zend_hash_has_more_elements_ex(arrht, &pos) == SUCCESS; zend_hash_move_forward_ex(arrht, &pos)) { char *key; uint keylen; ulong idx; int type; zval **ppzval, tmpcopy; type = zend_hash_get_current_key_ex(arrht, &key, &keylen, &idx, 0, &pos); if (zend_hash_get_current_data_ex(arrht, (void**)&ppzval, &pos) == FAILURE) { /* Should never actually fail * since the key is known to exist. */ continue; } /* Duplicate the zval so that * the original's contents are not destroyed */ tmpcopy = **ppzval; zval_copy_ctor(&tmpcopy); /* Reset refcount & Convert */ INIT_PZVAL(&tmpcopy); convert_to_string(&tmpcopy); /* Output */ php_printf("The value of "); if (type == HASH_KEY_IS_STRING) { /* String Key / Associative */ PHPWRITE(key, keylen); } else { /* Numeric Key */ php_printf("%ld", idx); } php_printf(" is: "); PHPWRITE(Z_STRVAL(tmpcopy), Z_STRLEN(tmpcopy)); php_printf("\n"); /* Toss out old copy */ zval_dtor(&tmpcopy); } } With these very slight additions, the HashTable's true internal pointer is preserved in whatever state it was initially in on entering the function. When it comes to working with internal pointers of userspace variable HashTables (that is, arrays), this extra step will very likely make the difference between whether the scripter's code works as expected. Destruction There are only four destruction functions you need to worry about. The first two are used for removing individual elements from a HashTable: int zend_hash_del(HashTable *ht, char *arKey, uint nKeyLen); int zend_hash_index_del(HashTable *ht, ulong h); As you can guess, these cover a HashTable's split-personality index design by providing deletion functions for both associative and numerically indexed hash elements. Each version returns either SUCCESS or FAILURE. Recall that when an item is removed from a HashTable, the HashTable's destructor function is called with a pointer to the item to be removed passed as the only parameter. void zend_hash_clean(HashTable *ht); When completely emptying out a HashTable, the quickest method is to call zend_hash_clean(), which will iterate through every element calling zend_hash_del() on them one at a time. void zend_hash_destroy(HashTable *ht); Usually, when cleaning out a HashTable, you'll want to discard it entirely. Calling zend_hash_destroy() will perform all the actions of a zend_hash_clean(), as well as free additional structures allocated during zend_hash_init(). A full HashTable life cycle might look like the following: int sample_strvec_handler(int argc, char **argv TSRMLS_DC) { HashTable *ht; /* Allocate a block of memory * for the HashTable structure */ ALLOC_HASHTABLE(ht); /* Initialize its internal state */ if (zend_hash_init(ht, argc, NULL, ZVAL_PTR_DTOR, 0) == FAILURE) { FREE_HASHTABLE(ht); return FAILURE; } /* Populate each string into a zval* */ while (argc) { zval *value; MAKE_STD_ZVAL(value); ZVAL_STRING(value, argv[argc], 1); argv++; if (zend_hash_next_index_insert(ht, (void**)&value, sizeof(zval*)) == FAILURE) { /* Silently skip failed additions */ zval_ptr_dtor(&value); } } /* Do some work */ process_hashtable(ht); /* Destroy the hashtable * freeing all zval allocations as necessary */ zend_hash_destroy(ht); /* Free the HashTable itself */ FREE_HASHTABLE(ht); return SUCCESS; } Sorting, Comparing, and Going to the Extreme(s) A couple more callbacks exist in the Zend Hash API. The first handles comparing two elements either from the same HashTable, or from similar positions in different HashTables: typedef int (*compare_func_t)(void *a, void *b TSRMLS_DC); Like the usort() callback in userspace PHP, this function expects you to compare the values of a and b. Using your own criteria for comparison, return either -1 if a is less than b, 1 if b is less than a, or 0 if they are equal. int zend_hash_minmax(HashTable *ht, compare_func_t compar, int flag, void **pData TSRMLS_DC); The simplest API function to use this callback is zend_hash_minmax(), whichas the name implieswill return the highest or lowest valued element from a HashTable based on the ultimate result of multiple calls to the comparison callback. Passing zero for flag will return the minimum value; passing non-zero will return maximum. The following example sorts the list of registered userspace functions by name and returns the lowest and highest named function (not case-sensitive): int fname_compare(zend_function *a, zend_function *b TSRMLS_DC) { return strcasecmp(a->common.function_name, b->common.function_name); } void php_sample_funcname_sort(TSRMLS_D) { zend_function *fe; if (zend_hash_minmax(EG(function_table), fname_compare, 0, (void **)&fe) == SUCCESS) { php_printf("Min function: %s\n", fe->common.function_name); } if (zend_hash_minmax(EG(function_table), fname_compare, 1, (void **)&fe) == SUCCESS) { php_printf("Max function: %s\n", fe->common.function_name); } } The hash comparison function is also used in zend_hash_compare(), which evaluates two hashes against each other as a whole. If hta is found to be "greater" than htb, 1 will be returned. -1 is returned if htb is "greater" than hta, and 0 if they are deemed equal. int zend_hash_compare(HashTable *hta, HashTable *htb, compare_func_t compar, zend_bool ordered TSRMLS_DC); This method begins by comparing the number of elements in each HashTable. If one HashTable contains more elements than the other, it is immediately deemed greater and the function returns quickly. Next it starts a loop with the first element of hta. If the ordered flag is set, it compares keys/indices with the first element of htbstring keys are compared first on length, and then on binary sequence using memcmp(). If the keys are equal, the value of the element is compared with the first element of htb using the comparison callback function. If the ordered flag is not set, the data portion of the first element of hta is compared against the element with a matching key/index in htb using the comparison callback function. If no matching element can be found for htb, then hta is considered greater than htb and 1 is returned. If at the end of a given loop, hta and htb are still considered equal, comparison continues with the next element of hta until a difference is found or all elements have been exhausted, in which case 0 is returned. The second callback function in this family is the sort function: typedef void (*sort_func_t)(void **Buckets, size_t numBuckets, size_t sizBucket, compare_func_t comp TSRMLS_DC); This callback will be triggered once, and receive a vector of all the Buckets (elements) in the HashTable as a series of pointers. These Buckets may be swapped around within the vector according to the sort function's own logic with or without the use of the comparison callback. In practice, sizBucket will always be sizeof(Bucket*). Unless you plan on implementing your own alternative bubblesort method, you won't need to implement a sort function yourself. A predefined sort methodzend_qsortalready exists for use as a callback to zend_hash_sort() leaving you to implement the comparison function only. int zend_hash_sort(HashTable *ht, sort_func_t sort_func, compare_func_t compare_func, int renumber TSRMLS_DC); The final parameter to zend_hash_sort(), when set, will toss out any existing associative keys or index numbers and reindex the array based on the result of the sorting operation. The userspace sort() implementation uses zend_hash_sort() in the following manner: zend_hash_sort(target_hash, zend_qsort, array_data_compare, 1 TSRMLS_CC); where array_data_compare is a simple compare_func_t implementation that sorts according to the value of the zval*s in the HashTable. zval* Array API Ninety-five percent of the HashTables you'll work with in a PHP extension are going to be for the purpose of storing and retrieving userspace variables. In turn, most of your HashTables will themselves be wrapped in zval containers. Easy Array Creation To aid the creation and manipulation of these common HashTables, the PHP API exposes a simple set of macros and helper functions starting with array_init(zval *arrval). This function allocates a HashTable, calls zend_hash_init() with the appropriate parameters for a userspace variable hash, and populates the zval* with the newly created structure. No special destruction function is needed because after the zval looses its last refcountthrough calls to zval_dtor ()/zval_ptr_dtor(), the engine automatically invokes zend_hash_destroy() and FREE_HASHTABLE(). Combine the array_init() method you just learned about with the techniques for returning values from functions you saw in Chapter 6, "Returning Values": PHP_FUNCTION(sample_array) { array_init(return_value); } Because return_value is a preallocated zval*, you don't have to do anything more to set it up. And because its only reference is the one you sent it out of the function with, you don't have to worry about cleaning it up either. Easy Array Population Just like with any HashTable, you'll populate an array by iteratively adding elements to it. With userspace variables specifically, you get to fall back on the primitive data types you know from C. A triumvirate of functions in the form: add_assoc_*(), add_index_*(), and add_next_index_*() exist for each of the data types you already have ZVAL_*(), RETVAL_*(), and RETURN_*() macros for. For example: add_assoc_long(zval *arrval, char *key, long lval); add_index_long(zval *arrval, ulong idx, long lval); add_next_index_long(zval *arrval, long lval); In each case, the array zval* comes first followed by an associative keyname, numeric index, orfor the next_index varietynothing at all. Lastly comes the data element itself, which will ultimately be wrapped in a newly allocated zval* and added to the array with zend_hash_update(), zend_hash_index_update(), or zend_hash_next_index_insert(). The add_assoc_*() function variants with their prototypes are as follows. In each case assoc may be replaced with index or next_index and the key/index parameter adjusted or removed as appropriate. add_assoc_null(zval *aval, char *key); add_assoc_bool(zval *aval, char *key, zend_bool bval); add_assoc_long(zval *aval, char *key, long lval); add_assoc_double(zval *aval, char *key, double dval); add_assoc_string(zval *aval, char *key, char *strval, int dup); add_assoc_stringl(zval *aval, char *key, char *strval, uint strlen, int dup); add_assoc_zval(zval *aval, char *key, zval *value); The last version of these functions allows you to prepare zvals of any arbitrary typeincluding resource, object, or arrayand add them to your growing array with the same simple ease. Try out a few additions to your sample_array() function: PHP_FUNCTION(sample_array) { zval *subarray; array_init(return_value); /* Add some scalars */ add_assoc_long(return_value, "life", 42); add_index_bool(return_value, 123, 1); add_next_index_double(return_value, 3.1415926535); /* Toss in a static string, dup'd by PHP */ add_next_index_string(return_value, "Foo", 1); /* Now a manually dup'd string */ add_next_index_string(return_value, estrdup("Bar"), 0); /* Create a subarray */ MAKE_STD_ZVAL(subarray); array_init(subarray); /* Populate it with some numbers */ add_next_index_long(subarray, 1); add_next_index_long(subarray, 20); add_next_index_long(subarray, 300); /* Place the subarray in the parent */ add_index_zval(return_value, 444, subarray); } If you were to var_dump() the array returned by this function you'd get output something like the following: array(6) { ["life"]=> int(42) [123]=> bool(true) [124]=> float(3.1415926535) [125]=> string(3) "Foo" [126]=> string(3) "Bar" [444]=> array(3) { [0]=> int(1) [1]=> int(20) [2]=> int(300) } } These add_*() functions may also be used for internal public properties by simple objects. Watch for them in Chapter 10, "PHP4 Objects." Summary You've just spent a long chapter learning about one of the most prevalent structures in the Zend Engine and PHP Coresecond only to the zval* of course. You compared different data storage mechanisms and were introduced to a large swath of the API that you'll use repeatedly. By now you should have enough tools amassed to implement a fair portion of the standard extension. In the next few chapters you'll round off the remaining zval data types by exploring resources and objects. Chapter 9. The Resource Data Type SO FAR, YOU'VE WORKED WITH fairly primitive userspace data types, strings, numbers, and true/false values. Even the arrays you started working with last chapter were just collections of primitive data types. Complex Structures Out in the real world, you'll usually have to work with more complex collections of data, often involving pointers to opaque structures. One common example of an opaque structure is the stdio file descriptor that appears even to C code as nothing more than a pointer. #include int main(void) { FILE *fd; fd = fopen("/home/jdoe/.plan", "r"); fclose(fd); return 0; } The way the stdio file descriptor is then usedlike most file descriptorsis like a bookmark. The calling applicationyour extensionneed only pass this value into the implementation functions such as feof(), fread(), fwrite(), fclose(), and so on. At some point, however, this bookmark must be accessible to userspace code; therefore, it's necessary to be able to represent it within the standard PHP variable, or zval*. This is where a new data type comes into play. The RESOURCE data type stores a simple integer value within the zval* itself, which is then used as a lookup into an index of registered resources. The resource entry contains information about what internal data type the resource index represents as well as a pointer to the stored resource data. Defining Resource Types In order for registered resource entries to understand anything about the resource they contain, it's necessary for that resource type to be declared. Start by adding the following piece of code to sample.c right after your existing function implementations: static int le_sample_descriptor; PHP_MINIT_FUNCTION(sample) { le_sample_descriptor = zend_register_list_destructors_ex( NULL, NULL, PHP_SAMPLE_DESCRIPTOR_RES_NAME, module_number); return SUCCESS; } Next, scroll down to the bottom of your file and modify the sample_module_entry structure replacing the NULL, /* MINIT */ line. Just as when you added your function list to this structure, you will want to make sure to keep a comma at the end of this line. PHP_MINIT(sample), /* MINIT */ Finally, you'll need to define PHP_SAMPLE_DESCRIPTOR_RES_NAME within php_sample.h by placing the following line next to your other constant definitions: #define PHP_SAMPLE_DESCRIPTOR_RES_NAME "File Descriptor" PHP_MINIT_FUNCTION() represents the first of four special startup and shutdown operations that you were introduced to conceptually in Chapter 1, "The PHP Life Cycle," and which you'll explore in greater depth in Chapter 12, "Startup, Shutdown, and a Few Points in Between," and Chapter 13, "INI Settings." What's important to know at this juncture is that the MINIT method is executed once when your extension is first loaded and before any requests have been received. Here you've used that opportunity to register destructor functionsthe NULL values, which you'll change soon enoughfor a resource type that will be thereafter known by a unique integer ID. Registering Resources Now that the engine is aware that you'll be storing some resource data, it's time to give userspace code a way to generate the actual resources. To do that, implement the following re-creation of the fopen() command: PHP_FUNCTION(sample_fopen) { FILE *fp; char *filename, *mode; int filename_len, mode_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss", &filename, &filename_len, &mode, &mode_len) == FAILURE) { RETURN_NULL(); } if (!filename_len || !mode_len) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid filename or mode length"); RETURN_FALSE; } fp = fopen(filename, mode); if (!fp) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to open %s using mode %s", filename, mode); RETURN_FALSE; } ZEND_REGISTER_RESOURCE(return_value, fp, le_sample_descriptor); } Note In order for the compiler to know what FILE* is, you'll need to include stdio.h. This could be placed in sample.c, but in preparation for a later part of this chapter, I'll ask you to place it in php_sample.h instead. If you've been paying attention to the previous chapters, you'll recognize everything up to the final line. This one command does the job of storing the fp pointer into that index of resources, associating it with the type declared during MINIT, and storing a lookup key into return_value. Note If it's necessary to store more than one pointer value, or store an immediate value, a new memory segment must be allocated to store the data, and then a pointer to that memory segment can be registered as a resource. Destroying Resources At this point you have a method for attaching internal chunks of data to userspace variables. Because most of the data you're likely to attach to a userspace resource variable will need to be cleaned up at some pointby calling fclose() in this caseyou'll probably assume you need a matching sample_fclose() function to receive the resource variable and handle destroying and unregistering it. What would happen if the variable were simply unset() though? Without a reference to the original FILE* pointer, there'd be no way to fclose() it, and it would remain open until the PHP process died. Because a single process serves many requests, this could take a very long time. The answer comes from those NULL pointers you passed to zend_register_list_destructors_ex. As the name implies, you're registering destruction methods. The first pointer refers to a method to be called when the last reference to a registered resource falls out of scope within a request. In practice, this typically means when unset() is called on the variable in which the resource was stored. The second pointer passed into zend_register_list_destructors_ex refers to another callback method that is executed for persistent resources when a process or thread shuts down. You'll take a look at persistent resources later in this chapter. Let's define the first of these destruction methods now. Place the following bit of code above your PHP_MINIT_FUNCTION block: static void php_sample_descriptor_dtor( zend_rsrc_list_entry *rsrc TSRMLS_DC) { FILE *fp = (FILE*)rsrc->ptr; fclose(fp); } Next replace the first NULL in zend_register_list_destructors_ex with a reference back to php_sample_descriptor_dtor: le_sample_descriptor = zend_register_list_destructors_ex( php_sample_descriptor_dtor, NULL, PHP_SAMPLE_DESCRIPTOR_RES_NAME, module_number); Now, when a variable is assigned with a registered resource value from sample_fopen(), it knows to automatically fclose() the FILE* pointer when the variable falls out of scope either explicitly through unset(), or implicitly at the end of a function. No sample_fclose() implementation is even needed! When unset($fp); is called here, php_sample_descriptor_dtor is automatically called by the engine to handle cleanup of the resource. Decoding Resources Creating a resource is only the first step because a bookmark is only as useful as its ability to return you to the original page. Here's another new function: PHP_FUNCTION(sample_fwrite) { FILE *fp; zval *file_resource; char *data; int data_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "rs", &file_resource, &data, &data_len) == FAILURE ) { RETURN_NULL(); } /* Use the zval* to verify the resource type and * retrieve its pointer from the lookup table */ ZEND_FETCH_RESOURCE(fp, FILE*, &file_resource, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, le_sample_descriptor); /* Write the data, and * return the number of bytes which were * successfully written to the file */ RETURN_LONG(fwrite(data, 1, data_len, fp)); } Using the "r" format specifier to zend_parse_parameters() is a relatively new trick, but one that should be understandable from what you read in Chapter 7, "Accepting Parameters."What's truly fresh here is the use of ZEND_FETCH_RESOURCE(). Unfolding the ZEND_FETCH_RESOURCE() macro, one finds the following: #define ZEND_FETCH_RESOURCE(rsrc, rsrc_type, passed_id, default_id, resource_type_name, resource_type) rsrc = (rsrc_type) zend_fetch_resource(passed_id TSRMLS_CC, default_id, resource_type_name, NULL, 1, resource_type); ZEND_VERIFY_RESOURCE(rsrc); Or in this case: fp = (FILE*) zend_fetch_resource(&file_descriptor TSRMLS_CC, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, NULL, 1, le_sample_descriptor); if (!fp) { RETURN_FALSE; } Like the zend_hash_find() method you explored in the last chapter, zend_fetch_resource() uses an index into a collectiona HashTable in factto pull out previously stored data. Unlike zend_hash_find(), this method performs additional data integrity checking such as ensuring that the entry in the resource table matches the correct resource type. In this case, you've asked zend_fetch_resource() to match the resource type stored in le_sample_descriptor. If the supplied resource ID does not exist, or is of the incorrect type, then zend_fetch_resource() will return NULL and automatically generate an error. By including the ZEND_VERIFY_RESOURCE() macro within the ZEND_FETCH_RESOURCE() macro, function implementations can automatically return, leaving the extension-specific code to focus on handling the generated resource value when conditions are correct. Now that your function has the original FILE* pointer back, it simply calls the internal fwrite() method as any normal program would. Tip To avoid having zend_fetch_resource() generate an error on failure, simply pass NULL for the resource_type_name parameter. Without a meaningful error message to display, zend_fetch_resource() will fail silently instead. Another approach to translating a resource variable ID into a pointer is to use the zend_list_find() function: PHP_FUNCTION(sample_fwrite) { FILE *fp; zval *file_resource; char *data; int data_len, rsrc_type; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "rs", &file_resource, &data, &data_len) == FAILURE ) { RETURN_NULL(); } fp = (FILE*)zend_list_find(Z_RESVAL_P(file_resource), &rsrc_type); if (!fp || rsrc_type != le_sample_descriptor) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid resource provided"); RETURN_FALSE; } RETURN_LONG(fwrite(data, 1, data_len, fp)); } Although this method is probably more recognizable to someone with a generic background in C programming, it is also much more verbose than using ZEND_FETCH_RESOURCE(). Pick a method that suits your programming style best, but expect to see the ZEND_FETCH_RESOURCE() macro used predominantly in other extension codes such as those found in the PHP core. Forcing Destruction Earlier you saw how using unset() to take a variable out of scope can trigger the destruction of a resource and cause its underlying resources to be cleaned up by your registered destruction method. Imagine now that a resource variable were copied into other variables: This time, $fp wasn't the only reference to the registered resource so it hasn't actually gone out of scope yet and won't be destroyed. This means that $evil_log can still be written to. In order to avoid having to search around for lost, stray references to a resource when you really, truly want it gone, it becomes necessary to have a sample_fclose() implementation after all: PHP_FUNCTION(sample_fclose) { FILE *fp; zval *file_resource; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "r", &file_resource) == FAILURE ) { RETURN_NULL(); } /* While it's not necessary to actually fetch the * FILE* resource, performing the fetch provides * an opportunity to verify that we are closing * the correct resource type. */ ZEND_FETCH_RESOURCE(fp, FILE*, &file_resource, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, le_sample_descriptor); /* Force the resource into self-destruct mode */ zend_hash_index_del(&EG(regular_list), Z_RESVAL_P(file_resource)); RETURN_TRUE; } This deletion method reinforced the fact that resource variables are registered within a global HashTable. Removing resource entries from this HashTable is a simple matter of using the resource ID as an index lookup into the regular list. Although other direct HashTable manipulation methodssuch as zend_hash_index_find() and zend_hash_next_index_insert()will work in place of the FETCH and REGISTER macros, such practice is discouraged where possible so that changes in the Zend API don't break existing extensions. Like userspace variable HashTables (arrays), the EG(regular_list) HashTable has an automatic dtor method that is called whenever an entry is removed or overwritten. This method checks your resource's type, and calls the registered destruction method you provided during your MINIT call to zend_register_list_destructors_ex(). Note In many places in the PHP Core and the Zend Engine you'll see zend_list_delete() used in this context rather than zend_hash_index_del(). The zend_list_delete() form takes into account reference counting, which you'll see later in this chapter. Persistent Resources The type of complex data structures that are usually stored in resource variables often require a fair amount of memory allocation, CPU time, or network communication to initialize. In cases where a script is very likely to need to reestablish these kind of resources on each invocation such as database links, it becomes useful to preserve the resource between requests. Memory Allocation From your exposure to earlier chapters you know that emalloc() and friends are the preferred set of functions to use when allocating memory within PHP because they are capable of garbage collectionshould a script have to abruptly exitin ways that system malloc() functions simply aren't. If a persistent resource is to stick around between requests, however, such garbage collection is obviously not a good thing. Imagine for a moment that it became necessary to store the name of the opened file along with the FILE* pointer. Now, you'd need to create a custom struct in php_sample.h to hold this combination of information: typedef struct _php_sample_descriptor_data { char *filename; FILE *fp; } php_sample_descriptor_data; And all the functions in sample.c dealing with your file resource would need to be modified: static void php_sample_descriptor_dtor( zend_rsrc_list_entry *rsrc TSRMLS_DC) { php_sample_descriptor_data *fdata = (php_sample_descriptor_data*)rsrc->ptr; fclose(fdata->fp); efree(fdata->filename); efree(fdata); } PHP_FUNCTION(sample_fopen) { php_sample_descriptor_data *fdata; FILE *fp; char *filename, *mode; int filename_len, mode_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss", &filename, &filename_len, &mode, &mode_len) == FAILURE) { RETURN_NULL(); } if (!filename_len || !mode_len) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid filename or mode length"); RETURN_FALSE; } fp = fopen(filename, mode); if (!fp) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to open %s using mode %s", filename, mode); RETURN_FALSE; } fdata = emalloc(sizeof(php_sample_descriptor_data)); fdata->fp = fp; fdata->filename = estrndup(filename, filename_len); ZEND_REGISTER_RESOURCE(return_value, fdata, le_sample_descriptor); } PHP_FUNCTION(sample_fwrite) { php_sample_descriptor_data *fdata; zval *file_resource; char *data; int data_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "rs", &file_resource, &data, &data_len) == FAILURE ) { RETURN_NULL(); } ZEND_FETCH_RESOURCE(fdata, php_sample_descriptor_data*, &file_resource, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, le_sample_descriptor); RETURN_LONG(fwrite(data, 1, data_len, fdata->fp)); } Note Technically, sample_fclose() can be left as-is because it doesn't actually deal with the resource data directly. If you're feeling confident, try updating it to use the corrections yourself. So far, everything is perfectly happy because you're still only registering non-persistent descriptor resources. You could even add a new function at this point to retrieve the original name of the file back out of the resource: PHP_FUNCTION(sample_fname) { php_sample_descriptor_data *fdata; zval *file_resource; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "r", &file_resource) == FAILURE ) { RETURN_NULL(); } ZEND_FETCH_RESOURCE(fdata, php_sample_descriptor_data*, &file_resource, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, le_sample_descriptor); RETURN_STRING(fdata->filename, 1); } However, soon problems will start to arise with usages such as this as you start to register persistent versions of your descriptor resource. Delayed Destruction As you've seen with non-persistent resources, once all the variables holding a resource ID have been unset() or have fallen out of scope, they are removed from EG(regular_list), which is the HashTable containing all per-request registered resources. Persistent resources, as you'll see later this chapter, are also stored in a second HashTable: EG(persistent_list). Unlike EG(regular_list), the indexes used by this table are associative, and the elements are not automatically removed from the HashTable at the end of a request. Entries in EG(persistent_list) are only removed through manual calls to zend_hash_del()which you'll see shortlyor when a thread or process completely shuts down (usually when the web server is stopped). Like the EG(regular_list) HashTable, the EG(persistent_list) HashTable also has its own dtor method. Like the regular list, this method is also a simple wrapper that uses the resource's type to look up a proper destruction method. This time, it takes the destruction method from the second parameter to zend_register_list_destructors_ex (), rather than the first. In practice, persistent and non-persistent resources are typically registered as two distinct types to avoid having non-persistent destruction code run against a resource that is supposed to be persistent. Depending on your implementation, you may choose to combine non-persistent and persistent destruction methods in a single type. For now, add another static int to the top of sample.c for a new persistent descriptor resource: static int le_sample_descriptor_persist; Then extend your MINIT function with a resource registration that uses a new dtor function aimed specifically at persistently allocated structures: static void php_sample_descriptor_dtor_persistent( zend_rsrc_list_entry *rsrc TSRMLS_DC) { php_sample_descriptor_data *fdata = (php_sample_descriptor_data*)rsrc->ptr; fclose(fdata->fp); pefree(fdata->filename, 1); pefree(fdata, 1); } PHP_MINIT_FUNCTION(sample) { le_sample_descriptor = zend_register_list_destructors_ex( php_sample_descriptor_dtor, NULL, PHP_SAMPLE_DESCRIPTOR_RES_NAME, module_number); le_sample_descriptor_persist = zend_register_list_destructors_ex( NULL, php_sample_descriptor_dtor_persistent, PHP_SAMPLE_DESCRIPTOR_RES_NAME, module_number); return SUCCESS; } By giving these two resource types the same name, their distinction will be transparent to the end user. Internally, only one will have php_sample_descriptor_dtor called on it during request cleanup; the other, as you'll see in a moment, will stick around for up to as long as the web server's process or thread does. Long Term Registration Now that a suitable cleanup method is in place, it's time to actually create some usable resource structures. Often this is done using two separate functions that map internally to the same implementation, but since that would only complicate an already muddy topic, you'll accomplish the same feat here by simply accepting a Boolean parameter to sample_fopen(): PHP_FUNCTION(sample_fopen) { php_sample_descriptor_data *fdata; FILE *fp; char *filename, *mode; int filename_len, mode_len; zend_bool persist = 0; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC,"ss|b", &filename, &filename_len, &mode, &mode_len, &persist) == FAILURE) { RETURN_NULL(); } if (!filename_len || !mode_len) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid filename or mode length"); RETURN_FALSE; } fp = fopen(filename, mode); if (!fp) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to open %s using mode %s", filename, mode); RETURN_FALSE; } if (!persist) { fdata = emalloc(sizeof(php_sample_descriptor_data)); fdata->filename = estrndup(filename, filename_len); fdata->fp = fp; ZEND_REGISTER_RESOURCE(return_value, fdata, le_sample_descriptor); } else { list_entry le; char *hash_key; int hash_key_len; fdata =pemalloc(sizeof(php_sample_descriptor_data),1); fdata->filename = pemalloc(filename_len + 1, 1); memcpy(data->filename, filename, filename_len + 1); fdata->fp = fp; ZEND_REGISTER_RESOURCE(return_value, fdata, le_sample_descriptor_persist); /* Store a copy in the persistent_list */ le.type = le_sample_descriptor_persist; le.ptr = fdata; hash_key_len = spprintf(&hash_key, 0, "sample_descriptor:%s:%s", filename, mode); zend_hash_update(&EG(persistent_list), hash_key, hash_key_len + 1, (void*)&le, sizeof(list_entry), NULL); efree(hash_key); } } The core portions of this function should be very familiar by now. A file was opened, it's name stored into newly allocated memory, and it was registered into a request-specific resource ID populated into return_value. What's new this time is the second portion, but hopefully it's not altogether alien. Here, you've actually done something very similar to what ZEND_RESOURCE_REGISTER() does; however, instead of giving it a numeric index and placing it in the per-request list, you've assigned it an associative key that can be reproduced in a later request and stowed into the persistent list, which isn't automatically purged at the end of every script. When one of these persistent descriptor resources goes out of scope, EG(regular_list)'s dtor function will check the registered list destructors for le_sample_descriptor_persist and, seeing that it's NULL, simply do nothing. This leaves the FILE* pointer and the char* name string safe for the next request. When the resource is finally removed from EG(persistent_list), either because the thread/process is shutting down or because your extension has deliberately removed it, the engine will now go looking for a persistent destructor. Because you defined one for this resource type, it will be called and issue the appropriate pefree()s to match the earlier pemallocs(). Reuse Putting a copy of a resource entry into the persistent_list would serve no purpose beyond extending the time that such resources can tie up memory and file locks unless you're somehow able to reuse them on subsequent requests. Here's where that hash_key comes in. When sample_fopen() is called, either for persistent or non-persistent use, your function can re-create the hash_key using the requested filename and mode and try to find it in the persistent_list before going to the trouble of opening the file again: PHP_FUNCTION(sample_fopen) { php_sample_descriptor_data *fdata; FILE *fp; char *filename, *mode, *hash_key; int filename_len, mode_len, hash_key_len; zend_bool persist = 0; list_entry *existing_file; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC,"ss|b", &filename, &filename_len, &mode, &mode_len, &persist) == FAILURE) { RETURN_NULL(); } if (!filename_len || !mode_len) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid filename or mode length"); RETURN_FALSE; } /* Try to find an already opened file */ hash_key_len = spprintf(&hash_key, 0, "sample_descriptor:%s:%s", filename, mode); if (zend_hash_find(&EG(persistent_list), hash_key, hash_key_len + 1, (void **)&existing_file) == SUCCESS) { /* There's already a file open, return that! */ ZEND_REGISTER_RESOURCE(return_value, existing_file->ptr, le_sample_descriptor_persist); efree(hash_key); return; } fp = fopen(filename, mode); if (!fp) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to open %s using mode %s", filename, mode); RETURN_FALSE; } if (!persist) { fdata = emalloc(sizeof(php_sample_descriptor_data)); fdata->filename = estrndup(filename, filename_len); fdata->fp = fp; ZEND_REGISTER_RESOURCE(return_value, fdata, le_sample_descriptor); } else { list_entry le; fdata =pemalloc(sizeof(php_sample_descriptor_data),1); fdata->filename = pemalloc(filename_len + 1, 1); memcpy(data->filename, filename, filename_len + 1); fdata->fp = fp; ZEND_REGISTER_RESOURCE(return_value, fdata, le_sample_descriptor_persist); /* Store a copy in the persistent_list */ le.type = le_sample_descriptor_persist; le.ptr = fdata; /* hash_key has already been created by now */ zend_hash_update(&EG(persistent_list), hash_key, hash_key_len + 1, (void*)&le, sizeof(list_entry), NULL); } efree(hash_key); } Because all extensions use the same persistent HashTable list to store their resources in, it's important that you choose a hash key that is both reproducible and unique. A common conventionas seen in the sample_fopen() functionis to use the extension and resource type names as a prefix, followed by the creation criteria. Liveness Checking and Early Departure Although it's safe to assume that once you open a file, you can keep it open indefinitely, other resource typesparticularly remote network resourcesmay have a tendency to become invalidated, especially when they're left unused for long periods between requests. When recalling a stored persistent resource into active duty, it is therefore important to make sure that it's still usable. If the resource is no longer valid, it must be removed from the persistent list and the function should continue as though no already allocated resource had been found. The following hypothetical code block performs a liveness check on a socket stored in the persistent list: if (zend_hash_find(&EG(persistent_list), hash_key, hash_key_len + 1, (void**)&socket) == SUCCESS) { if (php_sample_socket_is_alive(socket->ptr)) { ZEND_REGISTER_RESOURCE(return_value, socket->ptr, le_sample_socket); return; } zend_hash_del(&EG(persistent_list), hash_key, hash_key_len + 1); } As you can see, all that's been done here is to manually remove the list entry from the persistent list during runtime as opposed to engine shutdown (when it would normally be destroyed). This action handles the work of calling the persistent dtor method, which would have been defined by zend_register_list_destructors_ex(). On completion of this code block, the function will be in the same state it would have been if no resource had been found in the persistent list. Agnostic Retrieval At this point you can create file descriptor resources, store them persistently, and recall them transparently, but have you tried using a persistent version with your sample_fwite() function? Frustratingly, it doesn't work! Recall how the resource pointer is resolved from its numeric ID: ZEND_FETCH_RESOURCE(fdata, php_sample_descriptor_data*, &file_resource, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, le_sample_descriptor); le_sample_descriptor is explicitly named so that the type can be verified and you can be assured that you're not using a mysql_connection_handle* or some other type when you expect to see, for example, a php_sample_descruptor_data* structure. Mixing and matching types is generally a "bad thing." You know that the same data structure stored in le_sample_descriptor resources are also stored in le_sample_descruotor_persist resources, so to keep things simple in userspace, it'd be ideal if sample_fwrite() could simply accept either type equally. This is solved by using ZEND_FETCH_RESOURCE()'s sibling: ZEND_FETCH_RESOURCE2(). The only difference between these two macros is that the latter enables you to specifythat's righttwo resource types. In this case you'd change that line to the following: ZEND_FETCH_RESOURCE2(fdata, php_sample_descriptor_data*, &file_resource, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, le_sample_descriptor, le_sample_descriptor_persist); Now, the resource ID contained in file_resource can refer to either a persistent or non-persistent Sample Descriptor resource and they will both pass validation checks. Allowing for more than two resource types requires using the underlying zend_fetch_resource() implementation. Recall that the ZEND_FETCH_RESOURCE() macro you originally used expands out to fp = (FILE*) zend_fetch_resource(&file_descriptor TSRMLS_CC, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, NULL, 1, le_sample_descriptor); ZEND_VERIFY_RESOURCE(fp); Similarly, the ZEND_FETCH_RESOURCE2() macro you were just introduced to also expands to the same underlying function: fp = (FILE*) zend_fetch_resource(&file_descriptor TSRMLS_CC, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, NULL, 2, le_sample_descriptor, le_sample_descriptor_persist); ZEND_VERIFY_RESOURCE(fp); See a pattern? The sixth and subsequent parameters to zend_fetch_resource() say "There are N possible resource types I'm willing to match, and here they are...." So to match a third resource type (for example: le_sample_othertype), type the following: fp = (FILE*) zend_fetch_resource(&file_descriptor TSRMLS_CC, -1, PHP_SAMPLE_DESCRIPTOR_RES_NAME, NULL, 3, le_sample_descriptor, le_sample_descriptor_persist, le_sample_othertype); ZEND_VERIFY_RESOURCE(fp); And so on and so forth. The Other refcounter Like userspace variables, registered resources also have reference counters. In this case, the reference counter refers to how many container structures know about the resource ID in question. You already know by now that when a userspace variable (zval*) is of type IS_RESOURCE, it doesn't really hold the pointer to any structure; it simply holds a HashTable index number so that it can look up the pointer from the EG (regular_list) HashTable. When a resource is first created, such as by calling sample_fopen(), it's placed into a zval* container and its refcount is initialized to 1 because it's only held by that one variable. $a = sample_fopen('notes.txt', 'r'); /* var->refcount = 1, rsrc->refcount = 1 */ If that variable is then copied to another, you know from Chapter 3, "Memory Management," that no new zval* is actually created. Rather, the variables share that zval* in a copy-on-write reference set. In this case, the refcount for the zval* is raised to 2; however, the refcount for the resource is still 1 because it is only held by one zval*. $b = $a; /* var->refcount = 2, rsrc->refcount = 1 */ When one of these two variables is unset(), the zval*'s refcount is decremented, but it's not destroyed because the other variable still refers to it. unset($b); /* var->refcount = 1, rsrc->refcount = 1 */ You also know by now that mixing full-reference sets with copy-on-write reference sets will force a variable to separate by copying into a new zval*. When this happens, the resource's reference count does get incremented because it's now owned by a second zval*. $b = $a; $c = &$a; /* bvar->refcount = 1, bvar->is_ref = 0 acvar->refcount = 2, acvar->is_ref = 1 rsrc->refcount = 2 */ Now, unsetting $b would destroy its zval* entirely, bringing the rsrc->refcount to 1. Unsetting either $a or $cbut not bothwould not decrease the resource refcount, however, as the acvar, zval* would still exist. It's not until all three variables (and by extension their two zval*s) are unset() that the resource's refcount reaches 0 and its destruction method is triggered. Summary Using the topics covered in this chapter, you can begin to apply the glue that PHP is so famous for. The resource data type enables your extension to connect abstract concepts like opaque pointers from third-party libraries to the easy-to-use userspace scripting language that makes PHP so powerful. In the next two chapters you'll delve into the last, but by no means least, data type in the PHP lexicon. You'll start by exploring simple Zend Engine 1based classes, and move into their more powerful Zend Engine 2 successors. Chapter 10. PHP4 Objects ONCE UPON A TIME, IN A VERSION long long ago, PHP did not support object-oriented programming in any form. With the introduction of the Zend Engine (ZE1) with PHP 4, several new features appeared, including the object data type. The Evolution of the PHP Object Type This first incarnation of object-oriented programming (OOP) support covered only the barest implementation of object-related characteristics. In the words of one core developer, "A PHP4 object is just an Array with some functions bolted onto the side." It is this generation of PHP objects that you'll explore now. With the second major release of the Zend Engine (ZE2) found in PHP5, several new features found their way into PHP's OOP implementation. For example, properties and methods may now be marked with access modifiers to make them inaccessible from outside your class definition, an additional suite of overloading functions are available to define custom behavior for internal language constructs, and interfaces can be used to enforce API standards between multiple class chains. When you reach Chapter 11, "PHP5 Objects," you'll build on the knowledge you gain here by implementing these features in PHP5-specific class definitions. Implementing Classes As you start to explore the world of OOP, it's time to shake off some of the baggage you've collected in the chapters leading up to this point. To do that, "reset" back to the skeleton extension you started with in Chapter 5, "Your First Extension." In order to compile it alongside your earlier incarnation, you can name this version sample2. Place the three files shown in Listings 10.1 through 10.3 in ext/sample2/ off of your PHP source tree. Listing 10.1. Configuration File: config.m4 Listing 10.2. Header: php_sample2.h Listing 10.3. Source Code: sample2.c PHP_ARG_ENABLE(sample2, [Whether to enable the "sample2" extension], [ enable-sample2 Enable "sample2" extension support]) if test $PHP_SAMPLE2 != "no"; then PHP_SUBST(SAMPLE2_SHARED_LIBADD) PHP_NEW_EXTENSION(sample2, sample2.c, $ext_shared) fi #ifndef PHP_SAMPLE2_H /* Prevent double inclusion */ #define PHP_SAMPLE2_H /* Define Extension Properties */ #define PHP_SAMPLE2_EXTNAME "sample2" #define PHP_SAMPLE2_EXTVER "1.0" /* Import configure options when building outside of the PHP source tree */ #ifdef HAVE_CONFIG_H #include "config.h" #endif /* Include PHP Standard Header */ #include "php.h" /* Define the entry point symbol * Zend will use when loading this module */ extern zend_module_entry sample2_module_entry; #define phpext_sample2_ptr &sample2_module_entry #endif /* PHP_SAMPLE2_H */ #include "php_sample2.h" static function_entry php_sample2_functions[] = { { NULL, NULL, NULL } }; Now, as you did in Chapter 5, you can issue phpize, ./configure, and make to build your sample2.so extension module. Note Like config.m4, your prior version of config.w32 will work here with nothing more than occurrences of sample replaced with sample2. Declaring Class Entries In userspace, the first step to defining a class is to declare it. For example: As you can no doubt guess, this gets slightlybut only slightlyharder from within an extension. First, you'll need to define a zend_class_entry pointer within your source file similar to the le_sample_descriptor int you defined last chapter: zend_class_entry *php_sample2_firstclass_entry; Now, you can initialize and register the class within your MINIT method: PHP_MINIT_FUNCTION(sample2) { zend_class_entry ce; /* Temporary Variable */ /* Register Class */ INIT_CLASS_ENTRY(ce, "Sample2_FirstClass", NULL); php_sample2_firstclass_entry = zend_register_internal_class(&ce TSRMLS_CC); return SUCCESS; PHP_MINIT_FUNCTION(sample2) { return SUCCESS; } zend_module_entry sample2_module_entry = { #if ZEND_MODULE_API_NO >= 20010901 STANDARD_MODULE_HEADER, #endif PHP_SAMPLE2_EXTNAME, php_sample2_functions, PHP_MINIT(sample2), NULL, /* MSHUTDOWN */ NULL, /* RINIT */ NULL, /* RSHUTDOWN */ NULL, /* MINFO */ #if ZEND_MODULE_API_NO >= 20010901 PHP_SAMPLE2_EXTVER, #endif STANDARD_MODULE_PROPERTIES }; #ifdef COMPILE_DL_SAMPLE2 ZEND_GET_MODULE(sample2) #endif } Building this extension, and examining the output of get_declared_classes(), will show that Sample2_FirstClass is now available to userspace scripts. Defining Method Implementations At this point, you've only managed to implement stdClass, which is, of course, already available. You'll want your class to actually do something now. To accomplish this, you'll fall back on another concept you picked up back in Chapter 5. Replace the NULL parameter to INIT_CLASS_ENTRY() with php_sample2_firstclass_functions and define that struct directly above the MINIT method as follows: static function_entry php_sample2_firstclass_functions[] = { { NULL, NULL, NULL } }; Look familiar? It should. This is the same structure you've been using to define ordinary procedural functions. You'll even populate this structure in nearly the same manner: PHP_NAMED_FE(method1, PHP_FN(Sample2_FirstClass_method1), NULL) Alternatively, you could have used PHP_FE(method1, NULL). However, as you'll recall from Chapter 5, this expects to find an implementation function named zif_method1, which might potentially conflict with another method1() implementation elsewhere. In order to namespace the function safely away from any procedural implementations, the class name gets prepended to the method name using drop cap-casing for the class name and camel-casing for the method name. The PHP_FALIAS(method1, Sample2_FirstClass_method1, NULL) form is also acceptable; however, it may be slightly less intuitive when you come back later and wonder why there's no matching PHP_FE() line to go with it. Now that you have a function list attached to your class definition, it's time to declare some methods. Create the following function above the php_sample2_firstclass_functions struct: PHP_FUNCTION(Sample2_FirstClass_countProps) { RETURN_LONG(zend_hash_num_elements(Z_OBJPROP_P(getThis()))); } Now add a matching PHP_NAMED_FE() entry in the function list itself: static function_entry php_sample2_firstclass_functions[] = { PHP_NAMED_FE(countprops, PHP_FN(Sample2_FirstClass_countProps), NULL) { NULL, NULL, NULL } }; Note Be sure to notice that the function is named for userspace in all lowercase. The case-folding operations meant to ensure case-insensitivity in method and function names require that internal functions be given all lowercase names. The only new element here should be getThis() which, in all current PHP versions, is actually a macro that resolves to this_ptr. this_ptr, in turn, carries essentially the same meaning as $this within a userspace object method. If no object instance is available, such as when a method is called statically, getThis() will return NULL. Just as the data return semantics in object methods is identical to procedural functions, so is the parameter acceptance and arg_info methodology: PHP_FUNCTION(Sample2_FirstClass_sayHello) { char *name; int name_len; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &name, &name_len) == FAILURE) { RETURN_NULL(); } php_printf("Hello"); PHPWRITE(name, name_len); php_printf("!\nYou called an object method!\n"); RETURN_TRUE; } Constructors Your class constructor can simply be implemented as any other ordinary class method, and the same rules will apply to internals as to userspace when it comes to nomenclature. Specifically, you'll want to name your constructor identically to the class name. The other ZE1 magic methods, __sleep() and __wakeup(), can be implemented in this manner as well. Inheritance Inheritance between internal objects in PHP4 is sketchy at best and should generally be avoided like dark alleys in a horror flick. If you absolutely must inherit from another object, you'll need to duplicate some ZE1 code: void php_sample2_inherit_from_class(zend_class_entry *ce, zend_class_entry *parent_ce) { zend_hash_merge(&ce->function_table, &parent_ce->function_table, (void (*)(void *))function_add_ref, NULL, sizeof(zval*), 0); ce->parent = parent_ce; if (!ce->handle_property_get) { ce->handle_property_get = parent_ce->handle_property_get; } if (!ce->handle_property_set) { ce->handle_property_set = parent_ce->handle_property_set; } if (!ce->handle_function_call) { ce->handle_function_call = parent_ce->handle_function_call; } if (!zend_hash_exists(&ce->function_table, ce->name, ce->name_length + 1)) { zend_function *fe; if (zend_hash_find(&parent_ce->function_table, parent_ce->name, parent_ce->name_length + 1, (void**)fe) == SUCCESS) { zend_hash_update(&ce->function_table, ce->name, ce->name_length + 1, fe, sizeof(zend_function), NULL); function_add_ref(fe); } } } With this function defined, you can now place a call to it following zend_register_internal_class in your MINIT block: INIT_CLASS_ENTRY(ce, "Sample2_FirstClass", NULL); /* Assumes php_sample2_ancestor is an already * registered zend_class_entry* */ php_sample2_firstclass_entry = zend_register_internal_class(&ce TSRMLS_CC); php_sample2_inherit_from_class(php_sample2_firstclass_entry ,php_sample2_ancestor); Caution Although this approach to inheritance will work, it should generally be avoided as ZE1 simply wasn't designed to handle internal object inheritance properly. As with most OOP practices in PHP, the ZE2 (PHP5) and its revised object model is strongly encouraged for all but the most simple OOP-related tasks. Working with Instances Like other userspace variables, objects are stored in zval* containers. In ZE1, the zval* contained a HashTable* for properties, and a zend_class_entry* that points to the class definition. In ZE2, these values have been replaced by a handler table, which you'll delve into next chapter, and a numeric object ID that is used in a similar manner to resource IDs (discussed in Chapter 9, "The Resource Data Type." This discrepancy between ZE1 objects and ZE2 objects is thankfully hidden from your extension by means of a branch of the Z_*() macro family you first saw way back in Chapter 2, "Variables from the Inside Out." Table 10.1 lists the two ZE1 macros which, like their non-OOP related cousins, have _P and _PP counterparts for dealing with one and two levels of indirection respectively. Creating Instances The majority of the time, your extension will not create object instances itself. Rather, a userspace script will invoke the new keyword to create an instance and call your class' constructor. Should you need to create an instance, such as within a factory method, the object_init_ex(zval *val, zend_class_entry *ce) function from the ZENDAPI may be used to initialize the object instance into a variable. Note that the object_init_ex() function does not invoke the constructor. When instantiating objects from an internal function, the constructor must be called manually. The following procedural function replicates the functionality of the new keyword. PHP_FUNCTION(sample2_new) { int argc = ZEND_NUM_ARGS(); zval ***argv = safe_emalloc(sizeof(zval**), argc, 0); zend_class_entry *ce; if (argc == 0 || zend_get_parameters_array_ex(argc, argv) == FAILURE) { efree(argv); WRONG_PARAM_COUNT; } /* First arg is classname */ SEPARATE_ZVAL(argv[0]); convert_to_string(*argv[0]); /* class names are stored in lowercase */ php_strtolower(Z_STRVAL_PP(argv[0]), Z_STRLEN_PP(argv[0])); if (zend_hash_find(EG(class_table), Z_STRVAL_PP(argv[0]), Z_STRLEN_PP(argv[0]) + 1, (void**)&ce) == FAILURE) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Class %s does not exist.", Z_STRVAL_PP(argv[0])); zval_ptr_dtor(argv[0]); efree(argv); RETURN_FALSE; } object_init_ex(return_value, ce); /* Call the constructor if it has one * Additional arguments will be passed through as Table 10.1. Object Access Macros Macro Purpose Z_OBJPROP(zv) Resolves the built-in properties HashTable* Z_OBJCE(zv) Returns the associated zend_class_entry* * constructor parameters */ if (zend_hash_exists(&ce->function_table, Z_STRVAL_PP(argv[0]),Z_STRLEN_PP(argv[0]) + 1)) { /* Object has constructor */ zval *ctor, *dummy = NULL; /* constructor == classname */ MAKE_STD_ZVAL(ctor); array_init(ctor); zval_add_ref(argv[0]); add_next_index_zval(ctor, *argv[0]); zval_add_ref(argv[0]); add_next_index_zval(ctor, *argv[0]); if (call_user_function_ex(&ce->function_table, NULL, ctor, &dummy, /* Don't care about return value */ argc - 1, argv + 1, /* parameters */ 0, NULL TSRMLS_CC) == FAILURE) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Unable to call constructor"); } if (dummy) { zval_ptr_dtor(&dummy); } zval_ptr_dtor(&ctor); } zval_ptr_dtor(argv[0]); efree(argv); } Don't forget to add a reference to it in php_sample2_functions. That's the list for your extension's procedural functions, not the list for your class' methods. You'll also need to add #include "ext/standard/php_string.h" in order to get the prototype for the php_strtolower() function. This function is one of the busiest ones you've implemented yet and several features are likely to be entirely new. The first item, SEPARATE_ZVAL(), is actually a macroized version of a process you've already done several times involving zval_copy_ctor() to duplicate a value into a temporary structure and avoid modifying the original contents. php_strtolower() is used to convert the class name to lowercase because this is how all class and function names are stored in PHP in order to achieve case-insensitivity for identifiers. This is just one of the many PHPAPI utility functions you can find in Appendix B, "PHPAPI." EG(class_table) is a global registry of all zend_class_entry definitions available to the request. Note that in ZE1 (PHP4) this HashTable stores zend_class_entry* structures at a single level of indirection. In ZE2(PHP5), these are stored at two levels of indirection. This shouldn't be an issue because directly accessing this table is an uncommon task, but you'd do well to be aware of it. call_user_function_ex() is one of a pair of ZENDAPI calls you'll take a look at in Chapter 20, "Advanced Embedding." Here you've shifted forward by one zval** on the argument stack retrieved by zend_get_parameters_array_ex() in order to pass the remaining arguments on to the constructor untouched. Accepting Instances Often you'll need your functions or methods to accept objects from userspace. For this purpose, zend_parse_parameters() offers two format specifiers. The first is o (lowercase letter o), which will verify that the argument passed is an object and populate it into the passed zval**. A simple usage of this type could be the following userspace function, which returns the name of the class for whatever object it is passed: PHP_FUNCTION(sample2_class_getname) { zval *objvar; zend_class_entry *objce; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "o", &objvar) == FAILURE) { RETURN_NULL(); } objce = Z_OBJCE_P(objvar); RETURN_STRINGL(objce->name, objce->name_length, 1); } The second format specifier used with objects O (capital letter O) allows zend_parse_parameters() to verify not only the zval* type, but the class type as well. To do this, calling functions pass a zval** container along with a zend_class_entry* to validate against as in this implementation, which expects a Sample2_FirstClass object instance: PHP_FUNCTION(sample2_reload) { zval *objvar; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "O", &objvar, php_sample2_firstclass_entry) == FAILURE) { RETURN_NULL(); } /* Call hypothetical "reload" function */ RETURN_BOOL(php_sample2_fc_reload(objvar TSRMLS_CC)); } Accessing Properties As you already saw, class methods have access to the current object instances by way of getThis(). Combining the result of this macro, or any other zval* containing an object instance with the Z_OBJPROP_P() macro, yields a HashTable* containing the real properties associated with the object. An object's property listbeing a simple HashTable* containing zval*sis just another userspace variable array that happens to sit in a special location. Just as you'd use zend_hash_find(EG(active_symbol_table), ...) to retrieve a variable from the current scope, you'd also fetch and set object properties using the zend_hash API you learned about in Chapter 8, "Working with Arrays and HashTables." For example, assuming you have an instance of Sample2_FirstClass in the zval* variable rcvdclass, the following code block would retrieve the property foo from the standard properties HashTable*. zval **fooval; if (zend_hash_find(Z_OBJPROP_P(rcvdclass), "foo", sizeof("foo"), (void**)&fooval) == FAILURE) { /* $rcvdclass->foo doesn't exist */ return; } To add elements to the properties table, simply reverse this process with a call to zend_hash_add(), or use a variant of the add_assoc_*() functions you were introduced to in Chapter 8 for dealing with arrays. Simply replace the word assoc with property when dealing with objects. The following constructor method provides Sample2_FirstClass instances with a set of predefined default properties: PHP_NAMED_FUNCTION(php_sample2_fc_ctor) { /* For brevity, and to illustrate that arbitrary * function names may be used, the implementation * name was assigned manually this time */ zval *objvar = getThis(); if (!objvar) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Constructor called statically!"); RETURN_FALSE; } add_property_long(objvar, "life", 42); add_property_double(objvar, "pi", 3.1415926535); /* Constructor return values are irrelevant */ } The constructor can then be linked into the object through the php_sample2_firstclass_functions list: PHP_NAMED_FE(sample2_firstclass, php_sample2_fc_ctor, NULL) Summary Although the functionality provided by ZE1 / PHP4 classes is limited at best, they do have the advantage of being compatible with the widely installed PHP4 base currently in production. The simple techniques covered in this chapter will allow you to write functional, versatile code that compiles and runs today and will continue working tomorrow. In the next chapter, you'll find out what the buzz surrounding PHP5 is really about and why, if you want OOP functionality, you'll find a reason to upgrade and never look back. Chapter 11. PHP5 Objects COMPARING A PHP5 OBJECT TO ITS PHP4 ancestor is just plain unfair; however many of the API functions used with PHP5 objects are built to conform to the PHP4 API. If you worked through Chapter 10