%s" % (result,) def getHeadline(self, input): self.d = defer.Deferred() reactor.callLater(1, self.processHeadline, input) self.d.addCallback(self._toHTML) 28 | Chapter 3: Writing Asynchronous Code with Deferreds www.it-ebooks.info return self.d def printData(result): print result reactor.stop() def printError(failure): print failure reactor.stop() h = HeadlineRetriever() d = h.getHeadline("Breaking News: Twisted Takes Us to the Moon!") d.addCallbacks(printData, printError) reactor.run() Running Example 3-4 produces:
Breaking News: Twisted Takes Us to the Moon!Because the provided headline is fewer than 50 characters long, HeadlineRetriever fires the callback chain, invoking _toHTML and then printData, which prints the HTML headline. Example 3-4 uses a helpful reactor method called callLater, which you can use to schedule events. In this example, we use callLater in getHeadline to fake an asyn‐ chronous event arriving after one second. What happens when we replace the three lines before reactor.run() with the follow‐ ing? h = HeadlineRetriever() d = h.getHeadline("1234567890"*6) d.addCallbacks(printData, printError) Running this version of the example, we get: [Failure instance: Traceback (failure with no frames):
HomeHome page', '/about': '
AboutAll about me', } 42 | Chapter 4: Web Servers www.it-ebooks.info def process(self): self.setHeader('Content-Type', 'text/html') if self.resources.has_key(self.path): self.write(self.resources[self.path]) else: self.setResponseCode(http.NOT_FOUND) self.write("
Not FoundSorry, no such resource.") self.finish() class MyHTTP(http.HTTPChannel): requestFactory = MyRequestHandler class MyHTTPFactory(http.HTTPFactory): def buildProtocol(self, addr): return MyHTTP() reactor.listenTCP(8000, MyHTTPFactory()) reactor.run() As always, we register a factory that generates instances of our protocol with the reactor. In this case, instead of subclassing protocol.Protocol directly, we are taking advantage of a higher-level API, http.HTTPChannel, which inherits from basic.LineReceiver and already understands the structure of an HTTP request and the numerous behaviors required by the HTTP RFCs. Our MyHTTP protocol specifies how to process requests by setting its requestFactory instance variable to MyRequestHander, which subclasses http.Request. Request’s process method is a noop that must be overridden in subclasses, which we do here. The HTTP response code is 200 unless overridden with setResponseCode, as we do to send a 404 http.NOT_FOUND when an unknown resource is requested. To test this server, run python requesthandler.py; this will start up the web server on port 8000. You can then test accessing the supported resources, http://localhost:8000/ and http://localhost:8000/about, and an unsupported resource like http://localhost:8000/ foo. Handling GET Requests Now that we have a good grasp of the structure of the HTTP protocol and how the low- level APIs work, we can move up to the high-level APIs in twisted.web.server that facilitate the construction of more sophisticated web servers. Serving Static Content A common task for a web server is to be able to serve static content out of some directory. Example 4-3 shows a basic implementation. Handling GET Requests | 43 www.it-ebooks.info Example 4-3. static_content.py from twisted.internet import reactor from twisted.web.server import Site from twisted.web.static import File resource = File('/var/www/mysite') factory = Site(resource) reactor.listenTCP(8000, factory) reactor.run() At this level we no longer have to worry about HTTP protocol details. Instead, we use a Site, which subclasses http.HTTPFactory and manages HTTP sessions and dis‐ patching to resources for us. A Site is initialized with the resource to which it is man‐ aging access. A resource must provide the IResource interface, which describes how the resource gets rendered and how child resources in the resource hierarchy are added and accessed. In this case, we initialize our Site with a File resource representing a regular, non- interpreted file. twisted.web contains implementations for many common resources. Besides File, available resources include a customizable Directory Listing and ErrorPage, a ProxyResource that renders results retrieved from another server, and an XMLRPC implementation. The Site is registered with the reactor, which will then listen for requests on port 8000. After starting the web server with python static_content.py, we can visit http://localhost: 8000 in a web browser. The server serves up a directory listing for all of the files in /var/ www/mysite/ (replace that path with a valid path to a directory on your system). Static URL dispatch What if you’d like to serve different content at different URLs? We can create a hierarchy of resources to serve at different URLs by registering Resources as children of the root resource using its putChild method. Example 4-4 demonstrates this static URL dispatch. Example 4-4. static_dispatch.py from twisted.internet import reactor from twisted.web.server import Site from twisted.web.static import File root = File('/var/www/mysite') root.putChild("doc", File("/usr/share/doc")) 44 | Chapter 4: Web Servers www.it-ebooks.info root.putChild("logs", File("/var/log/mysitelogs")) factory = Site(root) reactor.listenTCP(8000, factory) reactor.run() Now, visiting http://localhost:8000/ in a web browser will serve content from /var/www/ mysite, http://localhost:8000/doc will serve content from /usr/share/doc, and http:// localhost:8000/logs/ will serve content from /var/log/mysitelogs. These Resource hierarchies can be extended to arbitrary depths by registering child resources with existing resources in the hierarchy. Serving Dynamic Content Serving dynamic content looks very similar to serving static content—the big difference is that instead of using an existing Resource, like File, you’ll subclass Resource to define the new dynamic resource you want a Site to serve. Example 4-5 implements a simple clock page that displays the local time when you visit any URL. Example 4-5. dynamic_content.py from twisted.internet import reactor from twisted.web.resource import Resource from twisted.web.server import Site import time class ClockPage(Resource): isLeaf = True def render_GET(self, request): return "The local time is %s" % (time.ctime(),) resource = ClockPage() factory = Site(resource) reactor.listenTCP(8000, factory) reactor.run() ClockPage is a subclass of Resource. We implement a render_ method for every HTTP method we want to support; in this case we only care about supporting GET requests, so render_GET is all we implement. If we were to POST to this web server, we’d get a 405 Method Not Allowed unless we also implemented render_POST. The rendering method is passed the request made by the client. This is not an instance of twisted.web.http.Request, as in Example 4-2; it is instead an instance of twisted.web.server.Request, which subclasses http.Request and understands application-layer ideas like session management and rendering. Handling GET Requests | 45 www.it-ebooks.info render_GET returns whatever we want served as a response to a GET request. In this case, we return a string containing the local time. If we start our server with python dynamic_content.py, we can visit any URL on http://localhost:8000 with a web browser and see the local time displayed and updated as we reload. The isLeaf instance variable describes whether or not a resource will have children. Without more work on our part (as demonstrated in Example 4-6), only leaf resources get rendered; if we set isLeaf to False and restart the server, attempting to view any URL will produce a 404 No Such Resource. Dynamic Dispatch We know how to serve static and dynamic content. The next step is to be able to respond to requests dynamically, serving different resources based on the URL. Example 4-6 demonstrates a calendar server that displays the calendar for the year provided in the URL. For example, visiting http://localhost:8000/2013 will display the calendar for 2013, as shown in Figure 4-2. Example 4-6. dynamic_dispatch.py from twisted.internet import reactor from twisted.web.resource import Resource, NoResource from twisted.web.server import Site from calendar import calendar class YearPage(Resource): def __init__(self, year): Resource.__init__(self) self.year = year def render_GET(self, request): return "
%s" % (calendar(self.year),) class CalendarHome(Resource): def getChild(self, name, request): if name == '': return self if name.isdigit(): return YearPage(int(name)) else: return NoResource() def render_GET(self, request): return "Welcome to the calendar server!" root = CalendarHome() factory = Site(root) 46 | Chapter 4: Web Servers www.it-ebooks.info reactor.listenTCP(8000, factory) reactor.run() Figure 4-2. Calendar This example has the same structure as Example 4-3. A TCP server is started on port 8000, serving the content registered with a Site, which is a subclass of twisted.web.http.HTTPFactory and knows how to manage access to resources. The root resource is CalendarHome, which subclasses Resource to specify how to look up child resources and how to render itself. CalendarHome.getChild describes how to traverse a URL from left to right until we get a renderable resource. If there is no additional component to the requested URL (i.e., the request was for / ), CalendarHome returns itself to be rendered by invoking its render_GET method. If the URL has an additional component to its path that is an integer, an instance of YearPage is rendered. If that path component couldn’t be con‐ verted to a number, an instance of twisted.web.error.NoResource is returned instead, which will render a generic 404 page. There are a few subtle points to this example that deserve highlighting. Creating resources that are both renderable and have children Note that CalendarHome does not set isLeaf to True, and yet it is still rendered when we visit http://localhost:8000. In general, only resources that are leaves are rendered; this can be because isLeaf is set to True or because when traversing the resource hierarchy, that resource is where we are when the URL runs out. However, when isLeaf is True for a resource, its getChild method is never called. Thus, for resources that have children, isLeaf cannot be set to True. If we want CalendarHome to both get rendered and have children, we must override its getChild method to dictate resource generation. Handling GET Requests | 47 www.it-ebooks.info In CalendarHome.getChild, if name == '' (i.e., if we are requesting the root resource), we return ourself to get rendered. Without that if condition, visiting http://localhost: 8000 would produce a 404. Similarly, YearPage does not have isLeaf set to True. That means that when we visit http://localhost:8000/2013, we get a rendered calendar because 2013 is at the end of the URL, but if we visit http://localhost:8000/2013/foo, we get a 404. If we want http://localhost:8000/2013/foo to generate a calendar just like http://localhost: 8000/2013, we need to set isLeaf to True or have YearPage override getChild to return itself, like we do in CalendarHome. Redirects In Example 4-6, visiting http://localhost:8000 produced a welcome page. What if we wanted http://localhost:8000 to instead redirect to the calendar for the current year? In the relevant render method (e.g., render_GET), instead of rendering the resource at a given URL, we need to construct a redirect with twisted.web.util.redirectTo. redirectTo takes as arguments the URL component to which to redirect, and the re‐ quest, which still needs to be rendered. Example 4-7 shows a revised CalenderHome.render_GET that redirects to the URL for the current year’s calendar (e.g., http://localhost:8000/2013) upon requesting the root resource at http://localhost:8000. Example 4-7. redirectTo from datetime import datetime from twisted.web.util import redirectTo def render_GET(self, request): return redirectTo(datetime.now().year, request) Handling POST Requests To handle POST requests, implement a render_POST method in your Resource. A Minimal POST Example Example 4-8 serves a page where users can fill out and submit to the web server the contents of a text box. The server will then display that text back to the user. Example 4-8. handle_post.py from twisted.internet import reactor from twisted.web.resource import Resource from twisted.web.server import Site 48 | Chapter 4: Web Servers www.it-ebooks.info import cgi class FormPage(Resource): isLeaf = True def render_GET(self, request): return """ """ def render_POST(self, request): return """ You submitted: %s """ % (cgi.escape(request.args["form-field"]),) factory = Site(FormPage()) reactor.listenTCP(8000, factory) reactor.run() The FormPage Resource in handle_post.py implements both render_GET and render_POST methods. render_GET returns the HTML for a blank page with a text box called "form-field". When a visitor visits http://localhost:8000, she will see this form. render_POST extracts the text inputted by the user from request.args, sanitizes it with cgi.escape, and returns HTML displaying what the user submitted. Asynchronous Responses In all of the Twisted web server examples up to this point, we have assumed that the server can instantaneously respond to clients without having to first retrieve an expen‐ sive resource (say, from a database query) or do expensive computation. What happens when responding to a request blocks? Example 4-9 implements a dummy BusyPage resource that sleeps for five seconds before returning a response to the request. Example 4-9. blocking.py from twisted.internet import reactor from twisted.web.resource import Resource Asynchronous Responses | 49 www.it-ebooks.info from twisted.web.server import Site import time class BusyPage(Resource): isLeaf = True def render_GET(self, request): time.sleep(5) return "Finally done, at %s" % (time.asctime(),) factory = Site(BusyPage()) reactor.listenTCP(8000, factory) reactor.run() If you run this server and then load http://localhost:8000 in several browser tabs in quick succession, you’ll observe that the last page to load will load N*5 seconds after the first page request, where N is the number of requests to the server. In other words, the requests are processed serially. This is terrible performance! We need our web server to be responding to other requests while an expensive resource is being processed. One of the great properties of this asynchronous framework is that we can achieve the responsiveness that we want without introducing threads by using the Deferred API we already know and love. Example 4-10 demonstrates how to use a Deferred instead of blocking on an expensive resource. deferLater replaces the blocking time.sleep(5) with a Deferred that will fire after five seconds, with a callback to _delayedRender to finish the request when the fake resource becomes available. Then, instead of waiting on that resource, render_GET returns NOT_DONE_YET immediately, freeing up the web server to process other requests. Example 4-10. non_blocking.py from twisted.internet import reactor from twisted.internet.task import deferLater from twisted.web.resource import Resource from twisted.web.server import Site, NOT_DONE_YET import time class BusyPage(Resource): isLeaf = True def _delayedRender(self, request): request.write("Finally done, at %s" % (time.asctime(),)) request.finish() def render_GET(self, request): d = deferLater(reactor, 5, lambda: request) 50 | Chapter 4: Web Servers www.it-ebooks.info d.addCallback(self._delayedRender) return NOT_DONE_YET factory = Site(BusyPage()) reactor.listenTCP(8000, factory) reactor.run() If you run Example 4-10 and then load multiple instances of http:// localhost:8000 in a browser, you may still find that the requests are pro‐ cessed serially. This is not Twisted’s fault: some browsers, notably Chrome, serialize requests to the same resource. You can verify that the web server isn’t blocking by issuing several simultaneous requests through cURL or a quick Python script. More Practice and Next Steps This chapter introduced Twisted HTTP servers, from the lowest-level APIs up through twisted.web.server. We saw examples of serving static and dynamic content, handling GET and POST requests, and how to keep our servers responsive with asynchronous responses using Deferreds. The Twisted Web HOWTO index has several in-depth tutorials related to HTTP servers, including on deployment and templating. This page is an excellent series of short, self- contained examples of Twisted Web concepts. The Twisted Web examples directory has a variety of server examples, including ex‐ amples for proxies, an XML-RPC server, and rendering the output of a server process. Twisted is not a “web framework” like Django, web.py, or Flask. However, one of its many roles is as a framework for building frameworks! An example of this is the Klein micro-web framework, which you can also browse and download at that GitHub page. More Practice and Next Steps | 51 www.it-ebooks.info www.it-ebooks.info CHAPTER 5 Web Clients This chapter will talk about the HTTP client side of Twisted Web, starting with quick web resource retrieval for one-off applications and ending with the Agent API for de‐ veloping flexible web clients. Basic HTTP Resource Retrieval Twisted has several high-level convenience classes for quick one-off resource retrieval. Printing a Web Resource twisted.web.client.getPage asynchronously retrieves a resource at a given URL. It returns a Deferred, which fires its callback with the resource as a string. Example 5-1 demonstrates the use of getPage; it retrieves and prints the resource at the user-supplied URL. Example 5-1. print_resource.py from twisted.internet import reactor from twisted.web.client import getPage import sys def printPage(result): print result def printError(failure): print >>sys.stderr, failure def stop(result): reactor.stop() if len(sys.argv) != 2: print >>sys.stderr, "Usage: python print_resource.py