When creating services which handle more than once client at a given time, the current state of each request must be managed. A typical way to keep the current state is to use the call stack, a stack of functions and their arguments. Unfortunately, this doesn't work well when you have more than one simultaneous request.
One primary approach to this problem is preemptive multi-tasking, where each request is handled by a new process or thread which has its own call stack. The primary difference between processes and threads, from the programmer's perspective, is primarily in how objects common to multiple requests are shared. For processes, one must use sockets or shared memory; in threads, the programmer must take care to lock objects which could be touched by other threads. In either case, each request is given its own call stack, making things easy for the programmer. However, the programmer must take extra care with information that is not managed with the stack and are shared between requests.
Another approach, is to use a event-driven approach, where each request
is broken down into distinct stages of work, and when each stage is done,
it schedules the next stage to be run. This is called cooperative
multi-tasking. The Twisted framework primarily uses this approach to
handling multiple requests. In Twisted, the execution of each stage is
typically scheduled with reactor.callLater
and the result of
each stage is reported with a call-back, usually managed by a
internet.defer.Deferred
object.
While this Deferred
approach is very good, it starts to have
problems when the flow of the request is not a simple linear sequence of
stages, or when results must flow between stages incrementally. An
alternative cooperative multitasking approach is to view data flows as
a hierarchy of stages, where the last stage in the flow pulls from previous
stages, and so on. This approach allows for a more granular control of
the flow, giving both incremental results and also allowing the next
state of the information to be chosen more dynamically. Unfortunately,
this approach is somewhat complicated since building iterators isn't easy.
Luckily, in version 2.2 and up, Python has generators, which are, in short,
syntax sugar for making iterators.
A further complication to this iterator based approach is that every once and a while, a iterator in the flow (perhaps one nested several layers deep) may have to block on a resource. In this case, the flow must be paused so that other flows in the system have a chance to produce results. So, rather than blocking, the entire state of the iterator chain must be saved so that it can be resumed later. Furthermore, traditional exception handling doesn't work since the call stack for a given request may be paused indefinitely. Helping the programmer manage these sorts of things is what the flow module does. While it does not depend on generators, to use flow effectively, generators are very useful.
An iterator is basically an object which produces a sequence of values.
Python's iterators are simply objects with an __iter__()
member function which returns an object (usually itself) which has a
next()
member function. The next()
method is
then invoked till it raises a StopIteration
exception.
In Python 2.2, the for syntax knows about iterators, making them
very nice to use.
from twisted.python.compat import iter, StopIteration class Counter: def __init__(self, count): self.count = count def __iter__(self): return self def next(self): ret = self.count self.count -= 1 if ret: return ret raise StopIteration return ret def list(it): ret = [] import sys if sys.version_info >= (2,2): for x in it: ret.append(x) else: it = iter(it) try: while 1: ret.append(it.next()) except StopIteration: pass return ret print list(Counter(3)) # prints: [3, 2, 1]
Often times it is useful for an iterator to change state during its production of values. This can be done nicely with the 'state' pattern. Simply store in the iterator the next function to be run.
class States: def __iter__(self): self.state = self.next_initial return self def next_initial(self): self.state = self.next_middle return "one" def next_middle(self): self.state = self.next_final return "two" def next_final(self): raise StopIteration def next(self): return self.state() print list(States()) # prints: ['one', 'two']
With Python 2.2, there is a wonderful syntax sugar for creating
iterators... generators. When a generator is first executed, an iterator
is returned. And from there on, each invocation of next()
gives the subsequent value produced by the yield
statement.
With generators, the two iterators above become very easy to express.
from __future__ import generators # <-- first line of file def Counter(count): while count > 0: yield count count -= 1 def States(): yield "one" yield "two" print list(Counter(3)) print list(States()) # prints: # [3, 2, 1] # ['one', 'two']
An important detail here, is that code which uses both iterators and
generators (dump) can be expressed in a manner which works in with 2.1 and
thus can be included in Twisted's code base. One technical difference
between iterators and generators, is that raising an exception from a
generator permanently halts the generator, while raising an exception from
an iterator's next()
method does not necessarily stop the
iterator, that is, one could call the next()
method again and
possibly get results. From here on, we use the generator syntax for
expressing iterators.
It is possible, and often useful to view a data flow as a nested iterator. In this view, the 'last' iterator in the chain 'pulls' data from previous iterators in the data flow. If you wish, you may call the last iterator a consumer, and the first iterator a producer. In the following example, we use the counter generator defined above as our producer.
from __future__ import generators def Counter(count): while count > 0: yield count count -= 1 def Consumer(): for result in Counter(3): if 2 != result: yield result print list(Consumer()) # prints: [3, 1]
The problem with this approach, in a cooperative multi-tasking environment, is that a producer could potentially block, and if it did, the entire process would stop servicing all other requests. Thus, some mechanism for pausing the flow and running it later is required.
The flow module provides this ability to cooperate with other tasks by placing a control mechanism between each stage of a flow. This is accomplished in code , by creating a wrapper object for each iterable, which one should yield before every call to next(), implicit or otherwise. In the yield, the control mechanism can then take over to support the underlying cooperative multi-tasking mechanism.
from __future__ import generators import flow def Counter(count): while count > 0: yield count count -= 1 def Consumer(): producer = flow.wrap(Counter(3)) yield producer for result in producer: if 2 != result: yield result yield producer print list(flow.Block(Consumer)) # prints: [3, 1]
In the code above, producer.next()
is implicitly
called. It does several things like checking for the end of the
iterator and scanning for failures. Its behavior can best be
described with a more verbose version below, with Counter
replaced with a simple list, for brevity.
from __future__ import generators import flow def Consumer(): producer = flow.wrap([3,2,1]) while 1: yield producer if producer.stop: break if producer.isFailure(): raise producer.result if 2 != producer.result: yield producer.result print list(flow.Block(Consumer)) # prints: [3, 1]
Another difference between plain old iterables and one wrapped with
the flow module is that exceptions thrown must be delayed for later
delivery. This is done with twisted.python.fail.Failure
.
Within a for loop, failure objects are raised if they are not provided
to wrap()
. Furthermore, Failure must also be used to
send Exceptions back from generators if it is recoverable.
from __future__ import generators import flow def Producer(): yield 1 yield flow.Failure(IOError("recoverable")) yield 2 assert 0, "asserting" yield 3 def Consumer(): producer = flow.wrap(Producer(), trap=IOError) yield producer try: for result in producer: if result is IOError: # handle recoverable error pass else: yield result yield producer except flow.Failure, fail: # pass other failures up the stack fail.trap(AssertionError) # handle non-recoverable error yield str(fail.value) print list(flow.Block(Consumer)) # prints: [1, 2, 'asserting']
This seems like quite the effort, wrapping each iterator and
then having to alter the calling sequence. Why? The answer is
that it allows for a flow.Cooperate
object to be
returned. When this happens, the entire call chain can be
paused so that other flows can use the call stack. For
flow.Iterator (which blocks), the implementation of Cooperate
simply puts the call chain to sleep.
import flow lst = ['1','2', flow.Cooperate(4), '3'] print list(flow.Block(lst)) # prints: ['1,'2','3']
An application of Cooperate can be demonstrated with the
Merge command. This simply zips two or more wrapped iterators
together, without blocking one or the other. In the example
below, the States
iterator isn't blocked by the
Counter
iterator.
from __future__ import generators import flow def States(): yield "one" yield "two" def Counter(count): while count > 0: if not count % 2: yield flow.Cooperate() yield count count -= 1 mrg = flow.Merge(Counter(3),States) print list(flow.Block(mrg)) # prints: [3, 'one', 'two', 2, 1]
The real value in Flow comes not from its stand-alone use, in this case,
Cooperate
does very little and the overhead imposed by flow
doesn't offset the functionality. However, when flow is combined with
Twisted's reactor.callLater
and
internet.defer.Deferred
mechanism, things get very cosy. In
the example below, the first two items in the list are produced (although
they are not delivered yet), other events in the reactor are allowed to
proceed, and then the last item in the list is produced.
from __future__ import generators from twisted.internet import reactor import flow def prn(x): print x d = flow.Deferred([1,2,flow.Cooperate(1),3]) d.addCallback(prn) reactor.callLater(2, reactor.stop) reactor.run() # prints # [1,2,3]
While the Flow module allows for multiple cooperative tasks
to work in a single thread, sometimes it is necessary to have
the output of another thread be consumed within a flow. This
can be done with the ThreadedIterator
. In the
example, the following Count implementation blocks within
a thread by using sleep.
from __future__ import generators from twisted.internet import reactor import flow class Count: def __init__(self, count): self.count = count def __iter__(self): return self def next(self): # this is run in a separate thread from time import sleep sleep(.2) val = self.count if not(val): raise flow.StopIteration self.count -= 1 print "producing", val return val d = flow.Deferred(flow.Threaded(Count(5))) def prn(x): print "results", x d.addCallback(prn) reactor.callLater(4, reactor.stop) reactor.run() # results: # producing 5 # producing 4 # producing 3 # producing 2 # producing 1 # results [5, 4, 3, 2, 1] # list(flow.Block(flow.Threaded(Count(5))))
Since most standard database drivers are thread based,
the flow builds on the ThreadedIterator
by
providing a QueryIterator
, which takes an sql
query and a ConnectionPool
.
from __future__ import generators from twisted.enterprise import adbapi from twisted.internet import reactor import flow dbpool = adbapi.ConnectionPool("SomeDriver",host='localhost', db='Database',user='User',passwd='Password') sql = """ (SELECT 'one') UNION ALL (SELECT 'two') UNION ALL (SELECT 'three') """ def consumer(): query = flow.Threaded(flow.QueryIterator(dbpool, sql)) while 1: yield query if query.stop: break print "Processed result : ", query.result from twisted.internet import reactor def finish(result): print "Deferred Complete : ", result f = flow.Deferred(consumer()) f.addBoth(finish) reactor.callLater(1,reactor.stop) reactor.run() # prints # Processed result : ('one',) # Processed result : ('two',) # Processed result : ('three',) # Deferred Complete: []