on Aug 30th, 2008Indepth generator usage in Python (part2)


Part 1: Introduction to generators in Python
Part 2: Indepth generator usage in Python (part2)
Part 3: Managing asynchronous operations with python generators (part3)

After my last post on generators in Python I realized I missed go through one thing that I wanted to mention in the first part, namely how the return keyword interfaces with yield and generators, take this example function and its usage:

def count_to_3or4():
	counter = 0

	while counter < 3:
		counter += 1
		yield counter

	return counter+1

c = count_to_3or4()
print c.next() # 1
print c.next() # 2
print c.next() # 3
print c.next() # 4, from return - or ?

If you’ve read the previous post, or have a basic understanding of generators you would probably guess that 1, 2, 3 will print from the thee first .next()-calls - but you would be wrong. If you try to run the above code you will get this thrown back in your face:

  File "generators.py", line 15
    return counter+1
SyntaxError: 'return' with argument inside generator

So you can’t use return in generators (functions with yield) - well, yes you can - you just can’t use return with a value attached to it. If you call return within a generator function it will exit and any further calls to .next() will throw a StopIteration exception, take this example code:

def count_to_2or3():
	counter = 0

	while counter < 3:
		counter += 1
		yield counter

		if counter is 2:
			return

c = count_to_2or3()
print c.next() # 1
print c.next() # 2
print c.next() # 3, or ?

It will print 1 and 2, when you call .next() a third time it will hit return (since counter == 2, the if-clause evaluates to true) and throw a StopIteration exception. Basically “return” inside a generator-function does what “break” does inside a loop.

Performance

When you’re using generators as a type of iterator together with for (or manually, for that matter) working with large datasets you will see a substantial performance increase over list-generating functions. These two functions will generate the exact same output, but one will be significantly faster and use less memory:

def count_to_list(stop):
	_list = []
	counter = 0

	while counter < stop:
		counter += 1
		_list.append(counter)

	return _list

def count_to_generator(stop):
	counter = 0

	while counter < stop:
		counter += 1
		yield counter

The first function will generate a list of numbers (which takes quite some time and memory) and then return that while the second function, our generator will produce one number each time .next() is called on it and only consume as much memory as one integer take up while also being a fair bit faster, running this on my VPS yields (again, no pun intended) these results:

fredrik@holmstrom:~/python/generators$ time python list.py

real    0m0.611s
user    0m0.540s
sys     0m0.040s
fredrik@holmstrom:~/python/generators$ time python generator.py

real    0m0.385s
user    0m0.380s
sys     0m0.000s

I didn’t measure memory usage here, but trust me - generator.py will consume a lot less memory, this technique is also called “lazy evaluation” in proper CS terms - there’s a lot more information on this topic alone, but this will do for now.

Advanced usage

As I mentioned in the history introduction in my previous post about generators, Python 2.5 gave generators a substantial usability boost allowing us to pass information back into the function through the yield statement and the .send() and .throw() methods on the generator-object. send() works exactly like next() except that you can pass a value back into the function as it’s first argument, but there are a few caveats you should look out for - take this snippet of code:

def echo():
	while True:
		print yield

Will give you the following SyntaxError:

  File "explained.py", line 3
    print yield
              ^

SyntaxError: invalid syntax

Changing the print yield to this:

def echo():
	while True:
		val = yield
		print val

Will make the code execute properly, however this seems pretty non-pythonic having to store the variable we want to store the result in a temporary variable - so instead we can do this:

def echo():
	while True:
		print (yield)

Wrapping the yield in parenthesizes will allow you to use the result of it directly instead of storing it in a temporary variable, so let’s put our echo() generator to use:

def echo():
	while True:
		print (yield)

e = echo()
e.send("Hello")
e.send("World!")

But, running this code will show you the second caveat of trying to pass values back into the generator function, this TypeError will be thrown in your face if you run this code:

Traceback (most recent call last):
  File "generator.py", line 6, in 
    e.send(”Hello”)
TypeError: can’t send non-None value to a just-started generator

Remember how I said that yield paused the execution of the generator function and that when you call the generator function (in this case e = echo()) no code is yet to be executed until you call .next() on your generator-object? So if .send() can be used to pass data back into a yield statement while the generator is paused, we can’t call .send() when no code has been executed and no yield statement has paused the generator, right?

What this means in practice is that you either have to call .next() or .send(None) the first time you call a generator, and when the generator reaches its first yield statement it will pause execution waiting for another call to .send() (or .next() if you don’t want to pass any data back) that will pass data back into it at the yield statement, confusing? So changing the code above to this:

def echo():
	while True:
		print (yield)

e = echo()
e.send(None) # or e.next()
e.send("Hello")
e.send("World!")

Will make it run, printing:

Hello
World!

To illustrate exactly what’s happening here, I’ll take another example - slightly more advanced but still achieving the same result as above:

def echo():
	counter = 0

	while True:
		counter += 1
		print (yield counter)

e = echo()
print "Yeild nr %s" % e.send(None) 	# Sending nothing in (since we havn't paused
					# anything with yield yet) and yielding nr 1
					# back to the print statement

print "Yeild nr %s" % e.send("Hello")	# the pause from nr 1 gets resumed, passing "Hello"
					# back in and printing it, then doing another loop
					# and yielding nr 2 back and pausing execution

print "Yeild nr %s" % e.send("World!")	# the pause from nr 2 gets resumed, passing "World!"
					# back in and printing it, then doing another loop
					# and yielding nr 3 back to and pausing execution

					# If we would call the same e.send("Blah"), etc.
					# here we could go on forever since the yield
					# statement is stuck in a "while True"-loop

Make sure to read the comments in the above code since I figured it would be a lot easier to explain if the comments where attached to the correct line, running the above code will yield (again, no pun intended ;p) the following results:

Yeild nr 1
Hello
Yeild nr 2
World!
Yeild nr 3

Quite simple, and yet so powerful. There is one last thing I want to demonstrate in this, second part, of the tutorial - the method .throw() those of you familiar with other languages then python might recognize the word throw and figure it would have something to do with exceptions, and you’d be correct - it does.

As I’ve demonstrated, .send() sends in data to the paused yield statement, and .throw() does something similar: it sends in an exception that gets thrown and the paused yield statements line, let’s demonstrate:

def exceptional():
	while True:
		yield

e = exceptional()
e.next()
e.throw(Exception)

Will give you this output:

Traceback (most recent call last):
  File "generator.py", line 7, in 
    e.throw(Exception)
  File “generator.py”, line 3, in exceptional
    yield
Exception

Which is correct, because you sent an Exception in. It is possible to call .throw() as the first method on a new generator object, before any call to .next() or .send(), however that will throw an exception before any code is executed in the method and you will not have a chance to handle it.

In the stack trace above you also see that the exception is actually thrown at the “yield” line when it’s resumed after being paused by .next() the first time.

Let’s do a more advanced example, with a custom exception class:

def exceptional():
	counter = 0
	while True:
		try:
			counter += 1
			yield counter
		except DemoException, exc:
			print "Caught exception with message: %s" % exc

class DemoException(Exception):
	pass

e = exceptional()
print e.next()
print e.throw(DemoException("Hello World"))

The above code will print this:

1
Caught exception with message: Hello World
2

And here’s the magic - if you handle the exception that gets thrown in at the line yield was called at (by wrapping it in a try/except/finally-block) the code will continue executing like it should and .throw() will return the result of the next invocation of yield. All in all .send() and .throw() work exactly the same way except that .throw() raises whatever you feed it with as an exception.

The ability to pass errors (exceptions) *into* generators allows you to do some really neat error handling that doesn’t require your wrapping code to have any information about the generator resulting in a very clean and loosely coupled code.

In the next, and last, part I will go through a real world example using asynchronous i/o and network calls utilizing all the techniques explained in these two posts.

4 Responses to “Indepth generator usage in Python (part2)”

  1. […] Feed « New anti-spam commenting solution Indepth generator usage in Python (part2) […]

  2. Paddy3118on 30 Aug 2008 at 9:10 pm

    You say:
    if counter is 2
    In one of your examples instead of:
    if counter == 2
    Although C-Python may well work with what you state I think it would be better to say the latter because , in general, ‘is’ cannot be used instead of ‘==’ when comparing integers.

    Try the following example:

    >>> x=2000
    >>> y = 123456 - 121456
    >>> x is y
    False
    >>> x == y
    True
    >>> x,y
    (2000, 2000)
    >>> id(x),id(y)
    (11789976, 12382812)
    >>>

    - Paddy.

  3. […] public links >> exceptions Indepth generator usage in Python (part2) Saved by hedochen on Thu 11-9-2008 Palin Not a Non-Interventionist Saved by grugru on Sat […]

  4. David Joneson 04 Jan 2009 at 5:43 pm

    Paddy is right. “counter is 2″ is a bug. A dangerous sleeping bug.

    See http://drj11.wordpress.com/2007/06/11/python-perils-of-«x-is-1»/

    Nice article by the way.

Trackback URI | Comments RSS

Leave a Reply